Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Brian Paul Kroth bpkr...@gmail.com 2012-10-11 14:06: Jonathan Nieder jrnie...@gmail.com 2012-10-01 01:25: snip/ Once again very sorry for the delay :( I forgot to disable the DEBUG_INFO and kept filling up my build VMs disk during compile. Then realized I had grabbed the 3.7 rc code, which these patches don't apply against. git checkout remotes/stable/linux-3.2.y (results in head c74a5e1fe4d0672936c8fb63d7484dfeaa30669c and 3.2.28), seemed to fix that. snip/ Anyways, I just started running that on a machine, so I'll let you know if I noticed anything there first before I think about pushing it to further places. Thanks, Brian Got another panic using this kernel/set of patches. The dump is attached. Let me know if you need anything else. Thanks, Brian Oct 12 13:43:01 kefka [14595.129262] FS-Cache: Unsupported event 2 [44/7] in state OBJECT_DEAD Oct 12 13:43:01 kefka [14595.129317] [ cut here ] Oct 12 13:43:01 kefka [14595.129338] kernel BUG at fs/fscache/object.c:357! Oct 12 13:43:01 kefka [14595.129358] invalid opcode: [#1] Oct 12 13:43:01 kefka SMP Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.129390] CPU 1 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.129395] Modules linked in: Oct 12 13:43:01 kefka acpi_cpufreq Oct 12 13:43:01 kefka mperf Oct 12 13:43:01 kefka cpufreq_stats Oct 12 13:43:01 kefka cpufreq_userspace Oct 12 13:43:01 kefka cpufreq_powersave Oct 12 13:43:01 kefka cpufreq_conservative Oct 12 13:43:01 kefka autofs4 Oct 12 13:43:01 kefka kvm_intel Oct 12 13:43:01 kefka kvm Oct 12 13:43:01 kefka cachefiles Oct 12 13:43:01 kefka binfmt_misc Oct 12 13:43:01 kefka nfsd Oct 12 13:43:01 kefka nfs Oct 12 13:43:01 kefka lockd Oct 12 13:43:01 kefka fscache Oct 12 13:43:01 kefka auth_rpcgss Oct 12 13:43:01 kefka nfs_acl Oct 12 13:43:01 kefka sunrpc Oct 12 13:43:01 kefka netconsole Oct 12 13:43:01 kefka configfs Oct 12 13:43:01 kefka ext3 Oct 12 13:43:01 kefka jbd Oct 12 13:43:01 kefka coretemp Oct 12 13:43:01 kefka ipmi_watchdog Oct 12 13:43:01 kefka ipmi_devintf Oct 12 13:43:01 kefka ipmi_si Oct 12 13:43:01 kefka ipmi_msghandler Oct 12 13:43:01 kefka fuse Oct 12 13:43:01 kefka uhci_hcd Oct 12 13:43:01 kefka ohci_hcd Oct 12 13:43:01 kefka tpm_infineon Oct 12 13:43:01 kefka snd_hda_codec_realtek Oct 12 13:43:01 kefka snd_hda_intel Oct 12 13:43:01 kefka snd_hda_codec Oct 12 13:43:01 kefka snd_hwdep Oct 12 13:43:01 kefka snd_pcm_oss Oct 12 13:43:01 kefka snd_mixer_oss Oct 12 13:43:01 kefka snd_pcm Oct 12 13:43:01 kefka snd_seq_midi Oct 12 13:43:01 kefka button Oct 12 13:43:01 kefka hp_wmi Oct 12 13:43:01 kefka snd_rawmidi Oct 12 13:43:01 kefka snd_seq_midi_event Oct 12 13:43:01 kefka processor Oct 12 13:43:01 kefka sparse_keymap Oct 12 13:43:01 kefka rfkill Oct 12 13:43:01 kefka snd_seq Oct 12 13:43:01 kefka psmouse Oct 12 13:43:01 kefka thermal_sys Oct 12 13:43:01 kefka serio_raw Oct 12 13:43:01 kefka joydev Oct 12 13:43:01 kefka evdev Oct 12 13:43:01 kefka tpm_tis Oct 12 13:43:01 kefka tpm Oct 12 13:43:01 kefka i2c_i801 Oct 12 13:43:01 kefka tpm_bios Oct 12 13:43:01 kefka i2c_core Oct 12 13:43:01 kefka wmi Oct 12 13:43:01 kefka snd_timer Oct 12 13:43:01 kefka snd_seq_device Oct 12 13:43:01 kefka snd Oct 12 13:43:01 kefka soundcore Oct 12 13:43:01 kefka snd_page_alloc Oct 12 13:43:01 kefka ext4 Oct 12 13:43:01 kefka mbcache Oct 12 13:43:01 kefka jbd2 Oct 12 13:43:01 kefka crc16 Oct 12 13:43:01 kefka dm_mod Oct 12 13:43:01 kefka raid10 Oct 12 13:43:01 kefka raid456 Oct 12 13:43:01 kefka async_raid6_recov Oct 12 13:43:01 kefka async_pq Oct 12 13:43:01 kefka raid6_pq Oct 12 13:43:01 kefka async_xor Oct 12 13:43:01 kefka xor Oct 12 13:43:01 kefka async_memcpy Oct 12 13:43:01 kefka async_tx Oct 12 13:43:01 kefka raid1 Oct 12 13:43:01 kefka raid0 Oct 12 13:43:01 kefka multipath Oct 12 13:43:01 kefka linear Oct 12 13:43:01 kefka md_mod Oct 12 13:43:01 kefka hid_microsoft Oct 12 13:43:01 kefka usbhid Oct 12 13:43:01 kefka hid Oct 12 13:43:01 kefka sg Oct 12 13:43:01 kefka sr_mod Oct 12 13:43:01 kefka sd_mod Oct 12 13:43:01 kefka cdrom Oct 12 13:43:01 kefka crc_t10dif Oct 12 13:43:01 kefka ahci Oct 12 13:43:01 kefka libahci Oct 12 13:43:01 kefka libata Oct 12 13:43:01 kefka scsi_mod Oct 12 13:43:01 kefka ehci_hcd Oct 12 13:43:01 kefka usbcore Oct 12 13:43:01 kefka e1000e Oct 12 13:43:01 kefka usb_common Oct 12 13:43:01 kefka [last unloaded: microcode] Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.130083] Oct 12 13:43:01 kefka [14595.130101] Pid: 25732, comm: kworker/u:0 Not tainted 3.2.28+ #8 Oct 12 13:43:01 kefka Hewlett-Packard HP Compaq 8200 Elite CMT PC Oct 12 13:43:01 kefka /1494 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.130149] RIP: 0010:[a0411fe5] Oct 12 13:43:01 kefka [a0411fe5] fscache_object_work_func+0x79c/0x7db [fscache] Oct 12 13:43:01 kefka [14595.130192] RSP: 0018:88021ed15e20 EFLAGS: 00010286 Oct 12 13:43:01 kefka [14595.130217] RAX: 004f RBX: 88021f6406c0 RCX:
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Brian Kroth wrote: Sorry, the labs went into their dormant period and all of my test cases ran away for the rest of the summer (the find cmd didn't seem to trigger the __fscache problem), so I hadn't moved any further on this. Now that they're back, I'm definitely seeing it again (about 20 different machines in two days last week), so I've started the process of hunting down a trigger cause again. I'll let you know if I find something. Thanks for the update. The human test cases can work fine for vetting a fix. I'd also be interested to hear whether the series I sent was completely borked, so I'd recommend trying on a test machine for a day or two before putting such a patched kernel into production, though. Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20121001082554.GA7957@elie.Belkin
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Jonathan Nieder jrnie...@gmail.com 2012-09-29 15:22: Hi again, In July, 2012, Brian Kroth wrote: Jonathan Nieder jrnie...@gmail.com 2012-07-21 12:04: Please test the attached patches, for example following the instructions below: [...] Anyways, I'll wait on the results of my previous test first to see if we have a reliable test case from it before moving forward. At this point the grep -r abc ... test is just hitting the cache over and over again, so it's not showing a whole lot. One other thing I'd tried before was something like this run a couple of times in a row (hmm, I suppose I could try them in parallel too): find /fsc_mounted_nfs -type f -exec cat {} /dev/null \; A couple of them paniced, but with inconsistent messages, so I had left them out for now. Perhaps that's worth another shot ... So, how did it go? Did some test case prove reliable? Any other new observations? Sorry, the labs went into their dormant period and all of my test cases ran away for the rest of the summer (the find cmd didn't seem to trigger the __fscache problem), so I hadn't moved any further on this. Now that they're back, I'm definitely seeing it again (about 20 different machines in two days last week), so I've started the process of hunting down a trigger cause again. I'll let you know if I find something. Thanks, Brian signature.asc Description: Digital signature
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Hi again, In July, 2012, Brian Kroth wrote: Jonathan Nieder jrnie...@gmail.com 2012-07-21 12:04: Please test the attached patches, for example following the instructions below: [...] Anyways, I'll wait on the results of my previous test first to see if we have a reliable test case from it before moving forward. At this point the grep -r abc ... test is just hitting the cache over and over again, so it's not showing a whole lot. One other thing I'd tried before was something like this run a couple of times in a row (hmm, I suppose I could try them in parallel too): find /fsc_mounted_nfs -type f -exec cat {} /dev/null \; A couple of them paniced, but with inconsistent messages, so I had left them out for now. Perhaps that's worth another shot ... So, how did it go? Did some test case prove reliable? Any other new observations? Thanks, Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/2012092902.GA12884@elie.Belkin