Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2013-03-27 Thread Stefan Behrens
On Tue, 3 Jul 2012 02:01:21 +0300, Sami Liedes wrote:
 Hi,
 
 I just got this oops on a computer running 3.4.2.
 
 A few minutes before I had started btrfs device scrub / and had a
 watcher process running btrfs scrub status / every 5 seconds. After
 a few gigabytes of scrubbing, I got this crash.
 
 The oops is transcribed from photos, so it may contain some errors. I
 tried to be careful, and double checked the backtrace.
 
   Sami
 
 
 general protection fault:  [#1] SMP
 CPU 4
 Modules linked in: tcp_diag inet_diag nfnetlink_log nfnetlink ufs qnx4 
 hfsplus hfs minix ntfs vfat msdos fat jfs reiserfs ext3 jbd ext2 ip6_tables 
 ebtable_nat ebtables cn rfcomm bnep
 parport_pc ppdev lp parport tun cpufreq_userspace cpufreq_stats 
 cpufreq_powersave cpufreq_conservative binfmt_misc fuse nfsd nfs nfs_acl 
 auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE
 ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
 ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm 
 i2c_algo_bit loop kvm_intel kvm snd_hda_codec_hdmi
 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio 
 snd_usbmidi_lib snd_hwdep snd_pcm_oss snd_mixer_oss joydev snd_pcm 
 acpi_cpufreq snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmi
 di ath3k snd_seq snd_seq_device snd_timer iTCO_wdt bluetooth eeepci_wmi 
 asus_wmi sparse_keymap crc16 rfkill pcspkr psmouse coretemp serio_raw evdev 
 mperf pci_hotplug i2c_i801 i2c_core processor button
 intel_agp snd mxm_wmi video wmi intel_gtt microcode soundcore sha256_generic 
 dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor 
 async_tx raid6_pq md_mod nbd btrfs libcrc32c
 zlib_deflate sd_mod crc_t10dif crc32c_intel ghash_cmulni_intel firewire_ohci 
 r8196 firewire_core ahci aesni_intel libahci mii crc_itu_t aes_x86_64 libata 
 aes_generic cryptd scsi_mod e1000e thermal fa
 n thermal_sys [last unloaded: scsi_wait_scan]
 
 Pid: 30863, comm: btrfs-endio-met Tainted: GW3.4.2 #1 System 
 manufacturer System Product Name/P8P67 EVO
 RIP: 0010:[811e83bd]  [811e83bd] memcpy+0xd/0x110
 RSP: :88003174dba8  EFLAGS: 00010202
 RAX: 88003174dc8f RBX: 0011 RCX: 0002
 RDX: 0001 RSI: 00050803 RDI: 88003174dc8f
 RBP: 88003174dbf0 R08: 000a R09: 
 R10:  R11:  R12: 88003174dca0
 R13: 8800659f42b0 R14: 0048 R15: 0011
 FS:  () GS:88021ed0() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 0973c000 CR3: 000167ef3000 CR4: 000407e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process btrfs-endio-met (pid: 30863, threadinfo 88003174c000, task 
 88006f818000)
 Stack:
  a026bd6b 8801960f5000 8003 1000
  88003174dc58 03dd 88000ac13c60 88003174dc58
  696f70203a61685f 88003174dc00 a026904d 88003174dcd0
 Call Trace:
  [a026bd6b] ? read_extent_buffer+0xbb/0x110 [btrfs]
  [a026304d] btrfs_node_key+0x1d/0x20 [btrfs]
  [a02994e0] __readahead_hook.isra.5+0x3c0/0x420 [btrfs]
  [a029986f] btree_readahead_hook+0x1f/0x40 [btrfs]
  [a023f841] btree_readpage_end_io_hook+0x111/0x260 [btrfs]
  [a0267452] ? find_first_extent_bit_state+0x22/0x80 [btrfs]
  [a026809b] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
  [a023ee61] ? end_workqueue_fn+0x31/0x50 [btrfs]
  [81158958] bio_endio+0x18/0x30
  [a023ee6c] end_workqueue_fn+0x3c/0x50 [btrfs]
  [a0275857] worker_loop+0x157/0x560 [btrfs]
  [a0275700] ? btrfs_queue_worker+0x310/0x310 [btrfs]
  [81058e5e] kthread+0x8e/0xa0
  [81418fe4] kernel_thread_helper+0x4/0x10
  [81058dd0] ? flush_kthread_worker+0x70/0x70
  [81418fe0] ? gs_change+0x13/0x13
 Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 
 90 90 90 90...
 8 4c 8b 56 10 4c
 RIP  [811e83bd] memcpy+0xd/0x110
  RSP 88003174dba8
 


Tonight my box had the same on a system running 3.7.10. Also with a Btrfs scrub 
running.

general protection fault:  [#1] SMP
Modules linked in: xt_LOG xt_limit xt_multiport iptable_nat nf_nat_ipv4 nf_nat 
ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
ip6_tables iptable_mangle ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 
xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth 
sp5100_tco edac_core edac_mce_amd k8temp i2c_piix4 kvm_amd kvm e1000e serio_raw 
mac_hid shpchp lp parport btrfs zlib_deflate libcrc32c raid10 raid456 async_pq 
async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx 

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-28 Thread Sami Liedes
On Tue, Jul 17, 2012 at 12:29:33AM +0300, Sami Liedes wrote:
 So, currently my idea is to boot the machine with a live USB stick,
 install kvm and make qemu qcow images backed by the real (2*1.1T)
 devices, but writing changes to the qcow images (I dare not mess with
 the actual devices, and don't happen to have quite 2.2T extra space
 outside of them...), and try to run scrub there. If that succeeds and
 the bug happens there too, debugging *should* be easier, and it
 *should* be possible to run it under KMEMCHECK too. If the bug doesn't
 happen inside a virtual machine, that would be interesting information
 too.

I have now been able to reproduce the bug in KVM with the setup
described above.

I think it's safe to say now that the bug depends on some kind of
interaction between btrfs and dm-crypt. With the following setup, the
bug does NOT happen:

* kvm, single cpu

* sees 3 disks, /dev/vda=root, /dev/vdb=btrfs-dev1, /dev/vdc=btrfs-dev2

* The btrfs devices are essentially snapshots of the real btrfs
  devices in raid-1 configuration (2*1.1T). As the real devices are
  encrypted, the decryption is done outside the KVM, i.e. the KVM
  snapshots are backed by the decrypted devices.

With the following setup, the bug DOES happen:

* kvm, single cpu

* sees 3 disks, /dev/vda=root, /dev/vdb=part1, /dev/vdc=part2, where
  part[12] is are LUKS containers containing the individual btrfs
  devices

* inside kvm, they are opened using

cryptsetup luksOpen /dev/vdb root1
cryptsetup luksOpen /dev/vdc root2

* after this, the filesystem is mounted with

mount /dev/mapper/root1 /media -o 
device=/dev/mapper/root1,device=/dev/mapper/root2

* The devices are snapshots of the actual physical encrypted
  partitions containing the btrfs devices.

I have not yet figured out if this can be reproduced using a pristine,
smaller btrfs filesystem in raid-1 configuration inside KVM or if
there's something about my specific filesystem that causes this. I can
investigate that too; it's easier to do for me than the above testing,
as I don't need to have continuous physical access to the computer to
do that.

Here's the .config of the kernel I used inside KVM to reproduce this:

  http://www.niksula.hut.fi/~sliedes/btrfs/config.3.4.4

I also ran the same tests with KMEMCHECK. Both with and without
crypto, there were quite a number of (of course possibly false)
warnings from btrfs code. I doubt any of them are related to this bug
- there were no KMEMCHECK warnings during the scrub operation. Here
are the logs, anyway:

  http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.nocrypto.gz
  http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.crypto.gz

Sami


signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-16 Thread Arne Jansen
Any news on this? I you give me some hints, I can try to reproduce
it here.

-Arne

On 10.07.2012 08:57, Arne Jansen wrote:
 On 10.07.2012 06:16, Sami Liedes wrote:
 On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote:
 * Just before the crash:
   btrfs: invalid parameters for read_extent_buffer: start (32771)  
 eb-len (32768). eb start is 2261163409408, level 100, generation 
 4412718571037421157, nritems 538968254. len param 17. debug 
 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0


 At a first glance: the generation converted to ascii is: ent() ==,
 so someone is patching the memory with ascii text, possibly C source.
 It might be interesting to dump the full contents of the eb, to get
 a clue on the source of the data.

 I changed the code to dump the contents of the eb struct at the point
 where that error (btrfs: invalid parameters...) is printed, at the
 checksum mismatch 4 site and at the node seems invalid now site.
 Now I have a big log of 1795 corrupted ebs. So far nothing that looks
 remotely like ascii text, though. But I have two different versions of
 the eb that caused that warning, a less corrupted one and a more
 corrupted one:

 
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
 ...f
 btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
 ...0...
 btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff  
 X..X..
 btrfs: 8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff  
 p..
 btrfs: 8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
 p..
 btrfs: 8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff  
 ._..
 btrfs: 8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff  
 @L...f..
 btrfs: 8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff  
 @...
 btrfs: 8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff  
 @...
 btrfs: 8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
 ...
 btrfs: 8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc658: 00 00 00 00 00 00 00 00  
 
 
 this one looks good so far.
 
 btrfs: --- end eb contents at 8801b13cc4c8 ---
 btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 
 3579FB59 level 160
 btrfs: node seems invalid now. checksum ok = 1
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 [... identical dump to above ...]
 btrfs: --- end eb contents at 8801b13cc4c8 ---
 btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
 (32768). eb start is 2239404212224, level 160, generation 
 4716553384049587249, nritems 295705211. len param 17. debug 
 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
 ...f
 btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
 ...0...
 btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc528: 00 00 00 00 

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-16 Thread Sami Liedes
On Mon, Jul 16, 2012 at 10:20:28AM +0200, Arne Jansen wrote:
 Any news on this? I you give me some hints, I can try to reproduce
 it here.

I've been planning for a few days now to try if it's reproducible in a
virtual machine with the same filesystem images. However I haven't
gotten around to doing this yet... I tried to get KMEMCHECK working on
my computer, but failed (and generated a bug report). It seems to work
in KVM though.

So, currently my idea is to boot the machine with a live USB stick,
install kvm and make qemu qcow images backed by the real (2*1.1T)
devices, but writing changes to the qcow images (I dare not mess with
the actual devices, and don't happen to have quite 2.2T extra space
outside of them...), and try to run scrub there. If that succeeds and
the bug happens there too, debugging *should* be easier, and it
*should* be possible to run it under KMEMCHECK too. If the bug doesn't
happen inside a virtual machine, that would be interesting information
too.

Another idea might be to use LVM snapshots of the actual devices, but
recent messages have made be wary of that approach - plus it's
somewhat of a pain, because the snapshotting would have to be done
before mounting anyway, since I have two devices and I doubt LVM
supports snapshotting two separate devices at the exact same moment.

However, I'm a bit ill at the moment, so I assume it might be at least
a few days until I get to actually implementing this...

Sami


signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-10 Thread Arne Jansen
Thanks for doing this, I'll start looking into it right away.
One thing though: it's probably not the contents of struct eb
that's being corrupted, but the data the eb points to. See for
example read_extent_buffer to see how to access it. Sorry for
being unclear on this.

Thanks,
Arne

On 10.07.2012 06:16, Sami Liedes wrote:
 On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote:
 * Just before the crash:
   btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
 (32768). eb start is 2261163409408, level 100, generation 
 4412718571037421157, nritems 538968254. len param 17. debug 
 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0


 At a first glance: the generation converted to ascii is: ent() ==,
 so someone is patching the memory with ascii text, possibly C source.
 It might be interesting to dump the full contents of the eb, to get
 a clue on the source of the data.
 
 I changed the code to dump the contents of the eb struct at the point
 where that error (btrfs: invalid parameters...) is printed, at the
 checksum mismatch 4 site and at the node seems invalid now site.
 Now I have a big log of 1795 corrupted ebs. So far nothing that looks
 remotely like ascii text, though. But I have two different versions of
 the eb that caused that warning, a less corrupted one and a more
 corrupted one:
 
 
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
 ...f
 btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
 ...0...
 btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff  
 X..X..
 btrfs: 8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff  
 p..
 btrfs: 8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
 p..
 btrfs: 8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff  
 ._..
 btrfs: 8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff  
 @L...f..
 btrfs: 8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff  
 @...
 btrfs: 8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff  
 @...
 btrfs: 8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
 ...
 btrfs: 8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc658: 00 00 00 00 00 00 00 00  
 
 btrfs: --- end eb contents at 8801b13cc4c8 ---
 btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 
 3579FB59 level 160
 btrfs: node seems invalid now. checksum ok = 1
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 [... identical dump to above ...]
 btrfs: --- end eb contents at 8801b13cc4c8 ---
 btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
 (32768). eb start is 2239404212224, level 160, generation 
 4716553384049587249, nritems 295705211. len param 17. debug 
 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
 ...f
 btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
 ...0...
 btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 

Re: *** GMX Spamverdacht *** Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-10 Thread Arne Jansen
On 10.07.2012 06:16, Sami Liedes wrote:
 On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote:
 * Just before the crash:
   btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
 (32768). eb start is 2261163409408, level 100, generation 
 4412718571037421157, nritems 538968254. len param 17. debug 
 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0


 At a first glance: the generation converted to ascii is: ent() ==,
 so someone is patching the memory with ascii text, possibly C source.
 It might be interesting to dump the full contents of the eb, to get
 a clue on the source of the data.
 
 I changed the code to dump the contents of the eb struct at the point
 where that error (btrfs: invalid parameters...) is printed, at the
 checksum mismatch 4 site and at the node seems invalid now site.
 Now I have a big log of 1795 corrupted ebs. So far nothing that looks
 remotely like ascii text, though. But I have two different versions of
 the eb that caused that warning, a less corrupted one and a more
 corrupted one:
 
 
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
 ...f
 btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
 ...0...
 btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff  
 X..X..
 btrfs: 8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff  
 p..
 btrfs: 8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
 p..
 btrfs: 8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff  
 ._..
 btrfs: 8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff  
 @L...f..
 btrfs: 8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff  
 @...
 btrfs: 8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff  
 @...
 btrfs: 8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
 ...
 btrfs: 8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc658: 00 00 00 00 00 00 00 00  
 

this one looks good so far.

 btrfs: --- end eb contents at 8801b13cc4c8 ---
 btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 
 3579FB59 level 160
 btrfs: node seems invalid now. checksum ok = 1
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 [... identical dump to above ...]
 btrfs: --- end eb contents at 8801b13cc4c8 ---
 btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
 (32768). eb start is 2239404212224, level 160, generation 
 4716553384049587249, nritems 295705211. len param 17. debug 
 2/989/295705211/4716553384049587249/0x0/0/0x0/0x0
 btrfs: --- start eb contents at 8801b13cc4c8 ---
 btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
 ...f
 btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
 ...0...
 btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 
 btrfs: 8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Re: *** GMX Spamverdacht *** Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-09 Thread Arne Jansen
On 07.07.2012 01:44, Sami Liedes wrote:
 [Retry: I think this mail didn't make it to the list, probably because
 of the 73 kilobyte attached log. Here's a URL to the file:]
 
http://www.niksula.hut.fi/~sliedes/btrfs-scrub-debug.log.gz
 
   Sami
 
 
 
 On Fri, Jul 06, 2012 at 05:09:00PM +0200, Jan Schmidt wrote:
 Oh I see. root-node can be NULL during mount. Please add this on top:
 
 Ok. So, ran it with DEBUG_PAGEALLOC and slub debugging on. This time
 it took half an hour to crash, and there's _lots_ of checksum mismatch
 [234] messages even before the crash. gzipped dmesg attached.
 
 At 781 seconds there's an irq 17: nobody cared. That's a known bug
 with this (and other Asus) motherboards and happens every now and
 then. I doubt it has anything to do with this.
 
 I think I might try running it overnight with KMEMCHECK to see if it
 reports something. But for now, what there's in the log:
 
 * lots of checksum mismatch [234], no 1s
 
 * a fair number of csum_tree_block: [0-9]+ callbacks suppressed
   lines
 
 * two btrfs: node seems invalid now. checksum ok = 1 messages, one
   at 1499 seconds and another just before the crash at 1973
 
 * Just before the crash:
   btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
 (32768). eb start is 2261163409408, level 100, generation 
 4412718571037421157, nritems 538968254. len param 17. debug 
 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0
 

At a first glance: the generation converted to ascii is: ent() ==,
so someone is patching the memory with ascii text, possibly C source.
It might be interesting to dump the full contents of the eb, to get
a clue on the source of the data.


 * the oopses
 
 By the way, something seems to be untabifying your patches. I don't
 know if it's on my side or yours, but at least some other patches I
 receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs
 mostly makes them apply cleanly for me.

 Oh, I'm sorry. Should have been on my side. I hope it's better with the 
 current
 diff?
 
 Yes. No problem :)
 
 [See attachment for dmesg log.]
 
   Sami
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-09 Thread Sami Liedes
On Mon, Jul 09, 2012 at 11:05:47AM +0200, Arne Jansen wrote:
  * Just before the crash:
btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
  (32768). eb start is 2261163409408, level 100, generation 
  4412718571037421157, nritems 538968254. len param 17. debug 
  2/989/538968254/4412718571037421157/0x0/0/0x0/0x0
  
 
 At a first glance: the generation converted to ascii is: ent() ==,
 so someone is patching the memory with ascii text, possibly C source.
 It might be interesting to dump the full contents of the eb, to get
 a clue on the source of the data.

I changed the code to dump the contents of the eb struct at the point
where that error (btrfs: invalid parameters...) is printed, at the
checksum mismatch 4 site and at the node seems invalid now site.
Now I have a big log of 1795 corrupted ebs. So far nothing that looks
remotely like ascii text, though. But I have two different versions of
the eb that caused that warning, a less corrupted one and a more
corrupted one:


btrfs: --- start eb contents at 8801b13cc4c8 ---
btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
...f
btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
...0...
btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  

btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc558: 58 c5 3c b1 01 88 ff ff 58 c5 3c b1 01 88 ff ff  
X..X..
btrfs: 8801b13cc568: 00 00 00 00 00 00 00 00 70 c5 3c b1 01 88 ff ff  
p..
btrfs: 8801b13cc578: 70 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
p..
btrfs: 8801b13cc588: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc598: 80 5f 9a 06 00 ea ff ff 00 86 9b 06 00 ea ff ff  
._..
btrfs: 8801b13cc5a8: 40 4c 9a 06 00 ea ff ff 80 66 9a 06 00 ea ff ff  
@L...f..
btrfs: 8801b13cc5b8: 80 eb 9b 06 00 ea ff ff 40 05 a2 06 00 ea ff ff  
@...
btrfs: 8801b13cc5c8: 40 e1 9b 06 00 ea ff ff 80 c4 9c 06 00 ea ff ff  
@...
btrfs: 8801b13cc5d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc5e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc5f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc608: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc618: 98 c5 3c b1 01 88 ff ff 00 00 00 00 00 00 00 00  
...
btrfs: 8801b13cc628: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc638: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc648: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc658: 00 00 00 00 00 00 00 00  

btrfs: --- end eb contents at 8801b13cc4c8 ---
btrfs: dm-6 checksum verify failed on 2239404212224 wanted B5F632BC found 
3579FB59 level 160
btrfs: node seems invalid now. checksum ok = 1
btrfs: --- start eb contents at 8801b13cc4c8 ---
[... identical dump to above ...]
btrfs: --- end eb contents at 8801b13cc4c8 ---
btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
(32768). eb start is 2239404212224, level 160, generation 4716553384049587249, 
nritems 295705211. len param 17. debug 
2/989/295705211/4716553384049587249/0x0/0/0x0/0x0
btrfs: --- start eb contents at 8801b13cc4c8 ---
btrfs: 8801b13cc4c8: 00 80 e4 66 09 02 00 00 00 80 00 00 00 00 00 00  
...f
btrfs: 8801b13cc4d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc4e8: 20 00 00 00 00 00 00 00 30 00 d7 10 02 88 ff ff   
...0...
btrfs: 8801b13cc4f8: 02 02 00 00 03 00 00 00 06 00 00 00 00 00 00 00  

btrfs: 8801b13cc508: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc518: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc528: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc538: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc548: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00  

btrfs: 8801b13cc558: 58 c5 3c b1 

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Chris Mason
On Fri, Jul 06, 2012 at 04:42:50AM -0600, Jan Schmidt wrote:
 ... down here in the stack. The warning is printed from two levels above,
 __readahead_hook.
 
 Either I'm absolutely blind and there's code along the (rather short) road
 between those two that might do this I haven't seen. Or someone else messes
 with our extent buffers or the underlying pages. What really confuses me is
 that it happens so reproducibly.
 
 I've no good idea at the moment how to go on. It might help to get a feeling 
 if
 it's shifting around at least a little bit or really constant in the timing of
 occurrence. So can you please apply the next patch on top of the other two and
 give it some more failure tries? The checksum mismatch [1234] line will be 
 of
 most interest. I'm also curious what the additional debug variables will say 
 in
 the extended version of the very first printk. You can leave out the stack
 traces if you like, they won't matter much anyway.

I would suggest turning on slab debug and CONFIG_DEBUG_PAGEALLOC.
Something really strange is happening here.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Sami Liedes
On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote:
 I've no good idea at the moment how to go on. It might help to get a feeling 
 if
 it's shifting around at least a little bit or really constant in the timing of
 occurrence. So can you please apply the next patch on top of the other two and
 give it some more failure tries? The checksum mismatch [1234] line will be 
 of
 most interest. I'm also curious what the additional debug variables will say 
 in
 the extended version of the very first printk. You can leave out the stack
 traces if you like, they won't matter much anyway.

Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as
suggested by Chris Mason.

With those and the latest patch, there's an oops already at boot. I
don't have netconsole yet at that point, but here's the important
parts (sure I can capture it fully if you need).

By the way, something seems to be untabifying your patches. I don't
know if it's on my side or yours, but at least some other patches I
receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs
mostly makes them apply cleanly for me.

Sami



btrfs: disk space caching is enabled
BUG: unable to handle kernel NULL pointer dereference at 0150
IP: [a0223568] check_node+0x138/0x250 [btrfs]
PGD 0
Oops:  [#1] SMP DEBUG_PAGEALLOC
CPU 6
Modules linked in: omitted [last unloaded: scsi_wait_scan]

Pid: 1176, comm: btrfs-endio-met Tainted: GW 3.4.4+btrfsdebug2 #2 
System Product Name/P8P67 EVO
RIP: 0010:[a0223568]  [a0223568] check_node+0x138/0x250 
[btrfs
[...]
Process btrfs-endio-met (pid: 1176, [...])
Call trace:
 [...] btree_readpage_end_io_hook+0x1e5/0x2d0 [btrfs]
 [...] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
 [...] ? end_workqueue_fn+0x31/0x50 [btrfs]
 [...] bio_endio+0x18/0x30
 [...] end_workqueue_fn+0x3c/0x50 [btrfs]
 [...] worker_loop+0x157/0x560 [btrfs]
 [...] ? btrfs_queue_worker+0x310/0x310 [btrfs]
 [...] kthead+0x8e/0xa0
 [...] kernel_thread_helper+0x4/0x10
 [...] ? flush_kthread_worker+0x70/0x70
 [...] ? gs_change+0x13/0x13
Code: [...]
RIP  [a0223568] check_node+0x138/0x250 [btrfs]
 RSP 8801f3843cb0



signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Chris Mason
On Fri, Jul 06, 2012 at 08:33:51AM -0600, Sami Liedes wrote:
 On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote:
  I've no good idea at the moment how to go on. It might help to get a 
  feeling if
  it's shifting around at least a little bit or really constant in the timing 
  of
  occurrence. So can you please apply the next patch on top of the other two 
  and
  give it some more failure tries? The checksum mismatch [1234] line will 
  be of
  most interest. I'm also curious what the additional debug variables will 
  say in
  the extended version of the very first printk. You can leave out the stack
  traces if you like, they won't matter much anyway.
 
 Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as
 suggested by Chris Mason.
 
 With those and the latest patch, there's an oops already at boot. I
 don't have netconsole yet at that point, but here's the important
 parts (sure I can capture it fully if you need).
 
 By the way, something seems to be untabifying your patches. I don't
 know if it's on my side or yours, but at least some other patches I
 receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs
 mostly makes them apply cleanly for me.
 
   Sami
 
 
 
 btrfs: disk space caching is enabled
 BUG: unable to handle kernel NULL pointer dereference at 0150
 IP: [a0223568] check_node+0x138/0x250 [btrfs]

This isn't from any of the new debugging.  Can you please try it on an
unpatched kernel?

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Jan Schmidt
On Fri, July 06, 2012 at 16:40 (+0200), Chris Mason wrote:
 On Fri, Jul 06, 2012 at 08:33:51AM -0600, Sami Liedes wrote:
 On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote:
 I've no good idea at the moment how to go on. It might help to get a 
 feeling if
 it's shifting around at least a little bit or really constant in the timing 
 of
 occurrence. So can you please apply the next patch on top of the other two 
 and
 give it some more failure tries? The checksum mismatch [1234] line will 
 be of
 most interest. I'm also curious what the additional debug variables will 
 say in
 the extended version of the very first printk. You can leave out the stack
 traces if you like, they won't matter much anyway.

 Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as
 suggested by Chris Mason.

 With those and the latest patch, there's an oops already at boot. I
 don't have netconsole yet at that point, but here's the important
 parts (sure I can capture it fully if you need).

 By the way, something seems to be untabifying your patches. I don't
 know if it's on my side or yours, but at least some other patches I
 receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs
 mostly makes them apply cleanly for me.

  Sami


 
 btrfs: disk space caching is enabled
 BUG: unable to handle kernel NULL pointer dereference at 0150
 IP: [a0223568] check_node+0x138/0x250 [btrfs]
 
 This isn't from any of the new debugging.  Can you please try it on an
 unpatched kernel?

You're confusing that with check_leaf. I added check_node along the way, see my
mail from Thu, July 05, 2012 at 15:41 (+0200). I'd really like to add something
similar for the 3.6 series.

Checking for the null pointer dereference.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Jan Schmidt
On Fri, July 06, 2012 at 16:33 (+0200), Sami Liedes wrote:
 On Fri, Jul 06, 2012 at 12:42:50PM +0200, Jan Schmidt wrote:
 I've no good idea at the moment how to go on. It might help to get a feeling 
 if
 it's shifting around at least a little bit or really constant in the timing 
 of
 occurrence. So can you please apply the next patch on top of the other two 
 and
 give it some more failure tries? The checksum mismatch [1234] line will be 
 of
 most interest. I'm also curious what the additional debug variables will say 
 in
 the extended version of the very first printk. You can leave out the stack
 traces if you like, they won't matter much anyway.
 
 Ok. Also turned on CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON as
 suggested by Chris Mason.
 
 With those and the latest patch, there's an oops already at boot. I
 don't have netconsole yet at that point, but here's the important
 parts (sure I can capture it fully if you need).

Oh I see. root-node can be NULL during mount. Please add this on top:

--
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index df0b347..22838a3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -578,7 +578,8 @@ static noinline int check_node(struct btrfs_root *root,
}
node-debug[5] = node-start;
node-debug[6] = btrfs_header_level(node);
-   node-debug[6] |= btrfs_header_level(root-node)  16;
+   if (root-node)
+   node-debug[6] |= btrfs_header_level(root-node)  16;
node-debug[7] = 0xb22f50b22f5;

return 0;
--


 By the way, something seems to be untabifying your patches. I don't
 know if it's on my side or yours, but at least some other patches I
 receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs
 mostly makes them apply cleanly for me.

Oh, I'm sorry. Should have been on my side. I hope it's better with the current
diff?

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Sami Liedes
On Fri, Jul 06, 2012 at 10:59:24PM +0300, Sami Liedes wrote:
 I think I might try running it overnight with KMEMCHECK to see if it
 reports something. But for now, what there's in the log:

My KMEMCHECK kernel didn't even boot (due to some weird KMEMCHECK/ACPI
interaction), so I won't pursue this idea further at the moment...

 * lots of checksum mismatch [234], no 1s

One thing to notice from the logs, too, is that the device seems to
always be dm-6, the second device of the filesystem. This never seems
to happen to dm-5. There are 1583 lines of btrfs: dm-6 checksum
verify failed.

Sami


signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-06 Thread Sami Liedes
[Retry: I think this mail didn't make it to the list, probably because
of the 73 kilobyte attached log. Here's a URL to the file:]

   http://www.niksula.hut.fi/~sliedes/btrfs-scrub-debug.log.gz

Sami



On Fri, Jul 06, 2012 at 05:09:00PM +0200, Jan Schmidt wrote:
 Oh I see. root-node can be NULL during mount. Please add this on top:

Ok. So, ran it with DEBUG_PAGEALLOC and slub debugging on. This time
it took half an hour to crash, and there's _lots_ of checksum mismatch
[234] messages even before the crash. gzipped dmesg attached.

At 781 seconds there's an irq 17: nobody cared. That's a known bug
with this (and other Asus) motherboards and happens every now and
then. I doubt it has anything to do with this.

I think I might try running it overnight with KMEMCHECK to see if it
reports something. But for now, what there's in the log:

* lots of checksum mismatch [234], no 1s

* a fair number of csum_tree_block: [0-9]+ callbacks suppressed
  lines

* two btrfs: node seems invalid now. checksum ok = 1 messages, one
  at 1499 seconds and another just before the crash at 1973

* Just before the crash:
  btrfs: invalid parameters for read_extent_buffer: start (32771)  eb-len 
(32768). eb start is 2261163409408, level 100, generation 4412718571037421157, 
nritems 538968254. len param 17. debug 
2/989/538968254/4412718571037421157/0x0/0/0x0/0x0

* the oopses

  By the way, something seems to be untabifying your patches. I don't
  know if it's on my side or yours, but at least some other patches I
  receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs
  mostly makes them apply cleanly for me.
 
 Oh, I'm sorry. Should have been on my side. I hope it's better with the 
 current
 diff?

Yes. No problem :)

[See attachment for dmesg log.]

Sami
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-05 Thread Jan Schmidt


On 04.07.2012 22:24, Sami Liedes wrote:
 On Wed, Jul 04, 2012 at 06:38:00PM +0200, Jan Schmidt wrote:
 [  200.980496] btrfs: invalid parameters for read_extent_buffer: start 
 (32771)  eb-len (32768). eb start is 2243489562624, level 26, generation 
 3144240307695375391, nritems 620178657. len param 17. debug 
 2/989/620178657/3144240307695375391
 
 Let's call this try 1. I ran it three more times, so below we have
 tries 2, 3 and 4.
 
 Wow, that's strange. Can you repeat your test once or twice and paste that 
 line,
 please? I'd like to get a feeling if the values are completely random.
 
 Curiously, it clearly takes longer for it to crash after starting the
 scrub each time I run it. Also on try 4 I got an entirely different
 crash (backtrace below). Now it scrubs maybe the first 200G or so of
 both devices of the (raid-1) 2.2T filesystem before it crashes.

Can you double check that there's nothing about corrected errors in your
logs? Scrub will correct errors along the way and log that. So maybe
we've only a few tries left to find the root cause.

 start and eb-len seem to be the same (32771 and 32768) every time.

That's the tree block size in your setup.

 eb start varies, but there's some pattern if you view them in hex:
 
   Try 1  20a5a66
   Try 2  20bb0018000
   Try 3  20a8bc28000
   Try 4  (no output, different crash)

I fail to identify a real pattern there.

 The rest of the values seem to me to be completely different every time.

Which is itself interesting.

 Try 4:

That one is not that much different. We read some garbage from a tree
block and started the next read ahead cycle for the alledged children.
That way we came to a logical address that's out of bounds, instead of a
logical address with even more garbage.

I'd like to see if you corrupted your trees on disk in a really strange
manner (with matching checksums?), or if data comes from the disk intact
and becomes damaged thereafter.

Could you store the output of
btrfs-debug-tree /dev/[whatever]
before try number 5 and afterwards? It will be quite a lot if you've got
a lot of files in there. Don't send it anywhere right now, just store it
away if possible.

What I'd like to get in the next reply is the output of the attached
patch, a single pass should do this time.

NB: As we've already check_leaf doing exta leaf checks after reading
them, we should probably add something like check_node as a general
manner to make btrfs more robust.

Thank you,
-Jan

---
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7ffc88..34122c2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -316,6 +316,11 @@ static int csum_tree_block(struct btrfs_root *root, struct 
extent_buffer *buf,
return 0;
 }

+int btrfs_csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf)
+{
+   return csum_tree_block(root, buf, 1);
+}
+
 /*
  * we can't consider a given block up to date unless the transid of the
  * block matches the transid in the parent node's pointer.  This is how we
@@ -471,6 +476,12 @@ static int check_tree_block_fsid(struct btrfs_root *root,
   (unsigned long long)btrfs_header_bytenr(eb), \
   (unsigned long long)root-objectid, slot)

+#define CORRUPT_NODE(root, node, reason, ...)  \
+   printk(KERN_CRIT btrfs: corrupt node block=%llu,  \
+  root=%llu:  reason,\
+  (unsigned long long)btrfs_header_bytenr(node),   \
+  (unsigned long long)root-objectid, ##__VA_ARGS__)
+
 static noinline int check_leaf(struct btrfs_root *root,
   struct extent_buffer *leaf)
 {
@@ -532,6 +543,42 @@ static noinline int check_leaf(struct btrfs_root *root,
return 0;
 }

+static noinline int check_node(struct btrfs_root *root,
+  struct extent_buffer *node)
+{
+   int i;
+   u32 nritems = btrfs_header_nritems(node);
+   u64 generation;
+
+   if (nritems == 0)
+   return 0;
+
+   if (nritems  BTRFS_NODEPTRS_PER_BLOCK(root)) {
+   CORRUPT_NODE(root, node, nritems (%lu) too large (%lu)\n,
+(unsigned long)nritems,
+BTRFS_NODEPTRS_PER_BLOCK(root));
+   return -EIO;
+   }
+
+   if (node-len  root-nodesize) {
+   CORRUPT_NODE(root, node, length (%lu) too large (%lu)\n,
+node-len, (unsigned long)root-nodesize);
+   return -EIO;
+   }
+
+   generation = btrfs_super_generation(root-fs_info-super_copy);
+   for (i = 0; i  nritems; i++) {
+   if (btrfs_node_ptr_generation(node, i)  generation) {
+   CORRUPT_NODE(root, node, generation (%llu) too new in 
slot %d (maximum expected %llu)\n,
+btrfs_node_ptr_generation(node, i), i,
+generation);
+   

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-05 Thread Sami Liedes
On Thu, Jul 05, 2012 at 03:41:33PM +0200, Jan Schmidt wrote:
 I'd like to see if you corrupted your trees on disk in a really strange
 manner (with matching checksums?), or if data comes from the disk intact
 and becomes damaged thereafter.
 
 Could you store the output of
   btrfs-debug-tree /dev/[whatever]
 before try number 5 and afterwards? It will be quite a lot if you've got
 a lot of files in there. Don't send it anywhere right now, just store it
 away if possible.

Ok, I have it stored. I mounted the fs read-only to do that just to be
safe, because when I did it on a live fs I got some (~15) parent
transid verify failed messages.

 Can you double check that there's nothing about corrected errors in your
 logs? Scrub will correct errors along the way and log that. So maybe
 we've only a few tries left to find the root cause.

Nope, definitely nothing there. A few unlinked orphans messages at
mount time, but that's all. I'm sure I would have captured any further
messages on my netconsole. Also nothing in any earlier logs that I can
see.

Well, until now (try 6) anyway. I actually ran btrfs scrub / twice now
after running btrfs-debug-tree (tries 5 and 6), because on the first
try I ran on wrong kernel (without the debug patch). On try 5 it
crashed very quickly, probably in less than a second. I also noticed
that it doesn't actually crash the machine or even make the filesystem
unusable in any way (apart from unmount blocking at shutdown), so I
also have the same oopses in my log files stored on the same
filesystem. But no, nothing about corrected errors or checksum
failures. So:

Try 5 (without your new patch, but after running btrfs-debug-tree); no
checksum failures as before, similar crash to before:


[ 2036.512656] btrfs: invalid parameters for read_extent_buffer: start (32771) 
 eb-len (32768). eb start is 2239379931136, level 102, generation 
309534250463128, nritems 56187918. len param 17. debug 
2/989/56187918/309534250463128
[ 2036.512770] [ cut here ]
[ 2036.512834] WARNING: at fs/btrfs/extent_io.c:4528 
read_extent_buffer+0x167/0x1a0 [btrfs]()
[ 2036.512880] Hardware name: System Product Name
[ 2036.512906] Modules linked in: omitted  [last unloaded: netconsole]
[ 2036.512954] Pid: 1135, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #2
[...]


Try 6, with your new patch - as a new thing, got checksum errors,
including one very slightly before the crash and with the same eb
start as in the invalid parameters message. Crashed in a couple of
minutes.

Now that I look at the the log below it actually has two crashes in
it, one of btrfs-endio-met pid 1212 and another one of btrfs-endio-met
pid some 72 seconds later. I didn't notice that until after the fact.

dm-6 (the device with checksum failures) is on a disk on which I don't
have anything else besides this partition, so I suppose it could be
corrupting data without me having noticed. I know it used to work at
one time, but that doesn't mean too much. Anyway, I'm sure I haven't
seen any checksum verify failed messages until now. None in the logs
either.

Ah, one thing I have forgotten to mention. I think I've been using
this filesystem for maybe a month now. It started as a single-device
filesystem, but I did restriping to raid-1 on June 12. dm-5 is the
original device, while dm-6 (the one with the checksum failures in try
6) is the added one. I still have logs of the restriping and
everything after that; grep reveals no unusual lines containing
btrfs in all the logs since then, apart from the crashes. Besides
the oopses and lines that mention btrfs just because of the kernel
version 3.4.4+btrfsdebug, this is a summary of all the lines that
mention btrfs:

  btrfs: disk space caching is enabled
  btrfs flagging fs with big metadata feature
  btrfs: new size for /dev/mapper/rootvg-scratch_crypt is 751617179648
 * this device is not on the filesystem with problems - although I
   haven't tried scrubbing this filesystem
  btrfs: unlinked 104 orphans (10 times with 4 different numbers)

And during restriping

  btrfs: found 10008 extents (and similar)
  btrfs: relocating block group 1019010351104 flags 1 (and similar)

And two different warnings in the last few days, but I think this is
only because the debug patches changed the code:

  WARNING: at fs/btrfs/extent_io.c:4522 read_extent_buffer+0xe6/0x110 [btrfs]()
  WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]()

Another thing that comes to mind that I might well be the only one
that uses btrfs with raid-1 on two separate dm-crypted devices
(contrary to the way suggested on Btrfs FAQ - in my experience that
can actually speed up things, especially on computers without hardware
crypto, because the kernel has only one kcryptd thread per dm-crypt
device and therefore only can use a single core 

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-04 Thread Sami Liedes
On Wed, Jul 04, 2012 at 01:26:46PM +0200, Jan Schmidt wrote:
 On 04.07.2012 02:17, Sami Liedes wrote:
  On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote:
  I've seen this before: An overly long Modules linked in: line causes
  a large gap in netconsole output.
  
  I managed to capture the entire output using netconsole by modifying
  the kernel to not output the list of modules.
 
 Okay, thanks for the output. Can you please apply the patch below and capture
 especially the line printed before the cut here line?

Here you go.

Sami


[  121.524803] netpoll: netconsole: local port 
[  121.524831] netpoll: netconsole: local IP 192.168.1.2
[  121.524853] netpoll: netconsole: interface 'eth0'
[  121.524874] netpoll: netconsole: remote port 1194
[  121.524894] netpoll: netconsole: remote IP 192.168.1.73
[  121.524917] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c
[  121.525055] console [netcon0] enabled
[  121.525074] netconsole: network logging started
[  200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) 
 eb-len (32768). eb start is 2243489562624, level 26, generation 
3144240307695375391, nritems 620178657. len param 17. debug 
2/989/620178657/3144240307695375391
[  200.980594] [ cut here ]
[  200.980644] WARNING: at fs/btrfs/extent_io.c:4528 
read_extent_buffer+0x167/0x1a0 [btrfs]()
[  200.980681] Hardware name: System Product Name
[  200.980701] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[  200.980739] Pid: 1145, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #2
[  200.980774] Call Trace:
[  200.980792]  [8103905a] warn_slowpath_common+0x7a/0xb0
[  200.980821]  [810390a5] warn_slowpath_null+0x15/0x20
[  200.980860]  [a01d6e07] read_extent_buffer+0x167/0x1a0 [btrfs]
[  200.980902]  [a01ce03d] btrfs_node_key+0x1d/0x20 [btrfs]
[  200.980941]  [a020459f] __readahead_hook.isra.5+0x3ff/0x460 [btrfs]
[  200.980982]  [a020492f] btree_readahead_hook+0x1f/0x40 [btrfs]
[  200.981022]  [a01aa841] btree_readpage_end_io_hook+0x111/0x260 
[btrfs]
[  200.981065]  [a01d2442] ? find_first_extent_bit_state+0x22/0x80 
[btrfs]
[  200.981109]  [a01d308b] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
[  200.981150]  [a01a9e61] ? end_workqueue_fn+0x31/0x50 [btrfs]
[  200.981182]  [81158988] bio_endio+0x18/0x30
[  200.981214]  [a01a9e6c] end_workqueue_fn+0x3c/0x50 [btrfs]
[  200.981253]  [a01e08d7] worker_loop+0x157/0x560 [btrfs]
[  200.981291]  [a01e0780] ? btrfs_queue_worker+0x310/0x310 [btrfs]
[  200.981323]  [81058e5e] kthread+0x8e/0xa0
[  200.981348]  [81419064] kernel_thread_helper+0x4/0x10
[  200.981377]  [81058dd0] ? flush_kthread_worker+0x70/0x70
[  200.981406]  [81419060] ? gs_change+0x13/0x13
[  200.981430] ---[ end trace e93713a9d40cd06e ]---
[  200.981459] general protection fault:  [#1] SMP 
[  200.981487] CPU 2 
[  200.981498] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[  200.981540] 
[  200.981550] Pid: 1145, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #2 System manufacturer System Product Name/P8P67 EVO
[  200.981612] RIP: 0010:[811e83ed]  [811e83ed] 
memcpy+0xd/0x110
[  200.981650] RSP: :8801f4bf7b68  EFLAGS: 00010202
[  200.981675] RAX: 8801f4bf7c8f RBX: 0011 RCX: 0002
[  200.981707] RDX: 0001 RSI: 00050803 RDI: 8801f4bf7c8f
[  200.981738] RBP: 8801f4bf7be0 R08:  R09: 
[  200.981769] R10: 8801f4bf7c8f R11: 8801f3930780 R12: 8801f4bf7ca0
[  200.981800] R13: 8801f7286178 R14: 0048 R15: 0011
[  200.981832] FS:  () GS:88021ec8() 
knlGS:
[  200.981868] CS:  0010 DS:  ES:  CR0: 8005003b
[  200.981894] CR2: f773a000 CR3: 00020d0de000 CR4: 000407e0
[  200.981925] DR0:  DR1:  DR2: 
[  200.981956] DR3:  DR6: 0ff0 DR7: 0400
[  200.981988] Process btrfs-endio-met (pid: 1145, threadinfo 8801f4bf6000, 
task 8801f4ff42f0)
[  200.982026] Stack:
[  200.982038]  a01d6d5b 880124f72ce1 0011 
0002
[  200.982084]  03dd 24f72ce1 2ba295e6a4a5d41f 
88020dc65000
[  200.982130]  8801f4bf7c8f 1000 8801f4bf7c58 
03dd
[  200.982175] Call Trace:
[  200.982201]  [a01d6d5b] ? read_extent_buffer+0xbb/0x1a0 [btrfs]
[  200.982243]  [a01ce03d] btrfs_node_key+0x1d/0x20 [btrfs]
[  200.982281]  [a020459f] __readahead_hook.isra.5+0x3ff/0x460 [btrfs]
[  200.982322]  [a020492f] btree_readahead_hook+0x1f/0x40 [btrfs]
[  200.982362]  

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-04 Thread Jan Schmidt
On 04.07.2012 18:03, Sami Liedes wrote:
 Here you go.
 
   Sami
 [...]
 [  200.980496] btrfs: invalid parameters for read_extent_buffer: start 
 (32771)  eb-len (32768). eb start is 2243489562624, level 26, generation 
 3144240307695375391, nritems 620178657. len param 17. debug 
 2/989/620178657/3144240307695375391

Wow, that's strange. Can you repeat your test once or twice and paste that line,
please? I'd like to get a feeling if the values are completely random.

Reading more of the readahead code now...

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-04 Thread Sami Liedes
On Wed, Jul 04, 2012 at 06:38:00PM +0200, Jan Schmidt wrote:
  [  200.980496] btrfs: invalid parameters for read_extent_buffer: start 
  (32771)  eb-len (32768). eb start is 2243489562624, level 26, generation 
  3144240307695375391, nritems 620178657. len param 17. debug 
  2/989/620178657/3144240307695375391

Let's call this try 1. I ran it three more times, so below we have
tries 2, 3 and 4.

 Wow, that's strange. Can you repeat your test once or twice and paste that 
 line,
 please? I'd like to get a feeling if the values are completely random.

Curiously, it clearly takes longer for it to crash after starting the
scrub each time I run it. Also on try 4 I got an entirely different
crash (backtrace below). Now it scrubs maybe the first 200G or so of
both devices of the (raid-1) 2.2T filesystem before it crashes.

start and eb-len seem to be the same (32771 and 32768) every time. eb
start varies, but there's some pattern if you view them in hex:

  Try 1  20a5a66
  Try 2  20bb0018000
  Try 3  20a8bc28000
  Try 4  (no output, different crash)

The rest of the values seem to me to be completely different every time.

Sami


Try 2:


[12961.870107] btrfs: invalid parameters for read_extent_buffer: start (32771) 
 eb-len (32768). eb start is 2249220784128, level 14, generation 
2242260605927040034, nritems 117835525. len param 17. debug 
2/989/117835525/2242260605927040034
[12961.870204] [ cut here ]
[12961.870264] WARNING: at fs/btrfs/extent_io.c:4528 
read_extent_buffer+0x167/0x1a0 [btrfs]()
[12961.870302] Hardware name: System Product Name
[12961.870322] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[12961.870367] Pid: 1144, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #2
[12961.870403] Call Trace:
[12961.870421]  [8103905a] warn_slowpath_common+0x7a/0xb0
[12961.870449]  [810390a5] warn_slowpath_null+0x15/0x20
[...]


Try 3:


[  531.770984] btrfs: invalid parameters for read_extent_buffer: start (32771) 
 eb-len (32768). eb start is 2244317708288, level 170, generation 
13639284858109917187, nritems 6171943. len param 17. debug 
2/989/6171943/13639284858109917187
[  531.771081] [ cut here ]
[  531.771133] WARNING: at fs/btrfs/extent_io.c:4528 
read_extent_buffer+0x167/0x1a0 [btrfs]()
[  531.771169] Hardware name: System Product Name
[  531.771191] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[  531.771229] Pid: 1132, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #2
[  531.771265] Call Trace:
[  531.771282]  [8103905a] warn_slowpath_common+0x7a/0xb0
[...]


Try 4:


[   95.933108] netconsole: network logging started
[  982.651987] unable to find logical 691402650139365534 len 32768
[  982.652060] [ cut here ]
[  982.652085] kernel BUG at fs/btrfs/volumes.c:3725!
[  982.652109] invalid opcode:  [#1] SMP 
[  982.652138] CPU 4 
[  982.652149] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[  982.652190] 
[  982.652201] Pid: 1127, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #2 System manufacturer System Product Name/P8P67 EVO
[  982.652264] RIP: 0010:[a0257778]  [a0257778] 
__btrfs_map_block+0x668/0x680 [btrfs]
[  982.652323] RSP: :8801f43cfa70  EFLAGS: 00010286
[  982.652347] RAX: 0049 RBX: 09985b0c0e52109e RCX: 0082
[  982.652379] RDX: 00e8 RSI: 0046 RDI: 0246
[  982.652411] RBP: 8801f43cfb10 R08:  R09: 
[  982.652443] R10: 8801f1fbf680 R11: 8801f3b00780 R12: 8801f0026108
[  982.652475] R13: 8000 R14: 8801f0026fe0 R15: 8801f43cfbb0
[  982.652507] FS:  () GS:88021ed0() 
knlGS:
[  982.652542] CS:  0010 DS:  ES:  CR0: 8005003b
[  982.652568] CR2: f775c000 CR3: 0002110a1000 CR4: 000407e0
[  982.652600] DR0:  DR1:  DR2: 
[  982.652632] DR3:  DR6: 0ff0 DR7: 0400
[  982.652664] Process btrfs-endio-met (pid: 1127, threadinfo 8801f43ce000, 
task 8801f401c2f0)
[  982.652703] Stack:
[  982.652715]  8801f43cfa90 811dc553 88021ec0ce88 
88021ed0ce80
[  982.652761]  8801f43cfb00 0086 8801f43cfaf0 
0086
[  982.652807]  00fdf43cfad0 8801f43cfba8 0024 
0004
[  982.652852] Call Trace:
[  982.652869]  [811dc553] ? cpumask_next_and+0x23/0x40
[  982.652897]  [8111d7e2] ? 

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-04 Thread Jan Schmidt
On 04.07.2012 02:17, Sami Liedes wrote:
 On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote:
 I've seen this before: An overly long Modules linked in: line causes
 a large gap in netconsole output.
 
 I managed to capture the entire output using netconsole by modifying
 the kernel to not output the list of modules.

Okay, thanks for the output. Can you please apply the patch below and capture
especially the line printed before the cut here line?

Thanks!
-Jan

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c9018a0..beabe99 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4519,7 +4519,14 @@ void read_extent_buffer(struct extent_buffer *eb, void 
*dstv,
size_t start_offset = eb-start  ((u64)PAGE_CACHE_SIZE - 1);
unsigned long i = (start_offset + start)  PAGE_CACHE_SHIFT;
 
-   WARN_ON(start  eb-len);
+   if (start  eb-len) {
+   printk(KERN_ERR btrfs: invalid parameters for 
read_extent_buffer: start (%lu)  eb-len (%lu). eb start is %llu, level %d, 
generation %llu, nritems %d. len param %lu. debug %llu/%llu/%llu/%llu\n,
+   start, eb-len, eb-start, btrfs_header_level(eb),
+   btrfs_header_generation(eb), btrfs_header_nritems(eb),
+   len,
+   eb-debug[0], eb-debug[1], eb-debug[2], eb-debug[3]);
+   WARN_ON(1);
+   }
WARN_ON(start + len  eb-start + eb-len);
 
offset = (start_offset + start)  ((unsigned long)PAGE_CACHE_SIZE - 1);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b516c3b..1bbf823 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -164,6 +164,8 @@ struct extent_buffer {
wait_queue_head_t lock_wq;
struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES];
struct page **pages;
+
+   u64 debug[4];
 };
 
 static inline void extent_set_compress_type(unsigned long *bio_flags,
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index ac5d010..d9c1146 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -168,10 +168,15 @@ static int __readahead_hook(struct btrfs_root *root, 
struct extent_buffer *eb,
struct btrfs_key key;
struct btrfs_key next_key;
 
+   eb-debug[0] = 1;
+   eb-debug[1] = i;
+   eb-debug[2] = nritems;
+   eb-debug[3] = generation;
btrfs_node_key_to_cpu(eb, key, i);
-   if (i + 1  nritems)
+   if (i + 1  nritems) {
+   eb-debug[0] = 2;
btrfs_node_key_to_cpu(eb, next_key, i + 1);
-   else
+   } else
next_key = re-top;
bytenr = btrfs_node_blockptr(eb, i);
n_gen = btrfs_node_ptr_generation(eb, i);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-03 Thread Jan Schmidt
 I just got this oops on a computer running 3.4.2.
 
 A few minutes before I had started btrfs device scrub / and had a
 watcher process running btrfs scrub status / every 5 seconds. After
 a few gigabytes of scrubbing, I got this crash.
 
 The oops is transcribed from photos, so it may contain some errors. I

You did *what*? :-) Uploading a photo would be fine, just in case that's easier
for you the next time.

 tried to be careful, and double checked the backtrace.
 
   Sami
 
 
 general protection fault:  [#1] SMP
 CPU 4
 Modules linked in: tcp_diag inet_diag nfnetlink_log nfnetlink ufs qnx4 
 hfsplus hfs minix ntfs vfat msdos fat jfs reiserfs ext3 jbd ext2 ip6_tables 
 ebtable_nat ebtables cn rfcomm bnep
 parport_pc ppdev lp parport tun cpufreq_userspace cpufreq_stats 
 cpufreq_powersave cpufreq_conservative binfmt_misc fuse nfsd nfs nfs_acl 
 auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE
 ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
 ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm 
 i2c_algo_bit loop kvm_intel kvm snd_hda_codec_hdmi
 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio 
 snd_usbmidi_lib snd_hwdep snd_pcm_oss snd_mixer_oss joydev snd_pcm 
 acpi_cpufreq snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmi
 di ath3k snd_seq snd_seq_device snd_timer iTCO_wdt bluetooth eeepci_wmi 
 asus_wmi sparse_keymap crc16 rfkill pcspkr psmouse coretemp serio_raw evdev 
 mperf pci_hotplug i2c_i801 i2c_core processor button
 intel_agp snd mxm_wmi video wmi intel_gtt microcode soundcore sha256_generic 
 dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor 
 async_tx raid6_pq md_mod nbd btrfs libcrc32c
 zlib_deflate sd_mod crc_t10dif crc32c_intel ghash_cmulni_intel firewire_ohci 
 r8196 firewire_core ahci aesni_intel libahci mii crc_itu_t aes_x86_64 libata 
 aes_generic cryptd scsi_mod e1000e thermal fa
 n thermal_sys [last unloaded: scsi_wait_scan]
 
 Pid: 30863, comm: btrfs-endio-met Tainted: GW3.4.2 #1 System 
 manufacturer System Product Name/P8P67 EVO
 RIP: 0010:[811e83bd]  [811e83bd] memcpy+0xd/0x110
 RSP: :88003174dba8  EFLAGS: 00010202
 RAX: 88003174dc8f RBX: 0011 RCX: 0002
 RDX: 0001 RSI: 00050803 RDI: 88003174dc8f
 RBP: 88003174dbf0 R08: 000a R09: 
 R10:  R11:  R12: 88003174dca0
 R13: 8800659f42b0 R14: 0048 R15: 0011
 FS:  () GS:88021ed0() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 0973c000 CR3: 000167ef3000 CR4: 000407e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process btrfs-endio-met (pid: 30863, threadinfo 88003174c000, task 
 88006f818000)
 Stack:
  a026bd6b 8801960f5000 8003 1000
  88003174dc58 03dd 88000ac13c60 88003174dc58
  696f70203a61685f 88003174dc00 a026904d 88003174dcd0
 Call Trace:
  [a026bd6b] ? read_extent_buffer+0xbb/0x110 [btrfs]
  [a026304d] btrfs_node_key+0x1d/0x20 [btrfs]
  [a02994e0] __readahead_hook.isra.5+0x3c0/0x420 [btrfs]
  [a029986f] btree_readahead_hook+0x1f/0x40 [btrfs]
  [a023f841] btree_readpage_end_io_hook+0x111/0x260 [btrfs]
  [a0267452] ? find_first_extent_bit_state+0x22/0x80 [btrfs]
  [a026809b] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
  [a023ee61] ? end_workqueue_fn+0x31/0x50 [btrfs]
  [81158958] bio_endio+0x18/0x30
  [a023ee6c] end_workqueue_fn+0x3c/0x50 [btrfs]
  [a0275857] worker_loop+0x157/0x560 [btrfs]
  [a0275700] ? btrfs_queue_worker+0x310/0x310 [btrfs]
  [81058e5e] kthread+0x8e/0xa0
  [81418fe4] kernel_thread_helper+0x4/0x10
  [81058dd0] ? flush_kthread_worker+0x70/0x70
  [81418fe0] ? gs_change+0x13/0x13
 Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 
 90 90 90 90...
 8 4c 8b 56 10 4c
 RIP  [811e83bd] memcpy+0xd/0x110
  RSP 88003174dba8

That's looking strange.

I checked the readahead code again: It deliberately skips locking and uses
btrfs_node_key with a counter variable. This means, we might end up reading a
key that's no longer actually there. However, it only operates on nodes of
trees, not leaves. Node entries have a fixed size, so no matter what changes in
the node, you won't reach behind the end of that node with an index that was
valid the moment before.

As far as I see it, that algorithm is safe. It could miss some keys or do some
extra work that's not strictly required, but it should never reach a GPF from
btrfs_node_key.

If no other ideas come 

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-03 Thread Sami Liedes
On Tue, Jul 03, 2012 at 02:01:21AM +0300, Sami Liedes wrote:
 I just got this oops on a computer running 3.4.2.

I now repeated this on 3.4.4. Merely running a btrfs scrub start /
causes this after a couple of minutes of running. This time I didn't
run btrfs scrub status / in a loop, so that's not the cause. And
judging from the two tries and two crashes, this seems very
repeatable.

The backtrace is basically exactly the same as with 3.4.4.

Sami


signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-03 Thread Sami Liedes
On Tue, Jul 03, 2012 at 03:11:04PM +0200, Jan Schmidt wrote:
  The oops is transcribed from photos, so it may contain some errors. I
 
 You did *what*? :-) Uploading a photo would be fine, just in case that's 
 easier
 for you the next time.

Nah, it's sometimes refreshing to do something that doesn't take too
much thinking for a while. Besides I fault myself for not having had
netconsole logging on that machine and needed to punish myself for
that omission ;)

 That's looking strange.
 
 I checked the readahead code again: It deliberately skips locking and uses
 btrfs_node_key with a counter variable. This means, we might end up reading a
 key that's no longer actually there. However, it only operates on nodes of
 trees, not leaves. Node entries have a fixed size, so no matter what changes 
 in
 the node, you won't reach behind the end of that node with an index that was
 valid the moment before.
 
 As far as I see it, that algorithm is safe. It could miss some keys or do some
 extra work that's not strictly required, but it should never reach a GPF from
 btrfs_node_key.
 
 If no other ideas come up, I'd try memtesting that machine.

Ran a full pass of memtest86+ v4.20, with no errors found. Also the
machine works very well in all other respects under heavy load.

I think I might try setting up that netconsole to see if there are any
interesting console messages before the oops... As I said, I also was
able to reproduce this on 3.4.4, so ATM I assume I'm able to reproduce
this at will.

Sami


signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-03 Thread Jan Schmidt
On 03.07.2012 15:58, Sami Liedes wrote:
 I think I might try setting up that netconsole to see if there are any
 interesting console messages before the oops... As I said, I also was
 able to reproduce this on 3.4.4, so ATM I assume I'm able to reproduce
 this at will.

That would be helpful. It's good that it's reproducible. Please paste the entire
output if possible.

If there's nothing immediately useful in there, I'll try to make up some
debugging patches for that problem.

Thanks!
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-03 Thread Sami Liedes
On Tue, Jul 03, 2012 at 04:35:25PM +0200, Jan Schmidt wrote:
 On 03.07.2012 15:58, Sami Liedes wrote:
  I think I might try setting up that netconsole to see if there are any
  interesting console messages before the oops... As I said, I also was
  able to reproduce this on 3.4.4, so ATM I assume I'm able to reproduce
  this at will.
 
 That would be helpful. It's good that it's reproducible. Please paste the 
 entire
 output if possible.
 
 If there's nothing immediately useful in there, I'll try to make up some
 debugging patches for that problem.

There's certainly something there; unfortunately netconsole eats most
of it. (Is there a better way to capture full dmesg from panic?
Assuming a computer without a serial port.)

I've seen this before: An overly long Modules linked in: line causes
a large gap in netconsole output.

Hovever perhaps the most interesting piece is there:

   [30846.822395] WARNING: at fs/btrfs/extent_io.c:4522 
read_extent_buffer+0xe6/0x110 [btrfs]()

Unfortunately, no backtrace for that :( Ideas?

Sami



[30638.258946] netpoll: netconsole: local port 
[30638.258950] netpoll: netconsole: local IP 192.168.1.2
[30638.258952] netpoll: netconsole: interface 'eth0'
[30638.258953] netpoll: netconsole: remote port 1194
[30638.258954] netpoll: netconsole: remote IP 192.168.1.73
[30638.258956] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c
[30638.259002] console [netcon0] enabled
[30638.259004] netconsole: network logging started
[30846.822356] [ cut here ]
[30846.822395] WARNING: at fs/btrfs/extent_io.c:4522 
read_extent_buffer+0xe6/0x110 [btrfs]()
[30846.822400] Hardware name: System Product Name
[30846.822402] Modules linked in: netconsole configfs tun cn ip6_tables 
ebtable_nat ebtables parport_pc ppdev lp parport bnep rfcomm cpufreq_userspace 
cpufreq_stats cpufreq_powersave cpufreq_conservative binfmt_misc nfsd nfs 
nfs_acl auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE 
ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm 
i2c_algo_bit loop fuse kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_usb_audio snd_hda_intel snd_usbmidi_lib snd_hda_codec snd_hwdep snd_pcm_oss 
joydev snd_mixer_oss snd_pcm snd_page_alloc eeepc_wmi90 90 90 48 89 f8 48 89 90 
48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 
10 4c 
[30846.825219] RIP  [811e84dd] memcpy+0xd/0x110
[30846.825248]  RSP 88009ecebba8
 xord1  raid6_pq md_mod nbd btrfs libcrc32c zlib_deflate sd_mod crc_t10dif ahci 
libahci crc32c_intel r8169 pci_hotplug bluetooth snd_seq_device snd_timer snd 
intel_agp acpi_cpufreq mperf mxm_wmi psmouse wmi coretemp soundcore processor 
serio_raw i2c_i801 iTCO_wdt video rfkill microcode crc16 evdev i2c_core button 
intel_gtt[30846.822679]  [a0249841] 
btree_readpage_end_io_hook+0x111/0x260 [btrfs]
 sha256_generic dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq 
async_xor xor async_tx[30846.822773]  [81419120] ? gs_change+0x13/0x13
 md_mod nbd btrfs libcrc32c zlib_deflate sd_mod crc_t10dif ahci libahci 
raid6_pq r8169 mii firewire_ohci ghash_clmulni_intel firewire_core aesni_intel 
crc_itu_t aes_x86_64 bnep aes_generic cryptd scsi_mod e1000e fan thermal nfsd 
thermal_sys [last unloaded: netconsole] auth_rpcgss[30846.823704] Pid: 16224, 
comm: btrfs-endio-met Tainted: GW3.4.4 #1 System manufacturer 
System Product Name/P8P67 EVO
[30846.823767] RIP: 0010:[811e84dd]  [811e84dd] 
memcpy+0xd/0x110
[30846.823805] RSP: :88009ecebba8  EFLAGS: 00010202
[30846.823830] RAX: 88009ecebc8f RBX: 0011 RCX: 0002
[30846.823865] RDX: 0001 RSI: 00050803 RDI: 88009ecebc8f

[30846.823933] R10: 88001d2395c0 R11: 8801f198c780 R12: 88009ecebca0
[30846.823968] R13: 88008eb01178 R14: 0048 R15: 0011
[30846.824002] FS:  () GS:88021ec8() 
knlGS:
[30846.824042] CS:  0010 DS:  ES:  CR0: 8005003b
[30846.824069] CR2: 0add813c CR3: 000163305000 CR4: 000407e0
[30846.823694] 
[30846.824101] DR0:  DR1:  DR2: 
[30846.824136] DR3:  DR6: 0ff0 DR7: 0400
[30846.824171] Process btrfs-endio-met (pid: 16224, threadinfo 
88009ecea000, task 88020f3142f0)
[30846.824225]  a0275d5b 880167d3 8003 
1000
[30846.824273]  88009ecebc58 03dd[30846.824214] Stack:
 880099cdea20
[30846.824318]  6172745f69706d20 88009ecebc00 a026d03d 
88009ecebc58 88009ecebcd0
[30846.824363] Call Trace:
[30846.824434]  [a026d03d] btrfs_node_key+0x1d/0x20 [btrfs]
[30846.824474]  

Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-03 Thread Sami Liedes
On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote:
 I've seen this before: An overly long Modules linked in: line causes
 a large gap in netconsole output.

I managed to capture the entire output using netconsole by modifying
the kernel to not output the list of modules.

Sami



[  125.827919] netpoll: netconsole: local port 
[  125.827946] netpoll: netconsole: local IP 192.168.1.2
[  125.827969] netpoll: netconsole: interface 'eth0'
[  125.827990] netpoll: netconsole: remote port 1194
[  125.828011] netpoll: netconsole: remote IP 192.168.1.73
[  125.828034] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c
[  125.828169] console [netcon0] enabled
[  125.828193] netconsole: network logging started
[  247.787472] [ cut here ]
[  247.787536] WARNING: at fs/btrfs/extent_io.c:4522 
read_extent_buffer+0xe6/0x110 [btrfs]()
[  247.787573] Hardware name: System Product Name
[  247.787594] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[  247.787632] Pid: 1146, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #1
[  247.787674] Call Trace:
[  247.787692]  [8103905a] warn_slowpath_common+0x7a/0xb0
[  247.787794]  [810390a5] warn_slowpath_null+0x15/0x20
[  247.787835]  [a01d7d86] read_extent_buffer+0xe6/0x110 [btrfs]
[  247.787877]  [a01cf03d] btrfs_node_key+0x1d/0x20 [btrfs]
[  247.787917]  [a02054d0] __readahead_hook.isra.5+0x3c0/0x420 [btrfs]
[  247.787959]  [a020585f] btree_readahead_hook+0x1f/0x40 [btrfs]
[  247.787999]  [a01ab841] btree_readpage_end_io_hook+0x111/0x260 
[btrfs]
[  247.788043]  [a01d3442] ? find_first_extent_bit_state+0x22/0x80 
[btrfs]
[  247.788087]  [a01d408b] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
[  247.788129]  [a01aae61] ? end_workqueue_fn+0x31/0x50 [btrfs]
[  247.788161]  [81158988] bio_endio+0x18/0x30
[  247.788194]  [a01aae6c] end_workqueue_fn+0x3c/0x50 [btrfs]
[  247.788234]  [a01e1847] worker_loop+0x157/0x560 [btrfs]
[  247.788272]  [a01e16f0] ? btrfs_queue_worker+0x310/0x310 [btrfs]
[  247.788306]  [81058e5e] kthread+0x8e/0xa0
[  247.788332]  [81419064] kernel_thread_helper+0x4/0x10
[  247.788360]  [81058dd0] ? flush_kthread_worker+0x70/0x70
[  247.788389]  [81419060] ? gs_change+0x13/0x13
[  247.788413] ---[ end trace e93713a9d40cd06e ]---
[  247.788444] general protection fault:  [#1] SMP 
[  247.788473] CPU 5 
[  247.788484] Modules linked in: omitted  [last unloaded: scsi_wait_scan]
[  247.788523] 
[  247.788533] Pid: 1146, comm: btrfs-endio-met Tainted: GW
3.4.4-modded-oops+ #1 System manufacturer System Product Name/P8P67 EVO
[  247.788595] RIP: 0010:[811e83ed]  [811e83ed] 
memcpy+0xd/0x110
[  247.788632] RSP: :8802152f3ba8  EFLAGS: 00010202
[  247.788656] RAX: 8802152f3c8f RBX: 0011 RCX: 0002
[  247.788687] RDX: 0001 RSI: 00050803 RDI: 8802152f3c8f
[  247.788719] RBP: 8802152f3bf0 R08:  R09: 
[  247.788750] R10: 880207dd89c0 R11: 8801f3a90780 R12: 8802152f3ca0
[  247.788782] R13: 8801c1d5b830 R14: 0048 R15: 0011
[  247.788814] FS:  () GS:88021ed4() 
knlGS:
[  247.788850] CS:  0010 DS:  ES:  CR0: 8005003b
[  247.788876] CR2: 038942a0 CR3: 0001c26ce000 CR4: 000407e0
[  247.788907] DR0:  DR1:  DR2: 
[  247.788939] DR3:  DR6: 0ff0 DR7: 0400
[  247.788971] Process btrfs-endio-met (pid: 1146, threadinfo 8802152f2000, 
task 8801f4192ca0)
[  247.789009] Stack:
[  247.789021]  a01d7d5b 8802010b7000 8003 
1000
[  247.789066]  8802152f3c58 03dd 88021032fa20 
8802152f3c58
[  247.789112]  6669646e65230a29 8802152f3c00 a01cf03d 
8802152f3cd0
[  247.789157] Call Trace:
[  247.789184]  [a01d7d5b] ? read_extent_buffer+0xbb/0x110 [btrfs]
[  247.789226]  [a01cf03d] btrfs_node_key+0x1d/0x20 [btrfs]
[  247.789265]  [a02054d0] __readahead_hook.isra.5+0x3c0/0x420 [btrfs]
[  247.789308]  [a020585f] btree_readahead_hook+0x1f/0x40 [btrfs]
[  247.789348]  [a01ab841] btree_readpage_end_io_hook+0x111/0x260 
[btrfs]
[  247.789391]  [a01d3442] ? find_first_extent_bit_state+0x22/0x80 
[btrfs]
[  247.789434]  [a01d408b] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
[  247.789475]  [a01aae61] ? end_workqueue_fn+0x31/0x50 [btrfs]
[  247.789506]  [81158988] bio_endio+0x18/0x30
[  247.789539]  [a01aae6c] end_workqueue_fn+0x3c/0x50 [btrfs]
[  247.789578]  [a01e1847] worker_loop+0x157/0x560 [btrfs]
[  247.789616]  

btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-02 Thread Sami Liedes
Hi,

I just got this oops on a computer running 3.4.2.

A few minutes before I had started btrfs device scrub / and had a
watcher process running btrfs scrub status / every 5 seconds. After
a few gigabytes of scrubbing, I got this crash.

The oops is transcribed from photos, so it may contain some errors. I
tried to be careful, and double checked the backtrace.

Sami


general protection fault:  [#1] SMP
CPU 4
Modules linked in: tcp_diag inet_diag nfnetlink_log nfnetlink ufs qnx4 hfsplus 
hfs minix ntfs vfat msdos fat jfs reiserfs ext3 jbd ext2 ip6_tables ebtable_nat 
ebtables cn rfcomm bnep
parport_pc ppdev lp parport tun cpufreq_userspace cpufreq_stats 
cpufreq_powersave cpufreq_conservative binfmt_misc fuse nfsd nfs nfs_acl 
auth_rpcgss fscache lockd sunrpc iptable_filter ipt_MASQUERADE
ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
ip_tables x_tables xfs ext4 jbd2 mbcache radeon drm_kms_helper ttm drm 
i2c_algo_bit loop kvm_intel kvm snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_usb_audio snd_usbmidi_lib 
snd_hwdep snd_pcm_oss snd_mixer_oss joydev snd_pcm acpi_cpufreq snd_page_alloc 
snd_seq_midi snd_seq_midi_event snd_rawmi
di ath3k snd_seq snd_seq_device snd_timer iTCO_wdt bluetooth eeepci_wmi 
asus_wmi sparse_keymap crc16 rfkill pcspkr psmouse coretemp serio_raw evdev 
mperf pci_hotplug i2c_i801 i2c_core processor button
intel_agp snd mxm_wmi video wmi intel_gtt microcode soundcore sha256_generic 
dm_crypt dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor xor 
async_tx raid6_pq md_mod nbd btrfs libcrc32c
zlib_deflate sd_mod crc_t10dif crc32c_intel ghash_cmulni_intel firewire_ohci 
r8196 firewire_core ahci aesni_intel libahci mii crc_itu_t aes_x86_64 libata 
aes_generic cryptd scsi_mod e1000e thermal fa
n thermal_sys [last unloaded: scsi_wait_scan]

Pid: 30863, comm: btrfs-endio-met Tainted: GW3.4.2 #1 System 
manufacturer System Product Name/P8P67 EVO
RIP: 0010:[811e83bd]  [811e83bd] memcpy+0xd/0x110
RSP: :88003174dba8  EFLAGS: 00010202
RAX: 88003174dc8f RBX: 0011 RCX: 0002
RDX: 0001 RSI: 00050803 RDI: 88003174dc8f
RBP: 88003174dbf0 R08: 000a R09: 
R10:  R11:  R12: 88003174dca0
R13: 8800659f42b0 R14: 0048 R15: 0011
FS:  () GS:88021ed0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0973c000 CR3: 000167ef3000 CR4: 000407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs-endio-met (pid: 30863, threadinfo 88003174c000, task 
88006f818000)
Stack:
 a026bd6b 8801960f5000 8003 1000
 88003174dc58 03dd 88000ac13c60 88003174dc58
 696f70203a61685f 88003174dc00 a026904d 88003174dcd0
Call Trace:
 [a026bd6b] ? read_extent_buffer+0xbb/0x110 [btrfs]
 [a026304d] btrfs_node_key+0x1d/0x20 [btrfs]
 [a02994e0] __readahead_hook.isra.5+0x3c0/0x420 [btrfs]
 [a029986f] btree_readahead_hook+0x1f/0x40 [btrfs]
 [a023f841] btree_readpage_end_io_hook+0x111/0x260 [btrfs]
 [a0267452] ? find_first_extent_bit_state+0x22/0x80 [btrfs]
 [a026809b] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
 [a023ee61] ? end_workqueue_fn+0x31/0x50 [btrfs]
 [81158958] bio_endio+0x18/0x30
 [a023ee6c] end_workqueue_fn+0x3c/0x50 [btrfs]
 [a0275857] worker_loop+0x157/0x560 [btrfs]
 [a0275700] ? btrfs_queue_worker+0x310/0x310 [btrfs]
 [81058e5e] kthread+0x8e/0xa0
 [81418fe4] kernel_thread_helper+0x4/0x10
 [81058dd0] ? flush_kthread_worker+0x70/0x70
 [81418fe0] ? gs_change+0x13/0x13
Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 
90 90 90 90...
8 4c 8b 56 10 4c
RIP  [811e83bd] memcpy+0xd/0x110
 RSP 88003174dba8


signature.asc
Description: Digital signature


Re: btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

2012-07-02 Thread Sami Liedes
On Tue, Jul 03, 2012 at 02:01:21AM +0300, Sami Liedes wrote:
 The oops is transcribed from photos, so it may contain some errors. I
 tried to be careful, and double checked the backtrace.

Forgot to menion, the fs is a two-partition filesystem, with both data
and metadata mirrored.


Label: none  uuid: f26f08b1-89e6-4f2d-b7f9-857010dd2517
Total devices 2 FS bytes used 778.31GB
devid2 size 1.07TB used 1.05TB path /dev/dm-6
devid1 size 1.07TB used 1.05TB path /dev/dm-5


Sami


signature.asc
Description: Digital signature