On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev <mator...@gmail.com> wrote: > > Hello! > > Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it > shows the following : > > mator@nvg5120:~/btrfs-progs$ git log -1 --oneline > 40650bf Btrfs progs v4.6.1 > > root@nvg5120:/home/mator/xfstests# gdb > GNU gdb (Debian 7.11.1-2) 7.11.1 > (gdb) file /opt/btrfs/bin/mkfs.btrfs > Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done. > (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 > (gdb) run > Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5 > /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". > btrfs-progs v4.6.1 > See http://btrfs.wiki.kernel.org for more information. > > ERROR: superblock checksum mismatch > ERROR: superblock checksum mismatch > ERROR: superblock checksum mismatch > Performing full device TRIM (2.00GiB) ... > Performing full device TRIM (2.00GiB) ... > Performing full device TRIM (2.00GiB) ... > Performing full device TRIM (2.00GiB) ... > > Program received signal SIGBUS, Bus error. > 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at > volumes.c:2156 > 2156 *(unsigned long *)(p_eb->data + i) ^= > (gdb) bt > #0 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0, > eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) > at volumes.c:2156 > #1 0x0000000000119b30 in write_and_map_eb (trans=0x2cc250, > root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426 > #2 0x0000000000119e74 in write_tree_block (trans=0x2cc250, > root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459 > #3 0x000000000011a4ac in __commit_transaction (trans=0x2cc250, > root=0x2c7d80) at disk-io.c:562 > #4 0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250, > root=0x2c7d80) at disk-io.c:598 > #5 0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786 > (gdb) > > Can someone help please? Thanks. >
The code that faults: (unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i); Because struct extent_buffer has 'data' as a char[], this will always fault on sparc64 and probably a number of other RISC architectures. It increments the address by 1, then reads an 8-byte chunk, XORs, and writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads would fault, even if both `data` pointers were 8-byte aligned. This would probably fix it, though it looks ugly. unsigned long a, b; memcpy(&a, p_eb->data + i, sizeof(a)); /* Read 8 bytes from p_eb->data+i */ memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i */ a ^= b; /* XOR */ memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */ I'm not familiar with btrfs, but the results seems like they depend on the sizeof(unsigned long). Given that they used parentheses, I assume it was intentional. However, if this was supposed to do an XOR operation 8 bytes at a time, then it would need to be something like: *(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i); i.e. cast pointer to unsigned long*, then add i (which would index array of unsigned long, not char). --Patrick -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html