Re: [sparc64] mkfs.btrfs bus error / align issue?

Patrick Baggett Wed, 27 Jul 2016 13:42:06 -0700

On Wed, Jul 27, 2016 at 3:39 PM, Patrick Baggett
<baggett.patr...@gmail.com> wrote:
> On Wed, Jul 27, 2016 at 8:59 AM, Anatoly Pugachev <mator...@gmail.com> wrote:
>>
>> Hello!
>>
>> Running xfstests suite, got in logs mkfs.btrfs bus error, debugging it
>> shows the following :
>>
>> mator@nvg5120:~/btrfs-progs$ git log -1 --oneline
>> 40650bf Btrfs progs v4.6.1
>>
>> root@nvg5120:/home/mator/xfstests# gdb
>> GNU gdb (Debian 7.11.1-2) 7.11.1
>> (gdb) file /opt/btrfs/bin/mkfs.btrfs
>> Reading symbols from /opt/btrfs/bin/mkfs.btrfs...done.
>> (gdb) set args -f -draid5 -mraid5 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
>> (gdb) run
>> Starting program: /opt/btrfs/bin/mkfs.btrfs -f -draid5 -mraid5
>> /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
>> btrfs-progs v4.6.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: superblock checksum mismatch
>> ERROR: superblock checksum mismatch
>> ERROR: superblock checksum mismatch
>> Performing full device TRIM (2.00GiB) ...
>> Performing full device TRIM (2.00GiB) ...
>> Performing full device TRIM (2.00GiB) ...
>> Performing full device TRIM (2.00GiB) ...
>>
>> Program received signal SIGBUS, Bus error.
>> 0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
>> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570) at
>> volumes.c:2156
>> 2156                                    *(unsigned long *)(p_eb->data + i) ^=
>> (gdb) bt
>> #0  0x000000000015e160 in write_raid56_with_parity (info=0x2b17b0,
>> eb=0x2c7fe0, multi=0x2c2870, stripe_len=65536, raid_map=0x2c2570)
>>     at volumes.c:2156
>> #1  0x0000000000119b30 in write_and_map_eb (trans=0x2cc250,
>> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:426
>> #2  0x0000000000119e74 in write_tree_block (trans=0x2cc250,
>> root=0x2c7d80, eb=0x2c7fe0) at disk-io.c:459
>> #3  0x000000000011a4ac in __commit_transaction (trans=0x2cc250,
>> root=0x2c7d80) at disk-io.c:562
>> #4  0x000000000011a7b8 in btrfs_commit_transaction (trans=0x2cc250,
>> root=0x2c7d80) at disk-io.c:598
>> #5  0x00000000001a2b04 in main (argc=8, argv=0x7fefffff698) at mkfs.c:1786
>> (gdb)
>>
>> Can someone help please? Thanks.
>>
>
> The code that faults:
>
> (unsigned long *)(p_eb->data + i) ^= *(unsigned long *)(ebs[j]->data + i);
>
> Because struct extent_buffer has 'data' as a char[], this will always
> fault on sparc64 and probably a number of other RISC architectures. It
> increments the address by 1, then reads an 8-byte chunk, XORs, and
> writes the 8-byte chunk, repeat. In other words, 7 out of 8 reads
> would fault, even if both `data` pointers were 8-byte aligned.


Actually, I misread that: the code increments i by sizeof(unsigned
long), not by 1 so only if either data point is misaligned, then this
will fault.

Disregard that.

>
> This would probably fix it, though it looks ugly.
> unsigned long a, b;
> memcpy(&a, p_eb->data + i, sizeof(a));  /* Read 8 bytes from p_eb->data+i */
> memcpy(&b, ebs[j]->data + i, sizeof(b)); /* Read 8 bytes from ebs[j]->data+i 
> */
> a ^= b; /* XOR */
> memcpy(p_eb->data + i, &a, sizeof(a)); /* Write back to p_eb->data+i */
>
> I'm not familiar with btrfs, but the results seems like they depend on
> the sizeof(unsigned long). Given that they used parentheses, I assume
> it was intentional. However, if this was supposed to do an XOR
> operation 8 bytes at a time, then it would need to be something like:
>
> *(((unsigned long *)p_eb->data)+i) ^= *(((unsigned long *)ebs[j]->data) + i);
>
> i.e. cast pointer to unsigned long*, then add i (which would index
> array of unsigned long, not char).
>
> --Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [sparc64] mkfs.btrfs bus error / align issue?

Reply via email to