Re: csum failed during rebalance

John Haller Wed, 19 Jun 2013 16:56:30 -0700

On Sun, Jun 2, 2013 at 9:05 PM, John Haller <john.h.haller@...> wrote:
> Hi,
>
> I added a new drive to an existing RAID 0 array. Every
> attempt to rebalance the array fails:
> # btrfs filesystem balance /share/bd8
> ERROR: error during balancing '/share/bd8' - Input/output error
> # dmesg | tail
> btrfs: found 1 extents
> btrfs: relocating block group 10752513540096 flags 1
> btrfs: found 5 extents
> btrfs: found 5 extents
> btrfs: relocating block group 10751439798272 flags 1
> btrfs: found 1 extents
> btrfs: found 1 extents
> btrfs: relocating block group 10048138903552 flags 1
> btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
> btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028
>
> An earlier rebalance attempt had the same csum error on a different inode:
> btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
> btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028
>
> Every rebalance attempt fails the same way, but with a different inum.
>
> Here is the array:
> # btrfs filesystem show
> Label: 'bd8'  uuid: b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
>         Total devices 4 FS bytes used 7.37TB
>         devid    4 size 3.64TB used 52.00GB path /dev/sde
>         devid    1 size 3.64TB used 3.32TB path /dev/sdf1
>         devid    3 size 3.64TB used 2.92TB path /dev/sdc
>         devid    2 size 3.64TB used 2.97TB path /dev/sdb
>
> While I didn't finish the scrub, no errors were found:
> # btrfs scrub status -d /share/bd8
> scrub status for b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
> scrub device /dev/sdf1 (id 1) status
>         scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360 seconds
>         total bytes scrubbed: 845.53GB with 0 errors
> scrub device /dev/sdb (id 2) status
>         scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360 seconds
>         total bytes scrubbed: 869.38GB with 0 errors
> scrub device /dev/sdc (id 3) status
>         scrub resumed at Sun Jun  2 20:29:06 2013, running for 10360 seconds
>         total bytes scrubbed: 706.04GB with 0 errors
> scrub device /dev/sde (id 4) history
>         scrub started at Sun Jun  2 12:48:36 2013 and finished after 0 seconds
>         total bytes scrubbed: 0.00 with 0 errors
>
> Mount options:
> /dev/sdf1 on /share/bd8 type btrfs (rw,flushoncommit)
>
> Kernel 3.9.4
>
> John


After cleaning up the scrub, the balance succeeded. The failure
messages from dmesg from the balance were not helpful in finding bad
sectors, only the scrub dmesg pointed to the right files with errors.

Now, the question is why did the balance leave things so unbalanced as
compared with above:
# btrfs scrub status /share/bd8
scrub status for b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
        scrub started at Mon Jun 17 23:07:01 2013 and finished after
39209 seconds
        total bytes scrubbed: 7.49TB with 0 errors

# btrfs filesystem show
Label: 'bd8'  uuid: b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4
        Total devices 4 FS bytes used 7.49TB
        devid    4 size 3.64TB used 1.99TB path /dev/sdf
        devid    1 size 3.64TB used 3.32TB path /dev/sdg1
        devid    3 size 3.64TB used 1.99TB path /dev/sdc
        devid    2 size 3.64TB used 1.97TB path /dev/sdb

Btrfs v0.20-rc1

It appears that devid 1 was never balanced. Note that the drive
numbers are different because I still have the backup device connected
which had the originals of corrupted files. The filesystem started
with devid 1, was filled to the above capacity, and the other drives
were added later, so it didn't start as a RAID 0 system. The metadata
is RAID 1.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: csum failed during rebalance

Reply via email to