Hi all,
Hoping you all can help, have a strange problem, think I know what's going on, but could use some verification. I set up a raid1 type btrfs filesystem on an Ubuntu 16.04 system, here's what it looks like:

btrfs fi show
Label: none  uuid: 597ee185-36ac-4b68-8961-d4adc13f95d4
    Total devices 10 FS bytes used 3.42TiB
    devid    1 size 1.82TiB used 1.18TiB path /dev/sdd
    devid    2 size 698.64GiB used 47.00GiB path /dev/sdk
    devid    3 size 931.51GiB used 280.03GiB path /dev/sdm
    devid    4 size 931.51GiB used 280.00GiB path /dev/sdl
    devid    5 size 1.82TiB used 1.17TiB path /dev/sdi
    devid    6 size 1.82TiB used 823.03GiB path /dev/sdj
    devid    7 size 698.64GiB used 47.00GiB path /dev/sdg
    devid    8 size 1.82TiB used 1.18TiB path /dev/sda
    devid    9 size 1.82TiB used 1.18TiB path /dev/sdb
    devid   10 size 1.36TiB used 745.03GiB path /dev/sdh

I added a couple disks, and then ran a balance operation, and that took about 3 days to finish. When it did finish, tried a scrub and got this message:

scrub status for 597ee185-36ac-4b68-8961-d4adc13f95d4
scrub started at Sun Jun 26 18:19:28 2016 and was aborted after 01:16:35
    total bytes scrubbed: 926.45GiB with 18849935 errors
    error details: read=18849935
corrected errors: 5860, uncorrectable errors: 18844075, unverified errors: 0

So that seems bad. Took a look at the devices and a few of them have errors:
...
[/dev/sdi].generation_errs 0
[/dev/sdj].write_io_errs   289436740
[/dev/sdj].read_io_errs    289492820
[/dev/sdj].flush_io_errs   12411
[/dev/sdj].corruption_errs 0
[/dev/sdj].generation_errs 0
[/dev/sdg].write_io_errs   0
...
[/dev/sda].generation_errs 0
[/dev/sdb].write_io_errs   3490143
[/dev/sdb].read_io_errs    111
[/dev/sdb].flush_io_errs   268
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdh].write_io_errs   5839
[/dev/sdh].read_io_errs    2188
[/dev/sdh].flush_io_errs   11
[/dev/sdh].corruption_errs 1
[/dev/sdh].generation_errs 16373

So I checked the smart data for those disks, they seem perfect, no reallocated sectors, no problems. But one thing I did notice is that they are all WD Green drives. So I'm guessing that if they power down and get reassigned to a new /dev/sd* letter, that could lead to data corruption. I used idle3ctl to turn off the shut down mode on all the green drives in the system, but I'm having trouble getting the filesystem working without the errors. I tried a 'check --repair' command on it, and it seems to find a lot of verification errors, but it doesn't look like things are getting fixed. But I have all the data on it backed up on another system, so I can recreate this if I need to. But here's what I want to know:

1. Am I correct about the issues with the WD Green drives, if they change mounts during disk operations, will that corrupt data?
2.  If that is the case:
a.) Is there any way I can stop the /dev/sd* mount points from changing? Or can I set up the filesystem using UUIDs or something more solid? I googled about it, but found conflicting info b.) Or, is there something else changing my drive devices? I have most of drives on an LSI SAS 9201-16i card, is there something I need to do to make them fixed? c.) Or, is there a script or something I can use to figure out if the disks will change mounts? d.) Or, if I wipe everything and rebuild, will the disks with the idle3ctl fix work now?

Regardless of whether or not it's a WD Green drive issue, should I just wipefs all the disks and rebuild it? Is there any way to recover this? Thanks for any help!


    ------- Corey
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to