Re: Reliability questions / "me too" for bugs

Josef Bacik Tue, 15 Nov 2011 06:17:49 -0800

On Mon, Nov 14, 2011 at 05:12:13PM -0500, Jérôme Carretero wrote:
> I have a couple of questions concerning btrfs reliability.
> 
> I'm currently using btrfs in my internal drives (strong advantages) and have 
> used it on external drives, but I've recently migrated the external ones to 
> ext4, for reliability reasons.
> The kernel seems to be able to handle ext4 partition disconnections (drive 
> error, cable gets eaten by rodent, or most commonly, unplugged too early on 
> removable drives...) quite gracefully.
> This is not yet the case for btrfs partitions (deadlocks, various oopses, 
> need to reboot).
> Any idea when this will be available ?
>


At some point in the future.  We know it's a problem, but as you can see from
your nice list at the end, we have lot's of problems working properly when the
drives are plugged in and behaving fine ;).

> How to handle bad blocks (sometimes, they are very localized on HDDs, and 
> they will happen on old SSDs) ?
> 
> Imagine the following use case:
>  - get untrusted drive from dumpster
>  - check that it runs, and has an acceptable amount of bad block clusters
>  - add the drive to a btrfs pool, which guarantees that its data will be 
> duplicated somewhere else
>  - enjoy the drive while it lasts
>  - ability to retrieve bad blocks map later on
>  - ability to cleanly remove the drive from the pool if it becomes useless 
> (found a better one) or dies (see first question)
>    when that happens, data gets replicated to other locations...
>    data replication could be done automatically by background scrubbing with 
> some mount flag or ioctl
> How far are we from that ? Will we get there some day ?
> 
> 
> Since I'm here, a few random and useless notes, as I'm currently testing 
> v3.2-rc1-284-g52e4c2a and I see a few bugs, deadlocks and weirdnesses.
> I don't know if it's normal for -rc1, maybe.
> My current workload is "rsync 1.5TB from SATA to USB2+3 (500+1000GB in raid0) 
> and vice versa".
> The load average can grow to 15.
> I've ran into BUG at fs/btrfs/inode.c:1795 
> (http://comments.gmane.org/gmane.comp.file-systems.btrfs/14128).
> I've ran into WARNING: at fs/btrfs/free-space-cache.c:1847 
> btrfs_remove_free_space+0x1a3/0x287() [1]
> I've also ran into INFO: task btrfs-transacti:1465 blocked for more than 120 
> seconds. [2]
> Sometimes linux is writing I don't know what for a looooooong time on drives, 
> and there's nothing in cache.
> Sometimes rsync stops, doing nothing. It will somehow restart after I do a 
> "echo 3 > /proc/sys/vm/drop_caches"...
> I see that a lot of features will be added for 3.2 but I hope they will be 
> well tested !

So the inode one was fixed recently, Chris sent the patch to Linus this weekend
so upgrade to that and you should be good.  The free_space_cache one is a new
one on me, how often do you hit it?  And as for the transaction thing, if it
happens again can you do sysrq+w?  Sometimes the guy hanging everybody up
doesn't get printed out so we only get part of the picture, sysrq+w will give us
all waiters so we can figure out whats going on.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Reliability questions / "me too" for bugs

Reply via email to