Re: btrfs stability

2016-05-26 Thread Roman Mamedov
On Fri, 27 May 2016 00:42:07 +0200
Diego Torres  wrote:

> Btrfs is the only fs that can add drives one by one to an existing raid
> setup, and use the new space inmediately, without replacing all the drives.

Ext4, XFS, JFS or pretty much any FS which can be resized upwards can also do
that, when placed on top of mdadm RAID5/6. It's not like you are absolutely
locked in to using Btrfs if you need that particular feature.

"Some of us" also prefer to use Btrfs on top of mdadm RAID, to benefit both
from Btrfs' advanced features such as snapshots, compression and checksum
verification (but not corruption resilience in this case), and from mdadm's
mature, well-tested and performant RAID implementations.

--
With respect,
Roman


pgp8CQYn2VAu3.pgp
Description: OpenPGP digital signature


Re: btrfs stability

2013-01-28 Thread Josef Bacik
On Sat, Jan 26, 2013 at 01:27:11PM -0700, Andrew McNabb wrote:
 Here's an update.  I tried the new kernel, and I seem to be having some
 new (possibly worse problems.  In my ssh session, I'm seeing many errors
 of this sort:
 
 Message from syslogd@guru at Jan 26 13:13:14 ...
  kernel:[  308.223834] BUG: soft lockup - CPU#0 stuck for 23s!
  [btrfs-endio-wri:2073]
 
 Message from syslogd@guru at Jan 26 13:13:14 ...
  kernel:[  308.248754] BUG: soft lockup - CPU#2 stuck for 23s!
  [btrfs-delalloc-:594]
 
 In the logs, I'm seeing several warnings and bugs, including:
 
 WARNING: at fs/btrfs/extent_map.c:78 free_extent_map+0x79/0x90 [btrfs]()
 WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
 BUG: unable to handle kernel NULL pointer dereference at (null)
 BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-endio-wri:1489]
 BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-delalloc-:607]
 
 Kernel logs (across a few reboots) are at:
 
 http://students.cs.byu.edu/~amcnabb/messages2
 

Hrm well I didn't expect that.  I will look into this and see what I can come up
with.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stability

2013-01-28 Thread Josef Bacik
On Sat, Jan 26, 2013 at 01:27:11PM -0700, Andrew McNabb wrote:
 Here's an update.  I tried the new kernel, and I seem to be having some
 new (possibly worse problems.  In my ssh session, I'm seeing many errors
 of this sort:
 
 Message from syslogd@guru at Jan 26 13:13:14 ...
  kernel:[  308.223834] BUG: soft lockup - CPU#0 stuck for 23s!
  [btrfs-endio-wri:2073]
 
 Message from syslogd@guru at Jan 26 13:13:14 ...
  kernel:[  308.248754] BUG: soft lockup - CPU#2 stuck for 23s!
  [btrfs-delalloc-:594]
 
 In the logs, I'm seeing several warnings and bugs, including:
 
 WARNING: at fs/btrfs/extent_map.c:78 free_extent_map+0x79/0x90 [btrfs]()
 WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
 BUG: unable to handle kernel NULL pointer dereference at (null)
 BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-endio-wri:1489]
 BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-delalloc-:607]
 
 Kernel logs (across a few reboots) are at:
 
 http://students.cs.byu.edu/~amcnabb/messages2
 

Ok I think I figured it out, can you give this a whirl?  Let me know when you
get testers fatigue ;)

http://koji.fedoraproject.org/koji/taskinfo?taskID=4908932

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stability

2013-01-26 Thread Andrew McNabb
Here's an update.  I tried the new kernel, and I seem to be having some
new (possibly worse problems.  In my ssh session, I'm seeing many errors
of this sort:

Message from syslogd@guru at Jan 26 13:13:14 ...
 kernel:[  308.223834] BUG: soft lockup - CPU#0 stuck for 23s!
 [btrfs-endio-wri:2073]

Message from syslogd@guru at Jan 26 13:13:14 ...
 kernel:[  308.248754] BUG: soft lockup - CPU#2 stuck for 23s!
 [btrfs-delalloc-:594]

In the logs, I'm seeing several warnings and bugs, including:

WARNING: at fs/btrfs/extent_map.c:78 free_extent_map+0x79/0x90 [btrfs]()
WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
BUG: unable to handle kernel NULL pointer dereference at (null)
BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-endio-wri:1489]
BUG: soft lockup - CPU#1 stuck for 22s! [btrfs-delalloc-:607]

Kernel logs (across a few reboots) are at:

http://students.cs.byu.edu/~amcnabb/messages2

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stability

2013-01-25 Thread Josef Bacik
On Fri, Jan 25, 2013 at 01:05:14PM -0700, Andrew McNabb wrote:
 I tried creating a multi-device btrfs filesystem for the first time (on
 Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems.  I
 had heard that btrfs is now reasonably stable, and though I expected to
 possibly see a problem here or there, I was a little surprised at just
 how many problems I encountered in such a short period of time.  I now
 have about a thousand error messages in my kernel logs related to
 several different problems.  Is this roughly the expected level of
 stability for btrfs with multiple devices, or am I just particularly
 lucky? :)
 
 Am I correct in assuming that I'll need to switch to md for a few months
 and try btrfs again later, or are there known problems in the specific
 kernel I'm running that I could avoid by trying a different version?
 
 For the sake of being specific, I'll detail a few of the problems I've
 hit:
 
 These two may have been caused by a possibly faulty disk (I'm still
 trying to determine whether it was faulty or whether the bug was purely
 in btrfs):
 
 https://bugzilla.redhat.com/show_bug.cgi?id=903794

This one is just a allocator warning because the relocator doesn't do the right
accounting for relocation.  It's just complainig, we need to fix it but it won't
keep it from working.

 https://bugzilla.redhat.com/show_bug.cgi?id=904143

This I'm almost certain (I have to check) was just a result of me making fsync
faster and forgetting to remove this warn on.  It's fixed upstream.  Again,
nothing to worry about, but annoying.

 
 This one was triggered when I tried to remove a possibly faulty disk:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=904197
 

Ok this is a bug, I can fix this.  Basically we tried to read from the faulty
disk, it failed, we read from the other copy, and then tried to write the good
copy back to the failed disk and when we saw that the IO wasn't actually going
to go to the bad disk we panic'ed.  Silly but easy enough to understand/fix.

 With a freshly created filesystem, I got a kernel bug, associated with a
 hang in most filesystem operations.  This occurred in the middle of
 ordinary operation and without any sort of hardware-related errors in
 the kernel logs.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=904223
 

So this is from the fsync stuff, and I'm sure I fixed this somewhere but I can't
account for where I did it.  Can you give btrfs-next a try and see if you can
still reproduce.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stability

2013-01-25 Thread Josef Bacik
On Fri, Jan 25, 2013 at 01:05:14PM -0700, Andrew McNabb wrote:
 I tried creating a multi-device btrfs filesystem for the first time (on
 Fedora 18 with 3.7.2-204.fc18.x86_64), and I ran into some problems.  I
 had heard that btrfs is now reasonably stable, and though I expected to
 possibly see a problem here or there, I was a little surprised at just
 how many problems I encountered in such a short period of time.  I now
 have about a thousand error messages in my kernel logs related to
 several different problems.  Is this roughly the expected level of
 stability for btrfs with multiple devices, or am I just particularly
 lucky? :)
 
 Am I correct in assuming that I'll need to switch to md for a few months
 and try btrfs again later, or are there known problems in the specific
 kernel I'm running that I could avoid by trying a different version?
 
 For the sake of being specific, I'll detail a few of the problems I've
 hit:
 
 These two may have been caused by a possibly faulty disk (I'm still
 trying to determine whether it was faulty or whether the bug was purely
 in btrfs):
 
 https://bugzilla.redhat.com/show_bug.cgi?id=903794
 https://bugzilla.redhat.com/show_bug.cgi?id=904143
 
 This one was triggered when I tried to remove a possibly faulty disk:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=904197

Actually for this one, how did you remove the disk?  Did you just yank it out
while the box was running?  Did you mount -o degraded and then delete the device
and then remove it?  How exactly did you get to this situation.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stability

2013-01-25 Thread Andrew McNabb
On Fri, Jan 25, 2013 at 03:37:17PM -0500, Josef Bacik wrote:
  https://bugzilla.redhat.com/show_bug.cgi?id=903794
 
 This one is just a allocator warning because the relocator doesn't do the 
 right
 accounting for relocation.  It's just complainig, we need to fix it but it 
 won't
 keep it from working.

I won't worry about this one, then.

  https://bugzilla.redhat.com/show_bug.cgi?id=904143
 
 This I'm almost certain (I have to check) was just a result of me making fsync
 faster and forgetting to remove this warn on.  It's fixed upstream.  Again,
 nothing to worry about, but annoying.

Sounds good.

  This one was triggered when I tried to remove a possibly faulty disk:
  
  https://bugzilla.redhat.com/show_bug.cgi?id=904197
  
 
 Ok this is a bug, I can fix this.  Basically we tried to read from the faulty
 disk, it failed, we read from the other copy, and then tried to write the good
 copy back to the failed disk and when we saw that the IO wasn't actually going
 to go to the bad disk we panic'ed.  Silly but easy enough to understand/fix.

I was a little surprised that this happened after I had already done a
btrfs dev delete--is there a way to tell btrfs that a disk really is
gone?

  With a freshly created filesystem, I got a kernel bug, associated with a
  hang in most filesystem operations.  This occurred in the middle of
  ordinary operation and without any sort of hardware-related errors in
  the kernel logs.
  
  https://bugzilla.redhat.com/show_bug.cgi?id=904223
  
 
 So this is from the fsync stuff, and I'm sure I fixed this somewhere but I 
 can't
 account for where I did it.

Would this also be the cause of the hangs that I'm seeing?  In the end,
a hang with the load rising to 260.10 is the most serious problem.  It's
happened a few times, and it gets temporarily fixed by a reboot, but
then tends to recur fairly soon.

 Can you give btrfs-next a try and see if you can
 still reproduce.  Thanks,

Is there a pre-built RPM for btrfs-next, or what's the best way to try
it out in Fedora without breaking other things?

Thanks for your quick response, and sorry for not responding sooner
(I've been interrupted by a few phone calls).

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stability

2013-01-25 Thread Andrew McNabb
On Fri, Jan 25, 2013 at 03:53:22PM -0500, Josef Bacik wrote:
 
 Actually for this one, how did you remove the disk?  Did you just yank it out
 while the box was running?  Did you mount -o degraded and then delete the 
 device
 and then remove it?  How exactly did you get to this situation.  Thanks,

I've moved my answer over to IRC to reduce the latency in the
conversation.  Thanks again for all the help.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html