Re: BTRFS balance segfault, where to go from here

E V Tue, 28 Oct 2014 06:13:55 -0700

I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14
until something newer stabilizes, haven't had any issues with it. You
might want to try the latest 3.14, though I think there should be a
new one pretty soon with quite a few btrfs patches.


On Tue, Oct 28, 2014 at 7:33 AM, Stephan Alz <stephan...@gmx.com> wrote:
> Hello Folks,
>
> Thanks for the help what I got so far. I did what you have recommended and 
> upgraded the kernel to 3.16.
>
> After reboot it automatically resumed the balancing operation. For about 2 
> hours it went well:
>
> Label: 'backup' ...
>     Total devices 5 FS bytes used 5.81TiB
>     devid    1 size 3.64TiB used 2.77TiB path /dev/sdc
>     devid    2 size 3.64TiB used 2.77TiB path /dev/sdb
>     devid    3 size 3.64TiB used 2.77TiB path /dev/sda
>     devid    4 size 3.64TiB used 2.76TiB path /dev/sdd
>     devid    5 size 3.64TiB used 572.00GiB path /dev/sdf < interestingly the 
> used is now lower than it was
>
> After that all the sudden I just lost the machine. As I thought it crashed 
> with kernel panic but this wasn't like with the 3.13, it killed the whole 
> system. Not even the magic keys worked.
>
> http://i59.tinypic.com/5we5ib.jpg
>
> Then when I tried to reboot it with 3.16 the system always segfaulted at boot 
> time when it tried to mount the btrfs filesystem.
>
> With 3.13 it at least didn't crash the entire system so I booted back to that 
> and managed to stop the balancing:
>
>>btrfs filesystem balance status /mnt/backup
>
> Balance on '/mnt/backup' is paused
> 1 out of about 10 chunks balanced (1 considered),  90% left
>
> Now my filesystem is fortunately back to RW again. Backups can continue 
> tonight.
> And about the "data not being important to backed up", hell yes it is so 
> yesterday I did a "backup of the backups" to a good old XFS filesystem 
> (something which is reliable). The problem is that our whole backup system 
> was designed to use BTRFS. It rsync from a lot of servers to the backup 
> server every night then creates snapshots. Changing this and going back to 
> other filesystem would require a lot of time and effort, possibly rewriting 
> all of our backup scripts.
>
> What else can I do?
> Should I try an even later 3.18 kernel version?
> Can this happen because it doesn't have enough space for real?
>
>
> The counter now says that:
>  btrfs    19534313824 12468488824 3753187048  77%
>
> The whole point I added the new drive is because it was running out of space.
> Somebody could really explain how this balancing works with RAID10 mode. What 
> I want to know that if ANY of the drives are fail do we lose data or not? And 
> the fact that the balancing is paused now changes this or not? If any of the 
> drives out of the 5 would completely fail right now, would I lose all the 
> data? I definitely don't want to leave the system in an inconsistent state 
> like this. At least the backups are only done at nights so if I can get the 
> backup drive mounted to RW by the end of the day that's enough.
>
> Thanks
>
> At the end I attached some recent 3.13 crash logs (maybe it's any help).
>
>
> [Tue Oct 28 12:01:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Tue Oct 28 12:01:35 2014] btrfs           D ffff88007fc14280     0  3820   
> 3202 0x00000000
> [Tue Oct 28 12:01:35 2014]  ffff88003735e800 0000000000000086 
> 0000000000000000 ffffffff81813480
> [Tue Oct 28 12:01:35 2014]  0000000000014280 ffff880048feffd8 
> 0000000000014280 ffff88003735e800
> [Tue Oct 28 12:01:35 2014]  0000000000000246 ffff880036c8a000 
> ffff880036c8b260 ffff880036c8b2a0
> [Tue Oct 28 12:01:35 2014] Call Trace:
> [Tue Oct 28 12:01:35 2014]  [<ffffffffa02c486d>] ? 
> btrfs_pause_balance+0x7d/0xf0 [btrfs]
> [Tue Oct 28 12:01:35 2014]  [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10
> [Tue Oct 28 12:01:35 2014]  [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 
> [btrfs]
> [Tue Oct 28 12:01:35 2014]  [<ffffffff81199ea1>] ? path_openat+0xd1/0x630
> [Tue Oct 28 12:01:35 2014]  [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540
> [Tue Oct 28 12:01:35 2014]  [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff81154a88>] ? do_brk+0x198/0x2f0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0
> [Tue Oct 28 12:01:35 2014]  [<ffffffff814deef9>] ? 
> system_call_fastpath+0x16/0x1b
> [Tue Oct 28 12:03:35 2014] INFO: task btrfs:3820 blocked for more than 120 
> seconds.
> [Tue Oct 28 12:03:35 2014]       Not tainted 3.13-0.bpo.1-amd64 #1
> [Tue Oct 28 12:03:35 2014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Tue Oct 28 12:03:35 2014] btrfs           D ffff88007fc14280     0  3820   
> 3202 0x00000000
> [Tue Oct 28 12:03:35 2014]  ffff88003735e800 0000000000000086 
> 0000000000000000 ffffffff81813480
> [Tue Oct 28 12:03:35 2014]  0000000000014280 ffff880048feffd8 
> 0000000000014280 ffff88003735e800
> [Tue Oct 28 12:03:35 2014]  0000000000000246 ffff880036c8a000 
> ffff880036c8b260 ffff880036c8b2a0
> [Tue Oct 28 12:03:35 2014] Call Trace:
> [Tue Oct 28 12:03:35 2014]  [<ffffffffa02c486d>] ? 
> btrfs_pause_balance+0x7d/0xf0 [btrfs]
> [Tue Oct 28 12:03:35 2014]  [<ffffffff8109e400>] ? __wake_up_sync+0x10/0x10
> [Tue Oct 28 12:03:35 2014]  [<ffffffffa02d1692>] ? btrfs_ioctl+0x1652/0x1f00 
> [btrfs]
> [Tue Oct 28 12:03:35 2014]  [<ffffffff81199ea1>] ? path_openat+0xd1/0x630
> [Tue Oct 28 12:03:35 2014]  [<ffffffff811956ac>] ? getname_flags+0xbc/0x1a0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff814dad78>] ? __do_page_fault+0x298/0x540
> [Tue Oct 28 12:03:35 2014]  [<ffffffff8119c4c1>] ? do_vfs_ioctl+0x81/0x4d0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff81154a88>] ? do_brk+0x198/0x2f0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff8119c9b0>] ? SyS_ioctl+0xa0/0xc0
> [Tue Oct 28 12:03:35 2014]  [<ffffffff814deef9>] ? 
> system_call_fastpath+0x16/0x1b
> [Tue Oct 28 12:03:48 2014] btrfs: found 16561 extents
>
> Sent: Tuesday, October 28, 2014 at 1:07 AM
> From: Duncan <1i5t5.dun...@cox.net>
> To: linux-btrfs@vger.kernel.org
> Subject: Re: BTRFS balance segfault, where to go from here
> Chris Murphy posted on Mon, 27 Oct 2014 10:51:16 -0600 as excerpted:
>
>> On Oct 27, 2014, at 3:26 AM, Stephan Alz <stephan...@gmx.com> wrote:
>>>
>>> My question is where to go from here? What I going to do right now is
>>> to copy the most important data to another separated XFS drive.
>>> What I planning to do is:
>>>
>>> 1, Upgrade the kernel 2, Upgrade BTRFS 3, Continue the balancing.
>>
>> Definitely upgrade the kernel and see how that goes, there's been many
>> many changes since 3.13. I would upgrade the user space tools also but
>> that's not as important.
>
> Just emphasizing...
>
> Because btrfs is still under heavy development and not yet fully stable,
> keeping particularly the kernel updated is vital, because running an old
> kernel often means running a kernel with known btrfs bugs, fixed in newer
> kernels.
>
> The userspace isn't quite as important since under normal operation it
> mostly simply tells the kernel what operations to perform, and an older
> userspace simply means you might be missing newer features. However,
> commands such as btrfs check (the old btrfsck) and btrfs restore work
> from userspace, so having a current btrfs-progs is important when you run
> into trouble and you're trying to fix things.
>
> That said, a couple of recent kernels has known issues. Don't use the
> 3.15 series at all, and be sure you're on 3.16.3 or newer for the 3.16
> series. 3.17 introduced another bug, with the fix hopefully in 3.17.2
> (it didn't make 3.17.1) and in 3.18-rcs.
>
> So 3.16.3 or later for stable kernel, or the latest 3.18-rc or live-git
> kernel, is what I'd recommend. The other alternative if you're really
> conservative is the latest long-term stable series kernel, 3.14.x, as it
> gets critical bugfixes as well, tho it won't be quite as current as
> 3.16.x or 3.18-rc. But anything older than the latest 3.14.x stable
> series is old and outdated in btrfs terms, and is thus not recommended.
> And 3.15, 3.16 before 3.16.3, and 3.17 before 3.17.2 (hopefully), are
> blackout versions due to known btrfs bugs. Avoid them.
>
> Of course with btrfs still not fully stable, the usual sysadmin rule of
> thumb that if you don't have a tested backup you don't have a backup, and
> if you don't have a backup, by definition you don't care if you lose the
> data, applies more than ever. If you're on not-yet-fully-stable btrfs
> and you don't have backups, by definition you don't care if you lose that
> data. There's people having to learn that the hard way, tho btrfs
> restore can often recover at least some of what would otherwise be lost.
>
>> FYI you can mount with skip_balance mount option to inhibit resuming
>> balance, sometimes pausing the balance isn't fast enough when there are
>> balance problems.
>
> =:^)
>
>>> Could someone please also explain that how is exactly the raid10 setup
>>> works with ODD number of drives with btrfs?
>>> Raid10 should be a stripe of mirrors. Now then this sdf drive is
>>> mirrored or striped or what?
>>
>> I have no idea honestly. Btrfs is very tolerant of adding odd number and
>> sizes of devices, but things get a bit nutty in actual operation
>> sometimes.
>
> In btrfs, raid1, including the raid1 side of raid10, is defined as
> exactly two copies of the data, one on each of two different devices.
> These copies are allocated by chunk size, 1 GiB size for data, quarter
> GiB size for metadata, and chunks are normally allocated on the device
> with the most unallocated space available, provided the other constraints
> (such as don't but both copies on the same device) are met.
>
> Btrfs raid0 stripes will be as wide as possible, but again are allocated
> a chunk at a time, in sub-chunk-size strips.
>
> While I've not run btrfs raid10 personally and thus (as a sysadmin not a
> dev) can't say for sure, what this implies to me is that, assuming equal
> sized devices, an odd number of devices in raid10 will alternate skipping
> one device at each chunk allocation.
>
> So with a five same-size device btrfs raid10, if I'm not mistaken, btrfs
> will allocate chunks from four at once, two mirrors, two stripes, with
> the fifth one unused for that chunk allocation. However, at the next
> chunk allocation, the device skipped in the previous allocation will now
> have the most free space and will thus get the first allocation, with the
> one of the other four devices skipped in that allocation round. After
> five allocation rounds (assuming all allocation rounds were 1 GiB data
> chunks, not quarter-GiB metadata), usage should thus be balanced across
> all five devices.
>
> Of course with six same-size devices, because btrfs raid1 does exactly
> two copies, no more, each stripe will be three devices wide.
>
>
> As for the dataloss question, unlike say raid56 mode which is known to be
> effectively little more than expensive raid0 at this point, raid10 should
> be as reliable as raid1, etc. But I'd refer again to that sysadmin's
> rule of thumb above. If you don't have tested backups, you don't have
> backups, and if you don't have backups, the data is by definition not
> valuable enough to be worth the hassle of backing it up; the calculated
> risk cost of data loss is lower than the given time required to make,
> test and keep current the backups. After that, it's your decision
> whether you value that data more than the time required to make and
> maintain those backups, or not, given the risk factor including the fact
> that btrfs is still under heavy development and is not yet fully stable.
>
> --
> Duncan - List replies preferred. No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master." Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS balance segfault, where to go from here

Reply via email to