Re: Poll: time to switch skinny-metadata on by default?

2014-10-25 Thread Marc Joliet
Am Mon, 20 Oct 2014 18:34:03 +0200
schrieb David Sterba :

> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
> > I'd like to make it default with the 3.17 release of btrfs-progs.
> > Please let me know if you have objections.
> 
> For the record, 3.17 will not change the defaults. The timing of the
> poll was very bad to get enough feedback before the release. Let's keep
> it open for now.

Two points:

First of all: does grub2 support booting from a btrfs file system with
skinny-metadata, or is it irrelevant?

And secondly, I've gotten a BUG after trying to convert my external backup
partition to skinny-metadata (the same one from the bug report mentioned
previously in this thread, I believe). Below is a more detailed account.

First of all, my setup (as of *now*, not before the BUG):

  # btrfs filesystem show
  Label: none  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
Total devices 1 FS bytes used 41.42GiB
devid1 size 107.79GiB used 53.06GiB path /dev/sdf1
  
  Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
Total devices 4 FS bytes used 514.54GiB
devid1 size 298.09GiB used 259.03GiB path /dev/sda
devid2 size 298.09GiB used 259.03GiB path /dev/sdb
devid3 size 298.09GiB used 259.03GiB path /dev/sdc
devid4 size 298.09GiB used 259.03GiB path /dev/sdd
  
  Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
Total devices 1 FS bytes used 169.31GiB
devid1 size 976.56GiB used 175.06GiB path /dev/sdg2
  
  Btrfs v3.17

  # btrfs filesystem df /
  Data, single: total=48.00GiB, used=39.94GiB
  System, DUP: total=32.00MiB, used=12.00KiB
  Metadata, DUP: total=2.50GiB, used=1.48GiB
  GlobalReserve, single: total=508.00MiB, used=0.00B

  # btrfs filesystem df /home
  Data, RAID10: total=516.00GiB, used=513.38GiB
  System, RAID10: total=64.00MiB, used=96.00KiB
  Metadata, RAID10: total=2.00GiB, used=1.16GiB
  GlobalReserve, single: total=400.00MiB, used=0.00B

  # btrfs filesystem df /media/MARCEC_BACKUP
  Data, single: total=167.00GiB, used=166.53GiB
  System, DUP: total=32.00MiB, used=28.00KiB
  Metadata, DUP: total=4.00GiB, used=2.79GiB
  GlobalReserve, single: total=512.00MiB, used=1.33MiB

  # uname -a
  Linux marcec 3.16.6-gentoo #1 SMP PREEMPT Fri Oct 24 01:06:49 CEST 2014 
x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux

  # btrfs --version
  Btrfs v3.17

Now, what I was trying to do - motivated by this thread - was convert /home
and /media/MARCEC_BACKUP to skinny-metadata, using "btrfstune -x".  That in
itself worked fine, and the MARCEC_BACKUP has since seen filesystem activity
(running rsync, creating and deleting snapshots). *Then* I started a "btrfs
balance -m" on /home (which completed without errors) and then
on /media/MARCEC_BACKUP, which is when the BUG happened (dmesg output see
below).

The result in user-space was that "btrfs balance" SEGFAULTed.  "btrfs balance
status" showed the balance still running, so I tried to cancel it, which ended
up hanging (the btrfs program has yet to return back to the shell).  For some
reason I tried running "sync" (as root), which has also hung in the same way.

I can still access files on MARCEC_BACKUP just fine, and the snapshots are
still there ("btrfs subvolume list" succeeds).

Is there anything else I can do, or any other information you might need?

 dmesg output (starting with the start of the balance) 

  [ 4651.448883] BTRFS info (device sdb): relocating block group 1492765376512 
flags 66
  [ 4652.259501] BTRFS info (device sdb): found 2 extents
  [ 4652.987753] BTRFS info (device sdb): relocating block group 1491691634688 
flags 68
  [ 4688.655390] BTRFS info (device sdb): found 13744 extents
  [ 4689.382109] BTRFS info (device sdb): relocating block group 1485249183744 
flags 68
  [ 4753.879520] BTRFS info (device sdb): found 62519 extents
  [ 4791.123268] BTRFS info (device sdg2): relocating block group 2499670966272 
flags 36
  [ 4830.811665] BTRFS info (device sdg2): found 1793 extents
  [ 4831.240909] BTRFS info (device sdg2): relocating block group 2499134095360 
flags 36
  [ 5407.582370] BTRFS info (device sdg2): found 51182 extents
  [ 5407.959115] BTRFS info (device sdg2): relocating block group 2498597224448 
flags 36
  [ 5724.487824] BTRFS info (device sdg2): found 51435 extents
  [ 5725.006401] BTRFS info (device sdg2): relocating block group 2473867608064 
flags 34
  [ 5725.817513] BTRFS info (device sdg2): found 7 extents
  [ 5726.328413] BTRFS info (device sdg2): relocating block group 2469002215424 
flags 36
  [ 5844.148295] [ cut here ]
  [ 5844.148307] WARNING: CPU: 1 PID: 7270 at fs/btrfs/extent-tree.c:876 
btrfs_lookup_extent_info+0x48c/0x4c0()
  [ 5844.148308] Modules linked in: uas usb_storage joydev hid_logitech_dj 
bridge stp llc ipt_REJECT xt_tcpudp iptable_filter iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_n

Fw: Heavy nocow'd VM image fragmentation

2014-10-25 Thread Duncan


Begin forwarded message (forgot to send to list):

Date: Sat, 25 Oct 2014 03:57:41 -0700
From: Duncan <1i5t5.dun...@cox.net>
To: Marc MERLIN 
Subject: Re: Heavy nocow'd VM image fragmentation


On Fri, 24 Oct 2014 21:48:56 -0700
Marc MERLIN  wrote:

> On Oct 25, 2014 11:28 AM, "Duncan" <1i5t5.dun...@cox.net> wrote:
> 
> > Yes, but the OP said he hadn't snapshotted since creating the file,
> > and MM's a regular that actually wrote much of the wiki
> > documentation on raid56 modes, so he better know about the
> > snapshotting problem too.
> 
> Yes and no. I use btrfs send receive, so I have to use snapshots on
> the subvolume my VM file is on.

That kinds screws things, since you can delete the snapshots afterward,
but if anything changed while it was there it still forces a 1cow on
it.  As long as the send doesn't take "forever" the time in question
can be reasonably short, but if the VM must remain active over that
period, there's likely to still be /some/ effect.

The only thing you can do about that, I guess, is periodically defrag
them, but of course without snapshot-aware-defrag that breaks any
snapshot sharing, multiplying the space required.  And with
send/receive requiring a reference snapshot for incrementals, there's
the snapshot, thus space-doubling on anything actually defragged is
unfortunately a given. =:^(

> Can you reply to this post to show my reply to others since my
> Android post to the list will get rejected?

Dug out of the trash here too due to the HTML, but OK... 

-- 
Duncan - HTML messages treated as spam
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-25 Thread Marc Joliet
Am Sat, 25 Oct 2014 14:24:58 +0200
schrieb Marc Joliet :

> I can still access files on MARCEC_BACKUP just fine, and the snapshots are
> still there ("btrfs subvolume list" succeeds).

Just an update: that was true for a while, but at one point listing directories
and accessing the file system in general stopped working (all processes that
touched the FS hung/zombified). This necessitated a hard reboot, since "reboot"
and "halt" (so... "shutdown", really) didn't do anything other than spit out the
usual "the system is rebooting" message.

Interestingly enough, the file system was (apparently) fine after that (just as
Petr Janecek wrote), other than an invalid space cache file:

  [   65.477006] BTRFS info (device sdg2): The free space cache file
  (2466854731776) is invalid. skip it

That is, running my backup routine worked just as before, and I can access
files on the FS just fine.

Oh, and apparently the rebalance continued successfully?!

  [  342.540865] BTRFS info (device sdg2): continuing balance
  [  342.51] BTRFS info (device sdg2): relocating block group 2502355320832
  flags 34 [  342.821608] BTRFS info (device sdg2): found 4 extents
  [  343.056915] BTRFS info (device sdg2): relocating block group 2501818449920
  flags 36 [  437.932405] BTRFS info (device sdg2): found 25086 extents
  [  438.727197] BTRFS info (device sdg2): relocating block group 2501281579008
  flags 36 [  557.319354] BTRFS info (device sdg2): found 83875 extents

  # btrfs balance status /media/MARCEC_BACKUP
  No balance found on '/media/MARCEC_BACKUP'

No SEGFAULT anywhere. All I can say right now is "huh". Although I'll try
starting a "balance -m" again tomorrow, because the continued balance only
took about 3-4 minutes (maybe it .

HTH
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: PGP signature


Re: Poll: time to switch skinny-metadata on by default?

2014-10-25 Thread Chris Murphy

On Oct 25, 2014, at 6:24 AM, Marc Joliet  wrote:
> 
> First of all: does grub2 support booting from a btrfs file system with
> skinny-metadata, or is it irrelevant?

Seems plausible if older kernels don't understand skinny-metadata, that GRUB2 
won't either. So I just tested it with grub2-2.02-0.8.fc21 and it works. I'm 
surprised, actually.

The way I did this was creating a whole new fs with -Oskinny-metadata and using 
btrfs send receive to copy an existing system over. Kernel reports at boot time 
that the volume uses skinny extents.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Poll: time to switch skinny-metadata on by default?

2014-10-25 Thread Chris Murphy

On Oct 25, 2014, at 2:33 PM, Chris Murphy  wrote:

> 
> On Oct 25, 2014, at 6:24 AM, Marc Joliet  wrote:
>> 
>> First of all: does grub2 support booting from a btrfs file system with
>> skinny-metadata, or is it irrelevant?
> 
> Seems plausible if older kernels don't understand skinny-metadata, that GRUB2 
> won't either. So I just tested it with grub2-2.02-0.8.fc21 and it works. I'm 
> surprised, actually.

I don't understand the nature of the incompatibility with older kernels. Can 
they not mount a Btrfs volume even as ro? If so then I'd expect GRUB to have a 
problem, so I'm going to guess that maybe a 3.9 or older kernel could ro mount 
a Btrfs volume with skinny extents and the incompatibility is writing.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727

2014-10-25 Thread Rich Freeman
On Mon, Oct 13, 2014 at 11:12 AM, Rich Freeman
 wrote:
> On Thu, Oct 9, 2014 at 10:19 AM, Petr Janecek  wrote:
>>
>>   I have trouble finishing btrfs balance on five disk raid10 fs.
>> I added a disk to 4x3TB raid10 fs and run "btrfs balance start
>> /mnt/b3", which segfaulted after few hours, probably because of the BUG
>> below. "btrfs check" does not find any errors, both before the balance
>> and after reboot (the fs becomes un-umountable).
>>
>> [22744.238559] WARNING: CPU: 0 PID: 4211 at fs/btrfs/extent-tree.c:876 
>> btrfs_lookup_extent_info+0x292/0x30a [btrfs]()
>>
>> [22744.532378] kernel BUG at fs/btrfs/extent-tree.c:7727!
>
> I am running into something similar. I just added a 3TB drive to my
> raid1 btrfs and started a balance.  The balance segfaulted, and I find
> this in dmesg:

I got another one of these crashes during a balance today, and this is
on 3.17.1 with the "Btrfs: race free update of commit root for ro
snapshots" patch.  So, there is something else in 3.17.1 that causes
this problem.  I did see mention of an extent error of some kind on
the lists and I don't have that patch - I believe it is planned for
3.17.2.

After the crash the filesystem became read-only.

I didn't have any way to easily capture the logs, but I got repeated
crashes when trying to re-mount the filesystem after rebooting.  The
dmesg log showed read errors from one of the devices (bdev /dev/sdb2
errs: wr 0, rd 1361, flush 0, corrupt 0, gen 0).  When I tried to
btrfs check the filesystem with btrfs-progs 3.17 it abruptly
terminated and output an error mentioning could not find extent items
followed by root and a really large number.

I finally managed to recover by mounting the device with skip_balance
- I suspect that it was crashing due to attempts to restart the
failing balance.  Then after letting the filesystem settle down I
unmounted it cleanly and rebooted and everything was back to normal.

However, i'm still getting "bdev /dev/sdb2 errs: wr 0, rd 1361, flush
0, corrupt 0, gen 0" in my dmesg logs.  I have tried scrubbing the
device with no errors found.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727

2014-10-25 Thread Chris Samuel
On Sat, 25 Oct 2014 09:41:27 PM Rich Freeman wrote:

> So, there is something else in 3.17.1 that causes
> this problem.  I did see mention of an extent error of some kind on
> the lists and I don't have that patch - I believe it is planned for
> 3.17.2.

There are currently 13 patches for btrfs queued for 3.17.2:

queue-3.17/btrfs-add-missing-compression-property-remove-in-btrfs_ioctl_setflags.patch

   
queue-3.17/btrfs-cleanup-error-handling-in-build_backref_tree.patch 

 
queue-3.17/btrfs-don-t-do-async-reclaim-during-log-replay.patch 

 
queue-3.17/btrfs-don-t-go-readonly-on-existing-qgroup-items.patch   

 
queue-3.17/btrfs-fix-a-deadlock-in-btrfs_dev_replace_finishing.patch

 
queue-3.17/btrfs-fix-and-enhance-merge_extent_mapping-to-insert-best-fitted-extent-map.patch

 
queue-3.17/btrfs-fix-build_backref_tree-issue-with-multiple-shared-blocks.patch 

 
queue-3.17/btrfs-fix-race-in-wait_sync-ioctl.patch  

 
queue-3.17/btrfs-fix-the-wrong-condition-judgment-about-subset-extent-map.patch 

 
queue-3.17/btrfs-fix-up-bounds-checking-in-lseek.patch  

 
queue-3.17/btrfs-try-not-to-enospc-on-log-replay.patch  

 
queue-3.17/btrfs-wake-up-transaction-thread-from-sync_fs-ioctl.patch

 
queue-3.17/revert-btrfs-race-free-update-of-commit-root-for-ro-snapshots.patch

You can grab them here:

http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.17

Hope this helps!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727

2014-10-25 Thread Duncan
Rich Freeman posted on Sat, 25 Oct 2014 21:41:27 -0400 as excerpted:

> However, i'm still getting "bdev /dev/sdb2 errs: wr 0, rd 1361, flush 0,
> corrupt 0, gen 0" in my dmesg logs.  I have tried scrubbing the device
> with no errors found.

Note that error counts do /not/ reset at boot.  The counts are therefore 
since either the last mkfs, or the last time the error counts were reset 
manually, and if you know you've had errors (as you did here), all you 
need to do is take note of the count and ensure it's not increasing 
unexpectedly.

Meanwhile, btrfs device stats can be used to print the error counts on 
demand and its -z option resets them after that print, thus being the 
manual reset I mentioned above.

So chances are those read errors are the same ones you had previously.  
As long as the number isn't increasing, you're not registering any 
further errors.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html