Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Chris Murphy
On Tue, Aug 11, 2015 at 2:32 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 If I lose the array, I won't cry.  The backup appears to be complete.
 But it would be convenient to avoid having to restore from scratch,
 and I'm hoping this might help you guys too in some way.  I really
 like btrfs, and I would like provide you with whatever info might
 contribute something.

Well it seems fine if it mounts rw,degraded. Just do a 'btrfs replace
start...' with a new drive. Or if you're going to try the old drive
that's failing, good luck with that. You might want to at least zero
out all the superblocks. Check the wiki for their location, wipefs
only removes the signature of the 1st superblock. I don't know if
that's enough for the purposes of a btrfs replace start (probably is
but I haven't tested it).

But then you need to fix this nodatacow thing by not using it as a
mount option, and setting it as a subvolume or directory option with
chattr +C. That way everything else is checksummed. Then you will use
btrfs check --init-csum-tree to compute checksums for everything that
right now have none due to nodatacow.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Chris Murphy
On Tue, Aug 11, 2015 at 3:00 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 4:48 PM, Chris Murphy li...@colorremedies.com wrote:


 The compress is ignored, and it looks like nodatasum and nodatacow
 apply to everything. The nodatasum means no raid1 self-healing is
 possible for any data on the entire volume. Metadata checksumming is
 still enabled.

 Ugh.  So I need to change my fstab file.  I swear, some expert on IRC
 told me that this should work fine, which is why I did it.  In fact, I
 think they recommended it on the basis that I wanted to put VM images
 on one of the subvolumes.  This discussion occurred a long time ago,
 well before RAID5 was even partially implemented.

 There is still data redundancy.  Will a scrub at least notice that the
 copies differ?

No, that's what I mean by nodatasum means no raid1 self-healing is
possible. You have data redundancy, but without checksums btrfs has
no way to know if they differ. It doesn't do two reads and compares
them, it's just like md raid, it picks one device, and so long as
there's no read error from the device, that copy of the data is
assumed to be good.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 5:24 PM, Chris Murphy li...@colorremedies.com wrote:

 There is still data redundancy.  Will a scrub at least notice that the
 copies differ?

 No, that's what I mean by nodatasum means no raid1 self-healing is
 possible. You have data redundancy, but without checksums btrfs has
 no way to know if they differ. It doesn't do two reads and compares
 them, it's just like md raid, it picks one device, and so long as
 there's no read error from the device, that copy of the data is
 assumed to be good.

Ok, that makes sense.  I'm guessing it wouldn't be worth it to add a
feature like this because (a) few people use nodatacow or end up in my
situation, and (b) if they did, and the two copies were inconsistent,
what would you do?  I suppose for me, it would be nice to know which
files were affected.


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 4:48 PM, Chris Murphy li...@colorremedies.com wrote:


 The compress is ignored, and it looks like nodatasum and nodatacow
 apply to everything. The nodatasum means no raid1 self-healing is
 possible for any data on the entire volume. Metadata checksumming is
 still enabled.

Ugh.  So I need to change my fstab file.  I swear, some expert on IRC
told me that this should work fine, which is why I did it.  In fact, I
think they recommended it on the basis that I wanted to put VM images
on one of the subvolumes.  This discussion occurred a long time ago,
well before RAID5 was even partially implemented.

There is still data redundancy.  Will a scrub at least notice that the
copies differ?


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Missing dedupe/locking patch in integration-4.2 tree?

2015-08-11 Thread Mark Fasheh
On Tue, Aug 11, 2015 at 09:42:10PM +0200, Holger Hoffstätte wrote:
 I saw this morning that it went into integration-4.3:
 
 https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=integration-4.3id=293a8489f300536dc6d996c35a6ebb89aa03bab2
 
 So probably just an oversight.

Ok thanks for pointing that out Holger!
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-progs: btrfs balance returns enospc error on a system with 80% free space

2015-08-11 Thread Catalin
I have a recently installed an Arch Linux x86_64 system on a 50GB
btrfs partition and every time I try btrfs balance start it gives me
an enospc error even though I have less than 20% of the available
space full.

I have tried the recommended method (from
https://btrfs.wiki.kernel.org/index.php/Balance_Filters) and with
-dusage I can go up to -dusage=100 with no problems but with -musage
it works until 34 and then at musage=35 it fails with the enospc
error.

I tested if that free space is real by mounting the system without
compression and filling the free space with a file written by dd from
/dev/zero and the maximum size is exactly the size of the free space
that is reported.
I have tried deleting all my snapshots and deleting things until I was
only using 6 GB out of 50 GB  and the exact same errors.
I have tried adding more files until I reached 11 GB to see if get
write errors when I add lots of small files and no problems and also
still the same error (like I said above I have also filled all the
free space with one single large file).

Here is more detailed information about my setup and output of several commands:

uname -a
Linux ArchLinux 4.1.4-1-ARCH #1 SMP PREEMPT Mon Aug 3 21:30:37 UTC
2015 x86_64 GNU/Linux

btrfs --version
btrfs-progs v4.1.2

btrfs fi show
Label: 'ArchLinux'  uuid: 6816726f-71ed-4b64-9071-60684a445e71
Total devices 1 FS bytes used 9.86GiB
devid1 size 50.00GiB used 12.31GiB path /dev/sda2

btrfs fi df /
Data, single: total=10.00GiB, used=9.52GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.12GiB, used=354.31MiB
GlobalReserve, single: total=128.00MiB, used=0.00B

lsblk -o NAME,SIZE,FSTYPE,UUID,PARTLABEL
NAMESIZE FSTYPE UUID PARTLABEL
sda  50G
   sda12M BIOS boot partition
   sda2   50G btrfs  6816726f-71ed-4b64-9071-60684a445e71 Linux x86-64 root (/)
sr01024M

btrfs filesystem usage /

Overall:
Device size:  50.00GiB
Device allocated:  12.31GiB
Device unallocated:  37.68GiB
Device missing: 0.00B
Used:  10.21GiB
Free (estimated):  38.17GiB(min: 19.33GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 128.00MiB(used: 0.00B)

Data,single: Size:10.00GiB, Used:9.52GiB
   /dev/sda2  10.00GiB

Metadata,DUP: Size:1.12GiB, Used:354.16MiB
   /dev/sda2   2.25GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/sda2  64.00MiB

Unallocated:
   /dev/sda2  37.68GiB

-This is the maximum size I have, I have also tested with only 6 GiB used
-I suspected that there is a problem with the metadata so I added in
fstab metadata_ratio=20 to all subvolumes rebooted and nothing
changed.

sudo btrfs scrub start -B /
scrub done for 6816726f-71ed-4b64-9071-60684a445e71
scrub started at Tue Aug 11 11:07:36 2015 and finished after 00:01:42
total bytes scrubbed: 10.21GiB with 0 errors

btrfs check output when run from the rescue cd with the partition unmounted:

Checking filesystem on /dev/sda2
UUID: 6816726f-71ed-4b64-9071-60684a445e71
found 10541920267 bytes used err is 0
total csum bytes: 9906264
total tree bytes: 370245632
total fs tree bytes: 337903616
total extent tree bytes: 20758528
btree space waste bytes: 63326339
file data blocks allocated: 10473455616
referenced 14596616192
btrfs-progs v4.1.2

btrfs balance start output with the following options:

-dusage 100:
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=100
Done, had to relocate 2 out of 13 chunks

-dusage 100, second, third, ... run:
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=100
Done, had to relocate 1 out of 13 chunks

-musage 33, first run:
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=33
  SYSTEM (flags 0x2): balancing, usage=33
Done, had to relocate 2 out of 13 chunks

-musage 33, second, third, run:
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=33
  SYSTEM (flags 0x2): balancing, usage=33
Done, had to relocate 1 out of 12 chunks

-musage 35 always gives an error:
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=35
  SYSTEM (flags 0x2): balancing, usage=35
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail

output of dmesg | tail (after repeated trying):
[ 2481.262199] BTRFS info (device sda2): found 1 extents
[ 2487.331921] BTRFS info (device sda2): relocating block group
683432476672 flags 34
[ 2498.583018] BTRFS info (device sda2): relocating block group
683466031104 flags 34
[ 2503.843304] BTRFS info (device sda2): relocating block group
683499585536 flags 34
[ 2511.407124] BTRFS info (device sda2): relocating block group
683533139968 flags 34
[ 

Re: Usage of new added disk not updated while doing a balance

2015-08-11 Thread Austin S Hemmelgarn

On 2015-08-11 07:08, Juan Orti Alcaine wrote:

Hello,

I have added a new disk to my filesystem and I'm doing a balance right
now, but I'm a bit worried that the disk usage does not get updated as
it should. I remember from earlier versions that you could see the
disk usage being balanced across all disks.

These are the commands I've run:
# btrfs device add /dev/sdb2 /mnt/btrfs_raid1
# btrfs fi balance /mnt/btrfs_raid1

I see the unallocated space of sdc2 and sdd2 increasing, but for sdb2
(the new disk), it doesn't change. sdb2 doesn't even appear in the
btrfs usage command for data, metadata and system.

Is this normal? It's very strange the disk not showing up in the usage report.
How much slack space was allocated by BTRFS before running the balance 
(ie, how big a difference was there between the allocated and used 
space), and did the balance run to completion?  If you had a lot of 
mostly empty chunks and stopped the balance part way through, then this 
is what I would expect to happen (balance back-fills partial chunks 
before it starts allocating new ones).


If that is not the case however, then this is very much _not_ normal, 
and is almost certainly a bug, in which case you should make sure any 
important data on the filesystem is backed up before doing anything 
further with it (including unmounting it or rebooting the system).




smime.p7s
Description: S/MIME Cryptographic Signature


Usage of new added disk not updated while doing a balance

2015-08-11 Thread Juan Orti Alcaine
Hello,

I have added a new disk to my filesystem and I'm doing a balance right
now, but I'm a bit worried that the disk usage does not get updated as
it should. I remember from earlier versions that you could see the
disk usage being balanced across all disks.

These are the commands I've run:
# btrfs device add /dev/sdb2 /mnt/btrfs_raid1
# btrfs fi balance /mnt/btrfs_raid1

I see the unallocated space of sdc2 and sdd2 increasing, but for sdb2
(the new disk), it doesn't change. sdb2 doesn't even appear in the
btrfs usage command for data, metadata and system.

Is this normal? It's very strange the disk not showing up in the usage report.

# btrfs --version
btrfs-progs v4.1

# uname -a
Linux xenon 4.1.3-201.fc22.x86_64 #1 SMP Wed Jul 29 19:50:22 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

# btrfs fi usage /mnt/btrfs_raid1
Overall:
Device size:   5.44TiB
Device allocated:  2.74TiB
Device unallocated:2.70TiB
Device missing:  0.00B
Used:  2.61TiB
Free (estimated):  1.42TiB  (min: 1.42TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 16.00KiB)

Data,RAID1: Size:1.36TiB, Used:1.30TiB
   /dev/sdc2   1.36TiB
   /dev/sdd2   1.36TiB

Metadata,RAID1: Size:10.00GiB, Used:8.11GiB
   /dev/sdc2  10.00GiB
   /dev/sdd2  10.00GiB

System,RAID1: Size:32.00MiB, Used:224.00KiB
   /dev/sdc2  32.00MiB
   /dev/sdd2  32.00MiB

Unallocated:
   /dev/sdb2   1.81TiB
   /dev/sdc2 454.48GiB
   /dev/sdd2 454.48GiB

# btrfs fi df /mnt/btrfs_raid1
Data, RAID1: total=1.36TiB, used=1.30TiB
System, RAID1: total=32.00MiB, used=224.00KiB
Metadata, RAID1: total=10.00GiB, used=8.13GiB
GlobalReserve, single: total=512.00MiB, used=18.83MiB

# btrfs fi show /mnt/btrfs_raid1
Label: 'btrfs_raid1'  uuid: 03eeb44b-de69-4f1f-9261-70bd7a5c6de0
Total devices 3 FS bytes used 1.30TiB
devid1 size 1.81TiB used 1.37TiB path /dev/sdc2
devid2 size 1.81TiB used 1.37TiB path /dev/sdd2
devid3 size 1.81TiB used 0.00B path /dev/sdb2

btrfs-progs v4.1

And the kernel log:
ago 11 11:54:45 xenon kernel: BTRFS info (device sdd2): disk added /dev/sdb2
ago 11 11:56:18 xenon kernel: BTRFS info (device sdd2): relocating
block group 1715902349312 flags 17
ago 11 11:56:36 xenon kernel: BTRFS info (device sdd2): found 12127 extents
ago 11 12:09:52 xenon kernel: BTRFS info (device sdd2): found 12127 extents
ago 11 12:09:56 xenon kernel: BTRFS info (device sdd2): relocating
block group 1714828607488 flags 17
ago 11 12:10:11 xenon kernel: BTRFS info (device sdd2): found 1076 extents
ago 11 12:11:24 xenon kernel: BTRFS info (device sdd2): found 1076 extents
ago 11 12:11:25 xenon kernel: BTRFS info (device sdd2): relocating
block group 1713754865664 flags 17
ago 11 12:11:37 xenon kernel: BTRFS info (device sdd2): found 8 extents
ago 11 12:11:50 xenon kernel: BTRFS info (device sdd2): found 8 extents
ago 11 12:11:50 xenon kernel: BTRFS info (device sdd2): relocating
block group 1712681123840 flags 17
ago 11 12:12:16 xenon kernel: BTRFS info (device sdd2): found 1432 extents
ago 11 12:13:17 xenon kernel: BTRFS info (device sdd2): found 1432 extents
[...]

-- 
Juan Orti
https://miceliux.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Usage of new added disk not updated while doing a balance

2015-08-11 Thread Juan Orti Alcaine
2015-08-11 15:20 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com:
 How much slack space was allocated by BTRFS before running the balance (ie,
 how big a difference was there between the allocated and used space), and
 did the balance run to completion?  If you had a lot of mostly empty chunks
 and stopped the balance part way through, then this is what I would expect
 to happen (balance back-fills partial chunks before it starts allocating new
 ones).

 If that is not the case however, then this is very much _not_ normal, and is
 almost certainly a bug, in which case you should make sure any important
 data on the filesystem is backed up before doing anything further with it
 (including unmounting it or rebooting the system).


I don't have the usage numbers before running the balance, but I have
around 1000 readonly snapshots, so maybe that's a factor.

It keeps running and using both CPU and IO, so I'll wait to see what happens.

Thank you.

-- 
Juan Orti
https://miceliux.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:8113! (4.1.3 kernel)

2015-08-11 Thread Josef Bacik

On 08/11/2015 01:07 AM, Marc MERLIN wrote:

On Sun, Aug 02, 2015 at 08:51:30PM -0700, Marc MERLIN wrote:

On Fri, Jul 24, 2015 at 09:24:46AM -0700, Marc MERLIN wrote:

Screenshot: 
https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs_crash.jpgk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=BIMTuuT5G3PNqsD7rUX5Uzfyd1xL9vQIECC7sPpJh5U%3D%0As=5a4e737cf6e23a884121a0bd2c935edb9e7011394b6b59b109c11716a562000b


So it's 32bit system, 3.19.8, crashing during snapshot deletion and
backref walking. EIP is in do_walk_down+0x142. I've tried to match it to
the sources on a local 32bit build, but it does not point to the
expected crash site:


Thanks for looking.
Unfortunately it's a mythtv where if I put a 64bit kernel, other things
go wrong with the 32bit userland/64bit kernel split.
But I'll put a newer 64bit kernel on it to see what happens and report
back.


I got home, built the last kernel and got netconsole working.
4.1.3/64bit and 32bit crash the same way.


So, it's been several weeks that I can't use this filesystem.
Is anyone interested in fixing the kernel bug before I wipe it?
(as in, even if the FS is corrupted, it should not crash the kernel)




From a48cf7a9ae44a17d927df5542c8b0be287aee9ed Mon Sep 17 00:00:00 2001
From: Josef Bacik jba...@fb.com
Date: Tue, 11 Aug 2015 11:39:37 -0400
Subject: [PATCH] Btrfs: kill BUG_ON() in btrfs_lookup_extent_info()

Replace it with an ASSERT(0) for the developers and an error for not the
developers.

Signed-off-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/extent-tree.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5411f0a..f7fb120 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -818,7 +818,11 @@ search_again:
BUG();
 #endif
}
-   BUG_ON(num_refs == 0);
+   if (num_refs == 0) {
+   ASSERT(0);
+   ret = -EIO;
+   goto out_free;
+   }
} else {
num_refs = 0;
extent_flags = 0;
@@ -859,7 +863,6 @@ search_again:
}
spin_unlock(delayed_refs-lock);
 out:
-   WARN_ON(num_refs == 0);
if (refs)
*refs = num_refs;
if (flags)
--
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy li...@colorremedies.com wrote:
 On Mon, Aug 10, 2015 at 7:23 PM, Timothy Normand Miller
 theo...@gmail.com wrote:
 On Mon, Aug 10, 2015 at 6:52 PM, Chris Murphy li...@colorremedies.com 
 wrote:

 - complete dmesg for the failed mount

 It really doesn't say much.  I have things like this:
 [8.643535] BTRFS info (device sdc): disk space caching is enabled
 [8.643789] BTRFS: failed to read the system array on sdc
 [8.706062] BTRFS: open_ctree failed
 [8.707124] BTRFS info (device sdc): disk space caching is enabled
 [8.710924] BTRFS: failed to read the system array on sdc
 [8.766080] BTRFS: open_ctree failed
 [8.766903] BTRFS info (device sdc): setting nodatacow, compression 
 disabled
 [8.766905] BTRFS info (device sdc): disk space caching is enabled
 [8.767152] BTRFS: failed to read the system array on sdc
 [8.936019] BTRFS: open_ctree failed
 [8.936906] BTRFS info (device sdc): disk space caching is enabled
 [8.939922] BTRFS: failed to read the system array on sdc
 [8.995984] BTRFS: open_ctree failed
 [8.996796] BTRFS info (device sdc): disk space caching is enabled
 [8.997093] BTRFS: failed to read the system array on sdc
 [9.125936] BTRFS: open_ctree failed

 It looks like there's not enough redundancy remaining to mount and in
 such a case there's really not much to be done.

 I don't see nodatacow in your fstab, so I don't know why that's
 happening. That means no checksumming for data.

Sorry.  I was dumb.  I only showed you the entry for what I was trying
to mount manually.  I have subvolumes, and this is what is in my
fstab:

UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /home btrfs
compress=lzo,noatime,space_cache,subvol=home 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/btrfs btrfs
compress=lzo,noatime,space_cache 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/vms btrfs
noatime,nodatacow,space_cache,subvol=vms 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/oldfiles btrfs
compress=lzo,noatime,space_cache,subvol=oldfiles 0 2
UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/backup btrfs
compress=lzo,noatime,space_cache,subvol=backup 0 2





 Also, when I manually try to mount, I get things like this:

 # mount /mnt/btrfs
 mount: wrong fs type, bad option, bad superblock on /dev/sdc,
missing codepage or helper program, or other error

 Have you tried to mount with -o degraded?

Ooh!  I can do that!

Mounting ro,degraded, I see this:

[94197.902443] BTRFS info (device sdc): allowing degraded mounts
[94197.902448] BTRFS info (device sdc): disk space caching is enabled
[94198.240621] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2

Mounting rw,degraded, I see this:

[94312.091613] BTRFS info (device sdc): allowing degraded mounts
[94312.091618] BTRFS info (device sdc): disk space caching is enabled
[94312.194513] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
corrupt 0, gen 2
[94319.824563] BTRFS: checking UUID tree





 Well, if I get something lengthy, I'll attach it to my bug report.
 Did the information I reported help at all?

 The entire dmesg is still useful because it should show libata errors
 if these aren't fully failed drives. So you should file a bug and
 include, literally, the entire unedited dmesg.

Alright, I'll do that.  Thanks!



 --
 Chris Murphy
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 1:56 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy li...@colorremedies.com 
 wrote:

 The entire dmesg is still useful because it should show libata errors
 if these aren't fully failed drives. So you should file a bug and
 include, literally, the entire unedited dmesg.

 Alright, I'll do that.  Thanks!


Here you go:

https://bugzilla.kernel.org/show_bug.cgi?id=102691

-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Scaling to 100k+ snapshots/subvolumes

2015-08-11 Thread Tristan Zajonc
Hi,

In an early thread Duncan mentioned that btrfs does not scale well in
the number of subvolumes (including snapshots).  He recommended
keeping the total number under 1000.  I just wanted to understand this
limitation further.  Is this something that has been resolved or will
be resolved in the future or is it something inherent to the design of
btrfs?

We have an application that could easily generate 100k-1M snapshots
and 10s of thousands of subvolumes.  We use snapshots to track very
fine-grained filesystem histories and subvolumes to enforce quotas
across a large number of distinct projects.

Thanks,
Tristan

Duncan 
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/43910

The question of number of subvolumes normally occurs in the context of
snapshots, since snapshots are a special kind of subvolume.  Ideally,
you'll want to keep the total number of subvolumes (including snapshots)
to under 1000, with the number of snapshots of any single subvolume
limited to 250-ish (say under 300).  However, even just four subvolumes
being snapshotted to this level will reach the thousand, and 2000-3000
total isn't /too/ bad as long as it's no more than 250-300 snapshots per
subvolume.  But DEFINITELY try to keep it under 3000, and preferably
under 2000, as the scaling really does start to go badly as the number of
subvolumes increases beyond that.  If you're dealing with 10k subvolumes/
snapshots, that's too many and you ARE likely to find yourself with
problems.

(With something like snapper, configuring it for say half-hour or hourly
snapshots at the shortest time, with twice-daily or daily being more
reasonable in many circumstances, and then thinning it down to say daily
after a few days and weekly after four weeks, goes quite a long way
toward reducing the number of snapshots per subvolume.  Keeping it near
250-ish per subvolume is WELL within reason, and considering that a month
or a year out, you're not likely to /care/ whether it's this hour or
that, just pick a day or a week and if it's not what you want, go back or
forward a day or a week, is actually likely to be more practical than
having hundreds of half-hourly snapshots a year old to choose from.  And
250-ish snapshots per subvolume really does turn out to be VERY
reasonable, provided you're doing reasonable thinning.)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Missing dedupe/locking patch in integration-4.2 tree?

2015-08-11 Thread Mark Fasheh
On Fri, Aug 07, 2015 at 10:11:46AM +0200, Holger Hoffstätte wrote:
 
 Mark's patch titled
 
   [PATCH 3/5] btrfs: fix clone / extent-same deadlocks [1]
 
 from his btrfs: dedupe fixes, features series is missing from the 
 integration-4.2 tree and 4.2-rc5, where it still applies cleanly (as of 5 
 mins ago).
 
 Any particular reason why this was silently dropped?

Same question here, I noticed this shortly after everything went upstream
and was planning a resend but it would be great if you could tell us whether
something was wrong with the patch or if it just got lost in the shuffle (no
big deal).
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Missing dedupe/locking patch in integration-4.2 tree?

2015-08-11 Thread Holger Hoffstätte
On 08/11/15 20:58, Mark Fasheh wrote:
 On Fri, Aug 07, 2015 at 10:11:46AM +0200, Holger Hoffstätte wrote:

 Mark's patch titled

   [PATCH 3/5] btrfs: fix clone / extent-same deadlocks [1]

 from his btrfs: dedupe fixes, features series is missing from the 
 integration-4.2 tree and 4.2-rc5, where it still applies cleanly (as of 5 
 mins ago).

 Any particular reason why this was silently dropped?
 
 Same question here, I noticed this shortly after everything went upstream
 and was planning a resend but it would be great if you could tell us whether
 something was wrong with the patch or if it just got lost in the shuffle (no
 big deal).
   --Mark

I saw this morning that it went into integration-4.3:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=integration-4.3id=293a8489f300536dc6d996c35a6ebb89aa03bab2

So probably just an oversight.

-h

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Timothy Normand Miller
On Tue, Aug 11, 2015 at 3:57 PM, Chris Murphy li...@colorremedies.com wrote:
 On Tue, Aug 11, 2015 at 12:04 PM, Timothy Normand Miller
 theo...@gmail.com wrote:

 https://bugzilla.kernel.org/show_bug.cgi?id=102691

 [7.729124] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 devid 2 transid 226237 /dev/sdd
 [7.746115] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 devid 4 transid 226237 /dev/sdb
 [7.826493] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 devid 3 transid 226237 /dev/sdc

 What do you get for 'btrfs fi show'

# btrfs fi show
Label: none  uuid: 49ac9ad2-b529-4e6e-aef9-1c5b9e8a72f8
Total devices 1 FS bytes used 28.33GiB
devid1 size 79.69GiB used 41.03GiB path /dev/sda3

Label: none  uuid: ecdff84d-b4a2-4286-a1c1-cd7e5396901c
Total devices 4 FS bytes used 1.46TiB
devid2 size 931.51GiB used 767.00GiB path /dev/sdd
devid3 size 931.51GiB used 760.03GiB path /dev/sdc
devid4 size 931.51GiB used 767.00GiB path /dev/sdb
*** Some devices missing

Label: none  uuid: f9331766-e50a-43d5-98dc-fabf5c68321d
Total devices 1 FS bytes used 2.99TiB
devid1 size 3.64TiB used 3.01TiB path /dev/sde1

btrfs-progs v4.1.2


 I see devid 2, 3, 4 only for this volume UUID. So you definitely
 appear to have a failed device and that's why it doesn't mount
 automatically at boot time. You just need to use -o degraded, and that
 should work assuming no problems with the other three devices. If it
 does work, 'btrfs replace start...' is the ideal way to replace the
 failed drive.

It's missing because I physically disconnected it.  Someone on IRC
suggested I try this in case the drive with the bad sector was
interfering.  Of course, now that I've done this and mounted
read/write, we can't reintegrate the failing drive.

If I lose the array, I won't cry.  The backup appears to be complete.
But it would be convenient to avoid having to restore from scratch,
and I'm hoping this might help you guys too in some way.  I really
like btrfs, and I would like provide you with whatever info might
contribute something.


 Maybe someone else can say whether nodatacow as a subvolume mount
 option will apply this to the entire volume.

At the moment, I'm only trying to mount the whole volume, just so I
could recover and scrub it, although as I mentioned in my earlier
email, the scrub aborts with no report of why and with 0 errors.



-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/
Open Graphics Project
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Chris Murphy
On Tue, Aug 11, 2015 at 12:04 PM, Timothy Normand Miller
theo...@gmail.com wrote:

 https://bugzilla.kernel.org/show_bug.cgi?id=102691

[7.729124] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
devid 2 transid 226237 /dev/sdd
[7.746115] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
devid 4 transid 226237 /dev/sdb
[7.826493] BTRFS: device fsid ecdff84d-b4a2-4286-a1c1-cd7e5396901c
devid 3 transid 226237 /dev/sdc

What do you get for 'btrfs fi show'

I see devid 2, 3, 4 only for this volume UUID. So you definitely
appear to have a failed device and that's why it doesn't mount
automatically at boot time. You just need to use -o degraded, and that
should work assuming no problems with the other three devices. If it
does work, 'btrfs replace start...' is the ideal way to replace the
failed drive.

Maybe someone else can say whether nodatacow as a subvolume mount
option will apply this to the entire volume.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Chris Murphy
On Tue, Aug 11, 2015 at 11:56 AM, Timothy Normand Miller
theo...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 12:21 AM, Chris Murphy li...@colorremedies.com 
 wrote:

 I don't see nodatacow in your fstab, so I don't know why that's
 happening. That means no checksumming for data.

 Sorry.  I was dumb.  I only showed you the entry for what I was trying
 to mount manually.  I have subvolumes, and this is what is in my
 fstab:

 UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /home btrfs
 compress=lzo,noatime,space_cache,subvol=home 0 2
 UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/btrfs btrfs
 compress=lzo,noatime,space_cache 0 2
 UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/vms btrfs
 noatime,nodatacow,space_cache,subvol=vms 0 2
 UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/oldfiles btrfs
 compress=lzo,noatime,space_cache,subvol=oldfiles 0 2
 UUID=ecdff84d-b4a2-4286-a1c1-cd7e5396901c /mnt/backup btrfs
 compress=lzo,noatime,space_cache,subvol=backup 0 2

Huh. I thought nodatacow applies to an entire volume only, not per
subvolume unless you use chattr +C (in which case it can be per
subvolume, directory or per file). I could be confused, but I think
you have mutually exclusive mount options.


 Have you tried to mount with -o degraded?

 Ooh!  I can do that!

 Mounting ro,degraded, I see this:

 [94197.902443] BTRFS info (device sdc): allowing degraded mounts
 [94197.902448] BTRFS info (device sdc): disk space caching is enabled
 [94198.240621] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
 corrupt 0, gen 2

 Mounting rw,degraded, I see this:

 [94312.091613] BTRFS info (device sdc): allowing degraded mounts
 [94312.091618] BTRFS info (device sdc): disk space caching is enabled
 [94312.194513] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
 corrupt 0, gen 2
 [94319.824563] BTRFS: checking UUID tree

I don't see any mount failure message. It worked then?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scaling to 100k+ snapshots/subvolumes

2015-08-11 Thread Michael Darling
If someone can answer Tristan's question, can they also add in if
large volumes of frequently created and destroyed snapshots/subvolumes
will cause issues?  Or, if they're deleted quickly after being made,
is it just the number that exists at any given time that matters?
(Building source in chroot subvolumes with its own o/s install, as
in Arch's devtools scripts, if used to create *separate* subvolumes
under each source build, rather than a global shared one.)

On Tue, Aug 11, 2015 at 6:33 PM, Tristan Zajonc tris...@sense.io wrote:

 Hi,

 In an early thread Duncan mentioned that btrfs does not scale well in
 the number of subvolumes (including snapshots).  He recommended
 keeping the total number under 1000.  I just wanted to understand this
 limitation further.  Is this something that has been resolved or will
 be resolved in the future or is it something inherent to the design of
 btrfs?

 We have an application that could easily generate 100k-1M snapshots
 and 10s of thousands of subvolumes.  We use snapshots to track very
 fine-grained filesystem histories and subvolumes to enforce quotas
 across a large number of distinct projects.

 Thanks,
 Tristan

 Duncan 
 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/43910

 The question of number of subvolumes normally occurs in the context of
 snapshots, since snapshots are a special kind of subvolume.  Ideally,
 you'll want to keep the total number of subvolumes (including snapshots)
 to under 1000, with the number of snapshots of any single subvolume
 limited to 250-ish (say under 300).  However, even just four subvolumes
 being snapshotted to this level will reach the thousand, and 2000-3000
 total isn't /too/ bad as long as it's no more than 250-300 snapshots per
 subvolume.  But DEFINITELY try to keep it under 3000, and preferably
 under 2000, as the scaling really does start to go badly as the number of
 subvolumes increases beyond that.  If you're dealing with 10k subvolumes/
 snapshots, that's too many and you ARE likely to find yourself with
 problems.

 (With something like snapper, configuring it for say half-hour or hourly
 snapshots at the shortest time, with twice-daily or daily being more
 reasonable in many circumstances, and then thinning it down to say daily
 after a few days and weekly after four weeks, goes quite a long way
 toward reducing the number of snapshots per subvolume.  Keeping it near
 250-ish per subvolume is WELL within reason, and considering that a month
 or a year out, you're not likely to /care/ whether it's this hour or
 that, just pick a day or a week and if it's not what you want, go back or
 forward a day or a week, is actually likely to be more practical than
 having hundreds of half-hourly snapshots a year old to choose from.  And
 250-ish snapshots per subvolume really does turn out to be VERY
 reasonable, provided you're doing reasonable thinning.)
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Chris Murphy
On Tue, Aug 11, 2015 at 2:26 PM, Timothy Normand Miller
theo...@gmail.com wrote:
 On Tue, Aug 11, 2015 at 3:47 PM, Chris Murphy li...@colorremedies.com wrote:


 Huh. I thought nodatacow applies to an entire volume only, not per
 subvolume unless you use chattr +C (in which case it can be per
 subvolume, directory or per file). I could be confused, but I think
 you have mutually exclusive mount options.

 Well, at the time I set up this system, I asked on IRC, and people
 said it should work.  I've never seen any errors from this.

Error implies a mistake with some sort of reference. All Btrfs does is
inform you of conflicting options:
[8.766903] BTRFS info (device sdc): setting nodatacow, compression disabled

Unfortunately that message should not reference just one device but a
volume label or UUID in my opinion.

When I test manual mount of subvolume first with -o compress, followed
by mount of another subvolume with -o nodatacow, this is the results
from mount:

/dev/sdb on /var/mnt/root type btrfs
(rw,relatime,seclabel,compress=zlib,space_cache)
/dev/sdb on /var/mnt/home type btrfs
(rw,relatime,seclabel,compress=zlib,space_cache)

When I do -o nodatacow first, followed by -o compress

/dev/sdb on /var/mnt/root type btrfs
(rw,relatime,seclabel,nodatasum,nodatacow,space_cache)
/dev/sdb on /var/mnt/home type btrfs
(rw,relatime,seclabel,nodatasum,nodatacow,space_cache)

The compress is ignored, and it looks like nodatasum and nodatacow
apply to everything. The nodatasum means no raid1 self-healing is
possible for any data on the entire volume. Metadata checksumming is
still enabled.


 [94312.091613] BTRFS info (device sdc): allowing degraded mounts
 [94312.091618] BTRFS info (device sdc): disk space caching is enabled
 [94312.194513] BTRFS: bdev (null) errs: wr 1724, rd 305, flush 45,
 corrupt 0, gen 2
 [94319.824563] BTRFS: checking UUID tree

 I don't see any mount failure message. It worked then?

 Yes and no.  It's mounted, but a scrub aborts silently:

 # btrfs scrub status /mnt/btrfs/
 scrub status for ecdff84d-b4a2-4286-a1c1-cd7e5396901c
 scrub started at Tue Aug 11 13:56:36 2015 and was aborted after 01:31:55
 total bytes scrubbed: 2.19TiB with 0 errors

 No new messages appeared in dmesg, so I can't tell why it aborted.
 It's also odd that it reports zero errors, given that it aborted.

Well I wouldn't expect a scrub to completely work in this, even though
it probably should fail more gracefully than this:
a.) you don't have a complete array I don't know what a scrub of a
degraded volume even means
b.) the data has no checksums so the only thing that can really be
scrubbed is metadata

So I'd say there are three UI/UX bugs here:
a.) The info message about nodatacow overriding compression should
refer to label and/or UUID not device.
b.) When degraded, a scrub should give more meaningful information on
the scope of what can and can't be done; or at the least say scrub
isn't possible on degraded volumes.
c.) When nodatacow, scrub should scrub metadata and inform user
metadata is scrubbed but data can't be scrubbed due to nodatacow.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Duncan
Timothy Normand Miller posted on Tue, 11 Aug 2015 17:32:12 -0400 as
excerpted:

 On Tue, Aug 11, 2015 at 5:24 PM, Chris Murphy li...@colorremedies.com
 wrote:
 
 There is still data redundancy.  Will a scrub at least notice that the
 copies differ?

 No, that's what I mean by nodatasum means no raid1 self-healing is
 possible. You have data redundancy, but without checksums btrfs has no
 way to know if they differ. It doesn't do two reads and compares them,
 it's just like md raid, it picks one device, and so long as there's no
 read error from the device, that copy of the data is assumed to be
 good.
 
 Ok, that makes sense.  I'm guessing it wouldn't be worth it to add a
 feature like this because (a) few people use nodatacow or end up in my
 situation, and (b) if they did, and the two copies were inconsistent,
 what would you do?  I suppose for me, it would be nice to know which
 files were affected.

FWIW, nodatacow and nodatasum are intended to /eventually/ be per-
subvolume mount options.  The infrastructure is there to make it so.  
It's just that the code to actually handle those mount options separately 
per subvolume doesn't exist yet, so they apply globally.

Similarly, the intention is to eventually allow per-subvolume and 
possibly even per-file raid-level specifications, while currently, the 
whole filesystem must be set to the same raid level (except that data and 
metadata raid levels are set separately).  It is currently possible to 
have multiple raid levels, but only because a raid-level conversion was 
started (either due to a balance-convert, or due to adding a second 
device changing the metadata default to raid1 from dup, for instance) and 
never finished.

So it's not so much a question of not worth it to add the no-checksum 
data redundancy scrub feature, it's that nodatacow and nodatasum are 
really intended to be exceptions where the admin has specifically 
disabled the checksumming, and are not intended to ever apply to a full 
filesystem, only, at most, to a particular subvolume.  The fact that if 
the mount option is used today it applies to the full filesystem is 
simply a temporary situational accident of not having the per-subvolume 
mount-option code implemented yet.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs: btrfs balance returns enospc error on a system with 80% free space

2015-08-11 Thread Duncan
Catalin posted on Tue, 11 Aug 2015 12:18:28 +0300 as excerpted:

 I have a recently installed an Arch Linux x86_64 system on a 50GB btrfs
 partition and every time I try btrfs balance start it gives me an enospc
 error even though I have less than 20% of the available space full.
 
 I have tried the recommended method (from
 https://btrfs.wiki.kernel.org/index.php/Balance_Filters) and with
 -dusage I can go up to -dusage=100 with no problems but with -musage it
 works until 34 and then at musage=35 it fails with the enospc error.
 
 Here is more detailed information about my setup and output of several
 commands:
 
 uname -[r] 4.1.4-1-ARCH
 
 btrfs --version btrfs-progs v4.1.2

Thanks.  That's about the first thing we ask for, and you're current on 
both kernel and userspace. =:^)

 btrfs fi show
 Label: 'ArchLinux'  uuid: 6816726f-71ed-4b64-9071-60684a445e71
 Total devices 1 FS bytes used 9.86GiB
 devid1 size 50.00GiB used 12.31GiB path /dev/sda2
 
 btrfs fi df /
 Data, single: total=10.00GiB, used=9.52GiB
 System, DUP: total=32.00MiB, used=16.00KiB
 Metadata, DUP: total=1.12GiB, used=354.31MiB
 GlobalReserve, single: total=128.00MiB, used=0.00B

Second thing we ask for. =:^)  (FWIW, usage is a newer command that 
basically combines the info of both of these, printing it in an often 
more understandable format.  But regulars are used to dealing with these 
older ones, so I omitted your usage output.)

50 GiB single-device filesystem, only 12.31 GiB allocated, default single 
data, dup metadata.  All healthy here. =:^)

[btrfs check and scrub returned no errors]

 btrfs balance start output with the following options:
 
 -dusage 100:
 Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=100
 Done, had to relocate 2 out of 13 chunks
 
 -dusage 100, second, third, ... run:
 Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=100
 Done, had to relocate 1 out of 13 chunks

It's unlikely to help, but when you're doing 100% anyway, you can simply 
use -d, IOW, tell balance data-only, but no filters.  Again, -d should 
work, but shouldn't help.  Of course you can do the same with metadata, 
but that's unlikely to work, since we already know a metadata balance 
dies with a chunk that's between 33 and 35 percent full, and as soon as 
it hits it...

 -musage 33, first run:
 Dumping filters: flags 0x6, state 0x0, force is off
   METADATA (flags 0x2): balancing, usage=33 SYSTEM (flags 0x2):
   balancing, usage=33
 Done, had to relocate 2 out of 13 chunks
 
 -musage 33, second, third, run:
 Dumping filters: flags 0x6, state 0x0, force is off
   METADATA (flags 0x2): balancing, usage=33 SYSTEM (flags 0x2):
   balancing, usage=33
 Done, had to relocate 1 out of 12 chunks
 
 -musage 35 always gives an error:
 Dumping filters: flags 0x6, state 0x0, force is off
   METADATA (flags 0x2): balancing, usage=35 SYSTEM (flags 0x2):
   balancing, usage=35
 ERROR: error during balancing '/' - No space left on device There may be
 more info in syslog - try dmesg | tail




 output of dmesg | tail (after repeated trying):

[Nothing much, reallocating blocks, ENOSPC error.]

 cat /etc/fstab
 # /dev/sda2 LABEL=ArchLinux
 UUID=6816726f-71ed-4b64-9071-60684a445e71/btrfs   
 rw,noatime,compress-force=lzo,space_cache,autodefrag 0 0

[subvolume mounts of the same btrfs omitted, subvolume/snapshot list 
omitted.]

 (like I said I have also tried with all the snapshots deleted)
 
 I have tried running the command both from inside the system and mounted
 from a rescue cd with different combinations of mount options like
 enabling and disabling space-cache / nospace_cache , clear_cache,
 enospc_debug, enable and disable compression or autodefrag.
 I have tried defragmenting everything, filling all the space, adding
 files, deleting files, making snapshots, deleting snapshots still the
 same problem.
 I have run the balance command on both the root subvolume and on
 subvolid=0.
 I have tried putting the balance commands with options that work inside
 a for to run 1000 times hoping that maybe that one relocated chunk it
 says about might actually solve something in time but it doesn't (I am
 new to btrfs and not 100% about how balance works).
 
 Everything else works fine, the system is very fast, good compression,
 no other errors and I have no other problems but the fact that I have
 this error means something is wrong and I don't know what is the problem
 and how to solve it.

You really have both included all sorts of info, and tried all sorts of 
stuff.  Top marks on that!  But unfortunately it's not helping with the 
problem...

One question.  You said you _recently_ installed.  Just how recently, or 
more directly, what version of btrfs-progs did you use for the 
mkfs.btrfs?  Or was it perhaps a conversion from ext*?

I ask, because...  the mkfs.btrfs from btrfs-progs v4.1.1 had a critical 
bug, with v4.1.2 released along with a message 

Re: Scaling to 100k+ snapshots/subvolumes

2015-08-11 Thread Duncan
Tristan Zajonc posted on Tue, 11 Aug 2015 11:33:45 -0700 as excerpted:

 In an early thread Duncan mentioned that btrfs does not scale well in
 the number of subvolumes (including snapshots).  He recommended keeping
 the total number under 1000.  I just wanted to understand this
 limitation further.  Is this something that has been resolved or will be
 resolved in the future or is it something inherent to the design of
 btrfs?

It is not resolved yet, but it's definitely on the radar.  I don't 
personally understand the details well enough to know if the problem is 
inherent to btrfs, or if some optimized rewrite down the road is likely 
to at least yield linear scaling.

On the practical side, one related thing I do know is that this is the 
reason snapshot-aware-defrag was disabled a few kernel cycles after being 
introduced -- it simply didn't scale, and the thought was, better a 
defrag that at least worked for the snapshot you pointed it at, even at 
the cost of increasing usage due to COW if other snapshots pointed at the 
same file extents, than a defrag that basically didn't work at all.

But the intent remains to at least get scaling working well enough to 
have snapshot-aware-defrag again.  So when snapshot-aware-defrag is 
enabled again, that's your clue that things should be scaling at least 
/reasonably/ well, and it's time to reexamine the situation.  Until then, 
I'd not recommend trying it.

 We have an application that could easily generate 100k-1M snapshots and
 10s of thousands of subvolumes.  We use snapshots to track very
 fine-grained filesystem histories and subvolumes to enforce quotas
 across a large number of distinct projects.

Btrfs quotas... have been another sticky wicket on btrfs, both as earlier 
the code was simply broken (tho AFAIK that's fixed in general, now), and 
because due to the way it works, quota tracking multiplies the scaling 
issues several fold (certainly in the original code form).  AFAIK they've 
actually done at least two partial rewrites, so are on the third quota 
code version now.  The third-try quota code is fresh enough I don't think 
people know yet how well it's going to perform in deployment.

As a result of that quota code history, my recommendation has been that 
unless you're deliberately testing it, if you don't need quotas, keep it 
turned off on btrfs and avoid the issues it has been known, at least 
historically, to trigger.  As btrfs quota code is demonstrably not yet 
stable and reliable enough to use, if you *do* actually depend on quotas, 
you should definitely be on some other filesystem where the quota code is 
well tested and known to be dependable, as that simply doesn't describe 
btrfs quota code at this point.

But there's actually some pretty big effort going into the quota code at 
the moment, this the fact that we're on the third version now, and 
they're definitely planning on it actually working, or they'd not be 
sinking the effort into it that they are.

And as I said, the quota code was multiplying the scaling issues several 
fold, so getting quotas actually working well is a big part of getting 
the scaling issues fixed as well.

But beyond that; in particular, whether it's ever likely to work at the 
scales you mention above, is something you'd have to ask the devs, as I'm 
just a list regular and btrfs-using admin, with a use-case that doesn't 
directly involve either quotas or subvolumes/snapshotting to any great 
degree.  So while I can point to the current situation and the current 
trend and work areas, I have effectively no idea if scaling to the 
numbers you mention above is even technically possible, or not.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Damaged filesystem, can read, can't repair, error says to contact devs

2015-08-11 Thread Duncan
Russell Coker posted on Wed, 12 Aug 2015 13:04:27 +1000 as excerpted:

 Linux Software RAID scrub will copy the data from one disk to the other
 to make them identical, the theory is that it's best to at least be
 consistent if you can't be sure you are right.
 
 Will a BTRFS scrub do this on a non-CoW file?

While I honestly don't know, a reasonably educated guess is no, btrfs 
scrub assumes COW and checksumming, and doesn't do anything if there's no 
checksums to verify against.  (I know scrub skips verification of items 
without checksums in the normally checksummed case, so it's reasonable to 
assume it would skip all data, verifying only metadata, if no data has 
checksums.)

In a case like this where nodatacow has disabled checksumming, the best 
thing one can do is manually check at least some samples, and if there's 
no visible corruption, assume the existing data is correct and (after a 
scrub to verify at least metadata) do a btrfs check --init-csum-tree, to 
initialize the checksums to at least cover the existing situation, 
whatever it may be.  Scrub should be able to work after that, since it 
has csums to work with.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html