Re: Will big metadata blocks fix # of hardlinks?

2012-05-29 Thread Hugo Mills
On Tue, May 29, 2012 at 02:09:03PM +0100, Martin wrote:
 Thanks for noting this one. That is one very surprising and unexpected
 limit!... And a killer for some not completely rare applications...

   There have been substantially-complete patches posted to this list
which fix the problem (see extended inode refs patches by Mark
Fasheh in the archives). I don't think they're quite ready for
inclusion yet, but work is ongoing to fix the issue.

 On 26/05/12 19:22, Sami Liedes wrote:
  Hi!
  
  I see that Linux 3.4 supports bigger metadata blocks for btrfs.
  
  Will using them allow a bigger number of hardlinks on a single file
  (i.e. the bug that has bitten at least git users on Debian[1,2], and
  BackupPC[3])? As far as I understand correctly, the problem has been
  that the hard links are stored in the same metadata block with some
  other metadata, so the size of the block is an inherent limitation?
  
  If so, I think it would be worth for me to try Btrfs again :)
  
  Sami
  
  
  [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603
  [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603
  [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762
 
 One example fail case is just 13 hard links. Even x4 that (16k blocks)
 only gives 52 links for that example fail case.
 
 
 The brief summary for those are:
 
 * It's a rare corner case that needs a format change to fix, so won't-fix;

   Definitely not won't-fix (see above).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great oxymorons of the world, no. 7: The Simple Truth ---  


signature.asc
Description: Digital signature


Re: Help with data recovering

2012-06-04 Thread Hugo Mills
 doesn't match,
 have=9066, want=9096
 Well block 3674785320960 seems great, but generation doesn't match,
 have=9067, want=9096
 Well block 3674788827136 seems great, but generation doesn't match,
 have=9069, want=9096
 Well block 3674792534016 seems great, but generation doesn't match,
 have=9068, want=9096
 Well block 3674808315904 seems great, but generation doesn't match,
 have=9071, want=9096
 Well block 3728604938240 seems great, but generation doesn't match,
 have=5297, want=9096
 Well block 3728635133952 seems great, but generation doesn't match,
 have=7598, want=9096
 Well block 3728682438656 seems great, but generation doesn't match,
 have=7599, want=9096
 Well block 3728770461696 seems great, but generation doesn't match,
 have=9074, want=9096
 Well block 3728819929088 seems great, but generation doesn't match,
 have=9073, want=9096
 Well block 3820340637696 seems great, but generation doesn't match,
 have=9075, want=9096
 Well block 3960145862656 seems great, but generation doesn't match,
 have=9076, want=9096
 Well block 4046161489920 seems great, but generation doesn't match,
 have=9077, want=9096
 Well block 4046213595136 seems great, but generation doesn't match,
 have=9079, want=9096
 Well block 4046217637888 seems great, but generation doesn't match,
 have=9081, want=9096
 Well block 4046217846784 seems great, but generation doesn't match,
 have=9080, want=9096
 Well block 4046252736512 seems great, but generation doesn't match,
 have=9083, want=9096
 Well block 4046301515776 seems great, but generation doesn't match,
 have=9085, want=9096
 Well block 4046302756864 seems great, but generation doesn't match,
 have=9084, want=9096
 Well block 4046358921216 seems great, but generation doesn't match,
 have=9086, want=9096
 Well block 4046409486336 seems great, but generation doesn't match,
 have=9087, want=9096
 Well block 4046414626816 seems great, but generation doesn't match,
 have=9088, want=9096
 Well block 4148447113216 seems great, but generation doesn't match,
 have=7618, want=9096
 Well block 4148522024960 seems great, but generation doesn't match,
 have=9089, want=9096
 Well block 4148539457536 seems great, but generation doesn't match,
 have=9090, want=9096
 Well block 4455562448896 seems great, but generation doesn't match,
 have=9092, want=9096
 Well block 4455568302080 seems great, but generation doesn't match,
 have=9091, want=9096
 Well block 4848395739136 seems great, but generation doesn't match,
 have=9093, want=9096
 Well block 4923796594688 seems great, but generation doesn't match,
 have=9094, want=9096
 Well block 4923798065152 seems great, but generation doesn't match,
 have=9095, want=9096
 Found tree root at 5532762525696
 
 
 On 06/04/2012 07:49 AM, Hugo Mills wrote:
 On Mon, Jun 04, 2012 at 07:43:40AM -0400, Maxim Mikheev wrote:
 Hi Arne,
 
 Can you advice how can I recover data?
 I tried almost everything what I found on https://btrfs.wiki.kernel.org
 
 /btrfs-restore restored some files but it is not what was stored.
 Can you post the complete output of find-root please?
 
 I have seen this command
 
 --
 In case of a corrupted superblock, start by asking btrfsck to use an
 alternate copy of the superblock instead of the superblock #0. This
 is achieved via the -s option followed by the number of the
 alternate copy you wish to use. In the following example we ask for
 using the superblock copy #2 of /dev/sda7:
 
 # ./btrfsck -s 2 /dev/sd7
 
 -
 but it gave me:
 $ sudo btrfsck -s 2 /dev/sdb
 btrfsck: invalid option -- 's'
 usage: btrfsck dev
 Btrfs Btrfs v0.19
 What exact version of the package do you have? Did you compile from
 a recent git, or do you have a distribution -progs package installed?
 If the latter, what date does it have in the version number?
 
 Hugo.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   ---   __(_'  Squeak!   ---   


signature.asc
Description: Digital signature


Re: Help with data recovering

2012-06-04 Thread Hugo Mills
[trimmed Arne  Jan from cc by request]

On Mon, Jun 04, 2012 at 08:28:22AM -0400, Maxim Mikheev wrote:
 adding -v, as an example:
 sudo btrfs-find-root -v -v -v -v -v /dev/sdb
 
 didn't change output at all.

   OK, then all I can suggest is what I said below -- work through the
potential tree roots in order from largest generation id to smallest.
Given that it's not reporting any trees, though, I'm not certain that
you'll get any success with it.

   Did you have your data in a subvolume?

   Hugo.

 On 06/04/2012 08:11 AM, Hugo Mills wrote:
 On Mon, Jun 04, 2012 at 08:01:32AM -0400, Maxim Mikheev wrote:
 Thank you for helping.
 I'm not sure I can be of much help, but there were a few things
 missing from the earlier conversation that I wanted to check the
 details of.
 
 ~$ uname -a
 Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02
 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 
 I compiled progs from recent git (week or two ago). I can compile it
 again if there updates.
 No, that should be recent enough. I don't think there have been any
 major updates since then.
 
 The output of btrfs-find-root is pretty long and below:
 max@s0:~$ sudo btrfs-find-root /dev/sdb
 Super think's the tree root is at 5532762525696, chunk root 20979712
 Well block 619435147264 seems great, but generation doesn't match,
 have=8746, want=9096
 This is not long enough, unfortunately. At least some of these
 should have a list of trees before them. At the moment, it's not
 reporting any trees at all. (At least, it should be doing this unless
 Chris took that line of code out). Do you get anything extra from
 adding a few -v options to the command?
 
 I would suggest, in the absence of any better ideas, sorting this
 list by the have= value, and systematically working down from the
 largest to the smallest, running btrfs-restore -t $n for each one
 (where $n is corresponding block number).
 
 Hugo.
[snip]

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   ---   __(_'  Squeak!   ---   


signature.asc
Description: Digital signature


Re: Help with data recovering

2012-06-04 Thread Hugo Mills
On Mon, Jun 04, 2012 at 06:04:22PM +0100, Hugo Mills wrote:
I'm out of ideas.

   ... but that's not to say that someone else may have some ideas. I
wouldn't get your hopes up too much, though.

At this point, though, you're probably looking at somebody writing
 custom code to scan the FS and attempt to find and retrieve anything
 that's recoverable.
 
You might try writing a tool to scan all the disks for useful
 fragments of old trees, and see if you can find some of the tree roots
 independently of the tree of tree roots (which clearly isn't
 particularly functional right now). You might try simply scanning the
 disks looking for your lost data, and try to reconstruct as much of it
 as you can from that. You could try to find a company specialising in
 data recovery and pay them to try to get your data back. Or you might
 just have to accept that the data's gone and work on reconstructing
 it.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- A linked list is still a binary tree.  Just a very unbalanced ---  
 one.  -- dragon 


signature.asc
Description: Digital signature


Re: [btrfs-progs] [bug][patch] Leaking file handle in scrub_fs_info()

2012-06-05 Thread Hugo Mills
;
   }
 
 - ret = scrub_fs_info(fdmnt, path, fi_args, di_args);
 + ret = scrub_fs_info(path, fi_args, di_args);
   if (ret) {
   ERR(!do_quiet, ERROR: getting dev info for scrub failed: 
   %s\n, strerror(-ret));
 @@ -1586,7 +1601,6 @@ static int cmd_scrub_status(int argc, char **argv)
   .sun_family = AF_UNIX,
   };
   int ret;
 - int fdmnt;
   int i;
   int print_raw = 0;
   int do_stats_per_dev = 0;
 @@ -1615,13 +1629,7 @@ static int cmd_scrub_status(int argc, char **argv)
 
   path = argv[optind];
 
 - fdmnt = open_file_or_dir(path);
 - if (fdmnt  0) {
 - fprintf(stderr, ERROR: can't access to '%s'\n, path);
 - return 12;
 - }
 -
 - ret = scrub_fs_info(fdmnt, path, fi_args, di_args);
 + ret = scrub_fs_info(path, fi_args, di_args);
   if (ret) {
   fprintf(stderr, ERROR: getting dev info for scrub failed: 
   %s\n, strerror(-ret));
 @@ -1698,7 +1706,6 @@ static int cmd_scrub_status(int argc, char **argv)
  out:
   free_history(past_scrubs);
   free(di_args);
 - close(fdmnt);
   if (fdres  -1)
   close(fdres);
 
 
 
 
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Computer Science is not about computers,  any more than --- 
 astronomy is about telescopes.  


signature.asc
Description: Digital signature


Re: delete disk proceedure

2012-06-05 Thread Hugo Mills
On Tue, Jun 05, 2012 at 10:38:11AM -0400, Jim wrote:
 Good morning btrfs list,
 I had written about 2 weeks ago about using extra btrfs space in an
 nfs file system setup.  Nfs seems to export the files but the mounts
 don't work on older machines without btrfs kernels.

   The mounts don't work -- can you be more specific here?

   It would seem that if we can get to the bottom of that problem, you
won't have to muck around with your current set-up at all.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I am an opera lover from planet Zog.  Take me to your lieder. ---  


signature.asc
Description: Digital signature


Re: delete disk proceedure

2012-06-05 Thread Hugo Mills
On Tue, Jun 05, 2012 at 01:12:17PM -0400, Jim wrote:
 [sorry for the resend, signature again]
 I am waiting for a window (later tonight) when I can try mounting
 the btrfs export.  Am I reading you guys correctly, that you think I
 should be deleting drives from the array?  Or is this a just in
 case?  Thanks.

   Try the modified exports as I suggested in the other part of the
thread first. If that turns out to be problematic still, then we can
discuss any migration strategies.

   Hugo.

 Jim Maloney
 
 On 06/05/2012 01:04 PM, Hugo Mills wrote:
 On Tue, Jun 05, 2012 at 06:19:00PM +0200, Helmut Hullen wrote:
 Hallo, Jim,
 
 Du meintest am 05.06.12:
 
 /dev/sda   11T  4.9T  6.0T  46% /btrfs
 [root@advanced ~]# btrfs fi show
 failed to read /dev/sr0
 Label: none  uuid: c21f1221-a224-4ba4-92e5-cdea0fa6d0f9
   Total devices 12 FS bytes used 4.76TB
   devid6 size 930.99GB used 429.32GB path /dev/sdf
   devid5 size 930.99GB used 429.32GB path /dev/sde
   devid8 size 930.99GB used 429.32GB path /dev/sdh
   devid9 size 930.99GB used 429.32GB path /dev/sdi
   devid4 size 930.99GB used 429.32GB path /dev/sdd
   devid3 size 930.99GB used 429.32GB path /dev/sdc
   devid   11 size 930.99GB used 429.08GB path /dev/sdk
   devid2 size 930.99GB used 429.32GB path /dev/sdb
   devid   10 size 930.99GB used 429.32GB path /dev/sdj
   devid   12 size 930.99GB used 429.33GB path /dev/sdl
   devid7 size 930.99GB used 429.32GB path /dev/sdg
   devid1 size 930.99GB used 429.09GB path /dev/sda
 Btrfs v0.19-35-g1b444cd
 df -h and btrfs fi show seem to be in good size agreement.  Btrfs was
 created as raid1 metadata and raid0 data.  I would like to delete the
 last 4 drives leaving 7T of space to hold 4.9T of data.  My plan
 would be to remove /dev/sdi, j, k, l one at a time.  After all are
 deleted run btrfs fi balance /btrfs.
 I'd prefer
 
  btrfs device delete /dev/sdi
  btrfs filesystem balance /btrfs
  btrfs device delete /dev/sdj
  btrfs filesystem balance /btrfs
 
 etc. - after every delete its balance run.
 That's not necessary. Delete will move the blocks from the device
 being removed into spare space on the other devices. The balance is
 unnecessary. (In fact, delete and balance share quite a lot of code)
 
 That may take a lot of hours - I use the last lines of dmesg to
 extrapolate the needed time (btrfs produces a message about every
 minute).
 
 And you can't use the console from where you have started the balance
 command. Therefore I wrap this command:
 
echo 'btrfs filesystem balance /btrfs' | at now
 ... or just put it into the background with btrfs bal start
 /mountpoint. You know, like everyone else does. :)
 
 Hugo.
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Quantum est ille canis in fenestra? ---   


signature.asc
Description: Digital signature


New btrfs-progs integration branch

2012-06-05 Thread Hugo Mills
   I've just pushed out a new integration branch to my git repo. This
is purely bugfix patches -- there are no new features in this issue of
the integration branch. I've got a stack of about a dozen more patches
with new features in them still to go. I'll be working on those
tomorrow. As always, there's minimal testing involved here, but it
does at least compile on my system(*).

   The branch is fetchable with git from:

http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ integration-20120605

   And viewable in human-readable form at:

http://git.darksatanic.net/cgi/gitweb.cgi?p=btrfs-progs-unstable.git

   Shortlog is below.

   Hugo.

(*) I don't care about works-on-my-machine. We are not shipping your
machine!



Akira Fujita (1):
  Btrfs-progs: Fix manual of btrfs command

Chris Samuel (1):
  Fix set-dafault typo in cmds-subvolume.c

Csaba Tóth (1):
  mkfs.btrfs on ARM

Goffredo Baroncelli (1):
  scrub_fs_info( ) file handle leaking

Hubert Kario (2):
  Fix segmentation fault when opening invalid file system
  man: fix btrfs man page formatting

Jan Kara (1):
  mkfs: Handle creation of filesystem larger than the first device

Jim Meyering (5):
  btrfs_scan_one_dir: avoid use-after-free on error path
  mkfs: use strdup in place of strlen,malloc,strcpy sequence
  restore: don't corrupt stack for a zero-length command-line argument
  avoid several strncpy-induced buffer overruns
  mkfs: avoid heap-buffer-read-underrun for zero-length size arg

Josef Bacik (3):
  Btrfs-progs: make btrfsck aware of free space inodes
  Btrfs-progs: make btrfs filesystem show uuid actually work
  btrfs-progs: enforce block count on all devices in mkfs

Miao Xie (3):
  Btrfs-progs: fix btrfsck's snapshot wrong unresolved refs
  Btrfs-progs, btrfs-corrupt-block: fix the wrong usage
  Btrfs-progs, btrfs-map-logical: Fix typo in usage

Phillip Susi (2):
  btrfs-progs: removed extraneous whitespace from mkfs man page
  btrfs-progs: document --rootdir mkfs switch

Sergei Trofimovich (2):
  Makefile: use $(CC) as a compilers instead of $(CC)/gcc
  Makefile: use $(MAKE) instead of hardcoded 'make'

Shawn Bohrer (1):
  btrfs-progs: Update resize documentation

Wang Sheng-Hui (1):
  btrfs-progs: cleanup: remove the redundant BTRFS_CSUM_TYPE_CRC32 macro def

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Great oxymorons of the world, no.  5: Manifesto Promise --- 


signature.asc
Description: Digital signature


Re: New btrfs-progs integration branch

2012-06-06 Thread Hugo Mills
On Wed, Jun 06, 2012 at 01:48:00PM +0200, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 05.06.12:
 
 The branch is fetchable with git from:
 
  http://git.darksatanic.net/repo/btrfs-progs-unstable.git/
  integration-20120605
 
 There seems to be a bug inside:
 
 [...]
 
 gcc -g -O0 -o btrfsck btrfsck.o ctree.o disk-io.o radix-tree.o extent- 
 tree.o print-tree.o root-tree.o dir-item.o file-item.o inode-item.o  
 inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o volumes.o  
 utils.o btrfs-list.o btrfslabel.o repair.o  -luuid
 
 gcc -g -O0 -o btrfs-convert ctree.o disk-io.o radix-tree.o extent-tree.o 
 print-tree.o root-tree.o dir-item.o file-item.o inode-item.o inode-map.o 
 crc32c.o rbtree.o extent-cache.o extent_io.o volumes.o utils.o btrfs-list.o 
 btrfslabel.o repair.o convert.o -lext2fs -lcom_err  -luuid
 
 gcc   convert.o   -o convert
 convert.o: In function `btrfs_item_key':
 /tmp/btrfs-progs-unstable/ctree.h:1404: undefined reference to 
 `read_extent_buffer'
 convert.o: In function `btrfs_dir_item_key':
 /tmp/btrfs-progs-unstable/ctree.h:1437: undefined reference to 
 `read_extent_buffer'
 convert.o: In function `btrfs_del_item':

   Odd. I've just tried this on a clean clone of my repo, and it's
building fine. It's declared in extent_io.h, and defined in
extent_io.c.

   However, it does look like there's a problem with the make process:
my Makefile says:

btrfs-convert: $(objects) convert.o
$(CC) $(CFLAGS) -o btrfs-convert $(objects) convert.o -lext2fs 
-lcom_err $(LDFLAGS) $(LIBS)

... which seems to be what the second line you quoted is doing.
However, the third line with the problem looks like something out of
date. Possibly a mis-merge?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I am but mad north-north-west:  when the wind is southerly, I ---  
   know a hawk from a handsaw.   


signature.asc
Description: Digital signature


Re: New btrfs-progs integration branch

2012-06-06 Thread Hugo Mills
On Wed, Jun 06, 2012 at 05:03:00PM +0200, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 06.06.12:
 
  However, the third line with the problem looks like something out of
  date. Possibly a mis-merge?
 
 Where should I search?

   Well, the first thing would be to try a completely new clone of the
repo, then git co integration-20120605, and run make again. If that's
OK, then take a look with gitk in the broken repo and see what kind of
history you've got in there -- it should be a single unbroken sequence
from master (1957076ab4fefa47b6efed3da541bc974c83eed7) to
integration-20120605 (d4c539067d1cb2476c7fb6003625de26e84059af).

   Also have a look in the Makefile of the broken repo -- all of the
commands (listed near the top, assigned to the progs variable)
should start with btrfs, and there should be no rule for convert
in there. Again, if that's not the case, you've managed to mis-merge
or check out the wrong branch.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- We are all lying in the gutter,  but some of us are looking ---   
  at the stars.  


signature.asc
Description: Digital signature


Re: New btrfs-progs integration branch

2012-06-06 Thread Hugo Mills
On Wed, Jun 06, 2012 at 05:52:00PM +0200, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 06.06.12:
 
  However, the third line with the problem looks like something out
  of date. Possibly a mis-merge?
 
  Where should I search?
 
 Well, the first thing would be to try a completely new clone of
  the repo, then git co integration-20120605, and run make again.
 
 I had a brand new git clone.
 
 Produced with
 
 [...]
 git clone http://git.darksatanic.net/repo/btrfs-progs-unstable.git
 cd btrfs-progs-unstable
 git checkout integration-20120605
 
 (and btrfs-progs-unstable had been empty before checkout)
 
  If
  that's OK, then take a look with gitk in the broken repo and see what
  kind of history you've got in there -- it should be a single unbroken
  sequence from master (1957076ab4fefa47b6efed3da541bc974c83eed7) to
  integration-20120605 (d4c539067d1cb2476c7fb6003625de26e84059af).
 
 I don't know much about working with git ... but I suppose I'm not  
 working with such things as a (broken) repo.
 
 It's the same way I had successfully compiled your version from 20111012  
 and from 20111030.
 
 Is there any change compiling the new version?

   No, just type make from the directory.

   Can you compare your Makefile with the one at [1] -- in particular
the progs variable at line 21-23, the all target on line 37, and the
btrfs-convert target on line 97. There definitely should not be a
plain convert target in there, but that seems to be what your system
was failing on.

   Hugo.

[1] 
http://git.darksatanic.net/cgi/gitweb.cgi?p=btrfs-progs-unstable.git;a=blob;f=Makefile;h=9699366d506918db711245aa771d103698a7;hb=integration-20120605

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- We are all lying in the gutter,  but some of us are looking ---   
  at the stars.  


signature.asc
Description: Digital signature


Re: New btrfs-progs integration branch

2012-06-06 Thread Hugo Mills
On Wed, Jun 06, 2012 at 06:18:00PM +0200, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 06.06.12:
 
  git checkout integration-20120605
 
 [...]
 Can you compare your Makefile with the one at [1] -- in particular
  the progs variable at line 21-23, the all target on line 37, and
  the btrfs-convert target on line 97. There definitely should not be
  a plain convert target in there, but that seems to be what your
  system was failing on.
 
 Makefile with 3888 Bytes.
 
 md5sum Makefile shows
 
 (my file)
 deef961e3ecd560ad8710cf0b58f5570  Makefile
 
 (the file from your link)
 deef961e3ecd560ad8710cf0b58f5570  Makefile
 
 The problem is somewhere on another place ...

   OK, can you send through the complete output of:

$ make clean
$ gcc --version
$ make --version
$ make
$ for f in .*.d; do echo == $d; cat $d; done

   My guess is that the dependency generation is going wrong somewhere.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's many a slip 'twixt wicket-keeper and gully. ---   


signature.asc
Description: Digital signature


Re: [PATCH v5 2/3] Btrfs-progs: make two utility functions globally available

2012-06-06 Thread Hugo Mills
 btrfs_ioctl_fs_info_args *fi_args,
 + struct btrfs_ioctl_dev_info_args **di_ret);
  #endif

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Happiness is mandatory.  Are you happy? --- 


signature.asc
Description: Digital signature


Re: [PATCHv2 3/4] avoid several strncpy-induced buffer overruns

2012-06-06 Thread Hugo Mills
);
 + args.name[BTRFS_PATH_NAME_MAX-1] = 0;
   res = ioctl(fddst, BTRFS_IOC_SUBVOL_CREATE, args);
   e = errno;
 
 @@ -202,6 +203,7 @@ static int cmd_subvol_delete(int argc, char **argv)
 
   printf(Delete subvolume '%s/%s'\n, dname, vname);
   strncpy(args.name, vname, BTRFS_PATH_NAME_MAX);
 + args.name[BTRFS_PATH_NAME_MAX-1] = 0;
   res = ioctl(fd, BTRFS_IOC_SNAP_DESTROY, args);
   e = errno;
 
 @@ -378,6 +380,7 @@ static int cmd_snapshot(int argc, char **argv)
 
   args.fd = fd;
   strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX);
 + args.name[BTRFS_PATH_NAME_MAX-1] = 0;

   This, however, is wrong. args here is a struct
btrfs_ioctl_vol_args_v2, and the name field is BTRFS_SUBVOL_NAME_MAX+1
long, so it should be:

-   strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX);
+   strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX+1);
+   args.name[BTRFS_SUBVOL_NAME_MAX] = 0;

   res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args);
   e = errno;
 
 diff --git a/restore.c b/restore.c
 index 2674832..d1ac542 100644
 --- a/restore.c
 +++ b/restore.c
 @@ -846,7 +846,8 @@ int main(int argc, char **argv)
 
   memset(path_name, 0, 4096);
 
 - strncpy(dir_name, argv[optind + 1], 128);
 + strncpy(dir_name, argv[optind + 1], sizeof dir_name);
 + dir_name[sizeof dir_name - 1] = 0;
 
   /* Strip the trailing / on the dir name */
   len = strlen(dir_name);
 diff --git a/utils.c b/utils.c
 index ee7fa1b..5240c2c 100644
 --- a/utils.c
 +++ b/utils.c
 @@ -657,9 +657,11 @@ int resolve_loop_device(const char* loop_dev, char* 
 loop_file, int max_len)
   ret_ioctl = ioctl(loop_fd, LOOP_GET_STATUS, loopinfo);
   close(loop_fd);
 
 - if (ret_ioctl == 0)
 + if (ret_ioctl == 0) {
   strncpy(loop_file, loopinfo.lo_name, max_len);
 - else
 + if (max_len  0)
 + loop_file[max_len-1] = 0;
 + } else
   return -errno;
 
   return 0;
 @@ -860,8 +862,10 @@ int check_mounted_where(int fd, const char *file, char 
 *where, int size,
   }
 
   /* Did we find an entry in mnt table? */
 - if (mnt  size  where)
 + if (mnt  size  where) {
   strncpy(where, mnt-mnt_dir, size);
 + where[size-1] = 0;
 + }
   if (fs_dev_ret)
   *fs_dev_ret = fs_devices_mnt;
 
 @@ -893,6 +897,8 @@ int get_mountpt(char *dev, char *mntpt, size_t size)
 if (strcmp(dev, mnt-mnt_fsname) == 0)
 {
 strncpy(mntpt, mnt-mnt_dir, size);
 +   if (size)
 +mntpt[size-1] = 0;
 break;
 }
 }
 @@ -925,6 +931,7 @@ void btrfs_register_one_device(char *fname)
   return;
   }
   strncpy(args.name, fname, BTRFS_PATH_NAME_MAX);
 + args.name[BTRFS_PATH_NAME_MAX-1] = 0;

   Same comment about the length of the name field in struct
btrfs_ioctl_vol_args as the 6 or 7 places above.

   ret = ioctl(fd, BTRFS_IOC_SCAN_DEV, args);
   e = errno;
   if(ret0){

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Your problem is that you've got too much taste to be ---   
a web developer. 


signature.asc
Description: Digital signature


Re: [PATCHv2 4/4] mkfs: avoid heap-buffer-read-underrun for zero-length size arg

2012-06-06 Thread Hugo Mills
On Fri, Apr 20, 2012 at 09:27:26PM +0200, Jim Meyering wrote:
 From: Jim Meyering meyer...@redhat.com
 
 * mkfs.c (parse_size): ./mkfs.btrfs -A '' would read and possibly
 write the byte before beginning of strdup'd heap buffer.  All other
 size-accepting options were similarly affected.
 
 Reviewed-by: Josef Bacik jo...@redhat.com
 ---
  cmds-subvolume.c |2 +-
  mkfs.c   |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index fc749f1..a01c830 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -380,7 +380,7 @@ static int cmd_snapshot(int argc, char **argv)
 
   args.fd = fd;
   strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX);
 ^ +1

 - args.name[BTRFS_PATH_NAME_MAX-1] = 0;
 + args.name[BTRFS_SUBVOL_NAME_MAX-1] = 0;
 
   args.name[BTRFS_SUBVOL_NAME_MAX] = 0;

   res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args);
   e = errno;
 
 diff --git a/mkfs.c b/mkfs.c
 index 03239fb..4aff2fd 100644
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -63,7 +63,7 @@ static u64 parse_size(char *s)
 
   s = strdup(s);
 
 - if (!isdigit(s[len - 1])) {
 + if (len  !isdigit(s[len - 1])) {

   I think I'd prefer that len is a size_t, not an int here. (Or that
len is tested to be 0).

   c = tolower(s[len - 1]);
   switch (c) {
   case 'g':

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Your problem is that you've got too much taste to be ---   
a web developer. 


signature.asc
Description: Digital signature


Re: [PATCHv2 3/4] avoid several strncpy-induced buffer overruns

2012-06-06 Thread Hugo Mills
On Wed, Jun 06, 2012 at 08:31:47PM +0100, Hugo Mills wrote:
  @@ -378,6 +380,7 @@ static int cmd_snapshot(int argc, char **argv)
  
  args.fd = fd;
  strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX);
  +   args.name[BTRFS_PATH_NAME_MAX-1] = 0;
 
This, however, is wrong. args here is a struct
 btrfs_ioctl_vol_args_v2, and the name field is BTRFS_SUBVOL_NAME_MAX+1
 long, so it should be:
 
 - strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX);
 + strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX+1);
 + args.name[BTRFS_SUBVOL_NAME_MAX] = 0;

   Oops, just spotted the v3 with this fix in. Ignore this comment.
(I'm actually using the v3 in integration, but I reviewed the mail
from a different mailbox and got the wrong series...)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Your problem is that you've got too much taste to be ---   
a web developer. 


signature.asc
Description: Digital signature


Re: locating a func in btrfs-progs

2012-06-06 Thread Hugo Mills
On Thu, Jun 07, 2012 at 09:38:13AM +0800, Sonu wrote:
 
 
 Hi 
 
   Any clues on where I can find the function  'btrfs_header_level' in 
 btrfs-progs ?  

   It's a getter/setter pair. See line 1555 of ctree.h.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Don't worry, he's not drunk. He's like that all the time. ---


signature.asc
Description: Digital signature


Re: Bug in btrfs-debug-tree for two or more devices.

2012-06-12 Thread Hugo Mills
On Tue, Jun 12, 2012 at 06:53:00AM +, Santosh Hosamani wrote:
 
 Hi btrfs folks,
 I am working on btrfs filesystem on how it manages the free 
 space. And found out btrfs maintain a ctree which manages the physical 
 location of the chunks and stripes of the filesystem.
 Btrfs-debug-tree also gives the information on the chunk tree
 
 I created btrfs on single device and two device.I have attached the output of 
 both on running btrfs-debug-tree.
 For single device sum of all the length in the chunks will add upto the total 
 used bytes which is expected behavior.
 
 But for two devices sum of all lengths in the chunks does not add to the 
 total bytes .Am I missing something .

   Without actually seeing the details of your technique and
expectations, I shall make a guess that you're not accounting for the
double-counting of RAID-1 metadata. In other words, you will find that
all of the metadata device extents (or chunks) will appear twice --
once on each device.

   Actually, this isn't quite right either -- what you really need to
do is look at the RAID-1, RAID-10 and DUP bits in the chunk flags, add
up all of those chunks, divide by two, and then add in the remaining
(RAID-0 and single) chunks. That total should then add up to the total
value of allocated space that you get from the output of btrfs fi df.

 Also I notice that for the second device the superblock location 0x1 is 
 not considered as used .
 
 I would be really grateful if you folks can answer my query.
 
 I hav run these tests on SLES11-sp2-x86
 Kernel 3.0.13.0.27-default

   This is pretty old, but shouldn't affect the results. It will cause
reliability problems if you try running it seriously.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold.


signature.asc
Description: Digital signature


Re: Computing size of snapshots approximatly

2012-06-13 Thread Hugo Mills
On Wed, Jun 13, 2012 at 02:15:33PM +0200, Jan-Hendrik Palic wrote:
 Hi,
 
 we using on a server several lvm volumes with btrfs. We want to use
 nightly build snapshots for some days as an alternative to backups.
 
 Now I want to get the size of the snapshots in detail.

   There are basically two figures you can get for each snapshot.
These values may differ wildly. Which one do you want?

(A) The first, larger, value is the total computed size of the
   files in the subvolume. This is what du returns.

(B) The second, smaller, value is the amount of space that would be
   freed by deleting the subvolume. (Alternatively, this is the amount
   of data in the subvolume which is not shared with some other
   subvolume). It is currently a difficult process to work out this
   value in general, but the qgroups patch set will track this
   information automatically, and expose an API that will allow you to
   retrieve it.

   The qgroups patches aren't complete yet.

 Therefore I
 played with
 
   btrfs subvolume find-new $snapshot $gen-id.

 And I know, that this is quite complicated and not implemented.
 Therefore I try to go my own way:
 
 Now assume there are two snapshots of one subvolume, snap1 and
 snap2. Further get the find-new informations of these snapshots with
 $gen-id=1 and save them into different files. A diff of these files
 shows the changes between snap1 and snap2, right?
 
 Ok.
 
 There are three operations on a filesystem, I think,
 
 1. copy a file on the filesystem
 2. change a file on the filesystem
 3. delete a file on the filesystem
 
 Am I right to assume, that operation 1 and 2 are not change much the
 size of a snapshot and the delete operation let increase the size of
 a snapshot in the size of the deleted files?

   It depends on which measure of the two above you're trying to use,
and whether the subvolume (and file) you're modifying still has
extents shared with some other subvolume.

1. Copying a file (without --reflink) will increase both the (A) and
   the (B) size of the snapshot. Copying a file with --reflink will
   increase (A) and leave (B) much the same.

2. Changing a file will, obviously, cause (A) to change by the
   difference between the old file and the new. If that file shares no
   extents with anything else, then (B) will also change by that
   amount. Otherwise, if it shares extents with anything else (another
   subvolume, or a reflink copy), then (B) will increase by the amount
   of data modified.

3. Deleting a file will reduce (A) by the size of the file. (B) will
   reduce by the size of non-shared extents owned by that file.

   Note that btrfs sub find-new will not allow you to track file
deletions.

 If it is so, it would be enough for me to get the deletions of files
 between two snapshots and their size. But is there another way to
 get these informations beside btrfs subvolume find-new? Perhaps it
 makes sense to use ioctl for it? What about the send/receive
 feature, which is upcoming?
 
 Are there any hints?

   Wait for qgroups to land, because that actually does it the right
way, and will avoid you having to track all kinds of awkward (and
hard-to-find) corner cases.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Summoning his Cosmic Powers, and glowing slightly ---
from his toes... 


signature.asc
Description: Digital signature


Re: cannot remove files: rm gives no space left on device, 3.2.0-24, ubuntu

2012-06-16 Thread Hugo Mills
On Sat, Jun 16, 2012 at 02:18:15PM +0300, Andrei Popa wrote:
 
 https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_I_ran_out_of_disk_space.21

   Also, please note the top box on

https://btrfs.wiki.kernel.org/index.php/Getting_started

   It may help, or it may not, but it's worth doing anyway.

   Hugo.

 On Sat, 2012-06-16 at 13:16 +0200, rupert THURNER wrote:
  how would i be able to delete something off this btrfs partition
  again? i saw the following messages in the archives which seem to be a
  little similar ... except a reboot and therfor a remount did not help:
  * http://article.gmane.org/gmane.linux.kernel/1265666/match=enospc
  
  rt@tv:~$ rm -rf /media/388gb-data/.Trash-1000/info/foto.trashinfo
  rm: cannot remove `/media/388gb-data/.Trash-1000/info/foto.trashinfo':
  No space left on device
  
  rt@tv:~$ btrfs filesystem df /media/388gb-data/
  Data: total=260.59GB, used=254.56GB
  System: total=32.00MB, used=24.00KB
  Metadata: total=128.00GB, used=120.01GB
  
  rt@tv:~$ sudo btrfs filesystem show /dev/sda6
  failed to read /dev/sr0
  Label: '388gb-data'  uuid: 19223a9e-7840-4798-8ee4-02b5bf9c2899
  Total devices 1 FS bytes used 374.56GB
  devid1 size 388.62GB used 388.62GB path /dev/sda6
  
  rt@tv:~$ uname -a
  Linux tv 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:51:22 UTC 2012
  i686 i686 i386 GNU/Linux
  
  the only snapshot is the one created during converting from ext4:
  $ sudo btrfs subvolume list /media/388gb-data/
  ID 256 top level 5 path ext2_saved
  
  
  open(/usr/lib/locale/locale-archive, O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
  fstat64(3, {st_mode=S_IFREG|0644, st_size=4480528, ...}) = 0
  mmap2(NULL, 262144, PROT_READ, MAP_PRIVATE, 3, 0x2bd) = 0xb7553000
  mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE, 3, 0x43a) = 0xb7552000
  close(3)= 0
  ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo 
  ...}) = 0
  lstat64(/, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
  fstatat64(AT_FDCWD,
  /media/388gb-data/.Trash-1000/info/foto.trashinfo,
  {st_mode=S_IFREG|0644, st_size=56, ...}, AT_SYMLINK_NOFOLLOW) = 0
  unlinkat(AT_FDCWD,
  /media/388gb-data/.Trash-1000/info/foto.trashinfo, 0) = -1 ENOSPC
  (No space left on device)
  open(/usr/share/locale/locale.alias, O_RDONLY|O_CLOEXEC) = 3
  fstat64(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0
  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  0) = 0xb7551000
  read(3, # Locale name alias data base.\n#..., 4096) = 2570
  read(3, , 4096)   = 0
  close(3)= 0
  munmap(0xb7551000, 4096)= 0
  open(/usr/share/locale/en/LC_MESSAGES/coreutils.mo, O_RDONLY) = -1
  ENOENT (No such file or directory)
  open(/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo, O_RDONLY) = 3
  fstat64(3, {st_mode=S_IFREG|0644, st_size=619, ...}) = 0
  mmap2(NULL, 619, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7551000
  close(3)= 0
  write(2, rm: , 4rm: ) = 4
  write(2, cannot remove `/media/388gb-data..., 65cannot remove
  `/media/388gb-data/.Trash-1000/info/foto.trashinfo') = 65
  open(/usr/share/locale/en/LC_MESSAGES/libc.mo, O_RDONLY) = -1 ENOENT
  (No such file or directory)
  open(/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo, O_RDONLY) =
  -1 ENOENT (No such file or directory)
  write(2, : No space left on device, 25: No space left on device) = 25
  write(2, \n, 1
  )   = 1
  _llseek(0, 0, 0xbfe0d210, SEEK_CUR) = -1 ESPIPE (Illegal seek)
  close(0)= 0
  
  
  rupert

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- No names... I want to remain anomalous. --- 


signature.asc
Description: Digital signature


Re: Subvolumes and /proc/self/mountinfo

2012-06-20 Thread Hugo Mills
On Tue, Jun 19, 2012 at 04:35:59PM -0700, H. Peter Anvin wrote:
 On 06/19/2012 07:22 AM, Calvin Walton wrote:
  
  All subvolumes are accessible from the volume mounted when you use -o
  subvolid=0. (Note that 0 is not the real ID of the root volume, it's
  just a shortcut for mounting it.)
  
 
 Could you clarify this bit?  Specifically, what is the real ID of the
 root volume, then?
 
 I found that after having set the default subvolume to something other
 than the root, and then mounting it without the -o subvol= option, then
 the subvolume name does *not* show in /proc/self/mountinfo; the same
 happens if a subvolume is mounted by -o subvolid= rather than -o subvol=.
 
 Is this a bug?  This would seem to give the worst of both worlds in
 terms of actually knowing what the underlying filesystem path would end
 up looking like.

   Yes, it's a bug, and rather an irritating one at that. I know that
David Sterba looked at fixing it, but apparently it was trickier to
fix than was expected. (I don't recall the reason, and probably
wouldn't have understood it anyway, so I'll leave it to Dave to tell
you about it in detail).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Doughnut furs ache me, Omar Dorlin. ---   


signature.asc
Description: Digital signature


Re: Leaving Red Hat

2012-06-20 Thread Hugo Mills
On Wed, Jun 20, 2012 at 08:59:15AM -0400, Josef Bacik wrote:
 Hello,
 
 Today is my last day at Red Hat, I will be joining Chris at Fusion IO.

   Blimey. It's all change round here, isn't it? Congratulations.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- We are all lying in the gutter,  but some of us are looking ---   
  at the stars.  


signature.asc
Description: Digital signature


Re: Knowing how much space is taken by each snapshot?

2012-06-25 Thread Hugo Mills
On Mon, Jun 25, 2012 at 07:58:40AM -0700, Marc MERLIN wrote:
 Howdy,
 
 My btrfs pool looks like this:
 usr
 usr_daily_20120622_00:01:01
 usr_daily_20120623_00:18:25
 usr_daily_20120624_00:01:01
 usr_daily_20120625_00:01:01
 usr_hourly_20120625_05:00:02
 usr_hourly_20120625_06:00:01
 usr_hourly_20120625_07:00:01
 usr_weekly_20120610_00:02:01
 usr_weekly_20120617_00:02:01
 usr_weekly_20120624_00:02:01
 
 Sometimes I run low on space and I have to start dropping snapshots. I
 realize that due to COW blocks, it's hard to say exactly how much space each
 snapshot uses, but is there some way to get an idea how much each snapshot
 will free if I delete it?

   When the relevant bit of qgroups lands, yes. Until then, not really.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Python is executable pseudocode; perl ---  
is executable line-noise.


signature.asc
Description: Digital signature


Re: Feature request: true RAID-1 mode

2012-06-25 Thread Hugo Mills
On Mon, Jun 25, 2012 at 10:46:01AM -0700, H. Peter Anvin wrote:
 On 06/25/2012 08:21 AM, Chris Mason wrote:
  Yes and no.  If you have 2 drives and you add one more, we can make it
  do all new chunks over 3 drives.  But, turning the existing double
  mirror chunks into a triple mirror requires a balance.
  
  -chris
 
 So trigger one.  This is the exact analogue to the resync pass that is
 required in classic RAID after adding new media.

   You'd have to cancel and restart if a second new disk was added
while the first balance was ongoing. Fortunately, this isn't a problem
these days.

   Also, it occurs to me that I should just check -- are you aware
that the btrfs implementation of RAID-1 makes no guarantees about the
location of any given piece of data? i.e. if I have a piece of data
stored at block X on disk 1, it's not guaranteed to be stored at block
X on disks 2, 3, 4, ... I'm not sure if this is important to you, but
it's a significant difference between the btrfs implementation of
RAID-1 and the MD implementation.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Never underestimate the bandwidth of a Volvo filled ---   
   with backup tapes.


signature.asc
Description: Digital signature


Re: New btrfs-progs integration branch

2012-06-26 Thread Hugo Mills
On Tue, Jun 26, 2012 at 11:58:41AM +0300, Alex Lyakas wrote:
 Hi Hugo,
 forgive me, but I am somewhat confused.
 What is the main repo of btrfs-progs, if there is such thing?
 I see patches coming in, but no updates to
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git,
 which I thought was the one.
 
 Can you pls clarify where should I pull updates from for btrfs-progs?

   The official source for btrfs-progs is Chris's one, at the URL
above. The integration repo is kind of a staging area where I pull in
as many patches as I can and get them a bit more visibility. We don't
really have a well-defined workflow here.

   It depends on what you intend doing: if you want to make packages
for your distribution, use Chris's repo. If you want something
reasonably stable and tested, use Chris's repo. If there's some
experimental kernel feature you want to test out, use integration. If
you want to be helpful and test out new patches and report problems
with them, use integration.

   Hugo.

 Thanks,
 Alex.
 
 
 
 On Tue, Jun 5, 2012 at 10:09 PM, Hugo Mills h...@carfax.org.uk wrote:
    I've just pushed out a new integration branch to my git repo. This
  is purely bugfix patches -- there are no new features in this issue of
  the integration branch. I've got a stack of about a dozen more patches
  with new features in them still to go. I'll be working on those
  tomorrow. As always, there's minimal testing involved here, but it
  does at least compile on my system(*).
 
    The branch is fetchable with git from:
 
  http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ 
  integration-20120605
 
    And viewable in human-readable form at:
 
  http://git.darksatanic.net/cgi/gitweb.cgi?p=btrfs-progs-unstable.git
 
    Shortlog is below.
 
    Hugo.
 
  (*) I don't care about works-on-my-machine. We are not shipping your
  machine!
 
  
 
  Akira Fujita (1):
       Btrfs-progs: Fix manual of btrfs command
 
  Chris Samuel (1):
       Fix set-dafault typo in cmds-subvolume.c
 
  Csaba Tóth (1):
       mkfs.btrfs on ARM
 
  Goffredo Baroncelli (1):
       scrub_fs_info( ) file handle leaking
 
  Hubert Kario (2):
       Fix segmentation fault when opening invalid file system
       man: fix btrfs man page formatting
 
  Jan Kara (1):
       mkfs: Handle creation of filesystem larger than the first device
 
  Jim Meyering (5):
       btrfs_scan_one_dir: avoid use-after-free on error path
       mkfs: use strdup in place of strlen,malloc,strcpy sequence
       restore: don't corrupt stack for a zero-length command-line argument
       avoid several strncpy-induced buffer overruns
       mkfs: avoid heap-buffer-read-underrun for zero-length size arg
 
  Josef Bacik (3):
       Btrfs-progs: make btrfsck aware of free space inodes
       Btrfs-progs: make btrfs filesystem show uuid actually work
       btrfs-progs: enforce block count on all devices in mkfs
 
  Miao Xie (3):
       Btrfs-progs: fix btrfsck's snapshot wrong unresolved refs
       Btrfs-progs, btrfs-corrupt-block: fix the wrong usage
       Btrfs-progs, btrfs-map-logical: Fix typo in usage
 
  Phillip Susi (2):
       btrfs-progs: removed extraneous whitespace from mkfs man page
       btrfs-progs: document --rootdir mkfs switch
 
  Sergei Trofimovich (2):
       Makefile: use $(CC) as a compilers instead of $(CC)/gcc
       Makefile: use $(MAKE) instead of hardcoded 'make'
 
  Shawn Bohrer (1):
       btrfs-progs: Update resize documentation
 
  Wang Sheng-Hui (1):
       btrfs-progs: cleanup: remove the redundant BTRFS_CSUM_TYPE_CRC32 macro 
  def
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: btrfs filesystem defragment exits with non-zero return code (20) upon success

2012-06-27 Thread Hugo Mills
On Wed, Jun 27, 2012 at 02:05:55PM +0200, Lenz Grimmer wrote:
 Hi,
 
 running btrfs filesystem defrag somehow always returns a non-zero exit code,
 even when it succeeds:

   Yes, this is a known problem, and one that's on my list of things
to deal with. Thanks for the reminder, though.
 
 I'm no C programmer, but looking at the end of the do_defrag function in
 btrfs_cmds.c, I wonder if the last return errors + 20 is correct? In case
 that errors is greater than zero, the function would be left via the exit(1)
 anyway, wouldn't it? In that case, wouldn't return 0 at the end be more
 appropriate?

   Yeah, basically, it's doing something silly and unexpected with
return codes.

   Hugo.

 [SNIP]
   if (errors) {
 fprintf(stderr, total %d failures\n, errors);
 exit(1);
   }
 
   free(av);
   return errors + 20;
 [SNIP]
 
 Thanks!
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Mixing mathematics and alcohol is dangerous.  Don't ---   
drink and derive.


signature.asc
Description: Digital signature


Re: Can give some help?

2012-06-29 Thread Hugo Mills
On Fri, Jun 29, 2012 at 09:41:47PM +0800, Zhi Yong Wu wrote:
 HI,
 
 Can anyone let me know where the funtions are declared or defined,
 such as btrfs_header_nritems(), btrfs_header_level(), etc? thanks.

   ctree.h, somewhere around or after line 1550. They're all accessor
functions, defined by a set of macros. Look for the *_SETGET_* macros.
The actual definitions of BTRFS_SETGET_FUNCS are in struct-funcs.h

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no. 1: Family Holiday ---   


signature.asc
Description: Digital signature


Re: Kernel panic from btrfs subvolume delete

2012-06-29 Thread Hugo Mills
On Fri, Jun 29, 2012 at 03:23:13PM +0100, Richard Cooper wrote:
 
 On 29 Jun 2012, at 11:42, Fajar A. Nugraha wrote:
  What should I do now? Do I need to upgrade to a more recent btrfs?
  
  Yep
  
  If so, how?
  
  https://blogs.oracle.com/linux/entry/oracle_unbreakable_enterprise_kernel_release
  http://elrepo.org/tiki/kernel-ml
 
 Perfect, thank you! I was looking for a mainline kernel yum repo but my 
 google-fu was failing me. That looks like just what I need.
 
 I've installed kernel v3.4.4 from http://elrepo.org/tiki/kernel-ml and that 
 seems to have fixed my kernel panic. I'm still using the default Cent OS 6 
 versions of the btrfs userspace programs (v0.19). Any reason why that might 
 be a bad idea?

   You miss out on new features (like scrub and btrfsck). Note that
0.19 could actually be any version from the last 3 years or so. Most
distributions these days are putting a date in their package names --
anything from 20120328 or so is good.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Charting the inexorable advance of Western syphilisation... ---   


signature.asc
Description: Digital signature


Re: Btrfs RAID space utilization and bitrot reconstruction

2012-07-01 Thread Hugo Mills
On Sun, Jul 01, 2012 at 01:50:39PM +0200, Waxhead wrote:
 As far as I understand btrfs stores all data in huge chunks that are
 striped, mirrored or raid5/6'ed throughout all the disks added to
 the filesystem/volume.

   Well, RAID-5/6 hasn't landed yet, but yes.

 How does btrfs deal with different sized disks? let's say that you
 for example have 10 different disks that are
 100GB,200GB,300GB...1000GB and you create a btrfs filesystem with
 all the disks. How will the raid5 implementation distribute chunks
 in such a setup.

   We haven't seen the code for that bit yet.

 I assume the stripe+stripe+parity are separate chunks that are
 placed on separate disks but how does btrfs select the best disk to
 store a chunk on? In short will a slow disk slow down the entire
 array, parts of it or will btrfs attempt to use the fastest disks
 first?

   Chunks are allocated by ordering the devices by the amount of free
(=unallocated) space left on each, and picking the chunks from devices
in that order. For RAID-1 chunks are picked in pairs. For RAID-0, as
many as possible are picked, down to a minimum of 2 (I think). For
RAID-10, the largest even number possible is picked, down to a minimum
of 4. I _believe_ that RAID-5 and -6 will pick as many as possible,
down to some minimum -- but as I said, we haven't seen the code yet.

 Also since btrfs checksums both data and metadata I am thinking that
 at least the raid6 implementation perhaps can (try to) reconstruct
 corrupt data (and try to rewrite it) before reading an alternate
 copy. Can someone please fill me in on the details here?

   Yes, it should be possible to do that with RAID-5 as well. (Read
the data stripes, verify checksums, if one fails, read the parity,
verify that, and reconstruct the bad block from the known-good data).

 Finaly how does btrfs deals with advanced format (4k sectors) drives
 when the entire drive (and not a partition) is used to build a btrfs
 filesystem. Is proper alignment achieved?

   I don't know about that. However, the native block size in btrfs is
4k, so I'd imagine that it's all good.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You stay in the theatre because you're afraid of having no ---
 money? There's irony... 


signature.asc
Description: Digital signature


Re: BTRFS fsck apparent errors

2012-07-03 Thread Hugo Mills
On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote:
 A couple days ago, I have converted my Ubuntu Precise machine from
 ext4 to BTRFS using btrfs-convert.
[snip]
 After I had shifted, I tried to defragment and compress my FS using
 commands such as :
 
 find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \;
 
 During execution of such commands, my kernel oopsed, so I restarted.
 
 Afterwards, I noticed that, during the execution of such a command,
 my FS free space was quickly dropping, where I would have expected
 it to increase...

   What you're seeing is the fact that you've still got the complete
ext4 filesystem and all of its data sitting untouched on the disk as
well. The defrag will have taken a complete new copy of the data but
not removed the ext4 copy.

   If you delete the conversion recovery directory (ext2_subvol), then
you'll see the space usage drop again. Of course, doing that will also
mean that you won't be able to roll back to ext4 without reformatting
and restoring from your backups. (You have got backups, right?)

 Once finished, I checked a couple of BTRFS FSes using btrfsck, but I
 interpret the results as having some errors :
 
 root@fnix:/# btrfsck /dev/VG1/DEBMINT
 checking extents
 checking fs roots
 root 256 inode 257 errors 800
 found 7814565888 bytes used err is 1
 total csum bytes: 6264636
 total tree bytes: 394928128
 total fs tree bytes: 365121536
 btree space waste bytes: 101451531
 file data blocks allocated: 20067590144
  referenced 13270241280
 Btrfs Btrfs v0.19
 
 root@fnix:/# btrfsck /dev/VG1/STORAGE
 checking extents
 checking fs roots
 root 301 inode 10644 errors 1000
 root 301 inode 10687 errors 1000
 root 301 inode 10688 errors 1000
 root 301 inode 10749 errors 1000
 found 55683117056 bytes used err is 1
 total csum bytes: 54188580
 total tree bytes: 191500288
 total fs tree bytes: 103596032
 btree space waste bytes: 49730472
 file data blocks allocated: 55640522752
  referenced 56466059264
 Btrfs Btrfs v0.19
 
 It doesn't seem that btrfsck attempts to fix these errors in any
 way... It just displays them.

   Correct, by default it just checks the filesystem. Just to be sure:
the filesystems in question weren't mounted, were they?

   I would also suggest using a 3.4 kernel. There's at least one FS
corruption bug known to exist in 3.2 that's been fixed in 3.4.
(Probably not what's happened in this case, but it's best to try to
avoid these kinds of issues).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- emacs: Eats Memory and Crashes. --- 


signature.asc
Description: Digital signature


Re: btrfs data dup on single device?

2014-06-25 Thread Hugo Mills
On Wed, Jun 25, 2014 at 09:25:57AM +0200, Daniel Landstedt wrote:
 Will it be possible to use DUP for data as well as for metadata on a
 single device?

   This has variously been possible and not over the last few years. I
think it's finally come down on the side of not, but by all means try
it (mkfs.btrfs -d dup).

 And if so, am I going to be able to specify more than 1 copy of the data?

   It'll be exactly 2 copies at the moment. Note that performance on
an SSD will at least halve, and performance on a rotational device
will probably suck quite badly. Neither will help you in the case of a
full-device failure. You still need backups, kept on a separate machine.

 Storage is pretty cheap now, and to have multiple copies in btrfs is
 something that I think could be used a lot. I know I will use multiple
 copies of my data if made possible.

   The question is, why? If you have enough disk media errors to make
it worth using multiple copies, then your storage device is basically
broken and needs replacing, and it can't really be relied on for very
much longer.

 Is it something that might be available when RAID1 gets N mirrors
 instead of just 1 mirror?

   The n-copies code will probably support n-copies DUP as well.
There's no reason particularly to restrict it that way.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Do not meddle in the affairs of wizards, for they are subtle, ---  
   and quick to anger.   


signature.asc
Description: Digital signature


Re: RAID1 3+ drives

2014-06-28 Thread Hugo Mills
On Sat, Jun 28, 2014 at 09:38:00AM +0200, Martin Steigerwald wrote:
 Am Samstag, 28. Juni 2014, 16:28:23 schrieb Russell Coker:
   So look for N-way-mirroring when you go RAID shopping, and no, btrfs does
   not have it at this time, altho it is roadmapped for implementation after
   completion of the raid5/6 code.
  
   
  
   FWIW, N-way-mirroring is my #1 btrfs wish-list item too, not just for
   device redundancy, but to take full advantage of btrfs data integrity
   features, allowing to scrub a checksum-mismatch copy with the content
   of a checksum-validated copy if available.  That's currently possible,
   but due to the pair-mirroring-only restriction, there's only one
   additional copy, and if it happens to be bad as well, there's no
   possibility of a third copy to scrub from.  As it happens my personal
   sweet-spot between cost/performance and reliability would be 3-way
   mirroring, but once they code beyond N=2, N should go unlimited, so N=3,
   N=4, N=50 if you have a way to hook them all up... should all be possible.
  
  What I want is the ZFS copies= feature.
 
 Something like this, even more flexible, was planned to be added. There were 
 some discussion on how to specificy complex redundancy patterns totally 
 flexibly 
 exactly with how much redundancy, how much spares and so on.
 
 I didn't read any of this since a long time. I wonder what happened to this 
 idea.

   It's moving slowly in fits and starts. I haven't forgotten it.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- But people have always eaten people,  / what else is there to ---  
 eat?  / If the Juju had meant us not to eat people / he 
 wouldn't have made us of meat.  


signature.asc
Description: Digital signature


Re: Question about debugfs on btrfs

2014-07-02 Thread Hugo Mills
On Wed, Jul 02, 2014 at 12:23:01PM -0400, Zhe Zhang wrote:
 Hi,
 
 I'm trying to use a functionality like debugfs blocks or
 dump_extents on a btrfs partition. The current debugfs user space
 program doesn't seem to support it (from e2fsprogs). I cannot find
 debugfs in btrfs-progs either. Any advice on how to do it?

   I don't know what the ext* debugfs does, but the odds of it
actually working on btrfs are pretty slim, given that they're
completely different filesystems. :)

   If you want a human-readable view of the filesystem's metadata,
then btrfs-debug-tree is the tool you need.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Someone's been throwing dead sheep down my Fun Well ---   


signature.asc
Description: Digital signature


Re: BTRFS claims that empty directory is not empty and refuses to delete it

2014-07-15 Thread Hugo Mills
On Tue, Jul 15, 2014 at 11:09:53AM +0200, Martin Steigerwald wrote:
 Hello!
 
 This is with 3.16-rc4 – stepped back to this one after having two hangs in 
 one 
 day with 3.16-rc5, see other thread started by me:
 
 martin@merkaba:~/Zeit/undeletable/db_data ls -lid akonadi
 450598 drwx-- 1 martin martin 1232 Jun 22 14:11 akonadi
 martin@merkaba:~/Zeit/undeletable/db_data ls -lai akonadi 
 insgesamt 0
 450598 drwx-- 1 martin martin 1232 Jun 22 14:11 .
 450595 drwxr-xr-x 1 martin martin   14 Jun 22 14:11 ..
 martin@merkaba:~/Zeit/undeletable/db_data LANG=C rmdir akonadi
 rmdir: failed to remove 'akonadi': Directory not empty
 martin@merkaba:~/Zeit/undeletable/db_data#1 LANG=C rm -r akonadi
 rm: cannot remove 'akonadi': Directory not empty
 martin@merkaba:~/Zeit/undeletable/db_data#1 LANG=C rm -rf akonadi
 rm: cannot remove 'akonadi': Directory not empty
 martin@merkaba:~/Zeit/undeletable/db_data#1
 
 
 Whats this?
 
 I had this weeks ago already and just moved it out of the way at that time, 
 just now stumbled upon it again.

   That is symptomatic of a bug from a couple of kernel versions ago
(now fixed, so it won't happen again). It it is that bug, then btrfs
check will report something along the lines of directory isize
wrong, and the problem can be fixed by running a btrfs check
--repair. If you get anything else from btrfs check (or it checks
cleanly), then let us know first.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- If you see something, say nothing and drink to forget ---  


signature.asc
Description: Digital signature


Re: btrfs hanging since 3.16-rc3 or so

2014-07-15 Thread Hugo Mills
cc linux-btrfs list

On Tue, Jul 15, 2014 at 10:40:46PM +0900, Norbert Preining wrote:
 Dear all
 
 (please keep Cc)
 
 Since 3.16-rc3 or so I regularly get btrfs hanging in some transations.
 
 Usually during apt-get upgrade or some other large file operations
 (cowbuilder building of packages).
 
 The log files give me for loads of processes things like:
 [ 6236.746546] INFO: task aptitude:22775 blocked for more than 120 seconds.
 [ 6236.746547]   Tainted: GW  O  3.16.0-rc5 #27
 [ 6236.746548] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 6236.746549] aptitudeD 8800b21a3868 0 22775  22709 
 0x
 [ 6236.746550]  88003644fd10 0082 81a15500 
 88003644ffd8
 [ 6236.746552]  8800b21a3430 00011c00 880147da9c30 
 880147da9c30
 [ 6236.746553]  88003644fd58 880034b3ed48 880034b3ed38 
 88003644fd20
 [ 6236.746555] Call Trace:
 [ 6236.746557]  [81585b4a] schedule+0x64/0x66
 [ 6236.746560]  [811bb22e] btrfs_wait_logged_extents+0xa4/0xdc
 [ 6236.746561]  [810635c1] ? finish_wait+0x5d/0x5d
 [ 6236.746564]  [811d9489] btrfs_sync_log+0x5ef/0x8a2
 [ 6236.746567]  [811b43cf] btrfs_sync_file+0x21b/0x24d
 [ 6236.746569]  [811b43cf] ? btrfs_sync_file+0x21b/0x24d
 [ 6236.746571]  [8110db8a] vfs_fsync_range+0x1c/0x1e
 [ 6236.746574]  [810d1681] SyS_msync+0x15d/0x1ea
 [ 6236.746575]  [81588712] system_call_fastpath+0x16/0x1b
 
 THis is aptitude, but I have all the other tasks accessing the disk
 hanging, too.
 
 This time, issueing a Sysrq-s for emergency syncing did the laptop
 out of the hang.
 
 Hardware: Sonly VAIO Pro 13
 Distribution: Debian/sid
 self compiled kernel, config on request.
 
 Please let me know if there is anything else I can provide.
 
 Thanks a lot
 
 Norbert
 
 
 PREINING, Norbert   http://www.preining.info
 JAIST, Japan TeX Live  Debian Developer
 GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0  ACF0 6CAC A448 860C DC13
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Normaliser unix c'est comme pasteuriser le Camembert ---   


signature.asc
Description: Digital signature


Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 12:18:37AM +0200, Sebastian Ochmann wrote:
 I'm sharing a btrfs-formatted drive between multiple computers and each of
 the machines has a separate home directory on that drive. The root of the
 drive is mounted at /mnt/tray and the home directory for machine {hostname}
 is under /mnt/tray/Homes/{hostname}. Up until now, I have mounted /mnt/tray
 like a normal volume and then did an additional bind-mount of
 /mnt/tray/Homes/{hostname} to /home.

   You've said you're not sharing it concurrently, which is good -- as
long as you've only got one machine accessing it at the same time,
you're fine there.

 Now I have a new drive and wanted to do things a bit more advanced by
 creating subvolumes for each of the machines' home directories so that I can
 also do independent snapshotting. I guess I could use the bind-mount method
 like before but my question is if it is considered safe to do an additional,
 regular mount of one of the subvolumes to /home instead, like
 
 mount /dev/sdxN /mnt/tray
 mount -o subvol=/Homes/{hostname} /dev/sdxN /home
 
 When I experimented with such additional mounts of subvolumes of
 already-mounted volumes, I noticed that the mount options of the additional
 subvolume mount might differ from the original mount. For instance, the
 root volume might be mounted with noatime while the subvolume mount may
 have relatime.
 
 So my questions are: Is mounting a subvolume of an already mounted volume
 considered safe

   Yes, absolutely:

hrm@amelia:~$ mount | grep btrfs
/dev/sda2 on /boot type btrfs (rw,noatime,space_cache)
/dev/sda2 on /home type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/video type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/pipeline type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/snarf type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/audio type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/home type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/video type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/testing type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/pipeline type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/audio type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/nadja type btrfs (rw,noatime,space_cache)

 and are there any combinations of possibly conflicting mount
 options one should be aware of (compression, autodefrag, cache clearing)? Is
 it advisable to use the same mount options for all mounts pointing to the
 same physical device?

   If you assume that the first mount options are the ones used for
everything, regardless of any different options provided in subsequent
mounts, then you probably won't go far wrong. It's not quite true:
some options do work on a per-mount basis, but most are
per-filesystem. I'm sure there was a list of them on the wiki at some
point, but I can't seem to track it down right now.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: btrfs fi df shows unknown ?

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 10:02:01AM +0200, Swâmi Petaramesh wrote:
 Hi there,
 
 Since a few days, I have noticed that btrfs fi df / displays an entry about 
 unknown used space, and I can see this on several Fedora machines, so it is 
 not an issue related to a given system...
 
 Does anybody know what these unknown data are ?

   It's the block reserve, which used to be part of metadata, but is
now split out to its own type. An updated userspace should be able to
show it properly.

   Hugo.

 i.e:
 
 # btrfs fi df /
 Data, single: total=106.00GiB, used=88.28GiB
 System, DUP: total=32.00MiB, used=24.00KiB
 Metadata, DUP: total=1.00GiB, used=520.36MiB
 unknown, single: total=176.00MiB, used=0.00
 
 # btrfs --version
 Btrfs v3.14.2
 
 # uname -r
 3.15.5-200.fc20.x86_64
 
 TIA, kind regards.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: NFS FILE ID not unique when exporting many brtfs subvolumes

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com wrote:
 I have a problem using btrfs/nfs to store my vmware images.
[snip]
 - vmware is basing its NFS files locks on the nfs fileid field returned from 
 a NFS GETATTR request for the file being locked
   
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1007909
vmware assumes that these nfs fileid are unique per storage.
 
 - it seemed that these nfs fileid are only unique 'per-subvolume', but 
 because my nfs export contains many subvolumes,
 the nfs export has then my files (in different subvolume) with the same nfs 
 fileid.
 
 - no problem when I start all machine alone, but when 2 machines are running 
 at the same time, vmware seems to mix its reference to lock file and 
 sometimes kills one vm.
 
   in esx server, following messages : /var/log/vmkwarning.log : 
 
   2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315: Inode 
 (Dup: 260 Orig: 260) has been recycled by server, freeing lock info for 
 .lck-0401
   2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348: Unable to 
 remove lockfile .invalid, not found
   2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348: Unable to 
 remove lockfile .invalid, not found
 
   and in machine log : 
   Message from sncubeesx02: The lock protecting vm-w7-sysp.vmdk 
 has been lost, 
   possibly due to underlying storage issues. If this virtual 
 machine is configured to be highly 
   available, ensure that the virtual machine is running on some 
 other host before clicking OK. 
   
 - vmware try to make its own file locking for flowing file type : 
   
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=10051
 
   VMNAME.vswp 
   DISKNAME-flat.vmdk 
   DISKNAME-ITERATION-delta.vmdk 
   VMNAME.vmx 
   VMNAME.vmxf 
   vmware.log
 
 Is there a way to deal with this problem ? is that a bug ? 

   Add an arbitrary and unique fsid=0x12345 value to the exports
declaration. For example, my server exports a number of subvolumes
from the same FS with:

/srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \
   10.0.0.20 fe80::20
/srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \
   fe80::/64
/srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \
   10.0.0.0/24 fe80::/64

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- You can get more with a kind word and a two-by-four than you ---   
   can with just a kind word.


signature.asc
Description: Digital signature


Re: NFS FILE ID not unique when exporting many brtfs subvolumes

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 01:02:06PM +, philippe.simo...@swisscom.com wrote:
 Hi Hugo
 
  -Original Message-
  From: Hugo Mills [mailto:h...@carfax.org.uk]
  Sent: Thursday, July 17, 2014 1:13 PM
  To: Simonet Philippe, INI-ON-FIT-NW-IPE
  Cc: linux-btrfs@vger.kernel.org
  Subject: Re: NFS FILE ID not unique when exporting many brtfs subvolumes
  
  On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com
  wrote:
   I have a problem using btrfs/nfs to store my vmware images.
  [snip]
   - vmware is basing its NFS files locks on the nfs fileid field returned 
   from a NFS
  GETATTR request for the file being locked
  
  http://kb.vmware.com/selfservice/microsites/search.do?language=en_
  UScmd=displayKCexternalId=1007909
  vmware assumes that these nfs fileid are unique per storage.
  
   - it seemed that these nfs fileid are only unique 'per-subvolume', but 
   because
  my nfs export contains many subvolumes,
   the nfs export has then my files (in different subvolume) with the same 
   nfs
  fileid.
  
   - no problem when I start all machine alone, but when 2 machines are 
   running
  at the same time, vmware seems to mix its reference to lock file and
   sometimes kills one vm.
  
 in esx server, following messages : /var/log/vmkwarning.log :
  
 2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315:
  Inode (Dup: 260 Orig: 260) has been recycled by server, freeing lock info 
  for
  .lck-0401
 2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348:
  Unable to remove lockfile .invalid, not found
 2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348:
  Unable to remove lockfile .invalid, not found
  
 and in machine log :
 Message from sncubeesx02: The lock protecting vm-w7-
  sysp.vmdk has been lost,
 possibly due to underlying storage issues. If this virtual 
   machine
  is configured to be highly
 available, ensure that the virtual machine is running on some
  other host before clicking OK.
  
   - vmware try to make its own file locking for flowing file type :
  
  http://kb.vmware.com/selfservice/microsites/search.do?language=en_
  UScmd=displayKCexternalId=10051
  
 VMNAME.vswp
 DISKNAME-flat.vmdk
 DISKNAME-ITERATION-delta.vmdk
 VMNAME.vmx
 VMNAME.vmxf
 vmware.log
  
   Is there a way to deal with this problem ? is that a bug ?
  
 Add an arbitrary and unique fsid=0x12345 value to the exports
  declaration. For example, my server exports a number of subvolumes
  from the same FS with:
  
  /srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \
 10.0.0.20 fe80::20
  /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \
 fe80::/64
  /srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \
 10.0.0.0/24 fe80::/64
  
 Hugo.
  

 first of all, thank for your answer !

 on my system, I have one export, that is the root btrfs subvolume
 and contains itself one subvolume per vm.
 if I change the NFS export fsid, it does not change anything in each
 the file IDs of the whole NFS export.
 (I crossed checked it just to be sure, tshark -V -nlp -t a port 2049
 | egrep Entry: name|File ID, and effectively,
 fsid has no impact on file id) 

   Aaah, that's interesting. I suspect that you'll have to make the
mounts explicit, so for every subvolume exported from the server,
there's a line in fstab to mount it to the place it's exported from.
This happens as a side-effect of the recommended filesystem/subvol
layout[1] anyway, since it doesn't use nested subvolumes at all, so
I've never actually noticed the situation you mention.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold.


signature.asc
Description: Digital signature


Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)

2014-07-25 Thread Hugo Mills
[this time, to the mailing list as well]

On Fri, Jul 25, 2014 at 09:02:44AM +0100, Hugo Mills wrote:
 On Thu, Jul 24, 2014 at 11:06:34PM -0400, Nick Krause wrote:
  On Thu, Jul 24, 2014 at 10:32 PM, Duncan 1i5t5.dun...@cox.net wrote:
 [snip]
  Hey Duncan and others ,
  I have read this and this seems to need some working on.
  If you want my help please ask , I am new to the kernel
  so I may ask a dumb question or two but if that's fine with
  you I have no problem helping out here. I would like
  a log of printk statements leading to the hand if that's
  not too much work in order for me to trace this back.
 
Note that btrfs is complex -- there's something around 100k lines
 of code in it. My first piece of kernel work in btrfs was simply
 documenting the way that the on-disk data structures related to each
 other[1]. That on its own took me two to three weeks of solid
 full-time effort, reading the code to find where each structure was
 used and how its elements related to other structures. You can't just
 wander up and dive in without putting in the effort of learning first.
 Whilst people will help you (come over to #btrfs on Freenode for more
 real-time interaction), they can't do the basic work of sitting down
 and understanding the code in detail for you.
 
Chris, who designed and wrote the filesystem, has spent the last
 couple of weeks tracking down this particular problem. Do you think
 it's appropriate to leap into the middle of the discussion on this
 subtle bug as someone with absolutely no experience in the area?
 
Your first task is to reproduce the bug on your own machine. If you
 can do that, _then_ you might be able to start tracking down its
 cause. But I wouldn't recommend doing that, as (a) it's a nasty subtle
 bug, and (b) Chris seems to be close to tracking it down anyway.
 
My recommendations for you, if you want to work on btrfs, are:
 
  * Build and install the latest kernel from Linus's git repo
 
  * Read and understand the user documentation [2]
 
  * Create one or several btrfs filesystems with different
configurations and learn how they work in userspace -- what are the
features, what are the problems you see? Actually use at least one
of the filesystems you created for real data in daily use (with
backups)
 
  * Build the userspace tools from git
 
  * Pick up one of the userspace projects from [3] and implement that.
If you pick the right one(s), you'll have to learn about some of
the internal structures of the FS anyway. Compile and test your
patch. If you're adding a new feature, write an automated xfstest
for it as well.
 
  * Get that patch accepted. This will probably involve a sequence of
revisions to it, multiple versions over a period of several weeks
or more, with a review process. You should also send your test to
xfstests and get that accepted.
 
  * Do the above again, until you get used to the processes involved,
and have demonstrated that you can work well with the other people
in the subsystem, and are generally producing useful and sane code.
It's all about trust -- can you be trusted to mostly do the right
thing? (So far on linux-kernel, you've rather demonstrated the
opposite: your intentions are good, but your execution leaves a lot
to be desired)
 
  * Use the documentation at [4], and the output of btrfs-debug-tree to
understand the internal structure of the FS
 
  * Pick up one of the smaller, more self-contained ideas from the
projects page [5] (say, [6] or [7]) and try to implement it. Again:
build, write test code, test thoroughly, submit patch for review,
modify as suggested by reviewers, and repeat as often as necessary
 
Hugo.
 
 [1] https://btrfs.wiki.kernel.org/index.php/Data_Structures
 [2] 
 https://btrfs.wiki.kernel.org/index.php/Main_Page#Guides_and_usage_information
 [3] 
 https://btrfs.wiki.kernel.org/index.php/Project_ideas#Userspace_tools_projects
 [4] https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentation
 [5] https://btrfs.wiki.kernel.org/index.php/Project_ideas
 [6] 
 https://btrfs.wiki.kernel.org/index.php/Project_ideas#Cancellable_operations
 [7] 
 https://btrfs.wiki.kernel.org/index.php/Project_ideas#Implement_new_FALLOC_FL_.2A_modes
 
 -- 
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- But people have always eaten people,  / what else is there to ---  
  eat?  / If the Juju had meant us not to eat people / he 
  wouldn't have made us of meat.  



-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- ORLY? IÄ! R'LYH! ---


signature.asc
Description: Digital signature


Re: Help with Project on brtfs wiki

2014-07-26 Thread Hugo Mills
implemented. Start asking _yourself_ the questions of if I want to
achieve this effect, what does the FS need to do? What behaviour would
need to be changed, and how?. When you think you have an answer to
those questions, you can start having a real and useful conversation.
You will probably be wrong, but that's where the process starts, and
you will get better at it over time. If at some point there's
something you don't understand, do ask, but make sure that you can say
what you think you know, and why you can't understand the thing you
are having trouble with. Think of the person responding to the
question: Make it easy to write the reply by ensuring that the reply
can be as short as possible. Some of the time, you will actually
answer your own question by trying to ask it in a sensible way. If you
find that happening, you're asking sensible questions.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I must be musical:  I've got *loads* of CDs ---   


signature.asc
Description: Digital signature


Re: Multi Core Support for compression in compression.c

2014-07-28 Thread Hugo Mills
On Sun, Jul 27, 2014 at 11:21:53PM -0400, Nick Krause wrote:
 On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
  On 07/27/2014 04:47 PM, Nick Krause wrote:
  This may be a bad idea , but compression in brtfs seems to be only
  using one core to compress.
  Depending on the CPU used and the amount of cores in the CPU we can
  make this much faster
  with multiple cores. This seems bad by my reading at least I would
  recommend for writing compression
  we write a function to use a certain amount of cores based on the load
  of the system's CPU not using
  more then 75% of the system's CPU resources as my system when idle has
  never needed more
  then one core of my i5 2500k to run when with interrupts for opening
  eclipse are running. For reading
  compression on good core seems fine to me as testing other compression
  software for reads , it's
  way less CPU intensive.
  Cheers Nick
  We would probably get a bigger benefit from taking an approach like
  SquashFS has recently added, that is, allowing multi-threaded
  decompression fro reads, and decompressing directly into the pagecache.
   Such an approach would likely make zlib compression much more scalable
  on large systems.
 
 
 
 Austin,
 That seems better then my idea as you seem to be more up to date on
 brtfs devolopment.
 If you and the other developers of brtfs are interested in adding this
 as a feature please let
 me known as I would like to help improve brtfs as the file system as
 an idea is great just
 seems like it needs a lot of work :).

   Yes, it probably does need a lot of work. This is (at least one
reason) why it's not been done yet. If you want to work on doing this,
then please do. However, don't expect anyone else to give you a
detailed plan of what code to write. Don't expect anyone else to write
the code for you. You will have to come up with your own ideas as to
how to implement it, and actually do it yourself, including building
it, and testing it.

   That's not to say that you are on your own, though. People will
help -- provided that you aren't asking them to do all the work. You
are not an empty vessel to be filled with the wisdom of the ancients.
This means that *you* have to take action. You have to take yourself
as far as you can in learning how things work. When you get stuck,
work out what it is that you don't know, and then ask about that one
thing. This makes it easier to answer, it shows that you're putting in
effort on your side, and it means that you *actually learn things*.
Questions like what function should I be modifying?, or how do you
want me to do this? show that you haven't put in even the smallest
piece of effort, and will be ignored (f you're lucky). Questions like
I'm trying to implement a crumble filter, but in the mix_breadcrumbs
function, how does it take account of the prestressed_yoghurt field?
show that you've read and understood at least some of the code, and
have thought about what it's doing.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Alert status mauve ocelot: Slight chance of brimstone. Be ---
   prepared to make a nice cup of tea.   


signature.asc
Description: Digital signature


Re: Help with Brtfs Bugs

2014-07-28 Thread Hugo Mills
On Mon, Jul 28, 2014 at 12:00:03AM -0400, Nick Krause wrote:
 Hey Josef,
 Seems there are a lot of brtfs bugs open on the kernel Bugzilla. I am
 new to the brtfs
 side of development so please let me known if you want help cleaning
 up some of the
 bugs here that are actually valid and still open.

   Make up your mind... this is the third unrelated idea you've had
about working in the area of btrfs. You're bouncing around all over
the place like a hyperactive puppy. Pick *one* thing, and just do it.
Put in the effort to learn about the subsystem (read my earlier emails
for a good approach here). Accept that there are no easy one-liners in
the kernel. The path to writing your first kernel patch is *hard*.
Don't give up at the first hint that each thing isn't going to be
solved in 5 minutes.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- What part of gestalt don't you understand? ---   


signature.asc
Description: Digital signature


Re: Work Queue for btrfs compression writes

2014-07-30 Thread Hugo Mills
On Tue, Jul 29, 2014 at 11:54:20PM -0400, Nick Krause wrote:
 Hey Guys ,
 I am new to   reading  and writing  kernel code.I got interested in
 writing code for btrfs as it seems to
 need more work then other file systems and this seems other then
 drivers, a good use of time on my part.
 I interested in helping improving the compression of btrfs by using  a
 set of threads using work queues like XFS
 or reads and keeping the page cache after reading compressed blocks as
 these seem to be a great way to improve
 on  compression performance mostly with large partitions of compressed
 data. I am not asking you to write the code
 for me but as I am new a little guidance and help would be greatly
 appreciated as this seems like too much work for just a newbie.

 * Documentation/workqueue.txt (in general, grep in Documentation
   usually throws up something useful)

 * grep -r alloc_workqueue fs/ shows a lot of uses (including in
   btrfs), so it should be fairly easy to see how to create and manage
   a workqueue.

   I suspect that this may be a medium-sized project, rather than a
small one. My gut feeling (based on limited experience) is that the
fallocate extensions project would be considerably simpler.

   I also noticed from the public reply to the private mail (don't do
this without getting permission from the other person) that you posted
to LKML in this thread (don't switch mailing lists mid-thread) that
you anticipated having problems testing with limited disks -- what you
will find is that testing new kernel code is something that you don't
do on your main development OS installation. Instead, you will need
either a scratch machine that you can easily update, or one or more
virtual machines. qemu/kvm is good for this, because it has a mode
that bypasses the BIOS and bootloader emulation, and just directly
runs a kernel from a file on the host machine. This is fast. You can
pass large sparse files to the VM to act as scratch disks, plus keep
another smaller file for the guest OS (and a copy of it so that you
can throw one away and make another one quickly and easily).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- You've read the project plan.  Forget that. We're going to Do ---  
  Stuff and Have Fun doing it.   


signature.asc
Description: Digital signature


Re: [PATCH] Remove certain calls for releasing page cache

2014-07-31 Thread Hugo Mills
On Wed, Jul 30, 2014 at 10:05:16PM -0400, Nick Krause wrote:
 On Wed, Jul 30, 2014 at 7:30 PM, Dave Airlie airl...@gmail.com wrote:
  This patch removes the lines for releasing the page cache in certain
  files as this may aid in perfomance with writes in the compression
  rountines of btrfs. Please note that this patch has not been tested
  on my own hardware due to no compression based btrfs volumes of my
  own.
 
 
  For all that is sacred, STOP.
[snip]
  But if you want to work on the kernel, this isn't the way to do it, and
  nobody will ever take a patch from you seriously if you continue in this
  fashion.
 
  Dave.
 Dave ,
 Seems I need to have tested this code first.

   You've said this before, having made exactly the same error (not
testing a patch). Yet you do it again. You seem to be ignoring all the
advice you've been given -- or at least not learning from it, and not
learning from your experiences. Could you please, for half an hour or
so, stop thinking about the immediate goal of getting a patch into the
kernel, and take a short while to think about your process of
learning. Look at all the advice you've had (from me, from Ted, from
others), actually understand it, and consider all the things you need
to do which *aren't* hacking up a lump of C. Actually learn these
things -- have them in your mind all the time.

   I would appreciate it if you could actually engage with someone
(doesn't have to be me) about this -- why are you ignoring the advice?
Is it because you don't understand it? Is it because you think you can
cut corners? Is it because you're concetrating on the code so much that
you're forgetting it?

   The main thing you're doing which is making people angry is not
because you're submitting bad patches (although you are). It's because
you're not listening to advice, and you're not apparently learning
anything from the feedback you're given. Your behaviour is not
changing over time, which makes you look like a waste of time to all
those people trying to help you.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- That's not rain, that's a lake with slots in it --- 


signature.asc
Description: Digital signature


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-07-31 Thread Hugo Mills
On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote:
 This adds checks for the stated modes as if they are crap we will return error
 not supported.

   You've just enabled two options, but you haven't actually
implemented the code behind it. I would tell you *NOT* to do anything
else on this work until you can answer the question: What happens if
you apply this patch, create a large file called foo.txt, and then a
userspace program executes the following code?

int fd = open(foo.txt, O_RDWR);
fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50);

   Try it on a btrfs filesystem, both with and without your patch.
Also try it on an ext4 filesystem.

   Once you've done all of that, reply to this mail and tell me what
the problem is with this patch. You need to make two answers: what are
the technical problems with the patch? What errors have you made in
the development process?

   *Only* if you can answer those questions sensibly, should you write
any more patches, of any kind.

   Hugo.

 Signed-off-by: Nicholas Krause xerofo...@gmail.com
 ---
  fs/btrfs/file.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
 index 1f2b99c..599495a 100644
 --- a/fs/btrfs/file.c
 +++ b/fs/btrfs/file.c
 @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int mode,
   alloc_end = round_up(offset + len, blocksize);
  
   /* Make sure we aren't being give some crap mode */
 - if (mode  ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
 + if (mode  ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE|
 + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
   return -EOPNOTSUPP;
  
   if (mode  FALLOC_FL_PUNCH_HOLE)
 -- 
 1.7.10.4
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The glass is neither half-full nor half-empty; it is twice as ---  
large as it needs to be. 


signature.asc
Description: Digital signature


Re: Implement new FALLOC_FL_* modes

2014-07-31 Thread Hugo Mills
On Thu, Jul 31, 2014 at 02:08:15PM -0400, Nick Krause wrote:
 I am doing this project from the btrfs wiki, since I am new after
 reading the code using lxr I am wondering if
 we can base the code off that already in ext4 for these modes as they
 seem to work rather well. I am wondering
 through as a newbie some of the data structures are ext4 based and the
 same goes for some of the functions.
 I am wondering what the equivalent structures and functions are in
 btrfs as I can't seem to find them after reading
 the code as a newbie for the last few hours in lxr. Maybe I just
 missing something?

   The fundamental on-disk structures for btrfs and ext4 are totally
different. You will get very confused if you expect the ext4 code to
work in the btrfs module, or even if you expect structures to be
similar.

   But first -- answer my questions in the reply I made to your patch
just now. Do nothing else until you can answer all three of those
questions sensibly.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Great films about cricket:  200/1: A Pace Odyssey ---


signature.asc
Description: Digital signature


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-08-01 Thread Hugo Mills
On Thu, Jul 31, 2014 at 09:53:15PM -0400, Nick Krause wrote:
 On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills h...@carfax.org.uk wrote:
  On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote:
  This adds checks for the stated modes as if they are crap we will return 
  error
  not supported.
 
 You've just enabled two options, but you haven't actually
  implemented the code behind it. I would tell you *NOT* to do anything
  else on this work until you can answer the question: What happens if
  you apply this patch, create a large file called foo.txt, and then a
  userspace program executes the following code?
 
  int fd = open(foo.txt, O_RDWR);
  fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50);
 
 Try it on a btrfs filesystem, both with and without your patch.
  Also try it on an ext4 filesystem.
 
 Once you've done all of that, reply to this mail and tell me what
  the problem is with this patch. You need to make two answers: what are
  the technical problems with the patch? What errors have you made in
  the development process?
 
 *Only* if you can answer those questions sensibly, should you write
  any more patches, of any kind.
[snip]

 Calls are there in btrfs , therefore will either kernel panic or
 cause an oops.

   That's a guess. I can tell it's a guess, because I've actually read
(some of) the rest of that function, so I've got a good idea of what I
think it will do -- and panic or oops is not the answer. Try again.
You can answer this question two ways: by test (see my suggestion
above), or by reading and understanding the code. Either will work in
this case, but doing neither is not an option for someone who wants to
change the function.

 Need to test this patch as this is very easy to catch bug.

   So why didn't you? It's your patch, testing it is your job --
*before* it gets out into the outside world.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- But people have always eaten people,  / what else is there to ---  
 eat?  / If the Juju had meant us not to eat people / he 
 wouldn't have made us of meat.  


signature.asc
Description: Digital signature


Re: ENOSPC with mkdir and rename

2014-08-04 Thread Hugo Mills
 for dealing with early
ENOSPC problems, so other things should probably point at that.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You stay in the theatre because you're afraid of having no ---
 money? There's irony... 


signature.asc
Description: Digital signature


Re: ENOSPC with mkdir and rename

2014-08-04 Thread Hugo Mills
On Mon, Aug 04, 2014 at 11:31:57AM +0100, Peter Waller wrote:
 Thanks Hugo, this is the most informative e-mail yet! (more inline)
 
 On 4 August 2014 11:22, Hugo Mills h...@carfax.org.uk wrote:
 
   * btrfs fi show
  - look at the total and used values. If used  total, you're OK.
If used == total, then you could potentially hit ENOSPC.
 
 Another thing which is unclear and undocumented anywhere I can find is
 what the meaning of `btrfs fi show` is.
 
 I'm sure it is totally obvious if you are a developer or if you have
 used it for long enough. But it isn't covered in the manpage, nor in
 the oracle documentation, nor anywhere on the wiki that I could find.
 
 When I looked at it in my problematic situation, it said 500 GiB /
 500 GiB. That sounded fine to me because I interpreted the output as
 what fraction of which RAID devices BTRFS was using. In other words, I
 thought Oh, BTRFS will just make use of the whole device that's
 available to it.. I thought that `btrfs fi df` was the source of
 information for how much space was free inside of that.

   That's actually pretty much accurate. The problem is that btrfs
distinguishes between space available for data and space available
for metadata, and doesn't trade off one for the other once they've
been allocated. The balance operation frees up some of the allocation,
allowing the newly-freed space to be allocated again for something
else.

   All of the information about the data/metadata split, and what's
used out of that, is revealed by btrfs fi df.

   * btrfs fi df
  - look at metadata used vs total. If these are close to zero (on
3.15+) or close to 512 MiB (on 3.15), then you are in danger of
ENOSPC.
 
 Hmm. It's unfortunate that this could indicate an amount of space
 which is free when it actually isn't.

   That's why the 512 MiB block reserve was split out of metadata --
so that you don't look at metadata and say oh, I've got half a gig
free, that's OK.

  - look at data used vs total. If the used is much smaller than
total, you can reclaim some of the allocation with a filtered
balance (btrfs balance start -dusage=5), which will then give
you unallocated space again (see the btrfs fi show test).
 
 So the filtered balance didn't help in my situation. I understand it's
 something to do with the 5 parameter. But I do not understand what
 the impact of changing this parameter is. It is something to do with a
 fraction of something, but those things are still not present in my
 mental model despite a large amount of reading. Is there an
 illustration which could clear this up?

   The 5 is 5%. So, it'll only look at chunks which are less than 5%
full. David Sterba published a patch that would balance the
(approximately N) least-used chunks, which is a considerably more
usable approach, but I don't know what happened to that one.

 Among other things I also got the kernel stack trace I pasted at the
 bottom of the first e-mail to this thread when I did the rebalance.

   OK, I'll go back and read that. You probably shouldn't have had it,
though. :)

 This FAQ entry is pretty horrible, I'm afraid. I actually started
  rewriting it here to try to make it clearer what's going on. I'll try
  to work on it a bit more this week and put out a better version for
  the wiki.
 
 This is great to hear! :)
 
 Thanks for your response Hugo, that really cleared up a lot of mental
 model problems. I hope the documentation can be improved so that
 others can learn from my mistakes.

   I do try to work on it every so often. Note to self: win lottery,
or get cloned.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You stay in the theatre because you're afraid of having no ---
 money? There's irony... 


signature.asc
Description: Digital signature


Re: ENOSPC with mkdir and rename

2014-08-04 Thread Hugo Mills
On Mon, Aug 04, 2014 at 01:04:25PM +0200, Clemens Eisserer wrote:
 Hi Hugo,
 
 On the 3.15+ kernels, the block reserve is split out of metadata
  and reported separately. This helps with the following process:
 
 Thanks a lot for pointing this out, I hadn't noticed this change until now.
 
 One thing I didn't find any information about is the overhead
 introduced by mixied-mode.
 It would be great if you could explain it in a few sentences.

   I don't know, I'm afraid. I don't think we've got any benchmarks on
the scale of the slowdown.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Reading Mein Kampf won't make you a Nazi. Reading Das Kapital ---  
 won't make you a communist. But most trolls started out 
with a copy of Lord of the Rings.


signature.asc
Description: Digital signature


Re: ENOSPC with mkdir and rename

2014-08-04 Thread Hugo Mills
On Mon, Aug 04, 2014 at 02:17:02PM +0100, Peter Waller wrote:
 For anyone else having this problem, this article is fairly useful for
 understanding disk full problems and rebalance:
 
 http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
 
 It actually covers the problem that I had, which is that a rebalance
 can't take place because it is full.
 
 I still am unsure what is really wrong with this whole situation. Is
 it that I wasn't careful to do a rebalance when I should have been
 doing? Is it that BTRFS doesn't do a rebalance automatically when it
 could in principle?

   This latter one.

   Well, actually two things: the FS should be capable of autonomously
rebalancing at low bandwidth to prevent this problem, but nobody's got
round to implementing it yet. Secondly, it should not be possible to
get into a state where you can't run the balance -- Josef spent about
three kernel revisions fixing the block reserve code to that end.
However, since about 3.14, there's been more cases like yours show up,
so I think there's been a regression. It's not very common, though. I
think we've had maybe a dozen reported instances in the last 6 months.
Someone on IRC had it just now, though, and captured a metadata image,
so at least we've got some (meta)data to work with now.

 It's pretty bad to end up in a situation (with spare space) where the
 only way out is to add more storage, which may be impractical,
 difficult or expensive.
 
 The other thing that I still don't understand I've seen repeated in a
 few places, from the above article:
 
 because the filesystem is only 55% full, I can ask balance to rewrite
 all chunks that are more than 55% full
 
 Then he uses `btrfs balance start -dusage=55 /mnt/btrfs_pool1`. I
 don't understand the relationship between the FS is 55% full and
 chunks more than 55% full. What's going on here?

   Pigeonhole principle -- if the FS is 55% full, there must be at
least one chunk = 55% full.

 I conclude that now since I have added more storage, the rebalance
 won't fail and if I keep rebalancing from a cron job I won't hit this
 problem again (unless the filesystem fills up very fast! what then?).
 I don't know however what value to assign to `-dusage` in general for
 the cron rebalance. Any hints?

   Try with increasing values until you've moved as many chunks as you
want to. This is what David's balance at least N chunks patch did.
I'd suggest start with 5, and go up in increments of 5, if you're
making it an automatic process. Stop when you reach some threshold
(like, say, 80), or when it reports that it's actually moved some
chunks.

   Doing it manually, I usually recommend 5, 10, 20, 50, 80.

   Hugo.

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Well, you don't get to be a kernel hacker simply by looking ---   
good in Speedos. -- Rusty Russell


signature.asc
Description: Digital signature


btrfs sub list output

2014-08-04 Thread Hugo Mills
   The output options of btrfs sub list seem a bit... arbitrary?
awkward? unhelpful?

   Here's my problem: Given a path at some arbitrary point into a
mounted btrfs (sub)volume, find all subvolumes visible under that
point, and identify their absolute path names.

   My test btrfs filesystem looks like this:

TOP_LEVEL
   root
   home
   test
  subdir (a subdir, not a subvol)
 foo
  bar

# mount -osubvol=test /dev/sda2 /mnt

so I want to be able to go from that configuration (knowing nothing
about the mountpoint), and map (both ways) between UUID and the
(e.g.) /mnt/foo path. But:

# btrfs sub list -oau /mnt  # and
# btrfs sub list -au /mnt
ID 259 gen 549115 top level 5 uuid 6a50af8d-83dd-9943-b5b7-4f8b0a7f3fa7 path 
FS_TREE/root
ID 260 gen 548768 top level 5 uuid c73d4296-7c30-074e-b647-e6e83025a125 path 
FS_TREE/home
ID 11826 gen 549045 top level 272 uuid f78aed0d-db5a-a342-b422-87abfa18efe0 
path test/subdir/foo
ID 11827 gen 549046 top level 272 uuid a5cea7ae-3fdd-c247-8905-40cbb7f39017 
path test/bar

   Here, I can easily filter out the subvols I want (they're the ones
without FS_TREE), but I have to know the mountpoint (which I can
find) and the subvol= parameter (which I think I can't).

# btrfs sub list -ou /mnt/subdir/
ID 11826 gen 549045 top level 272 uuid f78aed0d-db5a-a342-b422-87abfa18efe0 
path test/subdir/foo
ID 11827 gen 549046 top level 272 uuid a5cea7ae-3fdd-c247-8905-40cbb7f39017 
path test/bar

   This filters the subvols correctly, but otherwise has the same
drawbacks as above.

# btrfs sub list -u /mnt
ID 259 gen 549114 top level 5 uuid 6a50af8d-83dd-9943-b5b7-4f8b0a7f3fa7 path 
root
ID 260 gen 548768 top level 5 uuid c73d4296-7c30-074e-b647-e6e83025a125 path 
home
ID 11826 gen 549045 top level 272 uuid f78aed0d-db5a-a342-b422-87abfa18efe0 
path subdir/foo
ID 11827 gen 549046 top level 272 uuid a5cea7ae-3fdd-c247-8905-40cbb7f39017 
path bar

   Here, I get the paths relative to the mountpoint, which is what I
want, but mixed up with paths outside the mountpoint as well, which I
don't, and have no way of distinguishing the two classes without
making a separate call to btrfs sub list -a and filtering out the
UUIDs with FS_TREE in the name.

   Incidentally, if the parameter to btrfs sub list is inside another
subvolume within the mount, then the relative effects are all
relative to that subvol, not to the mountpoint.

   I'm finding it hard to work out how the variants with -o or -a (or
both) are actually helpful at all, now that I come to use them in more
than a vague human-readable form. Have I missed something, or is this
actually an awkward furball of confusing and mostly unhelpful options?
Are these options actually doing what the original author intended? If
so, what was that intent?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Summoning his Cosmic Powers, and glowing slightly ---
from his toes... 


signature.asc
Description: Digital signature


Re: Stack dumps in use_block_rsv while rebalancing (block rsv returned -28)

2014-08-05 Thread Hugo Mills
On Tue, Aug 05, 2014 at 10:34:13AM +0100, Peter Waller wrote:
 I already posted this in the thread ENOSPC with mkdir and rename,
 but now I have a device with 100GB unallocated on the btrfs fi sh
 output, and when I run a rebalance of the form:
 
  btrfs filesystem balance start -dusage=50 -musage=10 $mount
 
 I get more than 75 of such stack traces contaminating the klog. I've
 put some of them up in a gist here:
 https://gist.github.com/pwaller/1df8a7efc2f10343f2e3 and one of them
 is reproduced below.
 
 Is this harmful or expected? Are there any workarounds?

   It's a warning, not an oops, so it's less immediately dangerous.
The other key thing is block rsv returned -28, which says it's an
ENOSPC. My guess would be that you've got ENOSPC debugging enabled in
the kernel, and that the backtraces, while scary, are essentially
harmless (if irritating).

   Hugo.

 Thanks,
 
 - Peter
 
 [376007.681938] [ cut here ]
 [376007.681957] WARNING: CPU: 1 PID: 27021 at
 /home/apw/COD/linux/fs/btrfs/
 extent-tree.c:6946
 use_block_rsv+0xfd/0x1a0 [btrfs]()
 [376007.681958] BTRFS: block rsv returned -28
 [376007.681959] Modules linked in: softdog tcp_diag inet_diag dm_crypt
 ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt
 i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp
 iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq
 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
 aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse
 [376007.681980] CPU: 1 PID: 27021 Comm: pam_script_ses_ Tainted: G
W 3.15.7-031507-generic #201407281235
 [376007.681981] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
 [376007.681983]  1b22 8800acca39d8 8176f115
 0007
 [376007.681986]  8800acca3a28 8800acca3a18 8106ceac
 8801efc37870
 [376007.681989]  88017db0ff00 8801aedcd800 1000
 88001c987000
 [376007.681992] Call Trace:
 [376007.682000]  [8176f115] dump_stack+0x46/0x58
 [376007.682005]  [8106ceac] warn_slowpath_common+0x8c/0xc0
 [376007.682008]  [8106cf96] warn_slowpath_fmt+0x46/0x50
 [376007.682016]  [a00d9d1d] use_block_rsv+0xfd/0x1a0 [btrfs]
 [376007.682024]  [a00de687] btrfs_alloc_free_block+0x57/0x220 
 [btrfs]
 [376007.682027]  [8178033c] ? __do_page_fault+0x28c/0x550
 [376007.682031]  [8119749f] ? page_add_file_rmap+0x6f/0xb0
 [376007.682037]  [a00c8a3c] btrfs_copy_root+0xfc/0x2b0 [btrfs]
 [376007.682041]  [811c60b9] ? memcg_check_events+0x29/0x50
 [376007.682051]  [a013a583] ? create_reloc_root+0x33/0x2c0 [btrfs]
 [376007.682061]  [a013a743] create_reloc_root+0x1f3/0x2c0 [btrfs]
 [376007.682064]  [811dd073] ? generic_permission+0xf3/0x120
 [376007.682073]  [a0140eb8] btrfs_init_reloc_root+0xb8/0xd0 [btrfs]
 [376007.682082]  [a00ee967]
 record_root_in_trans.part.30+0x97/0x100 [btrfs]
 [376007.682090]  [a00ee9f4] record_root_in_trans+0x24/0x30 [btrfs]
 [376007.682098]  [a00efeb1]
 btrfs_record_root_in_trans+0x51/0x80 [btrfs]
 [376007.682106]  [a00f13d6]
 start_transaction.part.35+0x86/0x560 [btrfs]
 [376007.682109]  [8132c197] ? apparmor_capable+0x27/0x80
 [376007.682117]  [a00f18d9] start_transaction+0x29/0x30 [btrfs]
 [376007.682125]  [a00f19a7] btrfs_join_transaction+0x17/0x20 [btrfs]
 [376007.682133]  [a00f7fa8] btrfs_dirty_inode+0x58/0xe0 [btrfs]
 [376007.682141]  [a00fcaf2] btrfs_setattr+0xa2/0xf0 [btrfs]
 [376007.682144]  [811eec74] notify_change+0x1c4/0x3b0
 [376007.682146]  [811dde96] ? final_putname+0x26/0x50
 [376007.682149]  [811d088d] chown_common+0x16d/0x1a0
 [376007.682153]  [811f2b08] ? __mnt_want_write+0x58/0x70
 [376007.682156]  [811d1a8f] SyS_fchownat+0xbf/0x100
 [376007.682159]  [811d1aed] SyS_chown+0x1d/0x20
 [376007.682163]  [817858bf] tracesys+0xe1/0xe6
 [376007.682165] ---[ end trace 1853311c87a5cd94 ]---
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- UNIX: Italian pen maker --- 


signature.asc
Description: Digital signature


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Hugo Mills
On Wed, Aug 06, 2014 at 12:21:59PM +0200, Martin Steigerwald wrote:
 It basically happened on about the first heavy write I/O occasion after
 the BTRFS trees filled the complete device:
 
 I am now balancing the trees down to lower sizes manually with
 
 btrfs balance start -dusage=10 /home
 
 btrfs balance start -musage=10 /home

   Note that balance has nothing to do with balancing the metadata
trees. The tree structures are automatically balanced as part of their
normal operation. A btrfs balance start is a much higher-level
operation. It's called balance because the overall effect is to
balance the data usage evenly across multiple devices. (Actually, to
balance the available space evenly).

   Also note that the data part isn't tree-structured, so referring to
balancing the trees with a -d flag is doubly misleading. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You know... I'm sure this code would seem a lot better if I ---   
 never tried running it. 


signature.asc
Description: Digital signature


Re: File system stuck in scrub

2014-08-11 Thread Hugo Mills
On Mon, Aug 11, 2014 at 08:12:46AM -0700, Nikolaus Rath wrote:
 I started a scrub of one of my btrfs filesystem and then had to restart
 the system. `systemctl restart` seemed to terminate all processes, but
 then got stuck at the end. The disk activity led was still flashing
 rapidly at that point, so I assume that the active scrub was preventing
 the reboot (is that a bug or a feature?).

   Shouldn't have stopped it.

 In any case, I could not wait for that so I power cycled. But now my
 file system seems to be stuck in a scrub that can neither be completed
 nor cancelled:
 
 $ sudo btrfs scrub status /home/nikratio/
 scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds
 total bytes scrubbed: 209.97GiB with 0 errors
 
 $ date
 Sun Aug 10 22:00:44 PDT 2014
 
 $ sudo btrfs scrub cancel /home/nikratio/
 ERROR: scrub cancel failed on /home/nikratio/: not running
 
 $ sudo btrfs scrub start /home/nikratio/
 ERROR: scrub is already running.
 To cancel use 'btrfs scrub cancel /home/nikratio/'.
 To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
 
 Note that the scrub was started more than 3 hours ago, but claims to
 have been running for only 1562 seconds.

   This is a regrettably common problem -- fortunately with a simple
solution. The userspace scrub monitor died in the reboot, leaving the
status file present. If you delete the status file, which is in
/var/lib/btrfs/, that should allow you to start a new scrub.

 I then figured that maybe I need to run btrfsck. This gave the following
 output:
 
 checking extents
 checking free space cache
 checking fs roots
 root 5 inode 3149791 errors 400, nbytes wrong
 root 5 inode 3150233 errors 400, nbytes wrong
 root 5 inode 3150238 errors 400, nbytes wrong
 [102 similar lines]
 Checking filesystem on /dev/mapper/vg0-nikratio_crypt
 UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
 free space inode generation (0) did not match free space cache generation 
 (161262)
[snip]
 found 216444746042 bytes used err is 1
 total csum bytes: 383160676
 total tree bytes: 875753472
 total fs tree bytes: 284246016
 total extent tree bytes: 69320704
 btree space waste bytes: 205021777
 file data blocks allocated: 3701556121600
  referenced 388107321344
 Btrfs v3.14.1
 
 So nothing about the scrub, but apparently some other errors.

   The free space inode generation errors are harmless. The wrong
nbytes is probably not horrifically damaging, but I don't know so much
about that one.

 Can someone tell me:
 
  * Should I be able to restart while a scrub is in progress, or is that
deliberately prevented by btrfs?

   Restart the machine? Yes.

  * How can I resume or cancel the scrub?

   It's probably simply not running -- see above.

  * Is it more risky to leave the above errors uncorrected, or to run
btrfsck with --repair?

   I would, I think, leave them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- We are all lying in the gutter,  but some of us are looking ---   
  at the stars.  


signature.asc
Description: Digital signature


[PATCH] btrfs-progs: Add -R to list UUIDs of original received subvolume

2014-08-12 Thread Hugo Mills
When using send/receive, it it useful to be able to match up source
subvols on the send side (as, say, for -p or -c clone sources) with their
corresponding copies on the receive side. This patch adds a -R option to
btrfs sub list to show the received subvolume UUID on the receive side,
allowing the user to perform that matching correctly.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs-list.c | 32 +++-
 btrfs-list.h |  2 ++
 cmds-subvolume.c |  6 +-
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index 542dfe0..01ccca9 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -85,6 +85,11 @@ static struct {
.need_print = 0,
},
{
+   .name   = received_uuid,
+   .column_name= Received UUID,
+   .need_print = 0,
+   },
+   {
.name   = uuid,
.column_name= UUID,
.need_print = 0,
@@ -391,7 +396,7 @@ static struct root_info *root_tree_search(struct 
root_lookup *root_tree,
 static int update_root(struct root_lookup *root_lookup,
   u64 root_id, u64 ref_tree, u64 root_offset, u64 flags,
   u64 dir_id, char *name, int name_len, u64 ogen, u64 gen,
-  time_t ot, void *uuid, void *puuid)
+  time_t ot, void *uuid, void *puuid, void *ruuid)
 {
struct root_info *ri;
 
@@ -429,6 +434,8 @@ static int update_root(struct root_lookup *root_lookup,
memcpy(ri-uuid, uuid, BTRFS_UUID_SIZE);
if (puuid)
memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE);
+   if (ruuid)
+   memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE);
 
return 0;
 }
@@ -447,17 +454,19 @@ static int update_root(struct root_lookup *root_lookup,
  * ot: the original time(create time) of the root
  * uuid: uuid of the root
  * puuid: uuid of the root parent if any
+ * ruuid: uuid of the received subvol, if any
  */
 static int add_root(struct root_lookup *root_lookup,
u64 root_id, u64 ref_tree, u64 root_offset, u64 flags,
u64 dir_id, char *name, int name_len, u64 ogen, u64 gen,
-   time_t ot, void *uuid, void *puuid)
+   time_t ot, void *uuid, void *puuid, void *ruuid)
 {
struct root_info *ri;
int ret;
 
ret = update_root(root_lookup, root_id, ref_tree, root_offset, flags,
- dir_id, name, name_len, ogen, gen, ot, uuid, puuid);
+ dir_id, name, name_len, ogen, gen, ot,
+ uuid, puuid, ruuid);
if (!ret)
return 0;
 
@@ -501,6 +510,9 @@ static int add_root(struct root_lookup *root_lookup,
if (puuid)
memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE);
 
+   if (ruuid)
+   memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE);
+
ret = root_tree_insert(root_lookup, ri);
if (ret) {
printf(failed to insert tree %llu\n, (unsigned long 
long)root_id);
@@ -978,6 +990,7 @@ static int __list_subvol_search(int fd, struct root_lookup 
*root_lookup)
time_t t;
u8 uuid[BTRFS_UUID_SIZE];
u8 puuid[BTRFS_UUID_SIZE];
+   u8 ruuid[BTRFS_UUID_SIZE];
 
root_lookup_init(root_lookup);
memset(args, 0, sizeof(args));
@@ -1030,7 +1043,7 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
 
add_root(root_lookup, sh.objectid, sh.offset,
 0, 0, dir_id, name, name_len, 0, 0, 0,
-NULL, NULL);
+NULL, NULL, NULL);
} else if (sh.type == BTRFS_ROOT_ITEM_KEY) {
ri = (struct btrfs_root_item *)(args.buf + off);
gen = btrfs_root_generation(ri);
@@ -1041,16 +1054,18 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
ogen = btrfs_root_otransid(ri);
memcpy(uuid, ri-uuid, BTRFS_UUID_SIZE);
memcpy(puuid, ri-parent_uuid, 
BTRFS_UUID_SIZE);
+   memcpy(ruuid, ri-received_uuid, 
BTRFS_UUID_SIZE);
} else {
t = 0;
ogen = 0;
memset(uuid, 0, BTRFS_UUID_SIZE);
memset(puuid, 0, BTRFS_UUID_SIZE);
+   memset(ruuid, 0, BTRFS_UUID_SIZE);
}
 
add_root(root_lookup, sh.objectid, 0,
 sh.offset, flags, 0, NULL, 0

[PATCH] btrfs-progs: Fix spelling in btrfs sub list help

2014-08-12 Thread Hugo Mills
below, not bellow

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 cmds-subvolume.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 5216e53..349d0db 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -390,7 +390,7 @@ static const char * const cmd_subvol_list_usage[] = {
 to the given path,
-c   print the ogeneration of the subvolume,
-g   print the generation of the subvolume,
-   -o   print only subvolumes bellow specified path,
+   -o   print only subvolumes below specified path,
-u   print the uuid of subvolumes (and snapshots),
-q   print the parent uuid of the snapshots,
-R   print the uuid of the received snapshots,
-- 
2.0.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs-progs: Add -R to list UUIDs of original received subvolume

2014-08-13 Thread Hugo Mills
When using send/receive, it it useful to be able to match up source
subvols on the send side (as, say, for -p or -c clone sources) with their
corresponding copies on the receive side. This patch adds a -R option to
btrfs sub list to show the received subvolume UUID on the receive side,
allowing the user to perform that matching correctly.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
v1 - v2: Update man page as well.

 Documentation/btrfs-subvolume.txt |  2 ++
 btrfs-list.c  | 32 +++-
 btrfs-list.h  |  2 ++
 cmds-subvolume.c  |  6 +-
 4 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/Documentation/btrfs-subvolume.txt 
b/Documentation/btrfs-subvolume.txt
index a519131..789e462 100644
--- a/Documentation/btrfs-subvolume.txt
+++ b/Documentation/btrfs-subvolume.txt
@@ -104,6 +104,8 @@ print only subvolumes bellow specified path.
 print the UUID of the subvolume.
 -q
 print the parent uuid of subvolumes (and snapshots).
+-R
+print the UUID of the sent subvolume, where the subvolume is the result of a 
receive operation
 -t
 print the result as a table.
 -s
diff --git a/btrfs-list.c b/btrfs-list.c
index 542dfe0..01ccca9 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -85,6 +85,11 @@ static struct {
.need_print = 0,
},
{
+   .name   = received_uuid,
+   .column_name= Received UUID,
+   .need_print = 0,
+   },
+   {
.name   = uuid,
.column_name= UUID,
.need_print = 0,
@@ -391,7 +396,7 @@ static struct root_info *root_tree_search(struct 
root_lookup *root_tree,
 static int update_root(struct root_lookup *root_lookup,
   u64 root_id, u64 ref_tree, u64 root_offset, u64 flags,
   u64 dir_id, char *name, int name_len, u64 ogen, u64 gen,
-  time_t ot, void *uuid, void *puuid)
+  time_t ot, void *uuid, void *puuid, void *ruuid)
 {
struct root_info *ri;
 
@@ -429,6 +434,8 @@ static int update_root(struct root_lookup *root_lookup,
memcpy(ri-uuid, uuid, BTRFS_UUID_SIZE);
if (puuid)
memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE);
+   if (ruuid)
+   memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE);
 
return 0;
 }
@@ -447,17 +454,19 @@ static int update_root(struct root_lookup *root_lookup,
  * ot: the original time(create time) of the root
  * uuid: uuid of the root
  * puuid: uuid of the root parent if any
+ * ruuid: uuid of the received subvol, if any
  */
 static int add_root(struct root_lookup *root_lookup,
u64 root_id, u64 ref_tree, u64 root_offset, u64 flags,
u64 dir_id, char *name, int name_len, u64 ogen, u64 gen,
-   time_t ot, void *uuid, void *puuid)
+   time_t ot, void *uuid, void *puuid, void *ruuid)
 {
struct root_info *ri;
int ret;
 
ret = update_root(root_lookup, root_id, ref_tree, root_offset, flags,
- dir_id, name, name_len, ogen, gen, ot, uuid, puuid);
+ dir_id, name, name_len, ogen, gen, ot,
+ uuid, puuid, ruuid);
if (!ret)
return 0;
 
@@ -501,6 +510,9 @@ static int add_root(struct root_lookup *root_lookup,
if (puuid)
memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE);
 
+   if (ruuid)
+   memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE);
+
ret = root_tree_insert(root_lookup, ri);
if (ret) {
printf(failed to insert tree %llu\n, (unsigned long 
long)root_id);
@@ -978,6 +990,7 @@ static int __list_subvol_search(int fd, struct root_lookup 
*root_lookup)
time_t t;
u8 uuid[BTRFS_UUID_SIZE];
u8 puuid[BTRFS_UUID_SIZE];
+   u8 ruuid[BTRFS_UUID_SIZE];
 
root_lookup_init(root_lookup);
memset(args, 0, sizeof(args));
@@ -1030,7 +1043,7 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
 
add_root(root_lookup, sh.objectid, sh.offset,
 0, 0, dir_id, name, name_len, 0, 0, 0,
-NULL, NULL);
+NULL, NULL, NULL);
} else if (sh.type == BTRFS_ROOT_ITEM_KEY) {
ri = (struct btrfs_root_item *)(args.buf + off);
gen = btrfs_root_generation(ri);
@@ -1041,16 +1054,18 @@ static int __list_subvol_search(int fd, struct 
root_lookup *root_lookup)
ogen = btrfs_root_otransid(ri);
memcpy(uuid, ri-uuid, BTRFS_UUID_SIZE);
memcpy(puuid, ri-parent_uuid, 
BTRFS_UUID_SIZE

Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2

2014-08-19 Thread Hugo Mills
On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote:
 On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote:
  Hello list,
  
  I want to use an ARM kirkwood based NSA325v2 NAS (dubbed Receiver) for
  receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop
  running kubuntu 14.04 LTS (dubbed Source), storing them on a 3TB WD
  red disk (having GPT label, partitions created with parted).
  
  But all the btrfs receive commands on 'Receiver' fail soon with e.g.:
ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File
  too large
  ... and that stops reception/snapshot creation.
 
 ...
 
  Increasing the verbosity with -v -v for btrfs receive shows the
  following differences between receive operations on 'Receiver' and
  'OtherHost', both of them using the identical inputfile
  /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send
  
  * the chown and chmod operations are different - resulting in
  weird/wrong permissions and sizes on 'Receiver' side.
  * what's stransid, this is the first line that differs
 
 This is interesting, thanks for going to the trouble to show those
 diffs.
 
 That the commands and strings match up show us that the basic tlv header
 chaining is working.  But the u64 attribute values are sometimes messed
 up.  And messed up in a specific way.  A variable number of low order
 bytes are magically appearing.
 
 (gdb) print/x 11709972488
 $2 = 0x2b9f80008
 (gdb) print/x 178680
 $3 = 0x2b9f8
 
 (gdb) print/x 588032
 $6 = 0x8f900
 (gdb) print/x 2297
 $7 = 0x8f9
 
 Some light googling makes me think that the Marvell Kirkwood is not
 friendly at all to unaligned accesses.

   ARM isn't in general -- it never has been, even 20 years ago in the
ARM3 days when I was writing code in ARM assembler. We've been bitten
by this before in btrfs (mkfs on ARM works, mounting it fails fast,
because userspace has a trap to fix unaligned accesses, and the kernel
doesn't).

 The (biting tongue) send and receive code is playing some games with
 casting aligned and unaligned pointers.  Maybe that's upsetting the arm
 toolchain/kirkwood.

   Almost certainly the toolchain isn't identifying the unaligned
accesses, and thus building code that uses them causes stuff to break.

   There's a workaround for userspace that you can use to verify that
this is indeed the problem: echo 2 /proc/cpu/alignment will tell the
kernel to fix up unaligned accesses initiated in userspace. It's a
performance killer, but it should serve to identify whether the
problem is actually this.

   Hugo.

  Does this completely untested patch to btrfs-progs,
 to be run on the receiver, do anything?
 
 - z
 
 diff --git a/send-stream.c b/send-stream.c
 index 88e18e2..4f8dd83 100644
 --- a/send-stream.c
 +++ b/send-stream.c
 @@ -204,7 +204,7 @@ out:
 int __len; \
 TLV_GET(s, attr, (void**)__tmp, __len); \
 TLV_CHECK_LEN(sizeof(*__tmp), __len); \
 -   *v = le##bits##_to_cpu(*__tmp); \
 +   *v = get_unaligned_le##bits(__tmp); \
 } while (0)
  
  #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v)

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold.


signature.asc
Description: Digital signature


Re: Putting very big and small files in one subvolume?

2014-08-29 Thread Hugo Mills
On Fri, Aug 29, 2014 at 09:34:54PM +0530, Shriramana Sharma wrote:
 On 8/17/14, Shriramana Sharma samj...@gmail.com wrote:
  Hello. One more Q re generic BTRFS behaviour.
  https://btrfs.wiki.kernel.org/index.php/Main_Page specifically
  advertises BTRFS's Space-efficient packing of small files.
 
 Hello. I realized that while I got lots of interesting advice on how
 to best layout my FS on multiple devices/FSs, I would like to
 specifically know how exactly the above works (in not-too-technical
 terms) so I'd like to decide for myself if the above feature of BTRFS
 would suit my particular purpose.

   In brief: For small files (typically under about 3.5k), the FS can
put the file's data in the metadata -- specifically, the extent tree
-- so that the data is directly available without a second seek to
find it.

   The longer version: btrfs has a number of B-trees in its metadata.
These are trees with a high fan-out (from memory, it's something like
30-240 children each, depending on the block size), and with the
actual data being stored at the leaves of the tree. Each leaf of the
tree is a fixed size, depending on the options passed to mkfs.
Typically 4k-32k.

   The data in the trees is stored as a key and a value -- the tree
indexes the keys efficiently, and stores the values (usually some data
structure like an inode or file extent information) in the same leaf
node as the key -- keys at the front of the leaf, data at the back.

   The extent tree keeps track of the contiguous byte sequences of
each file, and where those sequences can be found on the FS. To read a
file, the FS looks up the file's extents in the extent tree, and then
has to go and find the data that it points to. This involves an extra
read of the disk, which is slow. However, the metadata tree leaf is
already in RAM (because the FS has just read it). So, for performance
and space efficiency reasons, it can optionally store data for small
files as part of the value component of the key/value pair for the
file's extent. This means that the file's data is available
immediately, without the extra disk read.

   Drawbacks -- metadata on btrfs is usually DUP, which means two
copies, so storing lots of medium-small files (2k-4k) will take up
more space than it would otherwise, because you're storing two copies
and not saving enough space to make it worthwhile. It also makes it
harder to calculate the used vs free values for df.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great films about cricket: Umpire of the Rising Sun ---   


signature.asc
Description: Digital signature


Re: Btrfs-progs-3.16: fs metadata is both single and dup?

2014-09-02 Thread Hugo Mills
On Tue, Sep 02, 2014 at 12:05:33PM +, Holger Hoffstätte wrote:
 
 I updated to progs-3.16 and noticed during testing:
 
 rootlosetup 
 NAME   SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
 /dev/loop0 0  0 0  0 /tmp/img
 
 rootmkfs.btrfs -f /dev/loop0
 Btrfs v3.16
 See http://btrfs.wiki.kernel.org for more information.
 
 Performing full device TRIM (8.00GiB) ...
 Turning ON incompat feature 'extref': increased hardlink limit per file 
 to 65536
 fs created label (null) on /dev/loop0
   nodesize 16384 leafsize 16384 sectorsize 4096 size 8.00GiB
 rootmkdir /tmp/btrfs
 rootmount /dev/loop0 /tmp/btrfs 
 
 All fine until here..
 
 rootbtrfs filesystem df /tmp/btrfs 
 Data, single: total=8.00MiB, used=64.00KiB
 System, DUP: total=8.00MiB, used=16.00KiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=409.56MiB, used=112.00KiB
 Metadata, single: total=8.00MiB, used=0.00

   Note that the single chunks are empty, and will remain so.

[snip]
 So where does the confusing initial display come from? I'm running this 
 against a (very patched) 3.14.17, but don't remember ever seeing this 
 with btrfs-progs-3.14.2.

   Your memory is faulty, I'm afraid. It's always done that -- at
least since I started using btrfs, several years ago.

   I believe it comes from mkfs creating a trivial basic filesystem
(with the single profiles), and then setting enough flags on it that
the kernel can bootstrap it with the desired chunks in it -- but I may
be wrong about that.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Normaliser unix c'est comme pasteuriser le Camembert ---   


signature.asc
Description: Digital signature


Re: Btrfs-progs-3.16: fs metadata is both single and dup?

2014-09-03 Thread Hugo Mills
On Wed, Sep 03, 2014 at 04:53:39AM +, Duncan wrote:
 Hugo Mills posted on Tue, 02 Sep 2014 13:13:49 +0100 as excerpted:
 
  On Tue, Sep 02, 2014 at 12:05:33PM +, Holger Hoffstätte wrote:
  So where does the confusing initial display come from? [I] don't
  remember ever seeing this with btrfs-progs-3.14.2.
  
 Your memory is faulty, I'm afraid. It's always done that -- at
  least since I started using btrfs, several years ago.
  
 I believe it comes from mkfs creating a trivial basic filesystem
  (with the single profiles), and then setting enough flags on it that the
  kernel can bootstrap it with the desired chunks in it -- but I may be
  wrong about that.
 
 Agreed.  It's an artifact of the mkfs.btrfs process and a btrfs fi df on 
 a new filesystem always seems to have those extra unused single profile 
 lines.
 
 I got so the first thing I'd do on first mount was a balance -- before 
 there was anything actually on the filesystem so it was real fast -- to 
 get rid of those null entries.

   Interesting. Last time I tried that (balance without any contents),
the balance removed *all* the chunks, and then the FS forgot about
what configuration it should have and reverted to RAID-1/single. I
usually recommend writing at least one 4k+ file to the FS first, if
it's bothering someone so much that they can't let it go.

   Hugo.

 Actually, I had already created a little mkfs.btrfs helper script that 
 sets options I normally want, etc, and after doing the mkfs and balance 
 drill a few times, I setup the script such that if at the appropriate 
 prompt I give it a mountpoint to point balance at, it'll mount the 
 filesystem and immediately run a balance, thus automating things and 
 making the balance part of the same scripted process that does the 
 mkfs.btrfs in the first place.
 
 IOW, those null-entry lines bother me too... enough that even tho I know 
 what they are I arranged things so they're automatically and immediately 
 eliminated and I don't have to see 'em! =:^)
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Never underestimate the bandwidth of a Volvo filled ---   
   with backup tapes.


signature.asc
Description: Digital signature


Re: ENOSPC on mostly empty file system

2014-09-09 Thread Hugo Mills
On Tue, Sep 09, 2014 at 09:49:12PM +0200, Clemens Eisserer wrote:
 Hi Arnd,
 
  Ok, one more data point:
 
 Why don't you provide the data point you were specifically asked for,
 btrfs fi df ;)

   btrfs fi show is important as well -- it's hard to work out the
state of the FS from just one of them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- You can get more with a kind word and a two-by-four than you ---   
   can with just a kind word.


signature.asc
Description: Digital signature


Re: ENOSPC on mostly empty file system

2014-09-09 Thread Hugo Mills
On Tue, Sep 09, 2014 at 11:49:10PM +0200, Arnd Bergmann wrote:
 Ok, now I'm in the bad state again (after running a 'make allmodconfig'
 kernel build:
 
 Label: none  uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47
 Total devices 1 FS bytes used 8.79GB
 devid1 size 67.14GB used 67.14GB path /dev/sdc6

   All the space on the FS has been allocated to some purpose or other.

 Data: total=65.11GB, used=7.99GB

   Here, you have 65 GiB allocated to data, but only 8 GiB of that
used. The FS won't automatically free up any of that (yet -- it's one
of the project ideas).

 System, DUP: total=8.00MB, used=12.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=1.00GB, used=821.48MB

   Here, you're running close to full with metadata -- the FS needs
some space to write new copies of metadata block in order to modify
anything. It can't get enough space to do that, because there's
nowhere for any more metadata allocation to come from (because it's
all allocated --see my first comment).

   So... you need to free up some data chunks. You can do this with:

# btrfs balance start -dusage=5 /mountpoint

   Take a look at the output of btrfs fi df and btrfs fi show
afterwards, and see how much the Data allocation has reduced by, and
how much unallocated space you have left afterwards. You may want to
increase the number in the above balance command to some higher value,
to free up even more chunks (it limits the balance to chunks less than
n% full -- so the command above will only touch chunks with 5% actual
data or less). This is in the FAQ.

   Hugo.

 Metadata: total=8.00MB, used=0.00
 : total=200.00MB, used=0.00

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Comic Sans goes into a bar,  and the barman says, We don't ---   
 serve your type here.  


signature.asc
Description: Digital signature


Re: No space on empty, degraded raid10

2014-09-11 Thread Hugo Mills
On Thu, Sep 11, 2014 at 07:19:00AM -0400, Austin S Hemmelgarn wrote:
 On 2014-09-11 02:40, Russell Coker wrote:
  Also it would be nice if there was a N-way mirror option for system data.  
  As 
  such data is tiny (32MB on the 120G filesystem in my workstation) the space 
  used by having a copy on every disk in the array shouldn't matter.
 
 N-way mirroring is in the queue for after RAID5/6 work; ideally, once it
 is ready, mkfs should default to one copy per disk in the filesystem.

   Why change the default from 2-copies, which it's been for years?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Ceci est un travail pour l'Australien. ---  


signature.asc
Description: Digital signature


Re: No space on empty, degraded raid10

2014-09-11 Thread Hugo Mills
On Thu, Sep 11, 2014 at 08:06:21AM -0400, Austin S Hemmelgarn wrote:
 On 2014-09-11 07:38, Hugo Mills wrote:
  On Thu, Sep 11, 2014 at 07:19:00AM -0400, Austin S Hemmelgarn wrote:
  On 2014-09-11 02:40, Russell Coker wrote:
  Also it would be nice if there was a N-way mirror option for system data. 
   As 
  such data is tiny (32MB on the 120G filesystem in my workstation) the 
  space 
  used by having a copy on every disk in the array shouldn't matter.
 
  N-way mirroring is in the queue for after RAID5/6 work; ideally, once it
  is ready, mkfs should default to one copy per disk in the filesystem.
  
 Why change the default from 2-copies, which it's been for years?
 
 Sorry about the ambiguity in my statement, I meant that the default for
 system chunks should be one copy per disk in the filesystem.  If you
 don't have a copy of the system chunks, then you essentially don't have
 a filesystem, and that means that BTRFS RAID6 can't provide true
 resilience against 2 disks failing catastrophically unless there are at
 least 3 copies of the system chunks.

   Aah, OK. That makes perfect sense, then.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Some days, it's just not worth gnawing through the straps ---


signature.asc
Description: Digital signature


Re: RAID1 failure and recovery

2014-09-12 Thread Hugo Mills
On Fri, Sep 12, 2014 at 01:57:37AM -0700, shane-ker...@csy.ca wrote:
 Hi,

 I am testing BTRFS in a simple RAID1 environment. Default mount
 options and data and metadata are mirrored between sda2 and sdb2. I
 have a few questions and a potential bug report. I don't normally
 have console access to the server so when the server boots with 1 of
 2 disks, the mount will fail without -o degraded. Can I use -o
 degraded by default to force mounting with any number of disks? This
 is the default behaviour for linux-raid so I was rather surprised
 when the server didn't boot after a simulated disk failure.

   The problem with that is that at the moment, you don't get any
notification that anything's wrong when the system boots. As a result,
using -odegraded as a default option is not generally recommended.

 So I pulled sdb to simulate a disk failure. The kernel oops'd but
 did continue running. I then rebooted encountering the above mount
 problem. I re-inserted the disk and rebooted again and BTRFS mounted
 successfully. However, I am now getting warnings like: BTRFS: read
 error corrected: ino 1615 off 86016 (dev /dev/sda2 sector
 4580382824)
 
 I take it there were writes to SDA and sdb is out of sync. Btrfs is
 correcting sdb as it goes but I won't have redundancy until sdb
 resyncs completely. Is there a way to tell btrfs that I just
 re-added a failed disk and to go through and resync the array as
 mdraid would do? I know I can do a btrfs fi resync manually but can
 that be automated if the array goes out of sync for whatever reason
 (power failure)...

   I've done this before, by accident (pulled the wrong drive,
reinserted it). You can fix it by running a scrub on the device (btrfs
scrub start /dev/ice, I think).

 Finally for those using this sort of setup in production, is running
 btrfs on top of mdraid the way to go at this point?

   Using btrfs native RAID means that you get independent checksums on
the two copies, so that where the data differs between the copies, the
correct data can be identified.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- SCSI is usually fixed by remembering that it needs three --- 
terminations: One at each end of the chain. And the goat.
 


signature.asc
Description: Digital signature


Re: RAID1 failure and recovery

2014-09-13 Thread Hugo Mills
On Sun, Sep 14, 2014 at 05:15:08AM +0200, Piotr Pawłow wrote:
 On 12.09.2014 12:47, Hugo Mills wrote:
 I've done this before, by accident (pulled the wrong drive, reinserted
 it). You can fix it by running a scrub on the device (btrfs scrub
 start /dev/ice, I think).
 
 I'd like to remind everyone that btrfs has weak checksums. It may be good
 for correcting an occasional error, but I wouldn't trust it to correct
 larger amounts of data.

   Checksums are done for each 4k block, so the increase in
probability of a false negative is purely to do with the sher volume
of data. Weak checksums like the CRC32 that btrfs currently uses are
indeed poor for detecting malicious targeted attacks on the data, but
for random failures, such as a disk block being unreadable and
returning zeroes or having bit errors, the odds of identifying the
failure are still excellent.

 Additionally, nocow files are not checksummed. They will not be corrected
 and may return good data or random garbage, depending on which mirror is
 accessed.

   Yes, this is a trade-off that you have to make for your own
use-case and happiness. For some things (like a browser cache), I'd be
happy with losing the checksums. For others (e.g. mail), I wouldn't be.

   Hugo.

 Below is a test I did some time ago, demonstrating the problem with nocow
 files:
 
 #!/bin/sh
 MOUNT_DIR=mnt
 DISK1=d1
 DISK2=d2
 SIZE=2G
 # create raid1 FS
 mkdir $MOUNT_DIR
 truncate --size $SIZE $DISK1
 truncate --size $SIZE $DISK2
 L1=$(losetup --show -f $DISK1)
 L2=$(losetup --show -f $DISK2)
 mkfs.btrfs -d raid1 -m raid1 $L1 $L2
 mount $L1 $MOUNT_DIR
 # enable NOCOW
 chattr +C $MOUNT_DIR
 umount $MOUNT_DIR
 # fail the second drive
 losetup -d $L2
 mount $L1 $MOUNT_DIR -odegraded
 # file must be large enough to not get embedded inside metadata
 perl -e 'print Test OK.\nx4096' $MOUNT_DIR/testfile
 umount $MOUNT_DIR
 # reattach the second drive
 L2=$(losetup --show -f $DISK2)
 mount $L1 $MOUNT_DIR
 # let's see what we get - correct data or garbage?
 cat $MOUNT_DIR/testfile
 # clean up
 umount $MOUNT_DIR
 losetup -d $L1
 losetup -d $L2
 rm $DISK1 $DISK2
 rmdir $MOUNT_DIR

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hey, Virtual Memory! Now I can have a *really big* ramdisk! ---   


signature.asc
Description: Digital signature


Re: [UI RFC][PATCH] btrfs-progs: add options to tune units for fi df output

2014-09-15 Thread Hugo Mills
On Mon, Sep 15, 2014 at 05:09:52PM +0200, David Sterba wrote:
 The size unit format is a longstanding annoyance. This patch is based on
 the work of Nils and Alexandre and enhances the options. It's possible
 to select raw bytes, SI-based or IEC-based compact units (human
 frientdly) or a fixed base from kilobytes to terabytes. The default is
 compact human readable IEC-based, no change to current version.
 
 CC: Nils Steinger n...@voidptr.de
 CC: Alexandre Oliva ol...@gnu.org
 Signed-off-by: David Sterba dste...@suse.cz

   Looks good to me. One _tiny_ nit: For the kilo-/kibi- prefix, IEC
is KiB (upper case), SI is kB (lower case). Other than that, the UI
looks pretty comfortable to me.

Reviewed-by: Hugo Mills h...@carfax.org.uk

 ---
 
 I tried to make the command line UI rich enough to address current and future
 needs, I'm open to tweaks, rewording etc.
 
 The patch is based on current snapshot of integration branch that will be the
 base of 3.17 release and contains the 'enhanced df' patches, branch dev/units.
 
  Documentation/btrfs-filesystem.txt |  25 -
  cmds-filesystem.c  | 111 
 -
  utils.c|  48 
  utils.h|  30 +++---
  4 files changed, 168 insertions(+), 46 deletions(-)
 
 diff --git a/Documentation/btrfs-filesystem.txt 
 b/Documentation/btrfs-filesystem.txt
 index c9c0b006a0b0..7ac105ff350e 100644
 --- a/Documentation/btrfs-filesystem.txt
 +++ b/Documentation/btrfs-filesystem.txt
 @@ -17,8 +17,31 @@ resizing, defragment.
  
  SUBCOMMAND
  --
 -*df* path [path...]::
 +*df* [options] path::
  Show space usage information for a mount point.
 ++
 +`Options`
 ++
 +-b|--raw
 +raw numbers in bytes, without the 'B' suffix
 +-h
 +print human friendly numbers, base 1024, this is the default
 +-H
 +print human friendly numbers, base 1000
 +--iec
 +select the 1024 base for the following options, according to the IEC standard
 +--si
 +select the 1000 base for the following options, according to the SI standard
 +-k|--kbytes
 +show sizes in KiB, or KB with --si
 +-m|--mbytes
 +show sizes in MiB, or MB with --si
 +-g|--gbytes
 +show sizes in GiB, or GB with --si
 +-t|--tbytes
 +show sizes in TiB, or TB with --si
 +
 +If conflicting options are passed, the last one takes precedence.
  
  *show* [--mounted|--all-devices|path|uuid|device|label]::
  Show the btrfs filesystem with some additional info.
 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index 89b897496256..68876957cbab 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c
 @@ -114,12 +114,21 @@ static const char * const filesystem_cmd_group_usage[] 
 = {
  };
  
  static const char * const cmd_filesystem_df_usage[] = {
 -   btrfs filesystem df path,
 +   btrfs filesystem df [options] path,
 Show space usage information for a mount point,
 + -b|--raw   raw numbers in bytes,
 + -h human friendly numbers, base 1024 (default),
 + -H human friendly numbers, base 1000,
 + --iec  use 1024 as a base (Kib, MiB, GiB, ...),
 + --si   use 1000 as a base (KB, MB, GB, ...),
 + -k|--kbytesshow sizes in KiB, or KB with --si,
 + -m|--mbytesshow sizes in MiB, or MB with --si,
 + -g|--gbytesshow sizes in GiB, or GB with --si,
 + -t|--tbytesshow sizes in TiB, or TB with --si,
 NULL
  };
  
 -static void print_df(struct btrfs_ioctl_space_args *sargs)
 +static void print_df(struct btrfs_ioctl_space_args *sargs, int unit_mode)
  {
 u64 i;
 struct btrfs_ioctl_space_info *sp = sargs-spaces;
 @@ -128,8 +137,8 @@ static void print_df(struct btrfs_ioctl_space_args *sargs)
 printf(%s, %s: total=%s, used=%s\n,
 group_type_str(sp-flags),
 group_profile_str(sp-flags),
 -   pretty_size(sp-total_bytes),
 -   pretty_size(sp-used_bytes));
 +   pretty_size_mode(sp-total_bytes, unit_mode),
 +   pretty_size_mode(sp-used_bytes, unit_mode));
 }
  }
  
 @@ -183,33 +192,83 @@ static int get_df(int fd, struct btrfs_ioctl_space_args 
 **sargs_ret)
  
  static int cmd_filesystem_df(int argc, char **argv)
  {
 -   struct btrfs_ioctl_space_args *sargs = NULL;
 -   int ret;
 -   int fd;
 -   char *path;
 -   DIR *dirstream = NULL;
 + struct btrfs_ioctl_space_args *sargs = NULL;
 + int ret;
 + int fd;
 + char *path;
 + DIR *dirstream = NULL;
 + unsigned unit_mode = UNITS_DEFAULT;
  
 -   if (check_argc_exact(argc, 2))
 -   usage(cmd_filesystem_df_usage);
 + optind = 1;
 + while (1) {
 + int long_index;
 + static const struct option long_options[] = {
 + { raw, no_argument, NULL, 'b

Re: btrfs receive: could not find parent subvolume

2014-09-18 Thread Hugo Mills
On Thu, Sep 18, 2014 at 08:27:18AM -0700, Marc MERLIN wrote:
 While debugging a btrfs send/receive slow problem, I now getting this:
 legolas:/mnt/btrfs_pool1# btrfs send -p tmp_ggm_daily_ro.20140917_06:29:58 
 tmp_ggm_daily_ro.20140918_02:48:24 | ssh gargamel btrfs receive -v 
 /mnt/btrfs_pool2/backup/debian64/legolas
 At subvol tmp_ggm_daily_ro.20140918_02:48:24
 At snapshot tmp_ggm_daily_ro.20140918_02:48:24
 receiving snapshot tmp_ggm_daily_ro.20140918_02:48:24 
 uuid=5d1f0454-1be3-b648-9ea5-dc427cd62d98, ctransid=310713 
 parent_uuid=d86e69bf-e17f-7f4c-bfb7-e571d5824687, parent_ctransid=308332
 ERROR: could not find parent subvolume
 
 The parent is there on the other side, but UUID is different:
 gargamel:/mnt/btrfs_pool2/backup/debian64/legolas# btrfs subvolume show 
 tmp_ggm_daily_ro.20140917_06:29:58
 /mnt/btrfs_pool2/backup/debian64/legolas/tmp_ggm_daily_ro.20140917_06:29:58
 Name:   tmp_ggm_daily_ro.20140917_06:29:58
 uuid:   3d424a2b-69da-244c-bfcc-c283f9cc1f34
 Parent uuid:05d3b9be-bfe2-bb4a-9f6a-64b9d44896c7
 Creation time:  2014-09-17 06:30:01
 Object ID:  7873
 Generation (Gen):   83621
 Gen at creation:83476
 Parent: 263
 Top Level:  263
 Flags:  -
 Snapshot(s):
 
 Now it seems that the UUID is different on all my snapshots created by
 btrfs send, so maybe it doesn't match UUID?
 
 Given that, what is btrfs receive using to get a match?

   There's a recieved UUID field on each subvolume. I posted a patch
to userspace a couple of weeks ago which adds a -R option to show it.
I don't think it's filtered through David's backlog yet.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I will not be pushed,  filed, stamped, indexed, briefed, ---
   debriefed or numbered.  My life is my own.   


signature.asc
Description: Digital signature


Re: lvm thin provisioning snapshots and btrfs

2014-09-24 Thread Hugo Mills
 /dev/dm-2
 [  195.495648] btrfs: device fsid 44c76cc5-5d03-4f02-af5f-2028e61e09fa devid 
 1 transid 38 /dev/mapper/vg00_th-lv_root_140924
 [ 1171.952393] btrfs: device fsid 44c76cc5-5d03-4f02-af5f-2028e61e09fa devid 
 1 transid 38 /dev/mapper/vg00_th-lv_root_140924
 
 
 
 
 
 
 
 
 
 
 
 
 
 This e-mail and any attachments may contain information that is confidential 
 and proprietary and otherwise protected from disclosure. If you are not the 
 intended recipient of this e-mail, do not read, duplicate or redistribute it 
 by any means. Please immediately delete it and any attachments and notify the 
 sender that you have received it in error. Unintended recipients are 
 prohibited from taking action on the basis of information in this e-mail or 
 any attachments. The DRW Companies make no representations that this e-mail 
 or any attachments are free of computer viruses or other defects.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Welcome to Rivendell,  Mr Anderson... ---  


signature.asc
Description: Digital signature


Re: BTRFS backup questions

2014-09-27 Thread Hugo Mills
 be open source, and that it will be more useful to me
 with community support. If anyone is interested in participating, or even
 just using it, please let me know.
 
 Thanks to everyone who has worked on BTRFS so far ;-)
 
 James

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- How do you become King?  You stand in the marketplace and ---
  announce you're going to tax everyone. If you get out  
   alive, you're King.   


signature.asc
Description: Digital signature


Re: BTRFS backup questions

2014-09-27 Thread Hugo Mills
On Sat, Sep 27, 2014 at 06:33:58PM +0200, James Pharaoh wrote:
 On 27/09/14 18:17, Hugo Mills wrote:
 On Sat, Sep 27, 2014 at 05:39:07PM +0200, James Pharaoh wrote:
 
 2. Duplicating NOCOW files
 
 This is obviously possible, since it takes place when you make a snapshot.
 So why can't I create a clone of a snapshot of a NOCOW file? I am hoping the
 answer to this is that it is possible but not implemented yet...
 
 Umm... you should be able to, I think.
 
 Well I've tried with the haskell btrfs library, using clone, and also using
 cp --reflink=auto. Here's an example using cp:
 
 root@host:/btrfs# btrfs subvolume snapshot -r src dest
 Create a readonly snapshot of 'src' in './dest'
 root@host:/btrfs# cp --reflink dest/test test
 cp: failed to clone 'test' from 'dest/test': Invalid argument

   Are you trying to cross a mount-point with that? It works for me:

hrm@amelia:/media/btrfs/amelia/test $ sudo btrfs sub create bar
Create subvolume './bar'
hrm@amelia:/media/btrfs/amelia/test $ sudo dd if=/dev/zero of=bar/data bs=1024 
count=500
500+0 records in
500+0 records out
512000 bytes (512 kB) copied, 0.0047491 s, 108 MB/s
hrm@amelia:/media/btrfs/amelia/test $ sudo btrfs sub snap -r bar foo
Create a readonly snapshot of 'bar' in './foo'
hrm@amelia:/media/btrfs/amelia/test $ sudo cp --reflink=always bar/data bar-data
hrm@amelia:/media/btrfs/amelia/test $ sudo cp --reflink=always foo/data foo-data
hrm@amelia:/media/btrfs/amelia/test $ ls -l
total 1000
drwxr-xr-x 1 root root  8 Sep 27 17:55 bar
-rw-r--r-- 1 root root 512000 Sep 27 17:57 bar-data
drwxr-xr-x 1 root root  8 Sep 27 17:55 foo
-rw-r--r-- 1 root root 512000 Sep 27 17:57 foo-data

[snip]
 3. Peformance penalty of fragmentation on SSD systems with lots of memory
 
 There are two performance problems with fragmentation -- seek time
 to find the fragments (which affects only rotational media), and the
 amount of time taken to manage the fragments. As the number of
 fragments increases, so does the number of extents that the FS has to
 keep track of. Ultimately, with very fragmented files, this will have
 an effect, as the metadata size will increase hugely.
 
 Ok so this sounds like the answer I wanted to hear ;-) Presumably so long as
 the load is not too great, and I run the occasional defrag, then this
 shouldn't be much to worry about then?

   Be aware that the current implementation of (manual) defrag will
separate the shared extents, so you no longer get the deduplication
effect. There was a snapshot-aware defrag implementation, but it
caused filesystem corruption, and has been removed for now until a
working version can be written. I think Josef was working on this.

 4. Generations and tree structures
 
 I am planning to use lots more clever tricks which I think should be
 available in BTRFS, but I can't see much documentation. Can anyone point out
 any good examples or documentation of how to access the tree structures
 directly. I'm particularly interested in finding changed files and portions
 of files using the generations and the tree search.
 
 You need the TREE SEARCH ioctl -- that gives you direct access to
 all the internal trees of the FS. There's some documentation on the
 wiki about how these fit together:
 
 https://btrfs.wiki.kernel.org/index.php/Data_Structures
 https://btrfs.wiki.kernel.org/index.php/Trees
 
 What tricks are you thinking of, exactly?
 
 Principally I want to be able to detect exactly what has changed, so that I
 can perform backups very quickly. I want to be able to update a small
 portion of a large file and then identify exactly which parts changed and
 only back those up, for example.

   send/receive does this.

[snip]
 Are you aware of btrfs send/receive? It should allow you to do all
 of this. The main part of the code then comes down to managing the
 send/receive, and all the distributed error handling. Then the only
 direct access to the internal metadata you need is being able to read
 UUIDs to work out what you have on each side -- which can also be done
 by btrfs sub list.
 
 Yes, this is one of my main inspirations. The problem is that I am pretty
 sure it won't handle deduplication of the data.

   It does. That's one of the things it's explicitly designed to do.

 I'm planning to have a LOT of containers running the same stuff, on fast
 (expensive) SSD media, and deduplication is essential to make that work
 properly. I can already see huge savings from this.
 
 As far as I can tell, btrfs send/receive operates on a subvolume basis, and
 any shared data between those subvolumes is duplicated if you copy them
 separately.

   Not so.

   You can tell send that there are subvolumes with known IDs on the
receive side, using the -c option (arbitrarily many subvols). If the
subvol you are sending (on the send side) shares extents with any of
those, then the data is not sent -- just a reference to it. On the
receive side, if that happens, the shared extents are reconstructed

Re: Fwd: Deleting a Subvol from a Cancelled Btrfs-Send

2014-10-02 Thread Hugo Mills
On Thu, Oct 02, 2014 at 12:05:39AM -0500, Justin Brown wrote:
 I'm experimenting with btrfs-send. Previously (2014-09-26), I did my
 first btrfs-send on a subvol, and that worked fine. Today, I tried to
 send a new snapshot. Unfortunately, I realized part way through that I
 forgot to specify the parent to only send a delta, and killed the send
 with ^C.
 
 On the destination, I'm left with:
 
 ~$ sudo btrfs subvol list /var/media/backups/venus/home/
 ID 2820 gen 57717 top level 5 path media
 ID 2821 gen 57402 top level 5 path ovirt
 ID 4169 gen 57703 top level 2820 path media/backups/venus/home
 ID 4170 gen 57575 top level 4169 path home-2014-09-26
 ID 4243 gen 57707 top level 4169 path home-2014-10-01
 
 Home-2014-10-01 was the partial send that was cancelled. I figured
 that I could delete this partial subvol and try again.
 
 ~$ sudo btrfs subvol del home-2014-10-01
 Transaction commit: none (default)
 ERROR: error accessing 'home-2014-10-01'

   If you're not doing this from /var/media/backups/venus/home/ it
won't succeed. You need to specify (either via a relative path or an
absolute one) where the subvol is, not just what its name is.

   (Consider what happens if you have two filesystems, each with a
home-2014-09-26 subvol.)

   Hugo.

 Obviously, trying to delete the subvol directory fails too:
 
 ~$ sudo rm -rf /var/media/backups/venus/home/home-2014-10-01/
 rm: cannot remove ‘/var/media/backups/venus/home/home-2014-10-01/’:
 Operation not permitted
 
 Is there anyway to delete this partial subvol?
 
 Thanks,
 Justin

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- All hope abandon,  Ye who press Enter here. ---   


signature.asc
Description: Digital signature


Re: FIBMAP unsupported

2014-10-02 Thread Hugo Mills
On Thu, Oct 02, 2014 at 07:25:49PM +0200, David Sterba wrote:
 On Thu, Oct 02, 2014 at 05:13:22PM +0200, Marc Dietrich wrote:
  I have a large (25G) virtual disk on a btrfs fs. Yes, I know this is not 
  optimial. So I try to defrag it from time to time. However, using btrfs fi 
  defrag -c vm.vdi results in even more fragments than before (reported by 
  filefrag). So I wrote my own pseudo defragger,
 
 Unfortunatelly the default target fragment size is 256k. Try
 'btrfs filesystem defrag -t 32m ...' or higher numbers and see if it
 helps.

   Note also that a compressed file will have fragments on the scale
of about 128k reported by filefrag, because of the way that the
compression works. The file may actually be contiguous, but filefrag
won't know about it. (At least, that's historically been the case. I
don't know if filefrag has recently grown some extra knowledge of
compressed extents.)

   Hugo.

  which produces much better results (ok, the file must not be in use). 
  Somewhere in the 3.17 cycle the resulting image got corrupted using the 
  script 
  above. 
  
  Running filefrag on it returns FIBMAP unsupported.
 
 This message doe not mean it is a corruption, but filefrag tries to use
 the FIBMAP ioctl that is not implemented on btrfs, instead FIEMAP is
 used.
 
 filefrag on a nocow file works for me here (3.16.x kernel), I can see
 that filefrag on a directory prints the FIBMAP message.
 
  Virtualbox returns  AHCI#0P0: Read at offset 606236672 (49152 bytes left) 
  returned rc=VERR_DEV_IO_ERROR. No errors in the kernel log.
  
  Trying cp vm.vdi /dev/null returns: cp: Error reading „vm.vdi“: IO-Error
 
 This could be caused by the virtualization layer. Try to run scrub and
 fsck in the non-destru^Wchecking mode if it finds problems.
 
 As you're using compression and autodefrag, a quick skim of the 3.17
 patches points to e9512d72e8e61c750c90efacd720abe3c4569822 fix
 autodefrag with compression, but that's just keyword match.
 
 There's another report about nocow corruption and VirtualBox in a 3.16 +
 for-linus version (which is almost 3.17-rc)
 http://article.gmane.org/gmane.comp.file-systems.btrfs/38701/
 
 But according to the attached messages, the underlying device is
 unreliable and logs a lot of IO errors.
 
 For now it looks like VirutalBox is not writing the data or there is a
 bug introduced post 3.16 killing nocow files.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: Identify mounted subvolume

2014-10-07 Thread Hugo Mills
On Tue, Oct 07, 2014 at 10:47:56AM +0200, Juan Orti Alcaine wrote:
 I cannot find the answer to this one. How can I determine which subvolume I
 have mounted in a certain path? I'm looking through /sys but no clue.

   Rumour has it that /proc/self/mountinfo is meant to have the
information, but I've just checked, and it doesn't seem to have the
subvol in it on my server (3.16.2).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Unix:  For controlling fungal diseases in crops --- 


signature.asc
Description: Digital signature


Re: What is the vision for btrfs fs repair?

2014-10-09 Thread Hugo Mills
On Thu, Oct 09, 2014 at 11:53:23AM +, Duncan wrote:
 Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as
 excerpted:
 
  Also, you should be running btrfs scrub regularly to correct bit-rot
  and force remapping of blocks with read errors.  While BTRFS
  technically handles both transparently on reads, it only corrects thing
  on disk when you do a scrub.
 
 AFAIK that isn't quite correct.  Currently, the number of copies is 
 limited to two, meaning if one of the two is bad, there's a 50% chance of 
 btrfs reading the good one on first try.

   Scrub checks both copies, though. It's ordinary reads that don't.

   Hugo.

 If btrfs reads the good copy, it simply uses it.  If btrfs reads the bad 
 one, it checks the other one and assuming it's good, replaces the bad one 
 with the good one both for the read (which otherwise errors out), and by 
 overwriting the bad one.
 
 But here's the rub.  The chances of detecting that bad block are 
 relatively low in most cases.  First, the system must try reading it for 
 some reason, but even then, chances are 50% it'll pick the good one and 
 won't even notice the bad one.
 
 Thus, while btrfs may randomly bump into a bad block and rewrite it with 
 the good copy, scrub is the only way to systematically detect and (if 
 there's a good copy) fix these checksum errors.  It's not that btrfs 
 doesn't do it if it finds them, it's that the chances of finding them are 
 relatively low, unless you do a scrub, which systematically checks the 
 entire filesystem (well, other than files marked nocsum, or nocow, which 
 implies nocsum, or files written when mounted with nodatacow or 
 nodatasum).
 
 At least that's the way it /should/ work.  I guess it's possible that 
 btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but 
 if so, that's the first /I/ remember reading of it.
 
 Other than that detail, what you posted matches my knowledge and 
 experience, such as it may be as a non-dev list regular, as well.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great oxymorons of the world, no. 7: The Simple Truth ---  


signature.asc
Description: Digital signature


Re: What is the vision for btrfs fs repair?

2014-10-09 Thread Hugo Mills
On Thu, Oct 09, 2014 at 08:07:51AM -0400, Austin S Hemmelgarn wrote:
 On 2014-10-09 07:53, Duncan wrote:
 Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as
 excerpted:
 
 Also, you should be running btrfs scrub regularly to correct bit-rot
 and force remapping of blocks with read errors.  While BTRFS
 technically handles both transparently on reads, it only corrects thing
 on disk when you do a scrub.
 
 AFAIK that isn't quite correct.  Currently, the number of copies is
 limited to two, meaning if one of the two is bad, there's a 50% chance of
 btrfs reading the good one on first try.
 
 If btrfs reads the good copy, it simply uses it.  If btrfs reads the bad
 one, it checks the other one and assuming it's good, replaces the bad one
 with the good one both for the read (which otherwise errors out), and by
 overwriting the bad one.
 
 But here's the rub.  The chances of detecting that bad block are
 relatively low in most cases.  First, the system must try reading it for
 some reason, but even then, chances are 50% it'll pick the good one and
 won't even notice the bad one.
 
 Thus, while btrfs may randomly bump into a bad block and rewrite it with
 the good copy, scrub is the only way to systematically detect and (if
 there's a good copy) fix these checksum errors.  It's not that btrfs
 doesn't do it if it finds them, it's that the chances of finding them are
 relatively low, unless you do a scrub, which systematically checks the
 entire filesystem (well, other than files marked nocsum, or nocow, which
 implies nocsum, or files written when mounted with nodatacow or
 nodatasum).
 
 At least that's the way it /should/ work.  I guess it's possible that
 btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but
 if so, that's the first /I/ remember reading of it.
 
 I'm not 100% certain, but I believe it doesn't actually fix things on disk
 when it detects an error during a read,

   I'm fairly sure it does, as I've had it happen to me. :)

 I know it doesn't it the fs is
 mounted ro (even if the media is writable), because I did some testing to
 see how 'read-only' mounting a btrfs filesystem really is.

   If the FS is RO, then yes, it won't fix things.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great films about cricket:  Interview with the Umpire ---  


signature.asc
Description: Digital signature


Re: unexplainable corruptions 3.17.0

2014-10-17 Thread Hugo Mills
On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
 On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
 Recently I've observed some corruptions to systemd's journal
   files which are somewhat puzzling. This is especially worrying
   as this is btrfs raid1 setup and I expected auto-healing.
   
 System details: 3.17.0-301.fc21.x86_64
   btrfs: raid1 over 2x dm-crypted 6TB HDDs.
   mount opts: rw,relatime,seclabel,compress=lzo,space_cache
 Reads with cat, hexdump fails with:
   read(4, 0x1001000, 65536)   = -1 EIO (Input/output error)
   
  Does scrub work for you?
 
   As there seem to be no way to scrub individual files, I've started
 scrub of full volume.  It will take some hours to finish.
 
   Meanwhile, could you satisfy my curiosity what would scrub do that
 wouldn't be done by just reading the whole file?

   It checks both copies. Reading the file will only read one of the
copies of any given block (so if that's good and the other copy is
bad, it won't fix anything).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The future isn't what it used to be. ---   


signature.asc
Description: Digital signature


Re: Uh, 1COW?... what happens when someone does this...

2014-10-22 Thread Hugo Mills
On Wed, Oct 22, 2014 at 12:41:10PM -0700, Robert White wrote:
 So I've been considering some NOCOW files (for VM disk images), but
 some questions arose. IS there a 1COW (copy on write only once)
 flag or are the following operations dangerous or undefined?
 
 (1) The page https://btrfs.wiki.kernel.org/index.php/FAQ (section
 Can copy-on-write be turned off for data blocks?) says COW may
 still happen if a snapshot is taken. Is that a may or a will,
 e.g. if I take a snapshot and then start the VM will the file in the
 snapshot still be frozen or will it update as I alter the VM? Does
 the read-only-or-not status of the snapshot matter in this outcome?
 
 e.g. what does may mean in that section?

   If you take a snapshot of something, then any write to that (the
original or the copy) will cause it to be CoWed once. Subsequent
writes to the same area of the same file will go back to nodatacow.

 (2) If you copy a file using cp --reflink and the destination is
 in a directory marked NOCOW, what happens? How about when the
 resultant file is modified in place?

   Same thing as above.

 (3) when using a watever.qcow2 virtual machine image that does
 copy-on-write in the VM (such as QEMU) is it better, worse, or a
 no-op to have the NOCOW flag set on the file? All the advice on this
 matter I can find in Google seems to be VM images bad, but will be
 addressed soon and its old enough that I don't know if soon has
 come to pass.
 
 It seems like there is a 1COW flag implicit somewhere.

   I wouldn't put it in those words, but yes, a single CoW operation
occurs on writes to data with nodatacow set.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold.


signature.asc
Description: Digital signature


Re: NOCOW and Swap Files?

2014-10-22 Thread Hugo Mills
On Wed, Oct 22, 2014 at 01:08:48PM -0700, Robert White wrote:
 So the documentation is clear that you can't mount a swap file
 through BTRFS (unless you use a loop device).
 
 Why isn't a NOCOW file that has been fully pre-allocated -- as with
 fallocate(1) -- not suitable for swapping?
 
 I found one reference to an unimplemented feature necessary for
 swap, but wouldn't it be reasonable for that feature to exist for
 NOCOW files? (or does this relate to my previous questions about the
 COW operation that happens after a snapshot?)

   The original swap implementation worked by determining a list of
blocks (well, I guess extents) using fiemap, and passing that to the
swap code for it to use. This is fine, as long as (a) nobody else
writes to the file, and (b) the blocks comprising the file don't move
elsewhere.

   Part (a) can be done with normal permissions, so that's not a
problem.

   Part (b) is more tricky -- not because of CoW (because the writes
from the swap code go directly to the device, ignoring the FS), but
because the FS's idea of where the file lives on the device can move
-- balance will do this, for example. So you can't balance a
filesystem with any active swapfiles on it. This is the main reason
that swapfiles aren't allowed on btrfs, as far as I know.

   The new code is the swap-on-NFS infrastructure, which indirects
swapfile accesses through the filesystem code. The reason you have to
do that with NFS is because NFS doesn't expose a block device at all,
so you can't get a list of blocks on an underlying device because
there isn't one. Indirecting the accesses through the filesystem,
however, allows us to side-step btrfs's problems with part (b) above,
and in theory gives us swapfile capability.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great oxymorons of the world, no. 7: The Simple Truth ---  


signature.asc
Description: Digital signature


Re: NOCOW and Swap Files?

2014-10-22 Thread Hugo Mills
On Wed, Oct 22, 2014 at 01:39:58PM -0700, Robert White wrote:
 On 10/22/2014 01:25 PM, Hugo Mills wrote:
 The new code is the swap-on-NFS infrastructure, which indirects
 swapfile accesses through the filesystem code. The reason you have to
 do that with NFS is because NFS doesn't expose a block device at all,
 so you can't get a list of blocks on an underlying device because
 there isn't one. Indirecting the accesses through the filesystem,
 however, allows us to side-step btrfs's problems with part (b) above,
 and in theory gives us swapfile capability.
 
 I was not even aware there was new code on the matter.
 
 Is there a guide or whatever to doing this? I didn't see any mention
 of it in the places Google led me.

   swap-on-NFS is still, I think, in a set of out of tree patches, and
it's not gone anywhere near btrfs yet. It's just that once it does
land in mainline, it would form the appropriate infrastructure to
develop swapfile capability for btrfs.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great oxymorons of the world, no. 7: The Simple Truth ---  


signature.asc
Description: Digital signature


Re: suspicious number of devices: 72057594037927936

2014-10-27 Thread Hugo Mills
On Mon, Oct 27, 2014 at 11:21:13AM -0700, Christian Kujau wrote:
 On Mon, 27 Oct 2014 at 16:35, David Sterba wrote:
  Yeah sorry, I sent the v2 too late, here's an incremental that applies
  on top of current 3.18-rc
  
  https://patchwork.kernel.org/patch/5160651/
 
 Yup, that fixes it. Thank you! If it's needed:
 
   Tested-by: Christian Kujau li...@nerdbynature.de
 
 @Filipe: and thanks for warning me about 3.17 - I used 3.17.0 since it 
 came out and compiled kernels on the btrfs partition and haven't had any 
 issues. But it wasn't used very often, so whatever the serious issues 
 were, I haven't experienced any.

   If you make read-only snapshots, there's a good chance of metadata
corruption. It's fixed in 3.17.2.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Our so-called leaders speak/with words they try to jail ya/ ---   
They subjugate the meek/but it's the rhetoric of failure.
 


signature.asc
Description: Digital signature


Re: which subvolume is mounted?

2014-11-01 Thread Hugo Mills
On Fri, Oct 31, 2014 at 09:23:27AM -0700, Rich Turner wrote:
 let’s first assume the contents of /etc/fstab are either not used or invalid 
 in mounting the subvolumes. given the following ‘df’ command, how do i know 
 which subvolume of the btrfs filesystem on /dev/sda3 is mounted at each mount 
 point (/, /var, /opt, /home)? i would have expected to see the mount option 
 used to define the subvolume (subvolid or subvol option) in /proc/mounts.

   I already answered this on IRC, but just for the record (and to
test whether my mail is working): it's in /proc/self/mountinfo. The
fourth field of each line is the location that the mountpoint came
from in the filesystem, which is what you're after.

   Hugo.

 # df
 Filesystem 1K-blocksUsed Available Use% Mounted on
 /dev/sda36839296 1698564   4903212  26% /
 devtmpfs  501464   0501464   0% /dev
 tmpfs 507316   0507316   0% /dev/shm
 tmpfs 5073166720500596   2% /run
 tmpfs 507316   0507316   0% /sys/fs/cgroup
 /dev/sda36839296 1698564   4903212  26% /var
 /dev/sda36839296 1698564   4903212  26% /opt
 /dev/sda36839296 1698564   4903212  26% /home
 /dev/sda1 517868   93040424828  18% /boot
 
 # btrfs subvolume list -a --sort=+rootid /
 ID 257 gen 7800 top level 5 path FS_TREE/root
 ID 258 gen 4127 top level 5 path FS_TREE/home
 ID 259 gen 7801 top level 5 path FS_TREE/var
 ID 260 gen 7795 top level 5 path FS_TREE/opt
 
 # uname -a
 Linux turner11.storix 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 
 2014 x86_64 x86_64 x86_64 GNU/Linux
 
 # btrfs --version
 Btrfs v3.12
 
 # btrfs fi show
 Label: rhel_turner11  uuid: cd3c0e50-d726-44e2-9bfa-19b11614136a
   Total devices 1 FS bytes used 1.62GiB
   devid1 size 6.52GiB used 2.24GiB path /dev/sda3
 
 Btrfs v3.12
 
 # btrfs fi df /
 Data, single: total=1.98GiB, used=1.58GiB
 System, single: total=4.00MiB, used=16.00KiB
 Metadata, single: total=264.00MiB, used=36.03MiB
 


 
 


-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Your problem is that you've got too much taste to be ---   
 a web developer 


signature.asc
Description: Digital signature


Re: which subvolume is mounted?

2014-11-01 Thread Hugo Mills
On Fri, Oct 31, 2014 at 09:23:27AM -0700, Rich Turner wrote:
 let’s first assume the contents of /etc/fstab are either not used or invalid 
 in mounting the subvolumes. given the following ‘df’ command, how do i know 
 which subvolume of the btrfs filesystem on /dev/sda3 is mounted at each mount 
 point (/, /var, /opt, /home)? i would have expected to see the mount option 
 used to define the subvolume (subvolid or subvol option) in /proc/mounts.

   It's in /proc/self/mountinfo -- look at the fourth field in the
table, which is the subvolume within the FS. Completely non-obvious,
I'm afraid, but at least it is there. (Apparently there were some
problems with getting the information into /proc/mounts).

   Hugo.

 # df
 Filesystem 1K-blocksUsed Available Use% Mounted on
 /dev/sda36839296 1698564   4903212  26% /
 devtmpfs  501464   0501464   0% /dev
 tmpfs 507316   0507316   0% /dev/shm
 tmpfs 5073166720500596   2% /run
 tmpfs 507316   0507316   0% /sys/fs/cgroup
 /dev/sda36839296 1698564   4903212  26% /var
 /dev/sda36839296 1698564   4903212  26% /opt
 /dev/sda36839296 1698564   4903212  26% /home
 /dev/sda1 517868   93040424828  18% /boot
 
 # btrfs subvolume list -a --sort=+rootid /
 ID 257 gen 7800 top level 5 path FS_TREE/root
 ID 258 gen 4127 top level 5 path FS_TREE/home
 ID 259 gen 7801 top level 5 path FS_TREE/var
 ID 260 gen 7795 top level 5 path FS_TREE/opt
 
 # uname -a
 Linux turner11.storix 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 
 2014 x86_64 x86_64 x86_64 GNU/Linux
 
 # btrfs --version
 Btrfs v3.12
 
 # btrfs fi show
 Label: rhel_turner11  uuid: cd3c0e50-d726-44e2-9bfa-19b11614136a
   Total devices 1 FS bytes used 1.62GiB
   devid1 size 6.52GiB used 2.24GiB path /dev/sda3
 
 Btrfs v3.12
 
 # btrfs fi df /
 Data, single: total=1.98GiB, used=1.58GiB
 System, single: total=4.00MiB, used=16.00KiB
 Metadata, single: total=264.00MiB, used=36.03MiB
 


 
 


-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Great films about cricket:  Silly Point Break ---  


signature.asc
Description: Digital signature


Re: request for info on the list of parameters to tweak for PCIe SSDs

2014-11-01 Thread Hugo Mills
On Fri, Oct 31, 2014 at 03:56:15AM -0700, lakshmi_narayanan...@dell.com wrote:
 Hi,
 
 Could you kindly help us with the list of all the btrfs  file system 
 parameters , that can be tweaked for the best performance of the PCIe based 
 SSDs ?

   It should detect the ssd option automatically, but it doesn't hurt
to specify explicitly. You can use the discard option, but I would
only recommend it if the SSD supports queued TRIM -- the older
unqueued TRIM can cause massive performance problems.

   You may want to use autodefrag, which has less of an effect than it
would on a rotational disk, but will still help to keep the number of
extents down. That will reduce the metadata overhead.

   That's about all there is for btrfs-specific options, I think.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- make bzImage, not war ---  


signature.asc
Description: Digital signature


Re: Kernel 3.17.2 and RO snapshots

2014-11-02 Thread Hugo Mills
On Sun, Nov 02, 2014 at 11:42:58AM +0100, Swâmi Petaramesh wrote:
 Hi there,
 
 I'm a little lost with latest kernel issues, and would like to know if the 
 data corruption with RO snapshots is fixed in 3.17.2, or not yet ?

   Yes, it is.

 Also about this issuse, I'd like to know if :
 
 - Data corruption occurs when creating RO snapshots,

   Yes.

 - Or if it can occur with just using a FS that already has RO snapshots ?

   No.

 - Or if it can occur when deleting existing RO snapshots with 3.17 ?

   Not that I'm aware of.

 Which means : If I have mistakenty upgraded to 3.17.2 a system containing RO 
 snapshots and cannot downgrade (big distro and drivers mess...), what is the 
 safest way to go ?

   Leave it as it is. :)

 And if data corruption occurs, should I expect it to affect snapshots only, 
 or 
 the whole system ?

   I _think_ it only affects the snapshots, but could bring the system
down if you access the broken data. But you're safe with 3.17.2, so
it's a moot point for you right now.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- If it's December 1941 in Casablanca,  what time is it ---  
  in New York?   


signature.asc
Description: Digital signature


Re: Compatibility matrix kernel/tools

2014-11-05 Thread Hugo Mills
On Wed, Nov 05, 2014 at 09:57:31PM +0100, Cyril Scetbon wrote:
 Hi,
 
 Where can I find the compatibility matrix to know which btrfs-tools version 
 should work with a chosen linux kernel ?

   Any of them should work with any kernel.

   For normal operation, if the tools are too old, they may not
support newer kernel features -- but that will simply mean you can't
access the feature, not that anything will be broken.

   If you're doing recovery work (btrfs check and friends) then using
the latest released version of the tools is strongly recommended.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Putting U back in Honor,  Valor, and Trth ---


signature.asc
Description: Digital signature


Re: btrfs balance fails with no space errors (despite having plenty)

2014-11-13 Thread Hugo Mills
On Wed, Nov 12, 2014 at 06:48:47PM +, Kline, Matthew wrote:
 Yesterday I converted my ext4 root and home partitions on my home machine
 to btrfs using btrfs-convert. After confirming that everything went well,
 I followed the wiki instructions to nuke the 'ext2_saved subvolume,
 then defraggad and rebalanced. Everything went according to plan on my root
 partition, but my home partition claimed to have run out of space
 when rebalancing.
 
 I did some digging and tried the following to resolve the problem,
 but to no avail. So far I have:
 
 - Made sure I deleted the subvolume (a common cause of this problem).
   `sudo btrfs subvolume list -a /home` exits with no output.
 
 - Made sure I defragged the newly converted btrfs partition before attempting
   to rebalance it.
 
 - Made sure that I actually have space on the partition.
   It is only about 60% full - see sdb1 below:
 
   ~ % sudo btrfs fi show
   Label: none  uuid: 3a154348-9bd4-4c3f-aaf8-e9446d3797db
   Total devices 1 FS bytes used 9.49GiB
   devid1 size 87.47GiB used 11.03GiB path /dev/sda3
 
   Label: none  uuid: cc90ee50-bbda-46d6-a7e6-fe1c8578d75b
   Total devices 1 FS bytes used 124.98GiB
   devid1 size 200.00GiB used 127.03GiB path /dev/sdb1
 
 - Made sure I have metadata space (as suggested on the problem FAQ):
 
   ~ % sudo btrfs fi df /home
   Data, single: total=126.00GiB, used=124.58GiB
   System, single: total=32.00MiB, used=20.00KiB
   Metadata, single: total=1.00GiB, used=404.27MiB
   GlobalReserve, single: total=136.00MiB, used=0.00B
 
 - Ran partial rebalances using the `-dusage` flag
   (as suggested on the problem FAQ),
   which successfully balanced a handful of blocks.
 
 - Checked the system log - nothing interesting comes up.
   btrfs happily chugs along with found xxx extents and
   relocating block group x flags 1 messages before unceremoniously
   ending with 7 enospc errors during balance.
 
 In spite of all of this, a full rebalance still fails when it's about 95% 
 done.
 I'm at a complete loss as to what could be causing it - I know that it's not
 completely necessary (especially with a single drive),
 and `btrfs scrub` finds no errors with the file system,
 but the wiki gives the impression that it's a good idea after you convert from
 ext.

   I'm fairly sure it's a bug. We've had several reports of something
like this, particularly with respect to converted filesystems. I know
that failed balances with apparently plenty of space are reasonably
high up on josef's list of things to investigate. I would recommend at
minimum putting this report on bugzilla.kernel.org as well. The other
thing you could do, which may help with debugging, is to take a copy
of the metadata:

$ btrfs-image -c9 -t4 /dev/sdb1 /path/to/image.img

and hang on to it just in case josef (or whoever else looks at it)
needs another sample to debug with.

 Is there something I'm missing?

   I don't think so. This is probably a bug -- I'd guess in the FS
(because we've seen something similar on non-converted FSes), but
maybe set up more easily by the conversion process.

   Hugo.

 Other obligatory info: I'm on Arch Linux using btrfs 3.17.1. uname -a is
 
 Linux kline-arch 3.17.2-1-ARCH #1 SMP PREEMPT Thu Oct 30 20:49:39 CET 2014
 x86_64 GNU/Linux

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Strive for apathy! ---


signature.asc
Description: Digital signature


Two persistent problems

2014-11-14 Thread Hugo Mills
   Chris, Josef, anyone else who's interested,

   On IRC, I've been seeing reports of two persistent unsolved
problems. Neither is showing up very often, but both have turned up
often enough to indicate that there's something specific going on
worthy of investigation.

   One of them is definitely a btrfs problem. The other may be btrfs,
or something in the block layer, or just broken hardware; it's hard to
tell from where I sit.

Problem 1: ENOSPC on balance

   This has been going on since about March this year. I can
reasonably certainly recall 8-10 cases, possibly a number more. When
running a balance, the operation fails with ENOSPC when there's plenty
of space remaining unallocated. This happens on full balance, filtered
balance, and device delete. Other than the ENOSPC on balance, the FS
seems to work OK. It seems to be more prevalent on filesystems
converted from ext*. The first few or more reports of this didn't make
it to bugzilla, but a few of them since then have gone in.

Problem 2: Unexplained zeroes

   Failure to mount. Transid failure, expected xyz, have 0. Chris
looked at an early one of these (for Ke, on IRC) back in September
(the 27th -- sadly, the public IRC logs aren't there for it, but I can
supply a copy of the private log). He rapidly came to the conclusion
that it was something bad going on with TRIM, replacing some blocks
with zeroes. Since then, I've seen a bunch of these coming past on
IRC. It seems to be a 3.17 thing. I can successfully predict the
presence of an SSD and -odiscard from the have 0. I've successfully
persuaded several people to put this into bugzilla and capture
btrfs-images.  btrfs recover doesn't generally seem to be helpful in
recovering data.


   I think Josef had problem 1 in his sights, but I don't know if
additional images or reports are helpful at this point. For problem 2,
there's obviously something bad going on, but there's not much else to
go on -- and the inability to recover data isn't good.

   For each of these, what more information should I be trying to
collect from any future reporters?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Great films about cricket:  Forrest Stump ---


signature.asc
Description: Digital signature


Re: Two persistent problems

2014-11-17 Thread Hugo Mills
On Fri, Nov 14, 2014 at 05:00:26PM -0500, Josef Bacik wrote:
 On 11/14/2014 04:51 PM, Hugo Mills wrote:
 Chris, Josef, anyone else who's interested,
 
 On IRC, I've been seeing reports of two persistent unsolved
 problems. Neither is showing up very often, but both have turned up
 often enough to indicate that there's something specific going on
 worthy of investigation.
 
 One of them is definitely a btrfs problem. The other may be btrfs,
 or something in the block layer, or just broken hardware; it's hard to
 tell from where I sit.
 
 Problem 1: ENOSPC on balance
 
 This has been going on since about March this year. I can
 reasonably certainly recall 8-10 cases, possibly a number more. When
 running a balance, the operation fails with ENOSPC when there's plenty
 of space remaining unallocated. This happens on full balance, filtered
 balance, and device delete. Other than the ENOSPC on balance, the FS
 seems to work OK. It seems to be more prevalent on filesystems
 converted from ext*. The first few or more reports of this didn't make
 it to bugzilla, but a few of them since then have gone in.
 
 Problem 2: Unexplained zeroes
 
 Failure to mount. Transid failure, expected xyz, have 0. Chris
 looked at an early one of these (for Ke, on IRC) back in September
 (the 27th -- sadly, the public IRC logs aren't there for it, but I can
 supply a copy of the private log). He rapidly came to the conclusion
 that it was something bad going on with TRIM, replacing some blocks
 with zeroes. Since then, I've seen a bunch of these coming past on
 IRC. It seems to be a 3.17 thing. I can successfully predict the
 presence of an SSD and -odiscard from the have 0. I've successfully
 persuaded several people to put this into bugzilla and capture
 btrfs-images.  btrfs recover doesn't generally seem to be helpful in
 recovering data.
 
 
 I think Josef had problem 1 in his sights, but I don't know if
 additional images or reports are helpful at this point. For problem 2,
 there's obviously something bad going on, but there's not much else to
 go on -- and the inability to recover data isn't good.
 
 For each of these, what more information should I be trying to
 collect from any future reporters?
 
 
 
 So for #2 I've been looking at that the last two weeks.  I'm always
 paranoid we're screwing up one of our data integrity sort of things,
 either not waiting on IO to complete properly or something like
 that. I've built a dm target to be as evil as possible and have been
 running it trying to make bad things happen.  I got slightly side
 tracked since my stress test exposed a bug in the tree log stuff an
 csums which I just fixed.  Now that I've fixed that I'm going back
 to try and make the expected blah, have 0 type errors happen.

   I've searched the bugzilla archive and found the two reports that I
know of (87061 and 87021); I couldn't see any others. I've requested
more information on both -- nothing obviously in common, except SSD
and (probably) discard. I tried to tag them both with trim for easy
finding, but that seems to have been lost somewhere. I'll try that
again when I get home this evening and have access to my password.

 As for the ENOSPC I keep meaning to look into it and I keep getting
 distracted with other more horrible things.  Ideally I'd like to
 reproduce it myself, so more info on that front would be good, like
 do all reports use RAID/compression/some other odd set of features?
 Thanks for taking care of this stuff Hugo, #2 is the worst one and
 I'd like to be absolutely sure it's not our bug, once I'm happy we
 aren't I'll look at the balance thing.

   OK, good to know you're on both of these. I think the easy
solution to reproduce the ENOSPC is to convert an ext4 filesystem. It
doesn't seem to be a unique characteristic, but it is a frequent
correlation. We had another one today, after an FS conversion -- I've
asked them to attach a btrfs-image dump and the enospc_debug log to
the bugzilla report.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- 2 + 2 = 5,  for sufficiently large values of 2. --- 


signature.asc
Description: Digital signature


Re: Two persistent problems

2014-11-17 Thread Hugo Mills
On Mon, Nov 17, 2014 at 11:59:48AM +0100, Konstantin wrote:
 Josef Bacik wrote on 14.11.2014 at 23:00:
  On 11/14/2014 04:51 PM, Hugo Mills wrote:
[snip]
  Problem 2: Unexplained zeroes
 
  Failure to mount. Transid failure, expected xyz, have 0. Chris
  looked at an early one of these (for Ke, on IRC) back in September
  (the 27th -- sadly, the public IRC logs aren't there for it, but I can
  supply a copy of the private log). He rapidly came to the conclusion
  that it was something bad going on with TRIM, replacing some blocks
  with zeroes. Since then, I've seen a bunch of these coming past on
  IRC. It seems to be a 3.17 thing. I can successfully predict the
  presence of an SSD and -odiscard from the have 0. I've successfully
  persuaded several people to put this into bugzilla and capture
  btrfs-images.  btrfs recover doesn't generally seem to be helpful in
  recovering data.
[snip]
  So for #2 I've been looking at that the last two weeks.  I'm always
  paranoid we're screwing up one of our data integrity sort of things,
  either not waiting on IO to complete properly or something like that.
  I've built a dm target to be as evil as possible and have been running
  it trying to make bad things happen.  I got slightly side tracked
  since my stress test exposed a bug in the tree log stuff an csums
  which I just fixed.  Now that I've fixed that I'm going back to try
  and make the expected blah, have 0 type errors happen.
[snip]
 For #2, I had a strangely damaged BTRFS I reported a week or so ago
 which may have similar background. Dmesg gives:
 
 parent transid verify failed on 586239082496 wanted 13329746340512024838
 found 588
 BTRFS: open_ctree failed
 
 The thing is that btrfsck crashes when trying to check this. As nobody
 seemed to be interested I reformatted this disk today.

   Whilst that's a genuine problem, it's not specifically the one I
was referring to here, which shows up with want=X, have=0 from btrfs
check, and seems to be related to TRIM on SSDs.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Turning,  pages turning in the widening bath, / The spine ---
cannot bear the humidity. / Books fall apart; the binding
cannot hold. / Page 129 is loosed upon the world.


signature.asc
Description: Digital signature


Re: btrfs filesystem show _exact_ freaking size?

2014-11-18 Thread Hugo Mills
On Tue, Nov 18, 2014 at 02:39:48AM -0800, Robert White wrote:
 Howdy,
 
 How does one get the exact size (in blocks preferably, but bytes
 okay) of the filesystem inside a partition? I know how to get the
 partition size, but that's not useful when shrinking a partition...
 
 So, for example, you successfully do
 
 btrfs filesystem resize -32G /dev/sdz2
 
 now you've got some space zero idea how many sectors can be
 trimmed off the end of the partition, you can do the math but thats
 a little iffy, especially if the file system didn't originally fill
 the partition to begin with.
 
 The current methodology for most such actions is to way over-trim
 the file system, then reallocate the space using your partition tool
 of choice, then re-grow the filesystem to fit. This has been the way
 of things forever and it blows...
 
 There needs to be an option to btrfs filesystem show that will tell
 you Xblocks, not Y.ZZ terabytes.

   The 3.17 userspace tools should now support flags to select the
display units in some detail, including bytes.

   Even without that, though, for your use case, I would recommend
shrinking the FS by *more* than you wanted to shrink, resizing the
partition, and then resizing the FS back up to fit the partition
exactly (with btrfs fi resize n:max).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Stick them with the pointy end ---  


signature.asc
Description: Digital signature


Re: btrfs send erroring...

2014-11-20 Thread Hugo Mills
On Thu, Nov 20, 2014 at 11:57:50AM -0500, Ken D'Ambrosio wrote:
 Hi!  Trying to do a btrfs send, and failing with:
 
 root@khamul:~# btrfs send /biggie/BACKUP/ | btrfs receive /tmp/sdd1/
 At subvol /biggie/BACKUP/
 At subvol BACKUP
 ERROR: rename o2046806-17126-0 - volumes/ccdn-ch2-01 failed. No
 such file or directory

   This looks like one of several bugs that have been fixed
recently. What kernel version and userspace tools version are you
using?

   Hugo.

 Judging by disk capacity, it hits this about 40% of the way through.
 As my disk has subvolumes on it, which are underneath
 /biggie/BACKUP/, is there a different way I should go about
 sending an entire disk?
 
 Thanks!
 
 -Ken

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- ©1973 Unclear Research Ltd ---   


signature.asc
Description: Digital signature


Re: Fixing Btrfs Filesystem Full Problems typo?

2014-11-22 Thread Hugo Mills
On Sun, Nov 23, 2014 at 12:26:38AM +0100, Patrik Lundquist wrote:
 On 22 November 2014 at 23:26, Marc MERLIN m...@merlins.org wrote:
 
  This one hurts my brain every time I think about it :)
 
 I'm new to Btrfs so I may very well be wrong, since I haven't really
 read up on it. :-)
 
 
  So, the bigger the -dusage number, the more work btrfs has to do.
 
 Agreed.
 
 
  -dusage=0 does almost nothing
  -dusage=100 effectively rebalances everything
 
 And -dusage=0 effectively reclaims empty chunks, right?
 
 
  But saying saying less than 95% full for -dusage=95 would mean
  rebalancing everything that isn't almost full,
 
 But isn't that what rebalance does? Rewriting chunks =95% full to
 completely full chunks and effectively defragmenting chunks and most
 likely reduce the number of chunks.
 
 A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and
 dev_item.bytes_used went from 1593466421248 to 1491460947968.
 
 
  Now, just to be sure, if I'm getting this right, if your filesystem is
  55% full, you could rebalance all blocks that have less than 55% space
  free, and use -dusage=55
 
 I realize that I interpret the usage parameter as operating on blocks
 (chunks? are they the same in this case?) that are = 55% full while
 you interpret it as = 55% free.
 
 Which is correct?

   Less than or equal to 55% full.

   0 gives you less than or equal to 0% full -- i.e. the empty block
groups. 100 gives you less than or equal to 100% full, i.e. all block
groups.

   A chunk is the part of a block group that lives on one device, so
in RAID-1, every block group is precisely two chunks; in RAID-0, every
block group is 2 or more chunks, up to the number of devices in the
FS. A chunk is usually 1 GiB in size for data and 250 MiB for
metadata, but can be smaller under some circumstances.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- And what rough beast,  its hour come round at last / slouches ---  
 towards Bethlehem,  to be born? 


signature.asc
Description: Digital signature


Re: Best GIT repository(s) for preparing patches?

2014-11-23 Thread Hugo Mills
On Sat, Nov 22, 2014 at 04:14:35PM -0800, Robert White wrote:
 Which is the best GIT repository to clone for each of the kernel
 support and btrfs-progs, for preparing a patch to submit to this
 email list?

   For kernel, I would suggest using a repo with Linus's latest -rc
tag (Linus's, for example :) ). For userspace, probably the latest -rc
tag from kdave's repo.

See https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Are you the man who rules the Universe? Well,  I ---   
  try not to.   


signature.asc
Description: Digital signature


Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-24 Thread Hugo Mills
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote:
 On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo bo.li@oracle.com wrote:
 This brings a strong-but-slow checksum algorithm, sha256.
 
 Actually btrfs used sha256 at the early time, but then moved to
 crc32c for
 performance purposes.
 
 As crc32c is sort of weak due to its hash collision issue, we need
 a stronger
 algorithm as an alternative.
 
 Users can choose sha256 from mkfs.btrfs via
 
 $ mkfs.btrfs -C 256 /device
 
 Agree with others about -C 256...-C sha256 is only three letters more ;)
 
 What's the target for this mode?  Are we trying to find evil people
 scribbling on the drive, or are we trying to find bad hardware?

   You're going to need a hell of a lot more infrastructure to deal
with the first of those two cases. If someone can write arbitrary data
to your storage without going through the filesystem, you've already
lost the game.

   I don't know what the stats are like for random error detection
(probably just what you'd expect in the naive case -- 1/2^n chance of
failing to detect an error for an n-bit hash). More bits likely are
better for that, but how much CPU time do you want to burn on it?

   I could see this possibly being useful for having fewer false
positives when using the inbuilt checksums for purposes of dedup.

   Hugo.

-- 
Hugo Mills | That's not rain, that's a lake with slots in it
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |


signature.asc
Description: Digital signature


Re: Moving contents from one subvol to another

2014-11-29 Thread Hugo Mills
On Sat, Nov 29, 2014 at 07:51:07PM +0530, Shriramana Sharma wrote:
 Hello. I am now taking the first steps to making my backup external
 HDD in BtrFS. From
 http://askubuntu.com/questions/119014/btrfs-subvolumes-vs-folders I
 understand that the only difference between subvolumes and ordinary
 folders is that the former can be snapshotted and independently
 mounted.
 
 But I have a question. I have two subvols test1, test2.
 
 $ cd test1
 $ dd if=/dev/urandom of=file bs=1M count=500
 500+0 records in
 500+0 records out
 524288000 bytes (524 MB) copied, 36.2291 s, 14.5 MB/s
 $ time mv file ../test2/
 real0m2.061s
 user0m0.013s
 sys 0m0.459s
 $ time { cp --reflink ../test2/file .  rm ../test2/file ; }
 real0m0.677s
 user0m0.022s
 sys 0m0.086s
 $ mkdir foo
 $ time mv file foo/
 real0m0.096s
 user0m0.008s
 sys 0m0.013s
 
 It seems that mv is not CoW aware and hence is not able to create
 reflinks so it is actually processing the entire file because it
 thinks test2 is a different device/filesystem/partition or such. Is
 this understanding correct?

   The latest version of mv should be able to use CoW copies to make
it more efficient. It has a --reflink option, the same as cp. Note
that you can't make reflinks crossing a mount boundary, but you can do
so crossing a subvolume boundary (as you're doing here).

 So doing cp --reflink with rm is much faster. But it is still slower
 than doing mv within the same subvol. Is it because of the
 housekeeping with updating the metadata of the two subvols?

   I should think so, yes.

 Methinks --reflink option should be added to mv for the above usecase.
 Do people think this is useful? Why or why not?

   See above: it already has been. :)

 My concern is that if somebody wants to consolidate two subvols into
 one, though really only the metadata needs to be processed using
 ordinary mv isn't aware of this and using cp --reflink with rm is
 unnecessarily complicated, especially if it will involve multiple
 files.
 
 And it's not clear to me what it would entail to cp --reflink + rm an
 entire directory tree because IIUC I'd have to handle each file
 separately. Perhaps something (unnecessarily convoluted) like:
 
 find . | while read f
 do
 [ -d $f ]  mkdir target/$f  touch target/$f -r $f
 [ -f $f ]  cp -a --reflink $f target/  rm $f
 done
 
 Again, what would happen to files which are not regular directories or files?

   Probably just the same thing that would happen without the
--reflink=always.

 And why isn't --reflink given a single letter alias for cp?

   I don't know about that; you'll have to ask the coreutils
developers. They're probably expecting it to be largely set to a
single value by default (e.g. through a shall alias).

   Hugo.

-- 
Hugo Mills | I will not be pushed, filed, stamped, indexed,
hugo@... carfax.org.uk | briefed, debriefed or numbered.
http://carfax.org.uk/  | My life is my own.
PGP: 65E74AC0  |Number 6, The Prisoner


signature.asc
Description: Digital signature


Re: root subvol id is 0 or 5?

2014-11-30 Thread Hugo Mills
On Sun, Nov 30, 2014 at 09:01:37AM +0530, Shriramana Sharma wrote:
 I am confused with this: should I call it the root subvol or
 top-level subvol or default subvol or doesn't it matter? Are all
 subvols equal, or some are more equal than others [hark to Orwell's
 Animal Farm ;-)]?

   I try to use top level for subvolid=5.

   root subvol is hugely confusing, as it could be one of several
things. If you mean the subvol mounted at /, then I call that / or
the / subvol.

   default subvol is the one marked as default. This starts out as
subvolid=5, but can be set to any other subvol.

 And more importantly, is the ID of the root subvol 0 or 5?

   In the data structures on disk, it's 5. The kernel aliases 0 to
mean subvolid 5.

 The Oracle guide
 (https://docs.oracle.com/cd/E37670_01/E37355/html/ol_use_case3_btrfs.html)
 seems to say it's 0 :
 
 By default, the operating system mounts the parent btrfs volume,
 which has an ID of 0
 
 but the BtrFS wiki (and btrfs subvol manpage) reads 5:
 
 every btrfs filesystem has a default subvolume as its initially
 top-level subvolume, whose subvolume id is 5(FS_TREE).
 
 as also the Ubuntu Wiki:
 
 The default subvolume to mount is always the top of the btrfs tree
 (subvolid=5).

   As above, both are correct here.

 Now this Oracle page
 http://www.oracle.com/technetwork/articles/servers-storage-admin/advanced-btrfs-1734952.html
 says:
 
 The only clean way to destroy the default subvolume is to rerun the
 mkfs.btrfs command, which would destroy existing data.

   OK, this is actually wrong. It's not the default subvolume if
someone's run set-default on the FS. They're correct that you can't
delete the top-level subvol. You can't delete the subvol marked as
default, either. Assuming (or implying) that the two are the same is
just plain wrong.

 So from what I've (confusedly) understood so far, 0 refers to the
 superstructure (or whatchamacallit) of the entire BtrFS-based contents
 of the device(s) and hence cannot be deleted but only reset by a
 mkfs.btrfs, but 5 is only the default subvol (mounted when the FS as a
 whole is mounted without subvol spec) provided by mkfs.btrfs, and
 subvol set-default can have another subvol mounted as default instead,
 after which 5 can actually be deleted?

   You can't delete subvolid=5. It's part of the fundamental
whatchamacallit of the FS (a good name). Even if you change the
default subvol, you still can't delete it.

   Hugo.

-- 
Hugo Mills | People are too unreliable to be replaced by
hugo@... carfax.org.uk | machines.
http://carfax.org.uk/  |
PGP: 65E74AC0  |  Nathan Spring, Star Cops


signature.asc
Description: Digital signature


<    2   3   4   5   6   7   8   9   10   11   >