Re: Will big metadata blocks fix # of hardlinks?
On Tue, May 29, 2012 at 02:09:03PM +0100, Martin wrote: Thanks for noting this one. That is one very surprising and unexpected limit!... And a killer for some not completely rare applications... There have been substantially-complete patches posted to this list which fix the problem (see extended inode refs patches by Mark Fasheh in the archives). I don't think they're quite ready for inclusion yet, but work is ongoing to fix the issue. On 26/05/12 19:22, Sami Liedes wrote: Hi! I see that Linux 3.4 supports bigger metadata blocks for btrfs. Will using them allow a bigger number of hardlinks on a single file (i.e. the bug that has bitten at least git users on Debian[1,2], and BackupPC[3])? As far as I understand correctly, the problem has been that the hard links are stored in the same metadata block with some other metadata, so the size of the block is an inherent limitation? If so, I think it would be worth for me to try Btrfs again :) Sami [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/13603 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642603 [3] https://bugzilla.kernel.org/show_bug.cgi?id=15762 One example fail case is just 13 hard links. Even x4 that (16k blocks) only gives 52 links for that example fail case. The brief summary for those are: * It's a rare corner case that needs a format change to fix, so won't-fix; Definitely not won't-fix (see above). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 7: The Simple Truth --- signature.asc Description: Digital signature
Re: Help with data recovering
doesn't match, have=9066, want=9096 Well block 3674785320960 seems great, but generation doesn't match, have=9067, want=9096 Well block 3674788827136 seems great, but generation doesn't match, have=9069, want=9096 Well block 3674792534016 seems great, but generation doesn't match, have=9068, want=9096 Well block 3674808315904 seems great, but generation doesn't match, have=9071, want=9096 Well block 3728604938240 seems great, but generation doesn't match, have=5297, want=9096 Well block 3728635133952 seems great, but generation doesn't match, have=7598, want=9096 Well block 3728682438656 seems great, but generation doesn't match, have=7599, want=9096 Well block 3728770461696 seems great, but generation doesn't match, have=9074, want=9096 Well block 3728819929088 seems great, but generation doesn't match, have=9073, want=9096 Well block 3820340637696 seems great, but generation doesn't match, have=9075, want=9096 Well block 3960145862656 seems great, but generation doesn't match, have=9076, want=9096 Well block 4046161489920 seems great, but generation doesn't match, have=9077, want=9096 Well block 4046213595136 seems great, but generation doesn't match, have=9079, want=9096 Well block 4046217637888 seems great, but generation doesn't match, have=9081, want=9096 Well block 4046217846784 seems great, but generation doesn't match, have=9080, want=9096 Well block 4046252736512 seems great, but generation doesn't match, have=9083, want=9096 Well block 4046301515776 seems great, but generation doesn't match, have=9085, want=9096 Well block 4046302756864 seems great, but generation doesn't match, have=9084, want=9096 Well block 4046358921216 seems great, but generation doesn't match, have=9086, want=9096 Well block 4046409486336 seems great, but generation doesn't match, have=9087, want=9096 Well block 4046414626816 seems great, but generation doesn't match, have=9088, want=9096 Well block 4148447113216 seems great, but generation doesn't match, have=7618, want=9096 Well block 4148522024960 seems great, but generation doesn't match, have=9089, want=9096 Well block 4148539457536 seems great, but generation doesn't match, have=9090, want=9096 Well block 4455562448896 seems great, but generation doesn't match, have=9092, want=9096 Well block 4455568302080 seems great, but generation doesn't match, have=9091, want=9096 Well block 4848395739136 seems great, but generation doesn't match, have=9093, want=9096 Well block 4923796594688 seems great, but generation doesn't match, have=9094, want=9096 Well block 4923798065152 seems great, but generation doesn't match, have=9095, want=9096 Found tree root at 5532762525696 On 06/04/2012 07:49 AM, Hugo Mills wrote: On Mon, Jun 04, 2012 at 07:43:40AM -0400, Maxim Mikheev wrote: Hi Arne, Can you advice how can I recover data? I tried almost everything what I found on https://btrfs.wiki.kernel.org /btrfs-restore restored some files but it is not what was stored. Can you post the complete output of find-root please? I have seen this command -- In case of a corrupted superblock, start by asking btrfsck to use an alternate copy of the superblock instead of the superblock #0. This is achieved via the -s option followed by the number of the alternate copy you wish to use. In the following example we ask for using the superblock copy #2 of /dev/sda7: # ./btrfsck -s 2 /dev/sd7 - but it gave me: $ sudo btrfsck -s 2 /dev/sdb btrfsck: invalid option -- 's' usage: btrfsck dev Btrfs Btrfs v0.19 What exact version of the package do you have? Did you compile from a recent git, or do you have a distribution -progs package installed? If the latter, what date does it have in the version number? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- __(_' Squeak! --- signature.asc Description: Digital signature
Re: Help with data recovering
[trimmed Arne Jan from cc by request] On Mon, Jun 04, 2012 at 08:28:22AM -0400, Maxim Mikheev wrote: adding -v, as an example: sudo btrfs-find-root -v -v -v -v -v /dev/sdb didn't change output at all. OK, then all I can suggest is what I said below -- work through the potential tree roots in order from largest generation id to smallest. Given that it's not reporting any trees, though, I'm not certain that you'll get any success with it. Did you have your data in a subvolume? Hugo. On 06/04/2012 08:11 AM, Hugo Mills wrote: On Mon, Jun 04, 2012 at 08:01:32AM -0400, Maxim Mikheev wrote: Thank you for helping. I'm not sure I can be of much help, but there were a few things missing from the earlier conversation that I wanted to check the details of. ~$ uname -a Linux s0 3.4.0-030400-generic #201205210521 SMP Mon May 21 09:22:02 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I compiled progs from recent git (week or two ago). I can compile it again if there updates. No, that should be recent enough. I don't think there have been any major updates since then. The output of btrfs-find-root is pretty long and below: max@s0:~$ sudo btrfs-find-root /dev/sdb Super think's the tree root is at 5532762525696, chunk root 20979712 Well block 619435147264 seems great, but generation doesn't match, have=8746, want=9096 This is not long enough, unfortunately. At least some of these should have a list of trees before them. At the moment, it's not reporting any trees at all. (At least, it should be doing this unless Chris took that line of code out). Do you get anything extra from adding a few -v options to the command? I would suggest, in the absence of any better ideas, sorting this list by the have= value, and systematically working down from the largest to the smallest, running btrfs-restore -t $n for each one (where $n is corresponding block number). Hugo. [snip] -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- __(_' Squeak! --- signature.asc Description: Digital signature
Re: Help with data recovering
On Mon, Jun 04, 2012 at 06:04:22PM +0100, Hugo Mills wrote: I'm out of ideas. ... but that's not to say that someone else may have some ideas. I wouldn't get your hopes up too much, though. At this point, though, you're probably looking at somebody writing custom code to scan the FS and attempt to find and retrieve anything that's recoverable. You might try writing a tool to scan all the disks for useful fragments of old trees, and see if you can find some of the tree roots independently of the tree of tree roots (which clearly isn't particularly functional right now). You might try simply scanning the disks looking for your lost data, and try to reconstruct as much of it as you can from that. You could try to find a company specialising in data recovery and pay them to try to get your data back. Or you might just have to accept that the data's gone and work on reconstructing it. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A linked list is still a binary tree. Just a very unbalanced --- one. -- dragon signature.asc Description: Digital signature
Re: [btrfs-progs] [bug][patch] Leaking file handle in scrub_fs_info()
; } - ret = scrub_fs_info(fdmnt, path, fi_args, di_args); + ret = scrub_fs_info(path, fi_args, di_args); if (ret) { ERR(!do_quiet, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); @@ -1586,7 +1601,6 @@ static int cmd_scrub_status(int argc, char **argv) .sun_family = AF_UNIX, }; int ret; - int fdmnt; int i; int print_raw = 0; int do_stats_per_dev = 0; @@ -1615,13 +1629,7 @@ static int cmd_scrub_status(int argc, char **argv) path = argv[optind]; - fdmnt = open_file_or_dir(path); - if (fdmnt 0) { - fprintf(stderr, ERROR: can't access to '%s'\n, path); - return 12; - } - - ret = scrub_fs_info(fdmnt, path, fi_args, di_args); + ret = scrub_fs_info(path, fi_args, di_args); if (ret) { fprintf(stderr, ERROR: getting dev info for scrub failed: %s\n, strerror(-ret)); @@ -1698,7 +1706,6 @@ static int cmd_scrub_status(int argc, char **argv) out: free_history(past_scrubs); free(di_args); - close(fdmnt); if (fdres -1) close(fdres); -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Computer Science is not about computers, any more than --- astronomy is about telescopes. signature.asc Description: Digital signature
Re: delete disk proceedure
On Tue, Jun 05, 2012 at 10:38:11AM -0400, Jim wrote: Good morning btrfs list, I had written about 2 weeks ago about using extra btrfs space in an nfs file system setup. Nfs seems to export the files but the mounts don't work on older machines without btrfs kernels. The mounts don't work -- can you be more specific here? It would seem that if we can get to the bottom of that problem, you won't have to muck around with your current set-up at all. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am an opera lover from planet Zog. Take me to your lieder. --- signature.asc Description: Digital signature
Re: delete disk proceedure
On Tue, Jun 05, 2012 at 01:12:17PM -0400, Jim wrote: [sorry for the resend, signature again] I am waiting for a window (later tonight) when I can try mounting the btrfs export. Am I reading you guys correctly, that you think I should be deleting drives from the array? Or is this a just in case? Thanks. Try the modified exports as I suggested in the other part of the thread first. If that turns out to be problematic still, then we can discuss any migration strategies. Hugo. Jim Maloney On 06/05/2012 01:04 PM, Hugo Mills wrote: On Tue, Jun 05, 2012 at 06:19:00PM +0200, Helmut Hullen wrote: Hallo, Jim, Du meintest am 05.06.12: /dev/sda 11T 4.9T 6.0T 46% /btrfs [root@advanced ~]# btrfs fi show failed to read /dev/sr0 Label: none uuid: c21f1221-a224-4ba4-92e5-cdea0fa6d0f9 Total devices 12 FS bytes used 4.76TB devid6 size 930.99GB used 429.32GB path /dev/sdf devid5 size 930.99GB used 429.32GB path /dev/sde devid8 size 930.99GB used 429.32GB path /dev/sdh devid9 size 930.99GB used 429.32GB path /dev/sdi devid4 size 930.99GB used 429.32GB path /dev/sdd devid3 size 930.99GB used 429.32GB path /dev/sdc devid 11 size 930.99GB used 429.08GB path /dev/sdk devid2 size 930.99GB used 429.32GB path /dev/sdb devid 10 size 930.99GB used 429.32GB path /dev/sdj devid 12 size 930.99GB used 429.33GB path /dev/sdl devid7 size 930.99GB used 429.32GB path /dev/sdg devid1 size 930.99GB used 429.09GB path /dev/sda Btrfs v0.19-35-g1b444cd df -h and btrfs fi show seem to be in good size agreement. Btrfs was created as raid1 metadata and raid0 data. I would like to delete the last 4 drives leaving 7T of space to hold 4.9T of data. My plan would be to remove /dev/sdi, j, k, l one at a time. After all are deleted run btrfs fi balance /btrfs. I'd prefer btrfs device delete /dev/sdi btrfs filesystem balance /btrfs btrfs device delete /dev/sdj btrfs filesystem balance /btrfs etc. - after every delete its balance run. That's not necessary. Delete will move the blocks from the device being removed into spare space on the other devices. The balance is unnecessary. (In fact, delete and balance share quite a lot of code) That may take a lot of hours - I use the last lines of dmesg to extrapolate the needed time (btrfs produces a message about every minute). And you can't use the console from where you have started the balance command. Therefore I wrap this command: echo 'btrfs filesystem balance /btrfs' | at now ... or just put it into the background with btrfs bal start /mountpoint. You know, like everyone else does. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Quantum est ille canis in fenestra? --- signature.asc Description: Digital signature
New btrfs-progs integration branch
I've just pushed out a new integration branch to my git repo. This is purely bugfix patches -- there are no new features in this issue of the integration branch. I've got a stack of about a dozen more patches with new features in them still to go. I'll be working on those tomorrow. As always, there's minimal testing involved here, but it does at least compile on my system(*). The branch is fetchable with git from: http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ integration-20120605 And viewable in human-readable form at: http://git.darksatanic.net/cgi/gitweb.cgi?p=btrfs-progs-unstable.git Shortlog is below. Hugo. (*) I don't care about works-on-my-machine. We are not shipping your machine! Akira Fujita (1): Btrfs-progs: Fix manual of btrfs command Chris Samuel (1): Fix set-dafault typo in cmds-subvolume.c Csaba Tóth (1): mkfs.btrfs on ARM Goffredo Baroncelli (1): scrub_fs_info( ) file handle leaking Hubert Kario (2): Fix segmentation fault when opening invalid file system man: fix btrfs man page formatting Jan Kara (1): mkfs: Handle creation of filesystem larger than the first device Jim Meyering (5): btrfs_scan_one_dir: avoid use-after-free on error path mkfs: use strdup in place of strlen,malloc,strcpy sequence restore: don't corrupt stack for a zero-length command-line argument avoid several strncpy-induced buffer overruns mkfs: avoid heap-buffer-read-underrun for zero-length size arg Josef Bacik (3): Btrfs-progs: make btrfsck aware of free space inodes Btrfs-progs: make btrfs filesystem show uuid actually work btrfs-progs: enforce block count on all devices in mkfs Miao Xie (3): Btrfs-progs: fix btrfsck's snapshot wrong unresolved refs Btrfs-progs, btrfs-corrupt-block: fix the wrong usage Btrfs-progs, btrfs-map-logical: Fix typo in usage Phillip Susi (2): btrfs-progs: removed extraneous whitespace from mkfs man page btrfs-progs: document --rootdir mkfs switch Sergei Trofimovich (2): Makefile: use $(CC) as a compilers instead of $(CC)/gcc Makefile: use $(MAKE) instead of hardcoded 'make' Shawn Bohrer (1): btrfs-progs: Update resize documentation Wang Sheng-Hui (1): btrfs-progs: cleanup: remove the redundant BTRFS_CSUM_TYPE_CRC32 macro def -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 5: Manifesto Promise --- signature.asc Description: Digital signature
Re: New btrfs-progs integration branch
On Wed, Jun 06, 2012 at 01:48:00PM +0200, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 05.06.12: The branch is fetchable with git from: http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ integration-20120605 There seems to be a bug inside: [...] gcc -g -O0 -o btrfsck btrfsck.o ctree.o disk-io.o radix-tree.o extent- tree.o print-tree.o root-tree.o dir-item.o file-item.o inode-item.o inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o volumes.o utils.o btrfs-list.o btrfslabel.o repair.o -luuid gcc -g -O0 -o btrfs-convert ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o root-tree.o dir-item.o file-item.o inode-item.o inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o volumes.o utils.o btrfs-list.o btrfslabel.o repair.o convert.o -lext2fs -lcom_err -luuid gcc convert.o -o convert convert.o: In function `btrfs_item_key': /tmp/btrfs-progs-unstable/ctree.h:1404: undefined reference to `read_extent_buffer' convert.o: In function `btrfs_dir_item_key': /tmp/btrfs-progs-unstable/ctree.h:1437: undefined reference to `read_extent_buffer' convert.o: In function `btrfs_del_item': Odd. I've just tried this on a clean clone of my repo, and it's building fine. It's declared in extent_io.h, and defined in extent_io.c. However, it does look like there's a problem with the make process: my Makefile says: btrfs-convert: $(objects) convert.o $(CC) $(CFLAGS) -o btrfs-convert $(objects) convert.o -lext2fs -lcom_err $(LDFLAGS) $(LIBS) ... which seems to be what the second line you quoted is doing. However, the third line with the problem looks like something out of date. Possibly a mis-merge? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am but mad north-north-west: when the wind is southerly, I --- know a hawk from a handsaw. signature.asc Description: Digital signature
Re: New btrfs-progs integration branch
On Wed, Jun 06, 2012 at 05:03:00PM +0200, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 06.06.12: However, the third line with the problem looks like something out of date. Possibly a mis-merge? Where should I search? Well, the first thing would be to try a completely new clone of the repo, then git co integration-20120605, and run make again. If that's OK, then take a look with gitk in the broken repo and see what kind of history you've got in there -- it should be a single unbroken sequence from master (1957076ab4fefa47b6efed3da541bc974c83eed7) to integration-20120605 (d4c539067d1cb2476c7fb6003625de26e84059af). Also have a look in the Makefile of the broken repo -- all of the commands (listed near the top, assigned to the progs variable) should start with btrfs, and there should be no rule for convert in there. Again, if that's not the case, you've managed to mis-merge or check out the wrong branch. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We are all lying in the gutter, but some of us are looking --- at the stars. signature.asc Description: Digital signature
Re: New btrfs-progs integration branch
On Wed, Jun 06, 2012 at 05:52:00PM +0200, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 06.06.12: However, the third line with the problem looks like something out of date. Possibly a mis-merge? Where should I search? Well, the first thing would be to try a completely new clone of the repo, then git co integration-20120605, and run make again. I had a brand new git clone. Produced with [...] git clone http://git.darksatanic.net/repo/btrfs-progs-unstable.git cd btrfs-progs-unstable git checkout integration-20120605 (and btrfs-progs-unstable had been empty before checkout) If that's OK, then take a look with gitk in the broken repo and see what kind of history you've got in there -- it should be a single unbroken sequence from master (1957076ab4fefa47b6efed3da541bc974c83eed7) to integration-20120605 (d4c539067d1cb2476c7fb6003625de26e84059af). I don't know much about working with git ... but I suppose I'm not working with such things as a (broken) repo. It's the same way I had successfully compiled your version from 20111012 and from 20111030. Is there any change compiling the new version? No, just type make from the directory. Can you compare your Makefile with the one at [1] -- in particular the progs variable at line 21-23, the all target on line 37, and the btrfs-convert target on line 97. There definitely should not be a plain convert target in there, but that seems to be what your system was failing on. Hugo. [1] http://git.darksatanic.net/cgi/gitweb.cgi?p=btrfs-progs-unstable.git;a=blob;f=Makefile;h=9699366d506918db711245aa771d103698a7;hb=integration-20120605 -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We are all lying in the gutter, but some of us are looking --- at the stars. signature.asc Description: Digital signature
Re: New btrfs-progs integration branch
On Wed, Jun 06, 2012 at 06:18:00PM +0200, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 06.06.12: git checkout integration-20120605 [...] Can you compare your Makefile with the one at [1] -- in particular the progs variable at line 21-23, the all target on line 37, and the btrfs-convert target on line 97. There definitely should not be a plain convert target in there, but that seems to be what your system was failing on. Makefile with 3888 Bytes. md5sum Makefile shows (my file) deef961e3ecd560ad8710cf0b58f5570 Makefile (the file from your link) deef961e3ecd560ad8710cf0b58f5570 Makefile The problem is somewhere on another place ... OK, can you send through the complete output of: $ make clean $ gcc --version $ make --version $ make $ for f in .*.d; do echo == $d; cat $d; done My guess is that the dependency generation is going wrong somewhere. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's many a slip 'twixt wicket-keeper and gully. --- signature.asc Description: Digital signature
Re: [PATCH v5 2/3] Btrfs-progs: make two utility functions globally available
btrfs_ioctl_fs_info_args *fi_args, + struct btrfs_ioctl_dev_info_args **di_ret); #endif -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Happiness is mandatory. Are you happy? --- signature.asc Description: Digital signature
Re: [PATCHv2 3/4] avoid several strncpy-induced buffer overruns
); + args.name[BTRFS_PATH_NAME_MAX-1] = 0; res = ioctl(fddst, BTRFS_IOC_SUBVOL_CREATE, args); e = errno; @@ -202,6 +203,7 @@ static int cmd_subvol_delete(int argc, char **argv) printf(Delete subvolume '%s/%s'\n, dname, vname); strncpy(args.name, vname, BTRFS_PATH_NAME_MAX); + args.name[BTRFS_PATH_NAME_MAX-1] = 0; res = ioctl(fd, BTRFS_IOC_SNAP_DESTROY, args); e = errno; @@ -378,6 +380,7 @@ static int cmd_snapshot(int argc, char **argv) args.fd = fd; strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX); + args.name[BTRFS_PATH_NAME_MAX-1] = 0; This, however, is wrong. args here is a struct btrfs_ioctl_vol_args_v2, and the name field is BTRFS_SUBVOL_NAME_MAX+1 long, so it should be: - strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX); + strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX+1); + args.name[BTRFS_SUBVOL_NAME_MAX] = 0; res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args); e = errno; diff --git a/restore.c b/restore.c index 2674832..d1ac542 100644 --- a/restore.c +++ b/restore.c @@ -846,7 +846,8 @@ int main(int argc, char **argv) memset(path_name, 0, 4096); - strncpy(dir_name, argv[optind + 1], 128); + strncpy(dir_name, argv[optind + 1], sizeof dir_name); + dir_name[sizeof dir_name - 1] = 0; /* Strip the trailing / on the dir name */ len = strlen(dir_name); diff --git a/utils.c b/utils.c index ee7fa1b..5240c2c 100644 --- a/utils.c +++ b/utils.c @@ -657,9 +657,11 @@ int resolve_loop_device(const char* loop_dev, char* loop_file, int max_len) ret_ioctl = ioctl(loop_fd, LOOP_GET_STATUS, loopinfo); close(loop_fd); - if (ret_ioctl == 0) + if (ret_ioctl == 0) { strncpy(loop_file, loopinfo.lo_name, max_len); - else + if (max_len 0) + loop_file[max_len-1] = 0; + } else return -errno; return 0; @@ -860,8 +862,10 @@ int check_mounted_where(int fd, const char *file, char *where, int size, } /* Did we find an entry in mnt table? */ - if (mnt size where) + if (mnt size where) { strncpy(where, mnt-mnt_dir, size); + where[size-1] = 0; + } if (fs_dev_ret) *fs_dev_ret = fs_devices_mnt; @@ -893,6 +897,8 @@ int get_mountpt(char *dev, char *mntpt, size_t size) if (strcmp(dev, mnt-mnt_fsname) == 0) { strncpy(mntpt, mnt-mnt_dir, size); + if (size) +mntpt[size-1] = 0; break; } } @@ -925,6 +931,7 @@ void btrfs_register_one_device(char *fname) return; } strncpy(args.name, fname, BTRFS_PATH_NAME_MAX); + args.name[BTRFS_PATH_NAME_MAX-1] = 0; Same comment about the length of the name field in struct btrfs_ioctl_vol_args as the 6 or 7 places above. ret = ioctl(fd, BTRFS_IOC_SCAN_DEV, args); e = errno; if(ret0){ Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Your problem is that you've got too much taste to be --- a web developer. signature.asc Description: Digital signature
Re: [PATCHv2 4/4] mkfs: avoid heap-buffer-read-underrun for zero-length size arg
On Fri, Apr 20, 2012 at 09:27:26PM +0200, Jim Meyering wrote: From: Jim Meyering meyer...@redhat.com * mkfs.c (parse_size): ./mkfs.btrfs -A '' would read and possibly write the byte before beginning of strdup'd heap buffer. All other size-accepting options were similarly affected. Reviewed-by: Josef Bacik jo...@redhat.com --- cmds-subvolume.c |2 +- mkfs.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index fc749f1..a01c830 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -380,7 +380,7 @@ static int cmd_snapshot(int argc, char **argv) args.fd = fd; strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX); ^ +1 - args.name[BTRFS_PATH_NAME_MAX-1] = 0; + args.name[BTRFS_SUBVOL_NAME_MAX-1] = 0; args.name[BTRFS_SUBVOL_NAME_MAX] = 0; res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args); e = errno; diff --git a/mkfs.c b/mkfs.c index 03239fb..4aff2fd 100644 --- a/mkfs.c +++ b/mkfs.c @@ -63,7 +63,7 @@ static u64 parse_size(char *s) s = strdup(s); - if (!isdigit(s[len - 1])) { + if (len !isdigit(s[len - 1])) { I think I'd prefer that len is a size_t, not an int here. (Or that len is tested to be 0). c = tolower(s[len - 1]); switch (c) { case 'g': Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Your problem is that you've got too much taste to be --- a web developer. signature.asc Description: Digital signature
Re: [PATCHv2 3/4] avoid several strncpy-induced buffer overruns
On Wed, Jun 06, 2012 at 08:31:47PM +0100, Hugo Mills wrote: @@ -378,6 +380,7 @@ static int cmd_snapshot(int argc, char **argv) args.fd = fd; strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX); + args.name[BTRFS_PATH_NAME_MAX-1] = 0; This, however, is wrong. args here is a struct btrfs_ioctl_vol_args_v2, and the name field is BTRFS_SUBVOL_NAME_MAX+1 long, so it should be: - strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX); + strncpy(args.name, newname, BTRFS_SUBVOL_NAME_MAX+1); + args.name[BTRFS_SUBVOL_NAME_MAX] = 0; Oops, just spotted the v3 with this fix in. Ignore this comment. (I'm actually using the v3 in integration, but I reviewed the mail from a different mailbox and got the wrong series...) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Your problem is that you've got too much taste to be --- a web developer. signature.asc Description: Digital signature
Re: locating a func in btrfs-progs
On Thu, Jun 07, 2012 at 09:38:13AM +0800, Sonu wrote: Hi Any clues on where I can find the function 'btrfs_header_level' in btrfs-progs ? It's a getter/setter pair. See line 1555 of ctree.h. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Don't worry, he's not drunk. He's like that all the time. --- signature.asc Description: Digital signature
Re: Bug in btrfs-debug-tree for two or more devices.
On Tue, Jun 12, 2012 at 06:53:00AM +, Santosh Hosamani wrote: Hi btrfs folks, I am working on btrfs filesystem on how it manages the free space. And found out btrfs maintain a ctree which manages the physical location of the chunks and stripes of the filesystem. Btrfs-debug-tree also gives the information on the chunk tree I created btrfs on single device and two device.I have attached the output of both on running btrfs-debug-tree. For single device sum of all the length in the chunks will add upto the total used bytes which is expected behavior. But for two devices sum of all lengths in the chunks does not add to the total bytes .Am I missing something . Without actually seeing the details of your technique and expectations, I shall make a guess that you're not accounting for the double-counting of RAID-1 metadata. In other words, you will find that all of the metadata device extents (or chunks) will appear twice -- once on each device. Actually, this isn't quite right either -- what you really need to do is look at the RAID-1, RAID-10 and DUP bits in the chunk flags, add up all of those chunks, divide by two, and then add in the remaining (RAID-0 and single) chunks. That total should then add up to the total value of allocated space that you get from the output of btrfs fi df. Also I notice that for the second device the superblock location 0x1 is not considered as used . I would be really grateful if you folks can answer my query. I hav run these tests on SLES11-sp2-x86 Kernel 3.0.13.0.27-default This is pretty old, but shouldn't affect the results. It will cause reliability problems if you try running it seriously. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold. signature.asc Description: Digital signature
Re: Computing size of snapshots approximatly
On Wed, Jun 13, 2012 at 02:15:33PM +0200, Jan-Hendrik Palic wrote: Hi, we using on a server several lvm volumes with btrfs. We want to use nightly build snapshots for some days as an alternative to backups. Now I want to get the size of the snapshots in detail. There are basically two figures you can get for each snapshot. These values may differ wildly. Which one do you want? (A) The first, larger, value is the total computed size of the files in the subvolume. This is what du returns. (B) The second, smaller, value is the amount of space that would be freed by deleting the subvolume. (Alternatively, this is the amount of data in the subvolume which is not shared with some other subvolume). It is currently a difficult process to work out this value in general, but the qgroups patch set will track this information automatically, and expose an API that will allow you to retrieve it. The qgroups patches aren't complete yet. Therefore I played with btrfs subvolume find-new $snapshot $gen-id. And I know, that this is quite complicated and not implemented. Therefore I try to go my own way: Now assume there are two snapshots of one subvolume, snap1 and snap2. Further get the find-new informations of these snapshots with $gen-id=1 and save them into different files. A diff of these files shows the changes between snap1 and snap2, right? Ok. There are three operations on a filesystem, I think, 1. copy a file on the filesystem 2. change a file on the filesystem 3. delete a file on the filesystem Am I right to assume, that operation 1 and 2 are not change much the size of a snapshot and the delete operation let increase the size of a snapshot in the size of the deleted files? It depends on which measure of the two above you're trying to use, and whether the subvolume (and file) you're modifying still has extents shared with some other subvolume. 1. Copying a file (without --reflink) will increase both the (A) and the (B) size of the snapshot. Copying a file with --reflink will increase (A) and leave (B) much the same. 2. Changing a file will, obviously, cause (A) to change by the difference between the old file and the new. If that file shares no extents with anything else, then (B) will also change by that amount. Otherwise, if it shares extents with anything else (another subvolume, or a reflink copy), then (B) will increase by the amount of data modified. 3. Deleting a file will reduce (A) by the size of the file. (B) will reduce by the size of non-shared extents owned by that file. Note that btrfs sub find-new will not allow you to track file deletions. If it is so, it would be enough for me to get the deletions of files between two snapshots and their size. But is there another way to get these informations beside btrfs subvolume find-new? Perhaps it makes sense to use ioctl for it? What about the send/receive feature, which is upcoming? Are there any hints? Wait for qgroups to land, because that actually does it the right way, and will avoid you having to track all kinds of awkward (and hard-to-find) corner cases. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Summoning his Cosmic Powers, and glowing slightly --- from his toes... signature.asc Description: Digital signature
Re: cannot remove files: rm gives no space left on device, 3.2.0-24, ubuntu
On Sat, Jun 16, 2012 at 02:18:15PM +0300, Andrei Popa wrote: https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_I_ran_out_of_disk_space.21 Also, please note the top box on https://btrfs.wiki.kernel.org/index.php/Getting_started It may help, or it may not, but it's worth doing anyway. Hugo. On Sat, 2012-06-16 at 13:16 +0200, rupert THURNER wrote: how would i be able to delete something off this btrfs partition again? i saw the following messages in the archives which seem to be a little similar ... except a reboot and therfor a remount did not help: * http://article.gmane.org/gmane.linux.kernel/1265666/match=enospc rt@tv:~$ rm -rf /media/388gb-data/.Trash-1000/info/foto.trashinfo rm: cannot remove `/media/388gb-data/.Trash-1000/info/foto.trashinfo': No space left on device rt@tv:~$ btrfs filesystem df /media/388gb-data/ Data: total=260.59GB, used=254.56GB System: total=32.00MB, used=24.00KB Metadata: total=128.00GB, used=120.01GB rt@tv:~$ sudo btrfs filesystem show /dev/sda6 failed to read /dev/sr0 Label: '388gb-data' uuid: 19223a9e-7840-4798-8ee4-02b5bf9c2899 Total devices 1 FS bytes used 374.56GB devid1 size 388.62GB used 388.62GB path /dev/sda6 rt@tv:~$ uname -a Linux tv 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:51:22 UTC 2012 i686 i686 i386 GNU/Linux the only snapshot is the one created during converting from ext4: $ sudo btrfs subvolume list /media/388gb-data/ ID 256 top level 5 path ext2_saved open(/usr/lib/locale/locale-archive, O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=4480528, ...}) = 0 mmap2(NULL, 262144, PROT_READ, MAP_PRIVATE, 3, 0x2bd) = 0xb7553000 mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE, 3, 0x43a) = 0xb7552000 close(3)= 0 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 lstat64(/, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 fstatat64(AT_FDCWD, /media/388gb-data/.Trash-1000/info/foto.trashinfo, {st_mode=S_IFREG|0644, st_size=56, ...}, AT_SYMLINK_NOFOLLOW) = 0 unlinkat(AT_FDCWD, /media/388gb-data/.Trash-1000/info/foto.trashinfo, 0) = -1 ENOSPC (No space left on device) open(/usr/share/locale/locale.alias, O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7551000 read(3, # Locale name alias data base.\n#..., 4096) = 2570 read(3, , 4096) = 0 close(3)= 0 munmap(0xb7551000, 4096)= 0 open(/usr/share/locale/en/LC_MESSAGES/coreutils.mo, O_RDONLY) = -1 ENOENT (No such file or directory) open(/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo, O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=619, ...}) = 0 mmap2(NULL, 619, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7551000 close(3)= 0 write(2, rm: , 4rm: ) = 4 write(2, cannot remove `/media/388gb-data..., 65cannot remove `/media/388gb-data/.Trash-1000/info/foto.trashinfo') = 65 open(/usr/share/locale/en/LC_MESSAGES/libc.mo, O_RDONLY) = -1 ENOENT (No such file or directory) open(/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo, O_RDONLY) = -1 ENOENT (No such file or directory) write(2, : No space left on device, 25: No space left on device) = 25 write(2, \n, 1 ) = 1 _llseek(0, 0, 0xbfe0d210, SEEK_CUR) = -1 ESPIPE (Illegal seek) close(0)= 0 rupert -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- No names... I want to remain anomalous. --- signature.asc Description: Digital signature
Re: Subvolumes and /proc/self/mountinfo
On Tue, Jun 19, 2012 at 04:35:59PM -0700, H. Peter Anvin wrote: On 06/19/2012 07:22 AM, Calvin Walton wrote: All subvolumes are accessible from the volume mounted when you use -o subvolid=0. (Note that 0 is not the real ID of the root volume, it's just a shortcut for mounting it.) Could you clarify this bit? Specifically, what is the real ID of the root volume, then? I found that after having set the default subvolume to something other than the root, and then mounting it without the -o subvol= option, then the subvolume name does *not* show in /proc/self/mountinfo; the same happens if a subvolume is mounted by -o subvolid= rather than -o subvol=. Is this a bug? This would seem to give the worst of both worlds in terms of actually knowing what the underlying filesystem path would end up looking like. Yes, it's a bug, and rather an irritating one at that. I know that David Sterba looked at fixing it, but apparently it was trickier to fix than was expected. (I don't recall the reason, and probably wouldn't have understood it anyway, so I'll leave it to Dave to tell you about it in detail). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Doughnut furs ache me, Omar Dorlin. --- signature.asc Description: Digital signature
Re: Leaving Red Hat
On Wed, Jun 20, 2012 at 08:59:15AM -0400, Josef Bacik wrote: Hello, Today is my last day at Red Hat, I will be joining Chris at Fusion IO. Blimey. It's all change round here, isn't it? Congratulations. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We are all lying in the gutter, but some of us are looking --- at the stars. signature.asc Description: Digital signature
Re: Knowing how much space is taken by each snapshot?
On Mon, Jun 25, 2012 at 07:58:40AM -0700, Marc MERLIN wrote: Howdy, My btrfs pool looks like this: usr usr_daily_20120622_00:01:01 usr_daily_20120623_00:18:25 usr_daily_20120624_00:01:01 usr_daily_20120625_00:01:01 usr_hourly_20120625_05:00:02 usr_hourly_20120625_06:00:01 usr_hourly_20120625_07:00:01 usr_weekly_20120610_00:02:01 usr_weekly_20120617_00:02:01 usr_weekly_20120624_00:02:01 Sometimes I run low on space and I have to start dropping snapshots. I realize that due to COW blocks, it's hard to say exactly how much space each snapshot uses, but is there some way to get an idea how much each snapshot will free if I delete it? When the relevant bit of qgroups lands, yes. Until then, not really. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Python is executable pseudocode; perl --- is executable line-noise. signature.asc Description: Digital signature
Re: Feature request: true RAID-1 mode
On Mon, Jun 25, 2012 at 10:46:01AM -0700, H. Peter Anvin wrote: On 06/25/2012 08:21 AM, Chris Mason wrote: Yes and no. If you have 2 drives and you add one more, we can make it do all new chunks over 3 drives. But, turning the existing double mirror chunks into a triple mirror requires a balance. -chris So trigger one. This is the exact analogue to the resync pass that is required in classic RAID after adding new media. You'd have to cancel and restart if a second new disk was added while the first balance was ongoing. Fortunately, this isn't a problem these days. Also, it occurs to me that I should just check -- are you aware that the btrfs implementation of RAID-1 makes no guarantees about the location of any given piece of data? i.e. if I have a piece of data stored at block X on disk 1, it's not guaranteed to be stored at block X on disks 2, 3, 4, ... I'm not sure if this is important to you, but it's a significant difference between the btrfs implementation of RAID-1 and the MD implementation. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Never underestimate the bandwidth of a Volvo filled --- with backup tapes. signature.asc Description: Digital signature
Re: New btrfs-progs integration branch
On Tue, Jun 26, 2012 at 11:58:41AM +0300, Alex Lyakas wrote: Hi Hugo, forgive me, but I am somewhat confused. What is the main repo of btrfs-progs, if there is such thing? I see patches coming in, but no updates to git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git, which I thought was the one. Can you pls clarify where should I pull updates from for btrfs-progs? The official source for btrfs-progs is Chris's one, at the URL above. The integration repo is kind of a staging area where I pull in as many patches as I can and get them a bit more visibility. We don't really have a well-defined workflow here. It depends on what you intend doing: if you want to make packages for your distribution, use Chris's repo. If you want something reasonably stable and tested, use Chris's repo. If there's some experimental kernel feature you want to test out, use integration. If you want to be helpful and test out new patches and report problems with them, use integration. Hugo. Thanks, Alex. On Tue, Jun 5, 2012 at 10:09 PM, Hugo Mills h...@carfax.org.uk wrote: I've just pushed out a new integration branch to my git repo. This is purely bugfix patches -- there are no new features in this issue of the integration branch. I've got a stack of about a dozen more patches with new features in them still to go. I'll be working on those tomorrow. As always, there's minimal testing involved here, but it does at least compile on my system(*). The branch is fetchable with git from: http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ integration-20120605 And viewable in human-readable form at: http://git.darksatanic.net/cgi/gitweb.cgi?p=btrfs-progs-unstable.git Shortlog is below. Hugo. (*) I don't care about works-on-my-machine. We are not shipping your machine! Akira Fujita (1): Btrfs-progs: Fix manual of btrfs command Chris Samuel (1): Fix set-dafault typo in cmds-subvolume.c Csaba Tóth (1): mkfs.btrfs on ARM Goffredo Baroncelli (1): scrub_fs_info( ) file handle leaking Hubert Kario (2): Fix segmentation fault when opening invalid file system man: fix btrfs man page formatting Jan Kara (1): mkfs: Handle creation of filesystem larger than the first device Jim Meyering (5): btrfs_scan_one_dir: avoid use-after-free on error path mkfs: use strdup in place of strlen,malloc,strcpy sequence restore: don't corrupt stack for a zero-length command-line argument avoid several strncpy-induced buffer overruns mkfs: avoid heap-buffer-read-underrun for zero-length size arg Josef Bacik (3): Btrfs-progs: make btrfsck aware of free space inodes Btrfs-progs: make btrfs filesystem show uuid actually work btrfs-progs: enforce block count on all devices in mkfs Miao Xie (3): Btrfs-progs: fix btrfsck's snapshot wrong unresolved refs Btrfs-progs, btrfs-corrupt-block: fix the wrong usage Btrfs-progs, btrfs-map-logical: Fix typo in usage Phillip Susi (2): btrfs-progs: removed extraneous whitespace from mkfs man page btrfs-progs: document --rootdir mkfs switch Sergei Trofimovich (2): Makefile: use $(CC) as a compilers instead of $(CC)/gcc Makefile: use $(MAKE) instead of hardcoded 'make' Shawn Bohrer (1): btrfs-progs: Update resize documentation Wang Sheng-Hui (1): btrfs-progs: cleanup: remove the redundant BTRFS_CSUM_TYPE_CRC32 macro def -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them. signature.asc Description: Digital signature
Re: btrfs filesystem defragment exits with non-zero return code (20) upon success
On Wed, Jun 27, 2012 at 02:05:55PM +0200, Lenz Grimmer wrote: Hi, running btrfs filesystem defrag somehow always returns a non-zero exit code, even when it succeeds: Yes, this is a known problem, and one that's on my list of things to deal with. Thanks for the reminder, though. I'm no C programmer, but looking at the end of the do_defrag function in btrfs_cmds.c, I wonder if the last return errors + 20 is correct? In case that errors is greater than zero, the function would be left via the exit(1) anyway, wouldn't it? In that case, wouldn't return 0 at the end be more appropriate? Yeah, basically, it's doing something silly and unexpected with return codes. Hugo. [SNIP] if (errors) { fprintf(stderr, total %d failures\n, errors); exit(1); } free(av); return errors + 20; [SNIP] Thanks! -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Mixing mathematics and alcohol is dangerous. Don't --- drink and derive. signature.asc Description: Digital signature
Re: Can give some help?
On Fri, Jun 29, 2012 at 09:41:47PM +0800, Zhi Yong Wu wrote: HI, Can anyone let me know where the funtions are declared or defined, such as btrfs_header_nritems(), btrfs_header_level(), etc? thanks. ctree.h, somewhere around or after line 1550. They're all accessor functions, defined by a set of macros. Look for the *_SETGET_* macros. The actual definitions of BTRFS_SETGET_FUNCS are in struct-funcs.h Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 1: Family Holiday --- signature.asc Description: Digital signature
Re: Kernel panic from btrfs subvolume delete
On Fri, Jun 29, 2012 at 03:23:13PM +0100, Richard Cooper wrote: On 29 Jun 2012, at 11:42, Fajar A. Nugraha wrote: What should I do now? Do I need to upgrade to a more recent btrfs? Yep If so, how? https://blogs.oracle.com/linux/entry/oracle_unbreakable_enterprise_kernel_release http://elrepo.org/tiki/kernel-ml Perfect, thank you! I was looking for a mainline kernel yum repo but my google-fu was failing me. That looks like just what I need. I've installed kernel v3.4.4 from http://elrepo.org/tiki/kernel-ml and that seems to have fixed my kernel panic. I'm still using the default Cent OS 6 versions of the btrfs userspace programs (v0.19). Any reason why that might be a bad idea? You miss out on new features (like scrub and btrfsck). Note that 0.19 could actually be any version from the last 3 years or so. Most distributions these days are putting a date in their package names -- anything from 20120328 or so is good. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Charting the inexorable advance of Western syphilisation... --- signature.asc Description: Digital signature
Re: Btrfs RAID space utilization and bitrot reconstruction
On Sun, Jul 01, 2012 at 01:50:39PM +0200, Waxhead wrote: As far as I understand btrfs stores all data in huge chunks that are striped, mirrored or raid5/6'ed throughout all the disks added to the filesystem/volume. Well, RAID-5/6 hasn't landed yet, but yes. How does btrfs deal with different sized disks? let's say that you for example have 10 different disks that are 100GB,200GB,300GB...1000GB and you create a btrfs filesystem with all the disks. How will the raid5 implementation distribute chunks in such a setup. We haven't seen the code for that bit yet. I assume the stripe+stripe+parity are separate chunks that are placed on separate disks but how does btrfs select the best disk to store a chunk on? In short will a slow disk slow down the entire array, parts of it or will btrfs attempt to use the fastest disks first? Chunks are allocated by ordering the devices by the amount of free (=unallocated) space left on each, and picking the chunks from devices in that order. For RAID-1 chunks are picked in pairs. For RAID-0, as many as possible are picked, down to a minimum of 2 (I think). For RAID-10, the largest even number possible is picked, down to a minimum of 4. I _believe_ that RAID-5 and -6 will pick as many as possible, down to some minimum -- but as I said, we haven't seen the code yet. Also since btrfs checksums both data and metadata I am thinking that at least the raid6 implementation perhaps can (try to) reconstruct corrupt data (and try to rewrite it) before reading an alternate copy. Can someone please fill me in on the details here? Yes, it should be possible to do that with RAID-5 as well. (Read the data stripes, verify checksums, if one fails, read the parity, verify that, and reconstruct the bad block from the known-good data). Finaly how does btrfs deals with advanced format (4k sectors) drives when the entire drive (and not a partition) is used to build a btrfs filesystem. Is proper alignment achieved? I don't know about that. However, the native block size in btrfs is 4k, so I'd imagine that it's all good. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you're afraid of having no --- money? There's irony... signature.asc Description: Digital signature
Re: BTRFS fsck apparent errors
On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote: A couple days ago, I have converted my Ubuntu Precise machine from ext4 to BTRFS using btrfs-convert. [snip] After I had shifted, I tried to defragment and compress my FS using commands such as : find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \; During execution of such commands, my kernel oopsed, so I restarted. Afterwards, I noticed that, during the execution of such a command, my FS free space was quickly dropping, where I would have expected it to increase... What you're seeing is the fact that you've still got the complete ext4 filesystem and all of its data sitting untouched on the disk as well. The defrag will have taken a complete new copy of the data but not removed the ext4 copy. If you delete the conversion recovery directory (ext2_subvol), then you'll see the space usage drop again. Of course, doing that will also mean that you won't be able to roll back to ext4 without reformatting and restoring from your backups. (You have got backups, right?) Once finished, I checked a couple of BTRFS FSes using btrfsck, but I interpret the results as having some errors : root@fnix:/# btrfsck /dev/VG1/DEBMINT checking extents checking fs roots root 256 inode 257 errors 800 found 7814565888 bytes used err is 1 total csum bytes: 6264636 total tree bytes: 394928128 total fs tree bytes: 365121536 btree space waste bytes: 101451531 file data blocks allocated: 20067590144 referenced 13270241280 Btrfs Btrfs v0.19 root@fnix:/# btrfsck /dev/VG1/STORAGE checking extents checking fs roots root 301 inode 10644 errors 1000 root 301 inode 10687 errors 1000 root 301 inode 10688 errors 1000 root 301 inode 10749 errors 1000 found 55683117056 bytes used err is 1 total csum bytes: 54188580 total tree bytes: 191500288 total fs tree bytes: 103596032 btree space waste bytes: 49730472 file data blocks allocated: 55640522752 referenced 56466059264 Btrfs Btrfs v0.19 It doesn't seem that btrfsck attempts to fix these errors in any way... It just displays them. Correct, by default it just checks the filesystem. Just to be sure: the filesystems in question weren't mounted, were they? I would also suggest using a 3.4 kernel. There's at least one FS corruption bug known to exist in 3.2 that's been fixed in 3.4. (Probably not what's happened in this case, but it's best to try to avoid these kinds of issues). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eats Memory and Crashes. --- signature.asc Description: Digital signature
Re: btrfs data dup on single device?
On Wed, Jun 25, 2014 at 09:25:57AM +0200, Daniel Landstedt wrote: Will it be possible to use DUP for data as well as for metadata on a single device? This has variously been possible and not over the last few years. I think it's finally come down on the side of not, but by all means try it (mkfs.btrfs -d dup). And if so, am I going to be able to specify more than 1 copy of the data? It'll be exactly 2 copies at the moment. Note that performance on an SSD will at least halve, and performance on a rotational device will probably suck quite badly. Neither will help you in the case of a full-device failure. You still need backups, kept on a separate machine. Storage is pretty cheap now, and to have multiple copies in btrfs is something that I think could be used a lot. I know I will use multiple copies of my data if made possible. The question is, why? If you have enough disk media errors to make it worth using multiple copies, then your storage device is basically broken and needs replacing, and it can't really be relied on for very much longer. Is it something that might be available when RAID1 gets N mirrors instead of just 1 mirror? The n-copies code will probably support n-copies DUP as well. There's no reason particularly to restrict it that way. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Do not meddle in the affairs of wizards, for they are subtle, --- and quick to anger. signature.asc Description: Digital signature
Re: RAID1 3+ drives
On Sat, Jun 28, 2014 at 09:38:00AM +0200, Martin Steigerwald wrote: Am Samstag, 28. Juni 2014, 16:28:23 schrieb Russell Coker: So look for N-way-mirroring when you go RAID shopping, and no, btrfs does not have it at this time, altho it is roadmapped for implementation after completion of the raid5/6 code. FWIW, N-way-mirroring is my #1 btrfs wish-list item too, not just for device redundancy, but to take full advantage of btrfs data integrity features, allowing to scrub a checksum-mismatch copy with the content of a checksum-validated copy if available. That's currently possible, but due to the pair-mirroring-only restriction, there's only one additional copy, and if it happens to be bad as well, there's no possibility of a third copy to scrub from. As it happens my personal sweet-spot between cost/performance and reliability would be 3-way mirroring, but once they code beyond N=2, N should go unlimited, so N=3, N=4, N=50 if you have a way to hook them all up... should all be possible. What I want is the ZFS copies= feature. Something like this, even more flexible, was planned to be added. There were some discussion on how to specificy complex redundancy patterns totally flexibly exactly with how much redundancy, how much spares and so on. I didn't read any of this since a long time. I wonder what happened to this idea. It's moving slowly in fits and starts. I haven't forgotten it. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn't have made us of meat. signature.asc Description: Digital signature
Re: Question about debugfs on btrfs
On Wed, Jul 02, 2014 at 12:23:01PM -0400, Zhe Zhang wrote: Hi, I'm trying to use a functionality like debugfs blocks or dump_extents on a btrfs partition. The current debugfs user space program doesn't seem to support it (from e2fsprogs). I cannot find debugfs in btrfs-progs either. Any advice on how to do it? I don't know what the ext* debugfs does, but the odds of it actually working on btrfs are pretty slim, given that they're completely different filesystems. :) If you want a human-readable view of the filesystem's metadata, then btrfs-debug-tree is the tool you need. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Someone's been throwing dead sheep down my Fun Well --- signature.asc Description: Digital signature
Re: BTRFS claims that empty directory is not empty and refuses to delete it
On Tue, Jul 15, 2014 at 11:09:53AM +0200, Martin Steigerwald wrote: Hello! This is with 3.16-rc4 – stepped back to this one after having two hangs in one day with 3.16-rc5, see other thread started by me: martin@merkaba:~/Zeit/undeletable/db_data ls -lid akonadi 450598 drwx-- 1 martin martin 1232 Jun 22 14:11 akonadi martin@merkaba:~/Zeit/undeletable/db_data ls -lai akonadi insgesamt 0 450598 drwx-- 1 martin martin 1232 Jun 22 14:11 . 450595 drwxr-xr-x 1 martin martin 14 Jun 22 14:11 .. martin@merkaba:~/Zeit/undeletable/db_data LANG=C rmdir akonadi rmdir: failed to remove 'akonadi': Directory not empty martin@merkaba:~/Zeit/undeletable/db_data#1 LANG=C rm -r akonadi rm: cannot remove 'akonadi': Directory not empty martin@merkaba:~/Zeit/undeletable/db_data#1 LANG=C rm -rf akonadi rm: cannot remove 'akonadi': Directory not empty martin@merkaba:~/Zeit/undeletable/db_data#1 Whats this? I had this weeks ago already and just moved it out of the way at that time, just now stumbled upon it again. That is symptomatic of a bug from a couple of kernel versions ago (now fixed, so it won't happen again). It it is that bug, then btrfs check will report something along the lines of directory isize wrong, and the problem can be fixed by running a btrfs check --repair. If you get anything else from btrfs check (or it checks cleanly), then let us know first. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If you see something, say nothing and drink to forget --- signature.asc Description: Digital signature
Re: btrfs hanging since 3.16-rc3 or so
cc linux-btrfs list On Tue, Jul 15, 2014 at 10:40:46PM +0900, Norbert Preining wrote: Dear all (please keep Cc) Since 3.16-rc3 or so I regularly get btrfs hanging in some transations. Usually during apt-get upgrade or some other large file operations (cowbuilder building of packages). The log files give me for loads of processes things like: [ 6236.746546] INFO: task aptitude:22775 blocked for more than 120 seconds. [ 6236.746547] Tainted: GW O 3.16.0-rc5 #27 [ 6236.746548] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 6236.746549] aptitudeD 8800b21a3868 0 22775 22709 0x [ 6236.746550] 88003644fd10 0082 81a15500 88003644ffd8 [ 6236.746552] 8800b21a3430 00011c00 880147da9c30 880147da9c30 [ 6236.746553] 88003644fd58 880034b3ed48 880034b3ed38 88003644fd20 [ 6236.746555] Call Trace: [ 6236.746557] [81585b4a] schedule+0x64/0x66 [ 6236.746560] [811bb22e] btrfs_wait_logged_extents+0xa4/0xdc [ 6236.746561] [810635c1] ? finish_wait+0x5d/0x5d [ 6236.746564] [811d9489] btrfs_sync_log+0x5ef/0x8a2 [ 6236.746567] [811b43cf] btrfs_sync_file+0x21b/0x24d [ 6236.746569] [811b43cf] ? btrfs_sync_file+0x21b/0x24d [ 6236.746571] [8110db8a] vfs_fsync_range+0x1c/0x1e [ 6236.746574] [810d1681] SyS_msync+0x15d/0x1ea [ 6236.746575] [81588712] system_call_fastpath+0x16/0x1b THis is aptitude, but I have all the other tasks accessing the disk hanging, too. This time, issueing a Sysrq-s for emergency syncing did the laptop out of the hang. Hardware: Sonly VAIO Pro 13 Distribution: Debian/sid self compiled kernel, config on request. Please let me know if there is anything else I can provide. Thanks a lot Norbert PREINING, Norbert http://www.preining.info JAIST, Japan TeX Live Debian Developer GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13 -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Normaliser unix c'est comme pasteuriser le Camembert --- signature.asc Description: Digital signature
Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?
On Thu, Jul 17, 2014 at 12:18:37AM +0200, Sebastian Ochmann wrote: I'm sharing a btrfs-formatted drive between multiple computers and each of the machines has a separate home directory on that drive. The root of the drive is mounted at /mnt/tray and the home directory for machine {hostname} is under /mnt/tray/Homes/{hostname}. Up until now, I have mounted /mnt/tray like a normal volume and then did an additional bind-mount of /mnt/tray/Homes/{hostname} to /home. You've said you're not sharing it concurrently, which is good -- as long as you've only got one machine accessing it at the same time, you're fine there. Now I have a new drive and wanted to do things a bit more advanced by creating subvolumes for each of the machines' home directories so that I can also do independent snapshotting. I guess I could use the bind-mount method like before but my question is if it is considered safe to do an additional, regular mount of one of the subvolumes to /home instead, like mount /dev/sdxN /mnt/tray mount -o subvol=/Homes/{hostname} /dev/sdxN /home When I experimented with such additional mounts of subvolumes of already-mounted volumes, I noticed that the mount options of the additional subvolume mount might differ from the original mount. For instance, the root volume might be mounted with noatime while the subvolume mount may have relatime. So my questions are: Is mounting a subvolume of an already mounted volume considered safe Yes, absolutely: hrm@amelia:~$ mount | grep btrfs /dev/sda2 on /boot type btrfs (rw,noatime,space_cache) /dev/sda2 on /home type btrfs (rw,noatime,space_cache) /dev/sda2 on /media/video type btrfs (rw,noatime,space_cache) /dev/sda2 on /media/pipeline type btrfs (rw,noatime,space_cache) /dev/sda2 on /media/snarf type btrfs (rw,noatime,space_cache) /dev/sda2 on /media/audio type btrfs (rw,noatime,space_cache) /dev/sda2 on /srv/nfs/home type btrfs (rw,noatime,space_cache) /dev/sda2 on /srv/nfs/video type btrfs (rw,noatime,space_cache) /dev/sda2 on /srv/nfs/testing type btrfs (rw,noatime,space_cache) /dev/sda2 on /srv/nfs/pipeline type btrfs (rw,noatime,space_cache) /dev/sda2 on /srv/nfs/audio type btrfs (rw,noatime,space_cache) /dev/sda2 on /srv/nfs/nadja type btrfs (rw,noatime,space_cache) and are there any combinations of possibly conflicting mount options one should be aware of (compression, autodefrag, cache clearing)? Is it advisable to use the same mount options for all mounts pointing to the same physical device? If you assume that the first mount options are the ones used for everything, regardless of any different options provided in subsequent mounts, then you probably won't go far wrong. It's not quite true: some options do work on a per-mount basis, but most are per-filesystem. I'm sure there was a list of them on the wiki at some point, but I can't seem to track it down right now. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Try everything once, except incest and folk-dancing. --- signature.asc Description: Digital signature
Re: btrfs fi df shows unknown ?
On Thu, Jul 17, 2014 at 10:02:01AM +0200, Swâmi Petaramesh wrote: Hi there, Since a few days, I have noticed that btrfs fi df / displays an entry about unknown used space, and I can see this on several Fedora machines, so it is not an issue related to a given system... Does anybody know what these unknown data are ? It's the block reserve, which used to be part of metadata, but is now split out to its own type. An updated userspace should be able to show it properly. Hugo. i.e: # btrfs fi df / Data, single: total=106.00GiB, used=88.28GiB System, DUP: total=32.00MiB, used=24.00KiB Metadata, DUP: total=1.00GiB, used=520.36MiB unknown, single: total=176.00MiB, used=0.00 # btrfs --version Btrfs v3.14.2 # uname -r 3.15.5-200.fc20.x86_64 TIA, kind regards. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Try everything once, except incest and folk-dancing. --- signature.asc Description: Digital signature
Re: NFS FILE ID not unique when exporting many brtfs subvolumes
On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com wrote: I have a problem using btrfs/nfs to store my vmware images. [snip] - vmware is basing its NFS files locks on the nfs fileid field returned from a NFS GETATTR request for the file being locked http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1007909 vmware assumes that these nfs fileid are unique per storage. - it seemed that these nfs fileid are only unique 'per-subvolume', but because my nfs export contains many subvolumes, the nfs export has then my files (in different subvolume) with the same nfs fileid. - no problem when I start all machine alone, but when 2 machines are running at the same time, vmware seems to mix its reference to lock file and sometimes kills one vm. in esx server, following messages : /var/log/vmkwarning.log : 2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315: Inode (Dup: 260 Orig: 260) has been recycled by server, freeing lock info for .lck-0401 2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348: Unable to remove lockfile .invalid, not found 2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348: Unable to remove lockfile .invalid, not found and in machine log : Message from sncubeesx02: The lock protecting vm-w7-sysp.vmdk has been lost, possibly due to underlying storage issues. If this virtual machine is configured to be highly available, ensure that the virtual machine is running on some other host before clicking OK. - vmware try to make its own file locking for flowing file type : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=10051 VMNAME.vswp DISKNAME-flat.vmdk DISKNAME-ITERATION-delta.vmdk VMNAME.vmx VMNAME.vmxf vmware.log Is there a way to deal with this problem ? is that a bug ? Add an arbitrary and unique fsid=0x12345 value to the exports declaration. For example, my server exports a number of subvolumes from the same FS with: /srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \ 10.0.0.20 fe80::20 /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \ fe80::/64 /srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \ 10.0.0.0/24 fe80::/64 Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You can get more with a kind word and a two-by-four than you --- can with just a kind word. signature.asc Description: Digital signature
Re: NFS FILE ID not unique when exporting many brtfs subvolumes
On Thu, Jul 17, 2014 at 01:02:06PM +, philippe.simo...@swisscom.com wrote: Hi Hugo -Original Message- From: Hugo Mills [mailto:h...@carfax.org.uk] Sent: Thursday, July 17, 2014 1:13 PM To: Simonet Philippe, INI-ON-FIT-NW-IPE Cc: linux-btrfs@vger.kernel.org Subject: Re: NFS FILE ID not unique when exporting many brtfs subvolumes On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com wrote: I have a problem using btrfs/nfs to store my vmware images. [snip] - vmware is basing its NFS files locks on the nfs fileid field returned from a NFS GETATTR request for the file being locked http://kb.vmware.com/selfservice/microsites/search.do?language=en_ UScmd=displayKCexternalId=1007909 vmware assumes that these nfs fileid are unique per storage. - it seemed that these nfs fileid are only unique 'per-subvolume', but because my nfs export contains many subvolumes, the nfs export has then my files (in different subvolume) with the same nfs fileid. - no problem when I start all machine alone, but when 2 machines are running at the same time, vmware seems to mix its reference to lock file and sometimes kills one vm. in esx server, following messages : /var/log/vmkwarning.log : 2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315: Inode (Dup: 260 Orig: 260) has been recycled by server, freeing lock info for .lck-0401 2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348: Unable to remove lockfile .invalid, not found 2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348: Unable to remove lockfile .invalid, not found and in machine log : Message from sncubeesx02: The lock protecting vm-w7- sysp.vmdk has been lost, possibly due to underlying storage issues. If this virtual machine is configured to be highly available, ensure that the virtual machine is running on some other host before clicking OK. - vmware try to make its own file locking for flowing file type : http://kb.vmware.com/selfservice/microsites/search.do?language=en_ UScmd=displayKCexternalId=10051 VMNAME.vswp DISKNAME-flat.vmdk DISKNAME-ITERATION-delta.vmdk VMNAME.vmx VMNAME.vmxf vmware.log Is there a way to deal with this problem ? is that a bug ? Add an arbitrary and unique fsid=0x12345 value to the exports declaration. For example, my server exports a number of subvolumes from the same FS with: /srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \ 10.0.0.20 fe80::20 /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \ fe80::/64 /srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \ 10.0.0.0/24 fe80::/64 Hugo. first of all, thank for your answer ! on my system, I have one export, that is the root btrfs subvolume and contains itself one subvolume per vm. if I change the NFS export fsid, it does not change anything in each the file IDs of the whole NFS export. (I crossed checked it just to be sure, tshark -V -nlp -t a port 2049 | egrep Entry: name|File ID, and effectively, fsid has no impact on file id) Aaah, that's interesting. I suspect that you'll have to make the mounts explicit, so for every subvolume exported from the server, there's a line in fstab to mount it to the place it's exported from. This happens as a side-effect of the recommended filesystem/subvol layout[1] anyway, since it doesn't use nested subvolumes at all, so I've never actually noticed the situation you mention. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold. signature.asc Description: Digital signature
Re: BTRFS hang with 3.16-rc5 (and also with 3.16-rc4)
[this time, to the mailing list as well] On Fri, Jul 25, 2014 at 09:02:44AM +0100, Hugo Mills wrote: On Thu, Jul 24, 2014 at 11:06:34PM -0400, Nick Krause wrote: On Thu, Jul 24, 2014 at 10:32 PM, Duncan 1i5t5.dun...@cox.net wrote: [snip] Hey Duncan and others , I have read this and this seems to need some working on. If you want my help please ask , I am new to the kernel so I may ask a dumb question or two but if that's fine with you I have no problem helping out here. I would like a log of printk statements leading to the hand if that's not too much work in order for me to trace this back. Note that btrfs is complex -- there's something around 100k lines of code in it. My first piece of kernel work in btrfs was simply documenting the way that the on-disk data structures related to each other[1]. That on its own took me two to three weeks of solid full-time effort, reading the code to find where each structure was used and how its elements related to other structures. You can't just wander up and dive in without putting in the effort of learning first. Whilst people will help you (come over to #btrfs on Freenode for more real-time interaction), they can't do the basic work of sitting down and understanding the code in detail for you. Chris, who designed and wrote the filesystem, has spent the last couple of weeks tracking down this particular problem. Do you think it's appropriate to leap into the middle of the discussion on this subtle bug as someone with absolutely no experience in the area? Your first task is to reproduce the bug on your own machine. If you can do that, _then_ you might be able to start tracking down its cause. But I wouldn't recommend doing that, as (a) it's a nasty subtle bug, and (b) Chris seems to be close to tracking it down anyway. My recommendations for you, if you want to work on btrfs, are: * Build and install the latest kernel from Linus's git repo * Read and understand the user documentation [2] * Create one or several btrfs filesystems with different configurations and learn how they work in userspace -- what are the features, what are the problems you see? Actually use at least one of the filesystems you created for real data in daily use (with backups) * Build the userspace tools from git * Pick up one of the userspace projects from [3] and implement that. If you pick the right one(s), you'll have to learn about some of the internal structures of the FS anyway. Compile and test your patch. If you're adding a new feature, write an automated xfstest for it as well. * Get that patch accepted. This will probably involve a sequence of revisions to it, multiple versions over a period of several weeks or more, with a review process. You should also send your test to xfstests and get that accepted. * Do the above again, until you get used to the processes involved, and have demonstrated that you can work well with the other people in the subsystem, and are generally producing useful and sane code. It's all about trust -- can you be trusted to mostly do the right thing? (So far on linux-kernel, you've rather demonstrated the opposite: your intentions are good, but your execution leaves a lot to be desired) * Use the documentation at [4], and the output of btrfs-debug-tree to understand the internal structure of the FS * Pick up one of the smaller, more self-contained ideas from the projects page [5] (say, [6] or [7]) and try to implement it. Again: build, write test code, test thoroughly, submit patch for review, modify as suggested by reviewers, and repeat as often as necessary Hugo. [1] https://btrfs.wiki.kernel.org/index.php/Data_Structures [2] https://btrfs.wiki.kernel.org/index.php/Main_Page#Guides_and_usage_information [3] https://btrfs.wiki.kernel.org/index.php/Project_ideas#Userspace_tools_projects [4] https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentation [5] https://btrfs.wiki.kernel.org/index.php/Project_ideas [6] https://btrfs.wiki.kernel.org/index.php/Project_ideas#Cancellable_operations [7] https://btrfs.wiki.kernel.org/index.php/Project_ideas#Implement_new_FALLOC_FL_.2A_modes -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn't have made us of meat. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ORLY? IÄ! R'LYH! --- signature.asc Description: Digital signature
Re: Help with Project on brtfs wiki
implemented. Start asking _yourself_ the questions of if I want to achieve this effect, what does the FS need to do? What behaviour would need to be changed, and how?. When you think you have an answer to those questions, you can start having a real and useful conversation. You will probably be wrong, but that's where the process starts, and you will get better at it over time. If at some point there's something you don't understand, do ask, but make sure that you can say what you think you know, and why you can't understand the thing you are having trouble with. Think of the person responding to the question: Make it easy to write the reply by ensuring that the reply can be as short as possible. Some of the time, you will actually answer your own question by trying to ask it in a sensible way. If you find that happening, you're asking sensible questions. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I must be musical: I've got *loads* of CDs --- signature.asc Description: Digital signature
Re: Multi Core Support for compression in compression.c
On Sun, Jul 27, 2014 at 11:21:53PM -0400, Nick Krause wrote: On Sun, Jul 27, 2014 at 10:56 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 07/27/2014 04:47 PM, Nick Krause wrote: This may be a bad idea , but compression in brtfs seems to be only using one core to compress. Depending on the CPU used and the amount of cores in the CPU we can make this much faster with multiple cores. This seems bad by my reading at least I would recommend for writing compression we write a function to use a certain amount of cores based on the load of the system's CPU not using more then 75% of the system's CPU resources as my system when idle has never needed more then one core of my i5 2500k to run when with interrupts for opening eclipse are running. For reading compression on good core seems fine to me as testing other compression software for reads , it's way less CPU intensive. Cheers Nick We would probably get a bigger benefit from taking an approach like SquashFS has recently added, that is, allowing multi-threaded decompression fro reads, and decompressing directly into the pagecache. Such an approach would likely make zlib compression much more scalable on large systems. Austin, That seems better then my idea as you seem to be more up to date on brtfs devolopment. If you and the other developers of brtfs are interested in adding this as a feature please let me known as I would like to help improve brtfs as the file system as an idea is great just seems like it needs a lot of work :). Yes, it probably does need a lot of work. This is (at least one reason) why it's not been done yet. If you want to work on doing this, then please do. However, don't expect anyone else to give you a detailed plan of what code to write. Don't expect anyone else to write the code for you. You will have to come up with your own ideas as to how to implement it, and actually do it yourself, including building it, and testing it. That's not to say that you are on your own, though. People will help -- provided that you aren't asking them to do all the work. You are not an empty vessel to be filled with the wisdom of the ancients. This means that *you* have to take action. You have to take yourself as far as you can in learning how things work. When you get stuck, work out what it is that you don't know, and then ask about that one thing. This makes it easier to answer, it shows that you're putting in effort on your side, and it means that you *actually learn things*. Questions like what function should I be modifying?, or how do you want me to do this? show that you haven't put in even the smallest piece of effort, and will be ignored (f you're lucky). Questions like I'm trying to implement a crumble filter, but in the mix_breadcrumbs function, how does it take account of the prestressed_yoghurt field? show that you've read and understood at least some of the code, and have thought about what it's doing. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Alert status mauve ocelot: Slight chance of brimstone. Be --- prepared to make a nice cup of tea. signature.asc Description: Digital signature
Re: Help with Brtfs Bugs
On Mon, Jul 28, 2014 at 12:00:03AM -0400, Nick Krause wrote: Hey Josef, Seems there are a lot of brtfs bugs open on the kernel Bugzilla. I am new to the brtfs side of development so please let me known if you want help cleaning up some of the bugs here that are actually valid and still open. Make up your mind... this is the third unrelated idea you've had about working in the area of btrfs. You're bouncing around all over the place like a hyperactive puppy. Pick *one* thing, and just do it. Put in the effort to learn about the subsystem (read my earlier emails for a good approach here). Accept that there are no easy one-liners in the kernel. The path to writing your first kernel patch is *hard*. Don't give up at the first hint that each thing isn't going to be solved in 5 minutes. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What part of gestalt don't you understand? --- signature.asc Description: Digital signature
Re: Work Queue for btrfs compression writes
On Tue, Jul 29, 2014 at 11:54:20PM -0400, Nick Krause wrote: Hey Guys , I am new to reading and writing kernel code.I got interested in writing code for btrfs as it seems to need more work then other file systems and this seems other then drivers, a good use of time on my part. I interested in helping improving the compression of btrfs by using a set of threads using work queues like XFS or reads and keeping the page cache after reading compressed blocks as these seem to be a great way to improve on compression performance mostly with large partitions of compressed data. I am not asking you to write the code for me but as I am new a little guidance and help would be greatly appreciated as this seems like too much work for just a newbie. * Documentation/workqueue.txt (in general, grep in Documentation usually throws up something useful) * grep -r alloc_workqueue fs/ shows a lot of uses (including in btrfs), so it should be fairly easy to see how to create and manage a workqueue. I suspect that this may be a medium-sized project, rather than a small one. My gut feeling (based on limited experience) is that the fallocate extensions project would be considerably simpler. I also noticed from the public reply to the private mail (don't do this without getting permission from the other person) that you posted to LKML in this thread (don't switch mailing lists mid-thread) that you anticipated having problems testing with limited disks -- what you will find is that testing new kernel code is something that you don't do on your main development OS installation. Instead, you will need either a scratch machine that you can easily update, or one or more virtual machines. qemu/kvm is good for this, because it has a mode that bypasses the BIOS and bootloader emulation, and just directly runs a kernel from a file on the host machine. This is fast. You can pass large sparse files to the VM to act as scratch disks, plus keep another smaller file for the guest OS (and a copy of it so that you can throw one away and make another one quickly and easily). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You've read the project plan. Forget that. We're going to Do --- Stuff and Have Fun doing it. signature.asc Description: Digital signature
Re: [PATCH] Remove certain calls for releasing page cache
On Wed, Jul 30, 2014 at 10:05:16PM -0400, Nick Krause wrote: On Wed, Jul 30, 2014 at 7:30 PM, Dave Airlie airl...@gmail.com wrote: This patch removes the lines for releasing the page cache in certain files as this may aid in perfomance with writes in the compression rountines of btrfs. Please note that this patch has not been tested on my own hardware due to no compression based btrfs volumes of my own. For all that is sacred, STOP. [snip] But if you want to work on the kernel, this isn't the way to do it, and nobody will ever take a patch from you seriously if you continue in this fashion. Dave. Dave , Seems I need to have tested this code first. You've said this before, having made exactly the same error (not testing a patch). Yet you do it again. You seem to be ignoring all the advice you've been given -- or at least not learning from it, and not learning from your experiences. Could you please, for half an hour or so, stop thinking about the immediate goal of getting a patch into the kernel, and take a short while to think about your process of learning. Look at all the advice you've had (from me, from Ted, from others), actually understand it, and consider all the things you need to do which *aren't* hacking up a lump of C. Actually learn these things -- have them in your mind all the time. I would appreciate it if you could actually engage with someone (doesn't have to be me) about this -- why are you ignoring the advice? Is it because you don't understand it? Is it because you think you can cut corners? Is it because you're concetrating on the code so much that you're forgetting it? The main thing you're doing which is making people angry is not because you're submitting bad patches (although you are). It's because you're not listening to advice, and you're not apparently learning anything from the feedback you're given. Your behaviour is not changing over time, which makes you look like a waste of time to all those people trying to help you. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- That's not rain, that's a lake with slots in it --- signature.asc Description: Digital signature
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: This adds checks for the stated modes as if they are crap we will return error not supported. You've just enabled two options, but you haven't actually implemented the code behind it. I would tell you *NOT* to do anything else on this work until you can answer the question: What happens if you apply this patch, create a large file called foo.txt, and then a userspace program executes the following code? int fd = open(foo.txt, O_RDWR); fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); Try it on a btrfs filesystem, both with and without your patch. Also try it on an ext4 filesystem. Once you've done all of that, reply to this mail and tell me what the problem is with this patch. You need to make two answers: what are the technical problems with the patch? What errors have you made in the development process? *Only* if you can answer those questions sensibly, should you write any more patches, of any kind. Hugo. Signed-off-by: Nicholas Krause xerofo...@gmail.com --- fs/btrfs/file.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 1f2b99c..599495a 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2490,7 +2490,8 @@ static long btrfs_fallocate(struct file *file, int mode, alloc_end = round_up(offset + len, blocksize); /* Make sure we aren't being give some crap mode */ - if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) + if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE| + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE)) return -EOPNOTSUPP; if (mode FALLOC_FL_PUNCH_HOLE) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The glass is neither half-full nor half-empty; it is twice as --- large as it needs to be. signature.asc Description: Digital signature
Re: Implement new FALLOC_FL_* modes
On Thu, Jul 31, 2014 at 02:08:15PM -0400, Nick Krause wrote: I am doing this project from the btrfs wiki, since I am new after reading the code using lxr I am wondering if we can base the code off that already in ext4 for these modes as they seem to work rather well. I am wondering through as a newbie some of the data structures are ext4 based and the same goes for some of the functions. I am wondering what the equivalent structures and functions are in btrfs as I can't seem to find them after reading the code as a newbie for the last few hours in lxr. Maybe I just missing something? The fundamental on-disk structures for btrfs and ext4 are totally different. You will get very confused if you expect the ext4 code to work in the btrfs module, or even if you expect structures to be similar. But first -- answer my questions in the reply I made to your patch just now. Do nothing else until you can answer all three of those questions sensibly. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: 200/1: A Pace Odyssey --- signature.asc Description: Digital signature
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 09:53:15PM -0400, Nick Krause wrote: On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills h...@carfax.org.uk wrote: On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: This adds checks for the stated modes as if they are crap we will return error not supported. You've just enabled two options, but you haven't actually implemented the code behind it. I would tell you *NOT* to do anything else on this work until you can answer the question: What happens if you apply this patch, create a large file called foo.txt, and then a userspace program executes the following code? int fd = open(foo.txt, O_RDWR); fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); Try it on a btrfs filesystem, both with and without your patch. Also try it on an ext4 filesystem. Once you've done all of that, reply to this mail and tell me what the problem is with this patch. You need to make two answers: what are the technical problems with the patch? What errors have you made in the development process? *Only* if you can answer those questions sensibly, should you write any more patches, of any kind. [snip] Calls are there in btrfs , therefore will either kernel panic or cause an oops. That's a guess. I can tell it's a guess, because I've actually read (some of) the rest of that function, so I've got a good idea of what I think it will do -- and panic or oops is not the answer. Try again. You can answer this question two ways: by test (see my suggestion above), or by reading and understanding the code. Either will work in this case, but doing neither is not an option for someone who wants to change the function. Need to test this patch as this is very easy to catch bug. So why didn't you? It's your patch, testing it is your job -- *before* it gets out into the outside world. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn't have made us of meat. signature.asc Description: Digital signature
Re: ENOSPC with mkdir and rename
for dealing with early ENOSPC problems, so other things should probably point at that. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you're afraid of having no --- money? There's irony... signature.asc Description: Digital signature
Re: ENOSPC with mkdir and rename
On Mon, Aug 04, 2014 at 11:31:57AM +0100, Peter Waller wrote: Thanks Hugo, this is the most informative e-mail yet! (more inline) On 4 August 2014 11:22, Hugo Mills h...@carfax.org.uk wrote: * btrfs fi show - look at the total and used values. If used total, you're OK. If used == total, then you could potentially hit ENOSPC. Another thing which is unclear and undocumented anywhere I can find is what the meaning of `btrfs fi show` is. I'm sure it is totally obvious if you are a developer or if you have used it for long enough. But it isn't covered in the manpage, nor in the oracle documentation, nor anywhere on the wiki that I could find. When I looked at it in my problematic situation, it said 500 GiB / 500 GiB. That sounded fine to me because I interpreted the output as what fraction of which RAID devices BTRFS was using. In other words, I thought Oh, BTRFS will just make use of the whole device that's available to it.. I thought that `btrfs fi df` was the source of information for how much space was free inside of that. That's actually pretty much accurate. The problem is that btrfs distinguishes between space available for data and space available for metadata, and doesn't trade off one for the other once they've been allocated. The balance operation frees up some of the allocation, allowing the newly-freed space to be allocated again for something else. All of the information about the data/metadata split, and what's used out of that, is revealed by btrfs fi df. * btrfs fi df - look at metadata used vs total. If these are close to zero (on 3.15+) or close to 512 MiB (on 3.15), then you are in danger of ENOSPC. Hmm. It's unfortunate that this could indicate an amount of space which is free when it actually isn't. That's why the 512 MiB block reserve was split out of metadata -- so that you don't look at metadata and say oh, I've got half a gig free, that's OK. - look at data used vs total. If the used is much smaller than total, you can reclaim some of the allocation with a filtered balance (btrfs balance start -dusage=5), which will then give you unallocated space again (see the btrfs fi show test). So the filtered balance didn't help in my situation. I understand it's something to do with the 5 parameter. But I do not understand what the impact of changing this parameter is. It is something to do with a fraction of something, but those things are still not present in my mental model despite a large amount of reading. Is there an illustration which could clear this up? The 5 is 5%. So, it'll only look at chunks which are less than 5% full. David Sterba published a patch that would balance the (approximately N) least-used chunks, which is a considerably more usable approach, but I don't know what happened to that one. Among other things I also got the kernel stack trace I pasted at the bottom of the first e-mail to this thread when I did the rebalance. OK, I'll go back and read that. You probably shouldn't have had it, though. :) This FAQ entry is pretty horrible, I'm afraid. I actually started rewriting it here to try to make it clearer what's going on. I'll try to work on it a bit more this week and put out a better version for the wiki. This is great to hear! :) Thanks for your response Hugo, that really cleared up a lot of mental model problems. I hope the documentation can be improved so that others can learn from my mistakes. I do try to work on it every so often. Note to self: win lottery, or get cloned. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You stay in the theatre because you're afraid of having no --- money? There's irony... signature.asc Description: Digital signature
Re: ENOSPC with mkdir and rename
On Mon, Aug 04, 2014 at 01:04:25PM +0200, Clemens Eisserer wrote: Hi Hugo, On the 3.15+ kernels, the block reserve is split out of metadata and reported separately. This helps with the following process: Thanks a lot for pointing this out, I hadn't noticed this change until now. One thing I didn't find any information about is the overhead introduced by mixied-mode. It would be great if you could explain it in a few sentences. I don't know, I'm afraid. I don't think we've got any benchmarks on the scale of the slowdown. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Reading Mein Kampf won't make you a Nazi. Reading Das Kapital --- won't make you a communist. But most trolls started out with a copy of Lord of the Rings. signature.asc Description: Digital signature
Re: ENOSPC with mkdir and rename
On Mon, Aug 04, 2014 at 02:17:02PM +0100, Peter Waller wrote: For anyone else having this problem, this article is fairly useful for understanding disk full problems and rebalance: http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html It actually covers the problem that I had, which is that a rebalance can't take place because it is full. I still am unsure what is really wrong with this whole situation. Is it that I wasn't careful to do a rebalance when I should have been doing? Is it that BTRFS doesn't do a rebalance automatically when it could in principle? This latter one. Well, actually two things: the FS should be capable of autonomously rebalancing at low bandwidth to prevent this problem, but nobody's got round to implementing it yet. Secondly, it should not be possible to get into a state where you can't run the balance -- Josef spent about three kernel revisions fixing the block reserve code to that end. However, since about 3.14, there's been more cases like yours show up, so I think there's been a regression. It's not very common, though. I think we've had maybe a dozen reported instances in the last 6 months. Someone on IRC had it just now, though, and captured a metadata image, so at least we've got some (meta)data to work with now. It's pretty bad to end up in a situation (with spare space) where the only way out is to add more storage, which may be impractical, difficult or expensive. The other thing that I still don't understand I've seen repeated in a few places, from the above article: because the filesystem is only 55% full, I can ask balance to rewrite all chunks that are more than 55% full Then he uses `btrfs balance start -dusage=55 /mnt/btrfs_pool1`. I don't understand the relationship between the FS is 55% full and chunks more than 55% full. What's going on here? Pigeonhole principle -- if the FS is 55% full, there must be at least one chunk = 55% full. I conclude that now since I have added more storage, the rebalance won't fail and if I keep rebalancing from a cron job I won't hit this problem again (unless the filesystem fills up very fast! what then?). I don't know however what value to assign to `-dusage` in general for the cron rebalance. Any hints? Try with increasing values until you've moved as many chunks as you want to. This is what David's balance at least N chunks patch did. I'd suggest start with 5, and go up in increments of 5, if you're making it an automatic process. Stop when you reach some threshold (like, say, 80), or when it reports that it's actually moved some chunks. Doing it manually, I usually recommend 5, 10, 20, 50, 80. Hugo. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, you don't get to be a kernel hacker simply by looking --- good in Speedos. -- Rusty Russell signature.asc Description: Digital signature
btrfs sub list output
The output options of btrfs sub list seem a bit... arbitrary? awkward? unhelpful? Here's my problem: Given a path at some arbitrary point into a mounted btrfs (sub)volume, find all subvolumes visible under that point, and identify their absolute path names. My test btrfs filesystem looks like this: TOP_LEVEL root home test subdir (a subdir, not a subvol) foo bar # mount -osubvol=test /dev/sda2 /mnt so I want to be able to go from that configuration (knowing nothing about the mountpoint), and map (both ways) between UUID and the (e.g.) /mnt/foo path. But: # btrfs sub list -oau /mnt # and # btrfs sub list -au /mnt ID 259 gen 549115 top level 5 uuid 6a50af8d-83dd-9943-b5b7-4f8b0a7f3fa7 path FS_TREE/root ID 260 gen 548768 top level 5 uuid c73d4296-7c30-074e-b647-e6e83025a125 path FS_TREE/home ID 11826 gen 549045 top level 272 uuid f78aed0d-db5a-a342-b422-87abfa18efe0 path test/subdir/foo ID 11827 gen 549046 top level 272 uuid a5cea7ae-3fdd-c247-8905-40cbb7f39017 path test/bar Here, I can easily filter out the subvols I want (they're the ones without FS_TREE), but I have to know the mountpoint (which I can find) and the subvol= parameter (which I think I can't). # btrfs sub list -ou /mnt/subdir/ ID 11826 gen 549045 top level 272 uuid f78aed0d-db5a-a342-b422-87abfa18efe0 path test/subdir/foo ID 11827 gen 549046 top level 272 uuid a5cea7ae-3fdd-c247-8905-40cbb7f39017 path test/bar This filters the subvols correctly, but otherwise has the same drawbacks as above. # btrfs sub list -u /mnt ID 259 gen 549114 top level 5 uuid 6a50af8d-83dd-9943-b5b7-4f8b0a7f3fa7 path root ID 260 gen 548768 top level 5 uuid c73d4296-7c30-074e-b647-e6e83025a125 path home ID 11826 gen 549045 top level 272 uuid f78aed0d-db5a-a342-b422-87abfa18efe0 path subdir/foo ID 11827 gen 549046 top level 272 uuid a5cea7ae-3fdd-c247-8905-40cbb7f39017 path bar Here, I get the paths relative to the mountpoint, which is what I want, but mixed up with paths outside the mountpoint as well, which I don't, and have no way of distinguishing the two classes without making a separate call to btrfs sub list -a and filtering out the UUIDs with FS_TREE in the name. Incidentally, if the parameter to btrfs sub list is inside another subvolume within the mount, then the relative effects are all relative to that subvol, not to the mountpoint. I'm finding it hard to work out how the variants with -o or -a (or both) are actually helpful at all, now that I come to use them in more than a vague human-readable form. Have I missed something, or is this actually an awkward furball of confusing and mostly unhelpful options? Are these options actually doing what the original author intended? If so, what was that intent? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Summoning his Cosmic Powers, and glowing slightly --- from his toes... signature.asc Description: Digital signature
Re: Stack dumps in use_block_rsv while rebalancing (block rsv returned -28)
On Tue, Aug 05, 2014 at 10:34:13AM +0100, Peter Waller wrote: I already posted this in the thread ENOSPC with mkdir and rename, but now I have a device with 100GB unallocated on the btrfs fi sh output, and when I run a rebalance of the form: btrfs filesystem balance start -dusage=50 -musage=10 $mount I get more than 75 of such stack traces contaminating the klog. I've put some of them up in a gist here: https://gist.github.com/pwaller/1df8a7efc2f10343f2e3 and one of them is reproduced below. Is this harmful or expected? Are there any workarounds? It's a warning, not an oops, so it's less immediately dangerous. The other key thing is block rsv returned -28, which says it's an ENOSPC. My guess would be that you've got ENOSPC debugging enabled in the kernel, and that the backtraces, while scary, are essentially harmless (if irritating). Hugo. Thanks, - Peter [376007.681938] [ cut here ] [376007.681957] WARNING: CPU: 1 PID: 27021 at /home/apw/COD/linux/fs/btrfs/ extent-tree.c:6946 use_block_rsv+0xfd/0x1a0 [btrfs]() [376007.681958] BTRFS: block rsv returned -28 [376007.681959] Modules linked in: softdog tcp_diag inet_diag dm_crypt ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse [376007.681980] CPU: 1 PID: 27021 Comm: pam_script_ses_ Tainted: G W 3.15.7-031507-generic #201407281235 [376007.681981] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014 [376007.681983] 1b22 8800acca39d8 8176f115 0007 [376007.681986] 8800acca3a28 8800acca3a18 8106ceac 8801efc37870 [376007.681989] 88017db0ff00 8801aedcd800 1000 88001c987000 [376007.681992] Call Trace: [376007.682000] [8176f115] dump_stack+0x46/0x58 [376007.682005] [8106ceac] warn_slowpath_common+0x8c/0xc0 [376007.682008] [8106cf96] warn_slowpath_fmt+0x46/0x50 [376007.682016] [a00d9d1d] use_block_rsv+0xfd/0x1a0 [btrfs] [376007.682024] [a00de687] btrfs_alloc_free_block+0x57/0x220 [btrfs] [376007.682027] [8178033c] ? __do_page_fault+0x28c/0x550 [376007.682031] [8119749f] ? page_add_file_rmap+0x6f/0xb0 [376007.682037] [a00c8a3c] btrfs_copy_root+0xfc/0x2b0 [btrfs] [376007.682041] [811c60b9] ? memcg_check_events+0x29/0x50 [376007.682051] [a013a583] ? create_reloc_root+0x33/0x2c0 [btrfs] [376007.682061] [a013a743] create_reloc_root+0x1f3/0x2c0 [btrfs] [376007.682064] [811dd073] ? generic_permission+0xf3/0x120 [376007.682073] [a0140eb8] btrfs_init_reloc_root+0xb8/0xd0 [btrfs] [376007.682082] [a00ee967] record_root_in_trans.part.30+0x97/0x100 [btrfs] [376007.682090] [a00ee9f4] record_root_in_trans+0x24/0x30 [btrfs] [376007.682098] [a00efeb1] btrfs_record_root_in_trans+0x51/0x80 [btrfs] [376007.682106] [a00f13d6] start_transaction.part.35+0x86/0x560 [btrfs] [376007.682109] [8132c197] ? apparmor_capable+0x27/0x80 [376007.682117] [a00f18d9] start_transaction+0x29/0x30 [btrfs] [376007.682125] [a00f19a7] btrfs_join_transaction+0x17/0x20 [btrfs] [376007.682133] [a00f7fa8] btrfs_dirty_inode+0x58/0xe0 [btrfs] [376007.682141] [a00fcaf2] btrfs_setattr+0xa2/0xf0 [btrfs] [376007.682144] [811eec74] notify_change+0x1c4/0x3b0 [376007.682146] [811dde96] ? final_putname+0x26/0x50 [376007.682149] [811d088d] chown_common+0x16d/0x1a0 [376007.682153] [811f2b08] ? __mnt_want_write+0x58/0x70 [376007.682156] [811d1a8f] SyS_fchownat+0xbf/0x100 [376007.682159] [811d1aed] SyS_chown+0x1d/0x20 [376007.682163] [817858bf] tracesys+0xe1/0xe6 [376007.682165] ---[ end trace 1853311c87a5cd94 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- UNIX: Italian pen maker --- signature.asc Description: Digital signature
Re: [PATCH] Btrfs: fix compressed write corruption on enospc
On Wed, Aug 06, 2014 at 12:21:59PM +0200, Martin Steigerwald wrote: It basically happened on about the first heavy write I/O occasion after the BTRFS trees filled the complete device: I am now balancing the trees down to lower sizes manually with btrfs balance start -dusage=10 /home btrfs balance start -musage=10 /home Note that balance has nothing to do with balancing the metadata trees. The tree structures are automatically balanced as part of their normal operation. A btrfs balance start is a much higher-level operation. It's called balance because the overall effect is to balance the data usage evenly across multiple devices. (Actually, to balance the available space evenly). Also note that the data part isn't tree-structured, so referring to balancing the trees with a -d flag is doubly misleading. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You know... I'm sure this code would seem a lot better if I --- never tried running it. signature.asc Description: Digital signature
Re: File system stuck in scrub
On Mon, Aug 11, 2014 at 08:12:46AM -0700, Nikolaus Rath wrote: I started a scrub of one of my btrfs filesystem and then had to restart the system. `systemctl restart` seemed to terminate all processes, but then got stuck at the end. The disk activity led was still flashing rapidly at that point, so I assume that the active scrub was preventing the reboot (is that a bug or a feature?). Shouldn't have stopped it. In any case, I could not wait for that so I power cycled. But now my file system seems to be stuck in a scrub that can neither be completed nor cancelled: $ sudo btrfs scrub status /home/nikratio/ scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds total bytes scrubbed: 209.97GiB with 0 errors $ date Sun Aug 10 22:00:44 PDT 2014 $ sudo btrfs scrub cancel /home/nikratio/ ERROR: scrub cancel failed on /home/nikratio/: not running $ sudo btrfs scrub start /home/nikratio/ ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /home/nikratio/'. To see the status use 'btrfs scrub status [-d] /home/nikratio/'. Note that the scrub was started more than 3 hours ago, but claims to have been running for only 1562 seconds. This is a regrettably common problem -- fortunately with a simple solution. The userspace scrub monitor died in the reboot, leaving the status file present. If you delete the status file, which is in /var/lib/btrfs/, that should allow you to start a new scrub. I then figured that maybe I need to run btrfsck. This gave the following output: checking extents checking free space cache checking fs roots root 5 inode 3149791 errors 400, nbytes wrong root 5 inode 3150233 errors 400, nbytes wrong root 5 inode 3150238 errors 400, nbytes wrong [102 similar lines] Checking filesystem on /dev/mapper/vg0-nikratio_crypt UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38 free space inode generation (0) did not match free space cache generation (161262) [snip] found 216444746042 bytes used err is 1 total csum bytes: 383160676 total tree bytes: 875753472 total fs tree bytes: 284246016 total extent tree bytes: 69320704 btree space waste bytes: 205021777 file data blocks allocated: 3701556121600 referenced 388107321344 Btrfs v3.14.1 So nothing about the scrub, but apparently some other errors. The free space inode generation errors are harmless. The wrong nbytes is probably not horrifically damaging, but I don't know so much about that one. Can someone tell me: * Should I be able to restart while a scrub is in progress, or is that deliberately prevented by btrfs? Restart the machine? Yes. * How can I resume or cancel the scrub? It's probably simply not running -- see above. * Is it more risky to leave the above errors uncorrected, or to run btrfsck with --repair? I would, I think, leave them. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We are all lying in the gutter, but some of us are looking --- at the stars. signature.asc Description: Digital signature
[PATCH] btrfs-progs: Add -R to list UUIDs of original received subvolume
When using send/receive, it it useful to be able to match up source subvols on the send side (as, say, for -p or -c clone sources) with their corresponding copies on the receive side. This patch adds a -R option to btrfs sub list to show the received subvolume UUID on the receive side, allowing the user to perform that matching correctly. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs-list.c | 32 +++- btrfs-list.h | 2 ++ cmds-subvolume.c | 6 +- 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/btrfs-list.c b/btrfs-list.c index 542dfe0..01ccca9 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -85,6 +85,11 @@ static struct { .need_print = 0, }, { + .name = received_uuid, + .column_name= Received UUID, + .need_print = 0, + }, + { .name = uuid, .column_name= UUID, .need_print = 0, @@ -391,7 +396,7 @@ static struct root_info *root_tree_search(struct root_lookup *root_tree, static int update_root(struct root_lookup *root_lookup, u64 root_id, u64 ref_tree, u64 root_offset, u64 flags, u64 dir_id, char *name, int name_len, u64 ogen, u64 gen, - time_t ot, void *uuid, void *puuid) + time_t ot, void *uuid, void *puuid, void *ruuid) { struct root_info *ri; @@ -429,6 +434,8 @@ static int update_root(struct root_lookup *root_lookup, memcpy(ri-uuid, uuid, BTRFS_UUID_SIZE); if (puuid) memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE); + if (ruuid) + memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE); return 0; } @@ -447,17 +454,19 @@ static int update_root(struct root_lookup *root_lookup, * ot: the original time(create time) of the root * uuid: uuid of the root * puuid: uuid of the root parent if any + * ruuid: uuid of the received subvol, if any */ static int add_root(struct root_lookup *root_lookup, u64 root_id, u64 ref_tree, u64 root_offset, u64 flags, u64 dir_id, char *name, int name_len, u64 ogen, u64 gen, - time_t ot, void *uuid, void *puuid) + time_t ot, void *uuid, void *puuid, void *ruuid) { struct root_info *ri; int ret; ret = update_root(root_lookup, root_id, ref_tree, root_offset, flags, - dir_id, name, name_len, ogen, gen, ot, uuid, puuid); + dir_id, name, name_len, ogen, gen, ot, + uuid, puuid, ruuid); if (!ret) return 0; @@ -501,6 +510,9 @@ static int add_root(struct root_lookup *root_lookup, if (puuid) memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE); + if (ruuid) + memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE); + ret = root_tree_insert(root_lookup, ri); if (ret) { printf(failed to insert tree %llu\n, (unsigned long long)root_id); @@ -978,6 +990,7 @@ static int __list_subvol_search(int fd, struct root_lookup *root_lookup) time_t t; u8 uuid[BTRFS_UUID_SIZE]; u8 puuid[BTRFS_UUID_SIZE]; + u8 ruuid[BTRFS_UUID_SIZE]; root_lookup_init(root_lookup); memset(args, 0, sizeof(args)); @@ -1030,7 +1043,7 @@ static int __list_subvol_search(int fd, struct root_lookup *root_lookup) add_root(root_lookup, sh.objectid, sh.offset, 0, 0, dir_id, name, name_len, 0, 0, 0, -NULL, NULL); +NULL, NULL, NULL); } else if (sh.type == BTRFS_ROOT_ITEM_KEY) { ri = (struct btrfs_root_item *)(args.buf + off); gen = btrfs_root_generation(ri); @@ -1041,16 +1054,18 @@ static int __list_subvol_search(int fd, struct root_lookup *root_lookup) ogen = btrfs_root_otransid(ri); memcpy(uuid, ri-uuid, BTRFS_UUID_SIZE); memcpy(puuid, ri-parent_uuid, BTRFS_UUID_SIZE); + memcpy(ruuid, ri-received_uuid, BTRFS_UUID_SIZE); } else { t = 0; ogen = 0; memset(uuid, 0, BTRFS_UUID_SIZE); memset(puuid, 0, BTRFS_UUID_SIZE); + memset(ruuid, 0, BTRFS_UUID_SIZE); } add_root(root_lookup, sh.objectid, 0, sh.offset, flags, 0, NULL, 0
[PATCH] btrfs-progs: Fix spelling in btrfs sub list help
below, not bellow Signed-off-by: Hugo Mills h...@carfax.org.uk --- cmds-subvolume.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 5216e53..349d0db 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -390,7 +390,7 @@ static const char * const cmd_subvol_list_usage[] = { to the given path, -c print the ogeneration of the subvolume, -g print the generation of the subvolume, - -o print only subvolumes bellow specified path, + -o print only subvolumes below specified path, -u print the uuid of subvolumes (and snapshots), -q print the parent uuid of the snapshots, -R print the uuid of the received snapshots, -- 2.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs-progs: Add -R to list UUIDs of original received subvolume
When using send/receive, it it useful to be able to match up source subvols on the send side (as, say, for -p or -c clone sources) with their corresponding copies on the receive side. This patch adds a -R option to btrfs sub list to show the received subvolume UUID on the receive side, allowing the user to perform that matching correctly. Signed-off-by: Hugo Mills h...@carfax.org.uk --- v1 - v2: Update man page as well. Documentation/btrfs-subvolume.txt | 2 ++ btrfs-list.c | 32 +++- btrfs-list.h | 2 ++ cmds-subvolume.c | 6 +- 4 files changed, 36 insertions(+), 6 deletions(-) diff --git a/Documentation/btrfs-subvolume.txt b/Documentation/btrfs-subvolume.txt index a519131..789e462 100644 --- a/Documentation/btrfs-subvolume.txt +++ b/Documentation/btrfs-subvolume.txt @@ -104,6 +104,8 @@ print only subvolumes bellow specified path. print the UUID of the subvolume. -q print the parent uuid of subvolumes (and snapshots). +-R +print the UUID of the sent subvolume, where the subvolume is the result of a receive operation -t print the result as a table. -s diff --git a/btrfs-list.c b/btrfs-list.c index 542dfe0..01ccca9 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -85,6 +85,11 @@ static struct { .need_print = 0, }, { + .name = received_uuid, + .column_name= Received UUID, + .need_print = 0, + }, + { .name = uuid, .column_name= UUID, .need_print = 0, @@ -391,7 +396,7 @@ static struct root_info *root_tree_search(struct root_lookup *root_tree, static int update_root(struct root_lookup *root_lookup, u64 root_id, u64 ref_tree, u64 root_offset, u64 flags, u64 dir_id, char *name, int name_len, u64 ogen, u64 gen, - time_t ot, void *uuid, void *puuid) + time_t ot, void *uuid, void *puuid, void *ruuid) { struct root_info *ri; @@ -429,6 +434,8 @@ static int update_root(struct root_lookup *root_lookup, memcpy(ri-uuid, uuid, BTRFS_UUID_SIZE); if (puuid) memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE); + if (ruuid) + memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE); return 0; } @@ -447,17 +454,19 @@ static int update_root(struct root_lookup *root_lookup, * ot: the original time(create time) of the root * uuid: uuid of the root * puuid: uuid of the root parent if any + * ruuid: uuid of the received subvol, if any */ static int add_root(struct root_lookup *root_lookup, u64 root_id, u64 ref_tree, u64 root_offset, u64 flags, u64 dir_id, char *name, int name_len, u64 ogen, u64 gen, - time_t ot, void *uuid, void *puuid) + time_t ot, void *uuid, void *puuid, void *ruuid) { struct root_info *ri; int ret; ret = update_root(root_lookup, root_id, ref_tree, root_offset, flags, - dir_id, name, name_len, ogen, gen, ot, uuid, puuid); + dir_id, name, name_len, ogen, gen, ot, + uuid, puuid, ruuid); if (!ret) return 0; @@ -501,6 +510,9 @@ static int add_root(struct root_lookup *root_lookup, if (puuid) memcpy(ri-puuid, puuid, BTRFS_UUID_SIZE); + if (ruuid) + memcpy(ri-ruuid, ruuid, BTRFS_UUID_SIZE); + ret = root_tree_insert(root_lookup, ri); if (ret) { printf(failed to insert tree %llu\n, (unsigned long long)root_id); @@ -978,6 +990,7 @@ static int __list_subvol_search(int fd, struct root_lookup *root_lookup) time_t t; u8 uuid[BTRFS_UUID_SIZE]; u8 puuid[BTRFS_UUID_SIZE]; + u8 ruuid[BTRFS_UUID_SIZE]; root_lookup_init(root_lookup); memset(args, 0, sizeof(args)); @@ -1030,7 +1043,7 @@ static int __list_subvol_search(int fd, struct root_lookup *root_lookup) add_root(root_lookup, sh.objectid, sh.offset, 0, 0, dir_id, name, name_len, 0, 0, 0, -NULL, NULL); +NULL, NULL, NULL); } else if (sh.type == BTRFS_ROOT_ITEM_KEY) { ri = (struct btrfs_root_item *)(args.buf + off); gen = btrfs_root_generation(ri); @@ -1041,16 +1054,18 @@ static int __list_subvol_search(int fd, struct root_lookup *root_lookup) ogen = btrfs_root_otransid(ri); memcpy(uuid, ri-uuid, BTRFS_UUID_SIZE); memcpy(puuid, ri-parent_uuid, BTRFS_UUID_SIZE
Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2
On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote: On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote: Hello list, I want to use an ARM kirkwood based NSA325v2 NAS (dubbed Receiver) for receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop running kubuntu 14.04 LTS (dubbed Source), storing them on a 3TB WD red disk (having GPT label, partitions created with parted). But all the btrfs receive commands on 'Receiver' fail soon with e.g.: ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File too large ... and that stops reception/snapshot creation. ... Increasing the verbosity with -v -v for btrfs receive shows the following differences between receive operations on 'Receiver' and 'OtherHost', both of them using the identical inputfile /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send * the chown and chmod operations are different - resulting in weird/wrong permissions and sizes on 'Receiver' side. * what's stransid, this is the first line that differs This is interesting, thanks for going to the trouble to show those diffs. That the commands and strings match up show us that the basic tlv header chaining is working. But the u64 attribute values are sometimes messed up. And messed up in a specific way. A variable number of low order bytes are magically appearing. (gdb) print/x 11709972488 $2 = 0x2b9f80008 (gdb) print/x 178680 $3 = 0x2b9f8 (gdb) print/x 588032 $6 = 0x8f900 (gdb) print/x 2297 $7 = 0x8f9 Some light googling makes me think that the Marvell Kirkwood is not friendly at all to unaligned accesses. ARM isn't in general -- it never has been, even 20 years ago in the ARM3 days when I was writing code in ARM assembler. We've been bitten by this before in btrfs (mkfs on ARM works, mounting it fails fast, because userspace has a trap to fix unaligned accesses, and the kernel doesn't). The (biting tongue) send and receive code is playing some games with casting aligned and unaligned pointers. Maybe that's upsetting the arm toolchain/kirkwood. Almost certainly the toolchain isn't identifying the unaligned accesses, and thus building code that uses them causes stuff to break. There's a workaround for userspace that you can use to verify that this is indeed the problem: echo 2 /proc/cpu/alignment will tell the kernel to fix up unaligned accesses initiated in userspace. It's a performance killer, but it should serve to identify whether the problem is actually this. Hugo. Does this completely untested patch to btrfs-progs, to be run on the receiver, do anything? - z diff --git a/send-stream.c b/send-stream.c index 88e18e2..4f8dd83 100644 --- a/send-stream.c +++ b/send-stream.c @@ -204,7 +204,7 @@ out: int __len; \ TLV_GET(s, attr, (void**)__tmp, __len); \ TLV_CHECK_LEN(sizeof(*__tmp), __len); \ - *v = le##bits##_to_cpu(*__tmp); \ + *v = get_unaligned_le##bits(__tmp); \ } while (0) #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v) -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold. signature.asc Description: Digital signature
Re: Putting very big and small files in one subvolume?
On Fri, Aug 29, 2014 at 09:34:54PM +0530, Shriramana Sharma wrote: On 8/17/14, Shriramana Sharma samj...@gmail.com wrote: Hello. One more Q re generic BTRFS behaviour. https://btrfs.wiki.kernel.org/index.php/Main_Page specifically advertises BTRFS's Space-efficient packing of small files. Hello. I realized that while I got lots of interesting advice on how to best layout my FS on multiple devices/FSs, I would like to specifically know how exactly the above works (in not-too-technical terms) so I'd like to decide for myself if the above feature of BTRFS would suit my particular purpose. In brief: For small files (typically under about 3.5k), the FS can put the file's data in the metadata -- specifically, the extent tree -- so that the data is directly available without a second seek to find it. The longer version: btrfs has a number of B-trees in its metadata. These are trees with a high fan-out (from memory, it's something like 30-240 children each, depending on the block size), and with the actual data being stored at the leaves of the tree. Each leaf of the tree is a fixed size, depending on the options passed to mkfs. Typically 4k-32k. The data in the trees is stored as a key and a value -- the tree indexes the keys efficiently, and stores the values (usually some data structure like an inode or file extent information) in the same leaf node as the key -- keys at the front of the leaf, data at the back. The extent tree keeps track of the contiguous byte sequences of each file, and where those sequences can be found on the FS. To read a file, the FS looks up the file's extents in the extent tree, and then has to go and find the data that it points to. This involves an extra read of the disk, which is slow. However, the metadata tree leaf is already in RAM (because the FS has just read it). So, for performance and space efficiency reasons, it can optionally store data for small files as part of the value component of the key/value pair for the file's extent. This means that the file's data is available immediately, without the extra disk read. Drawbacks -- metadata on btrfs is usually DUP, which means two copies, so storing lots of medium-small files (2k-4k) will take up more space than it would otherwise, because you're storing two copies and not saving enough space to make it worthwhile. It also makes it harder to calculate the used vs free values for df. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: Umpire of the Rising Sun --- signature.asc Description: Digital signature
Re: Btrfs-progs-3.16: fs metadata is both single and dup?
On Tue, Sep 02, 2014 at 12:05:33PM +, Holger Hoffstätte wrote: I updated to progs-3.16 and noticed during testing: rootlosetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE /dev/loop0 0 0 0 0 /tmp/img rootmkfs.btrfs -f /dev/loop0 Btrfs v3.16 See http://btrfs.wiki.kernel.org for more information. Performing full device TRIM (8.00GiB) ... Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 fs created label (null) on /dev/loop0 nodesize 16384 leafsize 16384 sectorsize 4096 size 8.00GiB rootmkdir /tmp/btrfs rootmount /dev/loop0 /tmp/btrfs All fine until here.. rootbtrfs filesystem df /tmp/btrfs Data, single: total=8.00MiB, used=64.00KiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=409.56MiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00 Note that the single chunks are empty, and will remain so. [snip] So where does the confusing initial display come from? I'm running this against a (very patched) 3.14.17, but don't remember ever seeing this with btrfs-progs-3.14.2. Your memory is faulty, I'm afraid. It's always done that -- at least since I started using btrfs, several years ago. I believe it comes from mkfs creating a trivial basic filesystem (with the single profiles), and then setting enough flags on it that the kernel can bootstrap it with the desired chunks in it -- but I may be wrong about that. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Normaliser unix c'est comme pasteuriser le Camembert --- signature.asc Description: Digital signature
Re: Btrfs-progs-3.16: fs metadata is both single and dup?
On Wed, Sep 03, 2014 at 04:53:39AM +, Duncan wrote: Hugo Mills posted on Tue, 02 Sep 2014 13:13:49 +0100 as excerpted: On Tue, Sep 02, 2014 at 12:05:33PM +, Holger Hoffstätte wrote: So where does the confusing initial display come from? [I] don't remember ever seeing this with btrfs-progs-3.14.2. Your memory is faulty, I'm afraid. It's always done that -- at least since I started using btrfs, several years ago. I believe it comes from mkfs creating a trivial basic filesystem (with the single profiles), and then setting enough flags on it that the kernel can bootstrap it with the desired chunks in it -- but I may be wrong about that. Agreed. It's an artifact of the mkfs.btrfs process and a btrfs fi df on a new filesystem always seems to have those extra unused single profile lines. I got so the first thing I'd do on first mount was a balance -- before there was anything actually on the filesystem so it was real fast -- to get rid of those null entries. Interesting. Last time I tried that (balance without any contents), the balance removed *all* the chunks, and then the FS forgot about what configuration it should have and reverted to RAID-1/single. I usually recommend writing at least one 4k+ file to the FS first, if it's bothering someone so much that they can't let it go. Hugo. Actually, I had already created a little mkfs.btrfs helper script that sets options I normally want, etc, and after doing the mkfs and balance drill a few times, I setup the script such that if at the appropriate prompt I give it a mountpoint to point balance at, it'll mount the filesystem and immediately run a balance, thus automating things and making the balance part of the same scripted process that does the mkfs.btrfs in the first place. IOW, those null-entry lines bother me too... enough that even tho I know what they are I arranged things so they're automatically and immediately eliminated and I don't have to see 'em! =:^) -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Never underestimate the bandwidth of a Volvo filled --- with backup tapes. signature.asc Description: Digital signature
Re: ENOSPC on mostly empty file system
On Tue, Sep 09, 2014 at 09:49:12PM +0200, Clemens Eisserer wrote: Hi Arnd, Ok, one more data point: Why don't you provide the data point you were specifically asked for, btrfs fi df ;) btrfs fi show is important as well -- it's hard to work out the state of the FS from just one of them. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You can get more with a kind word and a two-by-four than you --- can with just a kind word. signature.asc Description: Digital signature
Re: ENOSPC on mostly empty file system
On Tue, Sep 09, 2014 at 11:49:10PM +0200, Arnd Bergmann wrote: Ok, now I'm in the bad state again (after running a 'make allmodconfig' kernel build: Label: none uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47 Total devices 1 FS bytes used 8.79GB devid1 size 67.14GB used 67.14GB path /dev/sdc6 All the space on the FS has been allocated to some purpose or other. Data: total=65.11GB, used=7.99GB Here, you have 65 GiB allocated to data, but only 8 GiB of that used. The FS won't automatically free up any of that (yet -- it's one of the project ideas). System, DUP: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=821.48MB Here, you're running close to full with metadata -- the FS needs some space to write new copies of metadata block in order to modify anything. It can't get enough space to do that, because there's nowhere for any more metadata allocation to come from (because it's all allocated --see my first comment). So... you need to free up some data chunks. You can do this with: # btrfs balance start -dusage=5 /mountpoint Take a look at the output of btrfs fi df and btrfs fi show afterwards, and see how much the Data allocation has reduced by, and how much unallocated space you have left afterwards. You may want to increase the number in the above balance command to some higher value, to free up even more chunks (it limits the balance to chunks less than n% full -- so the command above will only touch chunks with 5% actual data or less). This is in the FAQ. Hugo. Metadata: total=8.00MB, used=0.00 : total=200.00MB, used=0.00 -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Comic Sans goes into a bar, and the barman says, We don't --- serve your type here. signature.asc Description: Digital signature
Re: No space on empty, degraded raid10
On Thu, Sep 11, 2014 at 07:19:00AM -0400, Austin S Hemmelgarn wrote: On 2014-09-11 02:40, Russell Coker wrote: Also it would be nice if there was a N-way mirror option for system data. As such data is tiny (32MB on the 120G filesystem in my workstation) the space used by having a copy on every disk in the array shouldn't matter. N-way mirroring is in the queue for after RAID5/6 work; ideally, once it is ready, mkfs should default to one copy per disk in the filesystem. Why change the default from 2-copies, which it's been for years? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Ceci est un travail pour l'Australien. --- signature.asc Description: Digital signature
Re: No space on empty, degraded raid10
On Thu, Sep 11, 2014 at 08:06:21AM -0400, Austin S Hemmelgarn wrote: On 2014-09-11 07:38, Hugo Mills wrote: On Thu, Sep 11, 2014 at 07:19:00AM -0400, Austin S Hemmelgarn wrote: On 2014-09-11 02:40, Russell Coker wrote: Also it would be nice if there was a N-way mirror option for system data. As such data is tiny (32MB on the 120G filesystem in my workstation) the space used by having a copy on every disk in the array shouldn't matter. N-way mirroring is in the queue for after RAID5/6 work; ideally, once it is ready, mkfs should default to one copy per disk in the filesystem. Why change the default from 2-copies, which it's been for years? Sorry about the ambiguity in my statement, I meant that the default for system chunks should be one copy per disk in the filesystem. If you don't have a copy of the system chunks, then you essentially don't have a filesystem, and that means that BTRFS RAID6 can't provide true resilience against 2 disks failing catastrophically unless there are at least 3 copies of the system chunks. Aah, OK. That makes perfect sense, then. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Some days, it's just not worth gnawing through the straps --- signature.asc Description: Digital signature
Re: RAID1 failure and recovery
On Fri, Sep 12, 2014 at 01:57:37AM -0700, shane-ker...@csy.ca wrote: Hi, I am testing BTRFS in a simple RAID1 environment. Default mount options and data and metadata are mirrored between sda2 and sdb2. I have a few questions and a potential bug report. I don't normally have console access to the server so when the server boots with 1 of 2 disks, the mount will fail without -o degraded. Can I use -o degraded by default to force mounting with any number of disks? This is the default behaviour for linux-raid so I was rather surprised when the server didn't boot after a simulated disk failure. The problem with that is that at the moment, you don't get any notification that anything's wrong when the system boots. As a result, using -odegraded as a default option is not generally recommended. So I pulled sdb to simulate a disk failure. The kernel oops'd but did continue running. I then rebooted encountering the above mount problem. I re-inserted the disk and rebooted again and BTRFS mounted successfully. However, I am now getting warnings like: BTRFS: read error corrected: ino 1615 off 86016 (dev /dev/sda2 sector 4580382824) I take it there were writes to SDA and sdb is out of sync. Btrfs is correcting sdb as it goes but I won't have redundancy until sdb resyncs completely. Is there a way to tell btrfs that I just re-added a failed disk and to go through and resync the array as mdraid would do? I know I can do a btrfs fi resync manually but can that be automated if the array goes out of sync for whatever reason (power failure)... I've done this before, by accident (pulled the wrong drive, reinserted it). You can fix it by running a scrub on the device (btrfs scrub start /dev/ice, I think). Finally for those using this sort of setup in production, is running btrfs on top of mdraid the way to go at this point? Using btrfs native RAID means that you get independent checksums on the two copies, so that where the data differs between the copies, the correct data can be identified. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- SCSI is usually fixed by remembering that it needs three --- terminations: One at each end of the chain. And the goat. signature.asc Description: Digital signature
Re: RAID1 failure and recovery
On Sun, Sep 14, 2014 at 05:15:08AM +0200, Piotr Pawłow wrote: On 12.09.2014 12:47, Hugo Mills wrote: I've done this before, by accident (pulled the wrong drive, reinserted it). You can fix it by running a scrub on the device (btrfs scrub start /dev/ice, I think). I'd like to remind everyone that btrfs has weak checksums. It may be good for correcting an occasional error, but I wouldn't trust it to correct larger amounts of data. Checksums are done for each 4k block, so the increase in probability of a false negative is purely to do with the sher volume of data. Weak checksums like the CRC32 that btrfs currently uses are indeed poor for detecting malicious targeted attacks on the data, but for random failures, such as a disk block being unreadable and returning zeroes or having bit errors, the odds of identifying the failure are still excellent. Additionally, nocow files are not checksummed. They will not be corrected and may return good data or random garbage, depending on which mirror is accessed. Yes, this is a trade-off that you have to make for your own use-case and happiness. For some things (like a browser cache), I'd be happy with losing the checksums. For others (e.g. mail), I wouldn't be. Hugo. Below is a test I did some time ago, demonstrating the problem with nocow files: #!/bin/sh MOUNT_DIR=mnt DISK1=d1 DISK2=d2 SIZE=2G # create raid1 FS mkdir $MOUNT_DIR truncate --size $SIZE $DISK1 truncate --size $SIZE $DISK2 L1=$(losetup --show -f $DISK1) L2=$(losetup --show -f $DISK2) mkfs.btrfs -d raid1 -m raid1 $L1 $L2 mount $L1 $MOUNT_DIR # enable NOCOW chattr +C $MOUNT_DIR umount $MOUNT_DIR # fail the second drive losetup -d $L2 mount $L1 $MOUNT_DIR -odegraded # file must be large enough to not get embedded inside metadata perl -e 'print Test OK.\nx4096' $MOUNT_DIR/testfile umount $MOUNT_DIR # reattach the second drive L2=$(losetup --show -f $DISK2) mount $L1 $MOUNT_DIR # let's see what we get - correct data or garbage? cat $MOUNT_DIR/testfile # clean up umount $MOUNT_DIR losetup -d $L1 losetup -d $L2 rm $DISK1 $DISK2 rmdir $MOUNT_DIR -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Hey, Virtual Memory! Now I can have a *really big* ramdisk! --- signature.asc Description: Digital signature
Re: [UI RFC][PATCH] btrfs-progs: add options to tune units for fi df output
On Mon, Sep 15, 2014 at 05:09:52PM +0200, David Sterba wrote: The size unit format is a longstanding annoyance. This patch is based on the work of Nils and Alexandre and enhances the options. It's possible to select raw bytes, SI-based or IEC-based compact units (human frientdly) or a fixed base from kilobytes to terabytes. The default is compact human readable IEC-based, no change to current version. CC: Nils Steinger n...@voidptr.de CC: Alexandre Oliva ol...@gnu.org Signed-off-by: David Sterba dste...@suse.cz Looks good to me. One _tiny_ nit: For the kilo-/kibi- prefix, IEC is KiB (upper case), SI is kB (lower case). Other than that, the UI looks pretty comfortable to me. Reviewed-by: Hugo Mills h...@carfax.org.uk --- I tried to make the command line UI rich enough to address current and future needs, I'm open to tweaks, rewording etc. The patch is based on current snapshot of integration branch that will be the base of 3.17 release and contains the 'enhanced df' patches, branch dev/units. Documentation/btrfs-filesystem.txt | 25 - cmds-filesystem.c | 111 - utils.c| 48 utils.h| 30 +++--- 4 files changed, 168 insertions(+), 46 deletions(-) diff --git a/Documentation/btrfs-filesystem.txt b/Documentation/btrfs-filesystem.txt index c9c0b006a0b0..7ac105ff350e 100644 --- a/Documentation/btrfs-filesystem.txt +++ b/Documentation/btrfs-filesystem.txt @@ -17,8 +17,31 @@ resizing, defragment. SUBCOMMAND -- -*df* path [path...]:: +*df* [options] path:: Show space usage information for a mount point. ++ +`Options` ++ +-b|--raw +raw numbers in bytes, without the 'B' suffix +-h +print human friendly numbers, base 1024, this is the default +-H +print human friendly numbers, base 1000 +--iec +select the 1024 base for the following options, according to the IEC standard +--si +select the 1000 base for the following options, according to the SI standard +-k|--kbytes +show sizes in KiB, or KB with --si +-m|--mbytes +show sizes in MiB, or MB with --si +-g|--gbytes +show sizes in GiB, or GB with --si +-t|--tbytes +show sizes in TiB, or TB with --si + +If conflicting options are passed, the last one takes precedence. *show* [--mounted|--all-devices|path|uuid|device|label]:: Show the btrfs filesystem with some additional info. diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 89b897496256..68876957cbab 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -114,12 +114,21 @@ static const char * const filesystem_cmd_group_usage[] = { }; static const char * const cmd_filesystem_df_usage[] = { - btrfs filesystem df path, + btrfs filesystem df [options] path, Show space usage information for a mount point, + -b|--raw raw numbers in bytes, + -h human friendly numbers, base 1024 (default), + -H human friendly numbers, base 1000, + --iec use 1024 as a base (Kib, MiB, GiB, ...), + --si use 1000 as a base (KB, MB, GB, ...), + -k|--kbytesshow sizes in KiB, or KB with --si, + -m|--mbytesshow sizes in MiB, or MB with --si, + -g|--gbytesshow sizes in GiB, or GB with --si, + -t|--tbytesshow sizes in TiB, or TB with --si, NULL }; -static void print_df(struct btrfs_ioctl_space_args *sargs) +static void print_df(struct btrfs_ioctl_space_args *sargs, int unit_mode) { u64 i; struct btrfs_ioctl_space_info *sp = sargs-spaces; @@ -128,8 +137,8 @@ static void print_df(struct btrfs_ioctl_space_args *sargs) printf(%s, %s: total=%s, used=%s\n, group_type_str(sp-flags), group_profile_str(sp-flags), - pretty_size(sp-total_bytes), - pretty_size(sp-used_bytes)); + pretty_size_mode(sp-total_bytes, unit_mode), + pretty_size_mode(sp-used_bytes, unit_mode)); } } @@ -183,33 +192,83 @@ static int get_df(int fd, struct btrfs_ioctl_space_args **sargs_ret) static int cmd_filesystem_df(int argc, char **argv) { - struct btrfs_ioctl_space_args *sargs = NULL; - int ret; - int fd; - char *path; - DIR *dirstream = NULL; + struct btrfs_ioctl_space_args *sargs = NULL; + int ret; + int fd; + char *path; + DIR *dirstream = NULL; + unsigned unit_mode = UNITS_DEFAULT; - if (check_argc_exact(argc, 2)) - usage(cmd_filesystem_df_usage); + optind = 1; + while (1) { + int long_index; + static const struct option long_options[] = { + { raw, no_argument, NULL, 'b
Re: btrfs receive: could not find parent subvolume
On Thu, Sep 18, 2014 at 08:27:18AM -0700, Marc MERLIN wrote: While debugging a btrfs send/receive slow problem, I now getting this: legolas:/mnt/btrfs_pool1# btrfs send -p tmp_ggm_daily_ro.20140917_06:29:58 tmp_ggm_daily_ro.20140918_02:48:24 | ssh gargamel btrfs receive -v /mnt/btrfs_pool2/backup/debian64/legolas At subvol tmp_ggm_daily_ro.20140918_02:48:24 At snapshot tmp_ggm_daily_ro.20140918_02:48:24 receiving snapshot tmp_ggm_daily_ro.20140918_02:48:24 uuid=5d1f0454-1be3-b648-9ea5-dc427cd62d98, ctransid=310713 parent_uuid=d86e69bf-e17f-7f4c-bfb7-e571d5824687, parent_ctransid=308332 ERROR: could not find parent subvolume The parent is there on the other side, but UUID is different: gargamel:/mnt/btrfs_pool2/backup/debian64/legolas# btrfs subvolume show tmp_ggm_daily_ro.20140917_06:29:58 /mnt/btrfs_pool2/backup/debian64/legolas/tmp_ggm_daily_ro.20140917_06:29:58 Name: tmp_ggm_daily_ro.20140917_06:29:58 uuid: 3d424a2b-69da-244c-bfcc-c283f9cc1f34 Parent uuid:05d3b9be-bfe2-bb4a-9f6a-64b9d44896c7 Creation time: 2014-09-17 06:30:01 Object ID: 7873 Generation (Gen): 83621 Gen at creation:83476 Parent: 263 Top Level: 263 Flags: - Snapshot(s): Now it seems that the UUID is different on all my snapshots created by btrfs send, so maybe it doesn't match UUID? Given that, what is btrfs receive using to get a match? There's a recieved UUID field on each subvolume. I posted a patch to userspace a couple of weeks ago which adds a -R option to show it. I don't think it's filtered through David's backlog yet. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I will not be pushed, filed, stamped, indexed, briefed, --- debriefed or numbered. My life is my own. signature.asc Description: Digital signature
Re: lvm thin provisioning snapshots and btrfs
/dev/dm-2 [ 195.495648] btrfs: device fsid 44c76cc5-5d03-4f02-af5f-2028e61e09fa devid 1 transid 38 /dev/mapper/vg00_th-lv_root_140924 [ 1171.952393] btrfs: device fsid 44c76cc5-5d03-4f02-af5f-2028e61e09fa devid 1 transid 38 /dev/mapper/vg00_th-lv_root_140924 This e-mail and any attachments may contain information that is confidential and proprietary and otherwise protected from disclosure. If you are not the intended recipient of this e-mail, do not read, duplicate or redistribute it by any means. Please immediately delete it and any attachments and notify the sender that you have received it in error. Unintended recipients are prohibited from taking action on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Welcome to Rivendell, Mr Anderson... --- signature.asc Description: Digital signature
Re: BTRFS backup questions
be open source, and that it will be more useful to me with community support. If anyone is interested in participating, or even just using it, please let me know. Thanks to everyone who has worked on BTRFS so far ;-) James -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- How do you become King? You stand in the marketplace and --- announce you're going to tax everyone. If you get out alive, you're King. signature.asc Description: Digital signature
Re: BTRFS backup questions
On Sat, Sep 27, 2014 at 06:33:58PM +0200, James Pharaoh wrote: On 27/09/14 18:17, Hugo Mills wrote: On Sat, Sep 27, 2014 at 05:39:07PM +0200, James Pharaoh wrote: 2. Duplicating NOCOW files This is obviously possible, since it takes place when you make a snapshot. So why can't I create a clone of a snapshot of a NOCOW file? I am hoping the answer to this is that it is possible but not implemented yet... Umm... you should be able to, I think. Well I've tried with the haskell btrfs library, using clone, and also using cp --reflink=auto. Here's an example using cp: root@host:/btrfs# btrfs subvolume snapshot -r src dest Create a readonly snapshot of 'src' in './dest' root@host:/btrfs# cp --reflink dest/test test cp: failed to clone 'test' from 'dest/test': Invalid argument Are you trying to cross a mount-point with that? It works for me: hrm@amelia:/media/btrfs/amelia/test $ sudo btrfs sub create bar Create subvolume './bar' hrm@amelia:/media/btrfs/amelia/test $ sudo dd if=/dev/zero of=bar/data bs=1024 count=500 500+0 records in 500+0 records out 512000 bytes (512 kB) copied, 0.0047491 s, 108 MB/s hrm@amelia:/media/btrfs/amelia/test $ sudo btrfs sub snap -r bar foo Create a readonly snapshot of 'bar' in './foo' hrm@amelia:/media/btrfs/amelia/test $ sudo cp --reflink=always bar/data bar-data hrm@amelia:/media/btrfs/amelia/test $ sudo cp --reflink=always foo/data foo-data hrm@amelia:/media/btrfs/amelia/test $ ls -l total 1000 drwxr-xr-x 1 root root 8 Sep 27 17:55 bar -rw-r--r-- 1 root root 512000 Sep 27 17:57 bar-data drwxr-xr-x 1 root root 8 Sep 27 17:55 foo -rw-r--r-- 1 root root 512000 Sep 27 17:57 foo-data [snip] 3. Peformance penalty of fragmentation on SSD systems with lots of memory There are two performance problems with fragmentation -- seek time to find the fragments (which affects only rotational media), and the amount of time taken to manage the fragments. As the number of fragments increases, so does the number of extents that the FS has to keep track of. Ultimately, with very fragmented files, this will have an effect, as the metadata size will increase hugely. Ok so this sounds like the answer I wanted to hear ;-) Presumably so long as the load is not too great, and I run the occasional defrag, then this shouldn't be much to worry about then? Be aware that the current implementation of (manual) defrag will separate the shared extents, so you no longer get the deduplication effect. There was a snapshot-aware defrag implementation, but it caused filesystem corruption, and has been removed for now until a working version can be written. I think Josef was working on this. 4. Generations and tree structures I am planning to use lots more clever tricks which I think should be available in BTRFS, but I can't see much documentation. Can anyone point out any good examples or documentation of how to access the tree structures directly. I'm particularly interested in finding changed files and portions of files using the generations and the tree search. You need the TREE SEARCH ioctl -- that gives you direct access to all the internal trees of the FS. There's some documentation on the wiki about how these fit together: https://btrfs.wiki.kernel.org/index.php/Data_Structures https://btrfs.wiki.kernel.org/index.php/Trees What tricks are you thinking of, exactly? Principally I want to be able to detect exactly what has changed, so that I can perform backups very quickly. I want to be able to update a small portion of a large file and then identify exactly which parts changed and only back those up, for example. send/receive does this. [snip] Are you aware of btrfs send/receive? It should allow you to do all of this. The main part of the code then comes down to managing the send/receive, and all the distributed error handling. Then the only direct access to the internal metadata you need is being able to read UUIDs to work out what you have on each side -- which can also be done by btrfs sub list. Yes, this is one of my main inspirations. The problem is that I am pretty sure it won't handle deduplication of the data. It does. That's one of the things it's explicitly designed to do. I'm planning to have a LOT of containers running the same stuff, on fast (expensive) SSD media, and deduplication is essential to make that work properly. I can already see huge savings from this. As far as I can tell, btrfs send/receive operates on a subvolume basis, and any shared data between those subvolumes is duplicated if you copy them separately. Not so. You can tell send that there are subvolumes with known IDs on the receive side, using the -c option (arbitrarily many subvols). If the subvol you are sending (on the send side) shares extents with any of those, then the data is not sent -- just a reference to it. On the receive side, if that happens, the shared extents are reconstructed
Re: Fwd: Deleting a Subvol from a Cancelled Btrfs-Send
On Thu, Oct 02, 2014 at 12:05:39AM -0500, Justin Brown wrote: I'm experimenting with btrfs-send. Previously (2014-09-26), I did my first btrfs-send on a subvol, and that worked fine. Today, I tried to send a new snapshot. Unfortunately, I realized part way through that I forgot to specify the parent to only send a delta, and killed the send with ^C. On the destination, I'm left with: ~$ sudo btrfs subvol list /var/media/backups/venus/home/ ID 2820 gen 57717 top level 5 path media ID 2821 gen 57402 top level 5 path ovirt ID 4169 gen 57703 top level 2820 path media/backups/venus/home ID 4170 gen 57575 top level 4169 path home-2014-09-26 ID 4243 gen 57707 top level 4169 path home-2014-10-01 Home-2014-10-01 was the partial send that was cancelled. I figured that I could delete this partial subvol and try again. ~$ sudo btrfs subvol del home-2014-10-01 Transaction commit: none (default) ERROR: error accessing 'home-2014-10-01' If you're not doing this from /var/media/backups/venus/home/ it won't succeed. You need to specify (either via a relative path or an absolute one) where the subvol is, not just what its name is. (Consider what happens if you have two filesystems, each with a home-2014-09-26 subvol.) Hugo. Obviously, trying to delete the subvol directory fails too: ~$ sudo rm -rf /var/media/backups/venus/home/home-2014-10-01/ rm: cannot remove ‘/var/media/backups/venus/home/home-2014-10-01/’: Operation not permitted Is there anyway to delete this partial subvol? Thanks, Justin -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- All hope abandon, Ye who press Enter here. --- signature.asc Description: Digital signature
Re: FIBMAP unsupported
On Thu, Oct 02, 2014 at 07:25:49PM +0200, David Sterba wrote: On Thu, Oct 02, 2014 at 05:13:22PM +0200, Marc Dietrich wrote: I have a large (25G) virtual disk on a btrfs fs. Yes, I know this is not optimial. So I try to defrag it from time to time. However, using btrfs fi defrag -c vm.vdi results in even more fragments than before (reported by filefrag). So I wrote my own pseudo defragger, Unfortunatelly the default target fragment size is 256k. Try 'btrfs filesystem defrag -t 32m ...' or higher numbers and see if it helps. Note also that a compressed file will have fragments on the scale of about 128k reported by filefrag, because of the way that the compression works. The file may actually be contiguous, but filefrag won't know about it. (At least, that's historically been the case. I don't know if filefrag has recently grown some extra knowledge of compressed extents.) Hugo. which produces much better results (ok, the file must not be in use). Somewhere in the 3.17 cycle the resulting image got corrupted using the script above. Running filefrag on it returns FIBMAP unsupported. This message doe not mean it is a corruption, but filefrag tries to use the FIBMAP ioctl that is not implemented on btrfs, instead FIEMAP is used. filefrag on a nocow file works for me here (3.16.x kernel), I can see that filefrag on a directory prints the FIBMAP message. Virtualbox returns AHCI#0P0: Read at offset 606236672 (49152 bytes left) returned rc=VERR_DEV_IO_ERROR. No errors in the kernel log. Trying cp vm.vdi /dev/null returns: cp: Error reading „vm.vdi“: IO-Error This could be caused by the virtualization layer. Try to run scrub and fsck in the non-destru^Wchecking mode if it finds problems. As you're using compression and autodefrag, a quick skim of the 3.17 patches points to e9512d72e8e61c750c90efacd720abe3c4569822 fix autodefrag with compression, but that's just keyword match. There's another report about nocow corruption and VirtualBox in a 3.16 + for-linus version (which is almost 3.17-rc) http://article.gmane.org/gmane.comp.file-systems.btrfs/38701/ But according to the attached messages, the underlying device is unreliable and logs a lot of IO errors. For now it looks like VirutalBox is not writing the data or there is a bug introduced post 3.16 killing nocow files. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them. signature.asc Description: Digital signature
Re: Identify mounted subvolume
On Tue, Oct 07, 2014 at 10:47:56AM +0200, Juan Orti Alcaine wrote: I cannot find the answer to this one. How can I determine which subvolume I have mounted in a certain path? I'm looking through /sys but no clue. Rumour has it that /proc/self/mountinfo is meant to have the information, but I've just checked, and it doesn't seem to have the subvol in it on my server (3.16.2). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Unix: For controlling fungal diseases in crops --- signature.asc Description: Digital signature
Re: What is the vision for btrfs fs repair?
On Thu, Oct 09, 2014 at 11:53:23AM +, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. Scrub checks both copies, though. It's ordinary reads that don't. Hugo. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. Other than that detail, what you posted matches my knowledge and experience, such as it may be as a non-dev list regular, as well. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 7: The Simple Truth --- signature.asc Description: Digital signature
Re: What is the vision for btrfs fs repair?
On Thu, Oct 09, 2014 at 08:07:51AM -0400, Austin S Hemmelgarn wrote: On 2014-10-09 07:53, Duncan wrote: Austin S Hemmelgarn posted on Thu, 09 Oct 2014 07:29:23 -0400 as excerpted: Also, you should be running btrfs scrub regularly to correct bit-rot and force remapping of blocks with read errors. While BTRFS technically handles both transparently on reads, it only corrects thing on disk when you do a scrub. AFAIK that isn't quite correct. Currently, the number of copies is limited to two, meaning if one of the two is bad, there's a 50% chance of btrfs reading the good one on first try. If btrfs reads the good copy, it simply uses it. If btrfs reads the bad one, it checks the other one and assuming it's good, replaces the bad one with the good one both for the read (which otherwise errors out), and by overwriting the bad one. But here's the rub. The chances of detecting that bad block are relatively low in most cases. First, the system must try reading it for some reason, but even then, chances are 50% it'll pick the good one and won't even notice the bad one. Thus, while btrfs may randomly bump into a bad block and rewrite it with the good copy, scrub is the only way to systematically detect and (if there's a good copy) fix these checksum errors. It's not that btrfs doesn't do it if it finds them, it's that the chances of finding them are relatively low, unless you do a scrub, which systematically checks the entire filesystem (well, other than files marked nocsum, or nocow, which implies nocsum, or files written when mounted with nodatacow or nodatasum). At least that's the way it /should/ work. I guess it's possible that btrfs isn't doing those routine bump-into-it-and-fix-it fixes yet, but if so, that's the first /I/ remember reading of it. I'm not 100% certain, but I believe it doesn't actually fix things on disk when it detects an error during a read, I'm fairly sure it does, as I've had it happen to me. :) I know it doesn't it the fs is mounted ro (even if the media is writable), because I did some testing to see how 'read-only' mounting a btrfs filesystem really is. If the FS is RO, then yes, it won't fix things. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: Interview with the Umpire --- signature.asc Description: Digital signature
Re: unexplainable corruptions 3.17.0
On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote: On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote: Recently I've observed some corruptions to systemd's journal files which are somewhat puzzling. This is especially worrying as this is btrfs raid1 setup and I expected auto-healing. System details: 3.17.0-301.fc21.x86_64 btrfs: raid1 over 2x dm-crypted 6TB HDDs. mount opts: rw,relatime,seclabel,compress=lzo,space_cache Reads with cat, hexdump fails with: read(4, 0x1001000, 65536) = -1 EIO (Input/output error) Does scrub work for you? As there seem to be no way to scrub individual files, I've started scrub of full volume. It will take some hours to finish. Meanwhile, could you satisfy my curiosity what would scrub do that wouldn't be done by just reading the whole file? It checks both copies. Reading the file will only read one of the copies of any given block (so if that's good and the other copy is bad, it won't fix anything). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The future isn't what it used to be. --- signature.asc Description: Digital signature
Re: Uh, 1COW?... what happens when someone does this...
On Wed, Oct 22, 2014 at 12:41:10PM -0700, Robert White wrote: So I've been considering some NOCOW files (for VM disk images), but some questions arose. IS there a 1COW (copy on write only once) flag or are the following operations dangerous or undefined? (1) The page https://btrfs.wiki.kernel.org/index.php/FAQ (section Can copy-on-write be turned off for data blocks?) says COW may still happen if a snapshot is taken. Is that a may or a will, e.g. if I take a snapshot and then start the VM will the file in the snapshot still be frozen or will it update as I alter the VM? Does the read-only-or-not status of the snapshot matter in this outcome? e.g. what does may mean in that section? If you take a snapshot of something, then any write to that (the original or the copy) will cause it to be CoWed once. Subsequent writes to the same area of the same file will go back to nodatacow. (2) If you copy a file using cp --reflink and the destination is in a directory marked NOCOW, what happens? How about when the resultant file is modified in place? Same thing as above. (3) when using a watever.qcow2 virtual machine image that does copy-on-write in the VM (such as QEMU) is it better, worse, or a no-op to have the NOCOW flag set on the file? All the advice on this matter I can find in Google seems to be VM images bad, but will be addressed soon and its old enough that I don't know if soon has come to pass. It seems like there is a 1COW flag implicit somewhere. I wouldn't put it in those words, but yes, a single CoW operation occurs on writes to data with nodatacow set. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold. signature.asc Description: Digital signature
Re: NOCOW and Swap Files?
On Wed, Oct 22, 2014 at 01:08:48PM -0700, Robert White wrote: So the documentation is clear that you can't mount a swap file through BTRFS (unless you use a loop device). Why isn't a NOCOW file that has been fully pre-allocated -- as with fallocate(1) -- not suitable for swapping? I found one reference to an unimplemented feature necessary for swap, but wouldn't it be reasonable for that feature to exist for NOCOW files? (or does this relate to my previous questions about the COW operation that happens after a snapshot?) The original swap implementation worked by determining a list of blocks (well, I guess extents) using fiemap, and passing that to the swap code for it to use. This is fine, as long as (a) nobody else writes to the file, and (b) the blocks comprising the file don't move elsewhere. Part (a) can be done with normal permissions, so that's not a problem. Part (b) is more tricky -- not because of CoW (because the writes from the swap code go directly to the device, ignoring the FS), but because the FS's idea of where the file lives on the device can move -- balance will do this, for example. So you can't balance a filesystem with any active swapfiles on it. This is the main reason that swapfiles aren't allowed on btrfs, as far as I know. The new code is the swap-on-NFS infrastructure, which indirects swapfile accesses through the filesystem code. The reason you have to do that with NFS is because NFS doesn't expose a block device at all, so you can't get a list of blocks on an underlying device because there isn't one. Indirecting the accesses through the filesystem, however, allows us to side-step btrfs's problems with part (b) above, and in theory gives us swapfile capability. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 7: The Simple Truth --- signature.asc Description: Digital signature
Re: NOCOW and Swap Files?
On Wed, Oct 22, 2014 at 01:39:58PM -0700, Robert White wrote: On 10/22/2014 01:25 PM, Hugo Mills wrote: The new code is the swap-on-NFS infrastructure, which indirects swapfile accesses through the filesystem code. The reason you have to do that with NFS is because NFS doesn't expose a block device at all, so you can't get a list of blocks on an underlying device because there isn't one. Indirecting the accesses through the filesystem, however, allows us to side-step btrfs's problems with part (b) above, and in theory gives us swapfile capability. I was not even aware there was new code on the matter. Is there a guide or whatever to doing this? I didn't see any mention of it in the places Google led me. swap-on-NFS is still, I think, in a set of out of tree patches, and it's not gone anywhere near btrfs yet. It's just that once it does land in mainline, it would form the appropriate infrastructure to develop swapfile capability for btrfs. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 7: The Simple Truth --- signature.asc Description: Digital signature
Re: suspicious number of devices: 72057594037927936
On Mon, Oct 27, 2014 at 11:21:13AM -0700, Christian Kujau wrote: On Mon, 27 Oct 2014 at 16:35, David Sterba wrote: Yeah sorry, I sent the v2 too late, here's an incremental that applies on top of current 3.18-rc https://patchwork.kernel.org/patch/5160651/ Yup, that fixes it. Thank you! If it's needed: Tested-by: Christian Kujau li...@nerdbynature.de @Filipe: and thanks for warning me about 3.17 - I used 3.17.0 since it came out and compiled kernels on the btrfs partition and haven't had any issues. But it wasn't used very often, so whatever the serious issues were, I haven't experienced any. If you make read-only snapshots, there's a good chance of metadata corruption. It's fixed in 3.17.2. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Our so-called leaders speak/with words they try to jail ya/ --- They subjugate the meek/but it's the rhetoric of failure. signature.asc Description: Digital signature
Re: which subvolume is mounted?
On Fri, Oct 31, 2014 at 09:23:27AM -0700, Rich Turner wrote: let’s first assume the contents of /etc/fstab are either not used or invalid in mounting the subvolumes. given the following ‘df’ command, how do i know which subvolume of the btrfs filesystem on /dev/sda3 is mounted at each mount point (/, /var, /opt, /home)? i would have expected to see the mount option used to define the subvolume (subvolid or subvol option) in /proc/mounts. I already answered this on IRC, but just for the record (and to test whether my mail is working): it's in /proc/self/mountinfo. The fourth field of each line is the location that the mountpoint came from in the filesystem, which is what you're after. Hugo. # df Filesystem 1K-blocksUsed Available Use% Mounted on /dev/sda36839296 1698564 4903212 26% / devtmpfs 501464 0501464 0% /dev tmpfs 507316 0507316 0% /dev/shm tmpfs 5073166720500596 2% /run tmpfs 507316 0507316 0% /sys/fs/cgroup /dev/sda36839296 1698564 4903212 26% /var /dev/sda36839296 1698564 4903212 26% /opt /dev/sda36839296 1698564 4903212 26% /home /dev/sda1 517868 93040424828 18% /boot # btrfs subvolume list -a --sort=+rootid / ID 257 gen 7800 top level 5 path FS_TREE/root ID 258 gen 4127 top level 5 path FS_TREE/home ID 259 gen 7801 top level 5 path FS_TREE/var ID 260 gen 7795 top level 5 path FS_TREE/opt # uname -a Linux turner11.storix 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.12 # btrfs fi show Label: rhel_turner11 uuid: cd3c0e50-d726-44e2-9bfa-19b11614136a Total devices 1 FS bytes used 1.62GiB devid1 size 6.52GiB used 2.24GiB path /dev/sda3 Btrfs v3.12 # btrfs fi df / Data, single: total=1.98GiB, used=1.58GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=264.00MiB, used=36.03MiB -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Your problem is that you've got too much taste to be --- a web developer signature.asc Description: Digital signature
Re: which subvolume is mounted?
On Fri, Oct 31, 2014 at 09:23:27AM -0700, Rich Turner wrote: let’s first assume the contents of /etc/fstab are either not used or invalid in mounting the subvolumes. given the following ‘df’ command, how do i know which subvolume of the btrfs filesystem on /dev/sda3 is mounted at each mount point (/, /var, /opt, /home)? i would have expected to see the mount option used to define the subvolume (subvolid or subvol option) in /proc/mounts. It's in /proc/self/mountinfo -- look at the fourth field in the table, which is the subvolume within the FS. Completely non-obvious, I'm afraid, but at least it is there. (Apparently there were some problems with getting the information into /proc/mounts). Hugo. # df Filesystem 1K-blocksUsed Available Use% Mounted on /dev/sda36839296 1698564 4903212 26% / devtmpfs 501464 0501464 0% /dev tmpfs 507316 0507316 0% /dev/shm tmpfs 5073166720500596 2% /run tmpfs 507316 0507316 0% /sys/fs/cgroup /dev/sda36839296 1698564 4903212 26% /var /dev/sda36839296 1698564 4903212 26% /opt /dev/sda36839296 1698564 4903212 26% /home /dev/sda1 517868 93040424828 18% /boot # btrfs subvolume list -a --sort=+rootid / ID 257 gen 7800 top level 5 path FS_TREE/root ID 258 gen 4127 top level 5 path FS_TREE/home ID 259 gen 7801 top level 5 path FS_TREE/var ID 260 gen 7795 top level 5 path FS_TREE/opt # uname -a Linux turner11.storix 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.12 # btrfs fi show Label: rhel_turner11 uuid: cd3c0e50-d726-44e2-9bfa-19b11614136a Total devices 1 FS bytes used 1.62GiB devid1 size 6.52GiB used 2.24GiB path /dev/sda3 Btrfs v3.12 # btrfs fi df / Data, single: total=1.98GiB, used=1.58GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=264.00MiB, used=36.03MiB -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: Silly Point Break --- signature.asc Description: Digital signature
Re: request for info on the list of parameters to tweak for PCIe SSDs
On Fri, Oct 31, 2014 at 03:56:15AM -0700, lakshmi_narayanan...@dell.com wrote: Hi, Could you kindly help us with the list of all the btrfs file system parameters , that can be tweaked for the best performance of the PCIe based SSDs ? It should detect the ssd option automatically, but it doesn't hurt to specify explicitly. You can use the discard option, but I would only recommend it if the SSD supports queued TRIM -- the older unqueued TRIM can cause massive performance problems. You may want to use autodefrag, which has less of an effect than it would on a rotational disk, but will still help to keep the number of extents down. That will reduce the metadata overhead. That's about all there is for btrfs-specific options, I think. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- make bzImage, not war --- signature.asc Description: Digital signature
Re: Kernel 3.17.2 and RO snapshots
On Sun, Nov 02, 2014 at 11:42:58AM +0100, Swâmi Petaramesh wrote: Hi there, I'm a little lost with latest kernel issues, and would like to know if the data corruption with RO snapshots is fixed in 3.17.2, or not yet ? Yes, it is. Also about this issuse, I'd like to know if : - Data corruption occurs when creating RO snapshots, Yes. - Or if it can occur with just using a FS that already has RO snapshots ? No. - Or if it can occur when deleting existing RO snapshots with 3.17 ? Not that I'm aware of. Which means : If I have mistakenty upgraded to 3.17.2 a system containing RO snapshots and cannot downgrade (big distro and drivers mess...), what is the safest way to go ? Leave it as it is. :) And if data corruption occurs, should I expect it to affect snapshots only, or the whole system ? I _think_ it only affects the snapshots, but could bring the system down if you access the broken data. But you're safe with 3.17.2, so it's a moot point for you right now. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If it's December 1941 in Casablanca, what time is it --- in New York? signature.asc Description: Digital signature
Re: Compatibility matrix kernel/tools
On Wed, Nov 05, 2014 at 09:57:31PM +0100, Cyril Scetbon wrote: Hi, Where can I find the compatibility matrix to know which btrfs-tools version should work with a chosen linux kernel ? Any of them should work with any kernel. For normal operation, if the tools are too old, they may not support newer kernel features -- but that will simply mean you can't access the feature, not that anything will be broken. If you're doing recovery work (btrfs check and friends) then using the latest released version of the tools is strongly recommended. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Putting U back in Honor, Valor, and Trth --- signature.asc Description: Digital signature
Re: btrfs balance fails with no space errors (despite having plenty)
On Wed, Nov 12, 2014 at 06:48:47PM +, Kline, Matthew wrote: Yesterday I converted my ext4 root and home partitions on my home machine to btrfs using btrfs-convert. After confirming that everything went well, I followed the wiki instructions to nuke the 'ext2_saved subvolume, then defraggad and rebalanced. Everything went according to plan on my root partition, but my home partition claimed to have run out of space when rebalancing. I did some digging and tried the following to resolve the problem, but to no avail. So far I have: - Made sure I deleted the subvolume (a common cause of this problem). `sudo btrfs subvolume list -a /home` exits with no output. - Made sure I defragged the newly converted btrfs partition before attempting to rebalance it. - Made sure that I actually have space on the partition. It is only about 60% full - see sdb1 below: ~ % sudo btrfs fi show Label: none uuid: 3a154348-9bd4-4c3f-aaf8-e9446d3797db Total devices 1 FS bytes used 9.49GiB devid1 size 87.47GiB used 11.03GiB path /dev/sda3 Label: none uuid: cc90ee50-bbda-46d6-a7e6-fe1c8578d75b Total devices 1 FS bytes used 124.98GiB devid1 size 200.00GiB used 127.03GiB path /dev/sdb1 - Made sure I have metadata space (as suggested on the problem FAQ): ~ % sudo btrfs fi df /home Data, single: total=126.00GiB, used=124.58GiB System, single: total=32.00MiB, used=20.00KiB Metadata, single: total=1.00GiB, used=404.27MiB GlobalReserve, single: total=136.00MiB, used=0.00B - Ran partial rebalances using the `-dusage` flag (as suggested on the problem FAQ), which successfully balanced a handful of blocks. - Checked the system log - nothing interesting comes up. btrfs happily chugs along with found xxx extents and relocating block group x flags 1 messages before unceremoniously ending with 7 enospc errors during balance. In spite of all of this, a full rebalance still fails when it's about 95% done. I'm at a complete loss as to what could be causing it - I know that it's not completely necessary (especially with a single drive), and `btrfs scrub` finds no errors with the file system, but the wiki gives the impression that it's a good idea after you convert from ext. I'm fairly sure it's a bug. We've had several reports of something like this, particularly with respect to converted filesystems. I know that failed balances with apparently plenty of space are reasonably high up on josef's list of things to investigate. I would recommend at minimum putting this report on bugzilla.kernel.org as well. The other thing you could do, which may help with debugging, is to take a copy of the metadata: $ btrfs-image -c9 -t4 /dev/sdb1 /path/to/image.img and hang on to it just in case josef (or whoever else looks at it) needs another sample to debug with. Is there something I'm missing? I don't think so. This is probably a bug -- I'd guess in the FS (because we've seen something similar on non-converted FSes), but maybe set up more easily by the conversion process. Hugo. Other obligatory info: I'm on Arch Linux using btrfs 3.17.1. uname -a is Linux kline-arch 3.17.2-1-ARCH #1 SMP PREEMPT Thu Oct 30 20:49:39 CET 2014 x86_64 GNU/Linux -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Strive for apathy! --- signature.asc Description: Digital signature
Two persistent problems
Chris, Josef, anyone else who's interested, On IRC, I've been seeing reports of two persistent unsolved problems. Neither is showing up very often, but both have turned up often enough to indicate that there's something specific going on worthy of investigation. One of them is definitely a btrfs problem. The other may be btrfs, or something in the block layer, or just broken hardware; it's hard to tell from where I sit. Problem 1: ENOSPC on balance This has been going on since about March this year. I can reasonably certainly recall 8-10 cases, possibly a number more. When running a balance, the operation fails with ENOSPC when there's plenty of space remaining unallocated. This happens on full balance, filtered balance, and device delete. Other than the ENOSPC on balance, the FS seems to work OK. It seems to be more prevalent on filesystems converted from ext*. The first few or more reports of this didn't make it to bugzilla, but a few of them since then have gone in. Problem 2: Unexplained zeroes Failure to mount. Transid failure, expected xyz, have 0. Chris looked at an early one of these (for Ke, on IRC) back in September (the 27th -- sadly, the public IRC logs aren't there for it, but I can supply a copy of the private log). He rapidly came to the conclusion that it was something bad going on with TRIM, replacing some blocks with zeroes. Since then, I've seen a bunch of these coming past on IRC. It seems to be a 3.17 thing. I can successfully predict the presence of an SSD and -odiscard from the have 0. I've successfully persuaded several people to put this into bugzilla and capture btrfs-images. btrfs recover doesn't generally seem to be helpful in recovering data. I think Josef had problem 1 in his sights, but I don't know if additional images or reports are helpful at this point. For problem 2, there's obviously something bad going on, but there's not much else to go on -- and the inability to recover data isn't good. For each of these, what more information should I be trying to collect from any future reporters? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great films about cricket: Forrest Stump --- signature.asc Description: Digital signature
Re: Two persistent problems
On Fri, Nov 14, 2014 at 05:00:26PM -0500, Josef Bacik wrote: On 11/14/2014 04:51 PM, Hugo Mills wrote: Chris, Josef, anyone else who's interested, On IRC, I've been seeing reports of two persistent unsolved problems. Neither is showing up very often, but both have turned up often enough to indicate that there's something specific going on worthy of investigation. One of them is definitely a btrfs problem. The other may be btrfs, or something in the block layer, or just broken hardware; it's hard to tell from where I sit. Problem 1: ENOSPC on balance This has been going on since about March this year. I can reasonably certainly recall 8-10 cases, possibly a number more. When running a balance, the operation fails with ENOSPC when there's plenty of space remaining unallocated. This happens on full balance, filtered balance, and device delete. Other than the ENOSPC on balance, the FS seems to work OK. It seems to be more prevalent on filesystems converted from ext*. The first few or more reports of this didn't make it to bugzilla, but a few of them since then have gone in. Problem 2: Unexplained zeroes Failure to mount. Transid failure, expected xyz, have 0. Chris looked at an early one of these (for Ke, on IRC) back in September (the 27th -- sadly, the public IRC logs aren't there for it, but I can supply a copy of the private log). He rapidly came to the conclusion that it was something bad going on with TRIM, replacing some blocks with zeroes. Since then, I've seen a bunch of these coming past on IRC. It seems to be a 3.17 thing. I can successfully predict the presence of an SSD and -odiscard from the have 0. I've successfully persuaded several people to put this into bugzilla and capture btrfs-images. btrfs recover doesn't generally seem to be helpful in recovering data. I think Josef had problem 1 in his sights, but I don't know if additional images or reports are helpful at this point. For problem 2, there's obviously something bad going on, but there's not much else to go on -- and the inability to recover data isn't good. For each of these, what more information should I be trying to collect from any future reporters? So for #2 I've been looking at that the last two weeks. I'm always paranoid we're screwing up one of our data integrity sort of things, either not waiting on IO to complete properly or something like that. I've built a dm target to be as evil as possible and have been running it trying to make bad things happen. I got slightly side tracked since my stress test exposed a bug in the tree log stuff an csums which I just fixed. Now that I've fixed that I'm going back to try and make the expected blah, have 0 type errors happen. I've searched the bugzilla archive and found the two reports that I know of (87061 and 87021); I couldn't see any others. I've requested more information on both -- nothing obviously in common, except SSD and (probably) discard. I tried to tag them both with trim for easy finding, but that seems to have been lost somewhere. I'll try that again when I get home this evening and have access to my password. As for the ENOSPC I keep meaning to look into it and I keep getting distracted with other more horrible things. Ideally I'd like to reproduce it myself, so more info on that front would be good, like do all reports use RAID/compression/some other odd set of features? Thanks for taking care of this stuff Hugo, #2 is the worst one and I'd like to be absolutely sure it's not our bug, once I'm happy we aren't I'll look at the balance thing. OK, good to know you're on both of these. I think the easy solution to reproduce the ENOSPC is to convert an ext4 filesystem. It doesn't seem to be a unique characteristic, but it is a frequent correlation. We had another one today, after an FS conversion -- I've asked them to attach a btrfs-image dump and the enospc_debug log to the bugzilla report. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- 2 + 2 = 5, for sufficiently large values of 2. --- signature.asc Description: Digital signature
Re: Two persistent problems
On Mon, Nov 17, 2014 at 11:59:48AM +0100, Konstantin wrote: Josef Bacik wrote on 14.11.2014 at 23:00: On 11/14/2014 04:51 PM, Hugo Mills wrote: [snip] Problem 2: Unexplained zeroes Failure to mount. Transid failure, expected xyz, have 0. Chris looked at an early one of these (for Ke, on IRC) back in September (the 27th -- sadly, the public IRC logs aren't there for it, but I can supply a copy of the private log). He rapidly came to the conclusion that it was something bad going on with TRIM, replacing some blocks with zeroes. Since then, I've seen a bunch of these coming past on IRC. It seems to be a 3.17 thing. I can successfully predict the presence of an SSD and -odiscard from the have 0. I've successfully persuaded several people to put this into bugzilla and capture btrfs-images. btrfs recover doesn't generally seem to be helpful in recovering data. [snip] So for #2 I've been looking at that the last two weeks. I'm always paranoid we're screwing up one of our data integrity sort of things, either not waiting on IO to complete properly or something like that. I've built a dm target to be as evil as possible and have been running it trying to make bad things happen. I got slightly side tracked since my stress test exposed a bug in the tree log stuff an csums which I just fixed. Now that I've fixed that I'm going back to try and make the expected blah, have 0 type errors happen. [snip] For #2, I had a strangely damaged BTRFS I reported a week or so ago which may have similar background. Dmesg gives: parent transid verify failed on 586239082496 wanted 13329746340512024838 found 588 BTRFS: open_ctree failed The thing is that btrfsck crashes when trying to check this. As nobody seemed to be interested I reformatted this disk today. Whilst that's a genuine problem, it's not specifically the one I was referring to here, which shows up with want=X, have=0 from btrfs check, and seems to be related to TRIM on SSDs. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Turning, pages turning in the widening bath, / The spine --- cannot bear the humidity. / Books fall apart; the binding cannot hold. / Page 129 is loosed upon the world. signature.asc Description: Digital signature
Re: btrfs filesystem show _exact_ freaking size?
On Tue, Nov 18, 2014 at 02:39:48AM -0800, Robert White wrote: Howdy, How does one get the exact size (in blocks preferably, but bytes okay) of the filesystem inside a partition? I know how to get the partition size, but that's not useful when shrinking a partition... So, for example, you successfully do btrfs filesystem resize -32G /dev/sdz2 now you've got some space zero idea how many sectors can be trimmed off the end of the partition, you can do the math but thats a little iffy, especially if the file system didn't originally fill the partition to begin with. The current methodology for most such actions is to way over-trim the file system, then reallocate the space using your partition tool of choice, then re-grow the filesystem to fit. This has been the way of things forever and it blows... There needs to be an option to btrfs filesystem show that will tell you Xblocks, not Y.ZZ terabytes. The 3.17 userspace tools should now support flags to select the display units in some detail, including bytes. Even without that, though, for your use case, I would recommend shrinking the FS by *more* than you wanted to shrink, resizing the partition, and then resizing the FS back up to fit the partition exactly (with btrfs fi resize n:max). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Stick them with the pointy end --- signature.asc Description: Digital signature
Re: btrfs send erroring...
On Thu, Nov 20, 2014 at 11:57:50AM -0500, Ken D'Ambrosio wrote: Hi! Trying to do a btrfs send, and failing with: root@khamul:~# btrfs send /biggie/BACKUP/ | btrfs receive /tmp/sdd1/ At subvol /biggie/BACKUP/ At subvol BACKUP ERROR: rename o2046806-17126-0 - volumes/ccdn-ch2-01 failed. No such file or directory This looks like one of several bugs that have been fixed recently. What kernel version and userspace tools version are you using? Hugo. Judging by disk capacity, it hits this about 40% of the way through. As my disk has subvolumes on it, which are underneath /biggie/BACKUP/, is there a different way I should go about sending an entire disk? Thanks! -Ken -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ©1973 Unclear Research Ltd --- signature.asc Description: Digital signature
Re: Fixing Btrfs Filesystem Full Problems typo?
On Sun, Nov 23, 2014 at 12:26:38AM +0100, Patrik Lundquist wrote: On 22 November 2014 at 23:26, Marc MERLIN m...@merlins.org wrote: This one hurts my brain every time I think about it :) I'm new to Btrfs so I may very well be wrong, since I haven't really read up on it. :-) So, the bigger the -dusage number, the more work btrfs has to do. Agreed. -dusage=0 does almost nothing -dusage=100 effectively rebalances everything And -dusage=0 effectively reclaims empty chunks, right? But saying saying less than 95% full for -dusage=95 would mean rebalancing everything that isn't almost full, But isn't that what rebalance does? Rewriting chunks =95% full to completely full chunks and effectively defragmenting chunks and most likely reduce the number of chunks. A -dusage=0 rebalance reduced my number of chunks from 1173 to 998 and dev_item.bytes_used went from 1593466421248 to 1491460947968. Now, just to be sure, if I'm getting this right, if your filesystem is 55% full, you could rebalance all blocks that have less than 55% space free, and use -dusage=55 I realize that I interpret the usage parameter as operating on blocks (chunks? are they the same in this case?) that are = 55% full while you interpret it as = 55% free. Which is correct? Less than or equal to 55% full. 0 gives you less than or equal to 0% full -- i.e. the empty block groups. 100 gives you less than or equal to 100% full, i.e. all block groups. A chunk is the part of a block group that lives on one device, so in RAID-1, every block group is precisely two chunks; in RAID-0, every block group is 2 or more chunks, up to the number of devices in the FS. A chunk is usually 1 GiB in size for data and 250 MiB for metadata, but can be smaller under some circumstances. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- And what rough beast, its hour come round at last / slouches --- towards Bethlehem, to be born? signature.asc Description: Digital signature
Re: Best GIT repository(s) for preparing patches?
On Sat, Nov 22, 2014 at 04:14:35PM -0800, Robert White wrote: Which is the best GIT repository to clone for each of the kernel support and btrfs-progs, for preparing a patch to submit to this email list? For kernel, I would suggest using a repo with Linus's latest -rc tag (Linus's, for example :) ). For userspace, probably the latest -rc tag from kdave's repo. See https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Are you the man who rules the Universe? Well, I --- try not to. signature.asc Description: Digital signature
Re: [RFC PATCH] Btrfs: add sha256 checksum option
On Mon, Nov 24, 2014 at 03:07:45PM -0500, Chris Mason wrote: On Mon, Nov 24, 2014 at 12:23 AM, Liu Bo bo.li@oracle.com wrote: This brings a strong-but-slow checksum algorithm, sha256. Actually btrfs used sha256 at the early time, but then moved to crc32c for performance purposes. As crc32c is sort of weak due to its hash collision issue, we need a stronger algorithm as an alternative. Users can choose sha256 from mkfs.btrfs via $ mkfs.btrfs -C 256 /device Agree with others about -C 256...-C sha256 is only three letters more ;) What's the target for this mode? Are we trying to find evil people scribbling on the drive, or are we trying to find bad hardware? You're going to need a hell of a lot more infrastructure to deal with the first of those two cases. If someone can write arbitrary data to your storage without going through the filesystem, you've already lost the game. I don't know what the stats are like for random error detection (probably just what you'd expect in the naive case -- 1/2^n chance of failing to detect an error for an n-bit hash). More bits likely are better for that, but how much CPU time do you want to burn on it? I could see this possibly being useful for having fewer false positives when using the inbuilt checksums for purposes of dedup. Hugo. -- Hugo Mills | That's not rain, that's a lake with slots in it hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: 65E74AC0 | signature.asc Description: Digital signature
Re: Moving contents from one subvol to another
On Sat, Nov 29, 2014 at 07:51:07PM +0530, Shriramana Sharma wrote: Hello. I am now taking the first steps to making my backup external HDD in BtrFS. From http://askubuntu.com/questions/119014/btrfs-subvolumes-vs-folders I understand that the only difference between subvolumes and ordinary folders is that the former can be snapshotted and independently mounted. But I have a question. I have two subvols test1, test2. $ cd test1 $ dd if=/dev/urandom of=file bs=1M count=500 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 36.2291 s, 14.5 MB/s $ time mv file ../test2/ real0m2.061s user0m0.013s sys 0m0.459s $ time { cp --reflink ../test2/file . rm ../test2/file ; } real0m0.677s user0m0.022s sys 0m0.086s $ mkdir foo $ time mv file foo/ real0m0.096s user0m0.008s sys 0m0.013s It seems that mv is not CoW aware and hence is not able to create reflinks so it is actually processing the entire file because it thinks test2 is a different device/filesystem/partition or such. Is this understanding correct? The latest version of mv should be able to use CoW copies to make it more efficient. It has a --reflink option, the same as cp. Note that you can't make reflinks crossing a mount boundary, but you can do so crossing a subvolume boundary (as you're doing here). So doing cp --reflink with rm is much faster. But it is still slower than doing mv within the same subvol. Is it because of the housekeeping with updating the metadata of the two subvols? I should think so, yes. Methinks --reflink option should be added to mv for the above usecase. Do people think this is useful? Why or why not? See above: it already has been. :) My concern is that if somebody wants to consolidate two subvols into one, though really only the metadata needs to be processed using ordinary mv isn't aware of this and using cp --reflink with rm is unnecessarily complicated, especially if it will involve multiple files. And it's not clear to me what it would entail to cp --reflink + rm an entire directory tree because IIUC I'd have to handle each file separately. Perhaps something (unnecessarily convoluted) like: find . | while read f do [ -d $f ] mkdir target/$f touch target/$f -r $f [ -f $f ] cp -a --reflink $f target/ rm $f done Again, what would happen to files which are not regular directories or files? Probably just the same thing that would happen without the --reflink=always. And why isn't --reflink given a single letter alias for cp? I don't know about that; you'll have to ask the coreutils developers. They're probably expecting it to be largely set to a single value by default (e.g. through a shall alias). Hugo. -- Hugo Mills | I will not be pushed, filed, stamped, indexed, hugo@... carfax.org.uk | briefed, debriefed or numbered. http://carfax.org.uk/ | My life is my own. PGP: 65E74AC0 |Number 6, The Prisoner signature.asc Description: Digital signature
Re: root subvol id is 0 or 5?
On Sun, Nov 30, 2014 at 09:01:37AM +0530, Shriramana Sharma wrote: I am confused with this: should I call it the root subvol or top-level subvol or default subvol or doesn't it matter? Are all subvols equal, or some are more equal than others [hark to Orwell's Animal Farm ;-)]? I try to use top level for subvolid=5. root subvol is hugely confusing, as it could be one of several things. If you mean the subvol mounted at /, then I call that / or the / subvol. default subvol is the one marked as default. This starts out as subvolid=5, but can be set to any other subvol. And more importantly, is the ID of the root subvol 0 or 5? In the data structures on disk, it's 5. The kernel aliases 0 to mean subvolid 5. The Oracle guide (https://docs.oracle.com/cd/E37670_01/E37355/html/ol_use_case3_btrfs.html) seems to say it's 0 : By default, the operating system mounts the parent btrfs volume, which has an ID of 0 but the BtrFS wiki (and btrfs subvol manpage) reads 5: every btrfs filesystem has a default subvolume as its initially top-level subvolume, whose subvolume id is 5(FS_TREE). as also the Ubuntu Wiki: The default subvolume to mount is always the top of the btrfs tree (subvolid=5). As above, both are correct here. Now this Oracle page http://www.oracle.com/technetwork/articles/servers-storage-admin/advanced-btrfs-1734952.html says: The only clean way to destroy the default subvolume is to rerun the mkfs.btrfs command, which would destroy existing data. OK, this is actually wrong. It's not the default subvolume if someone's run set-default on the FS. They're correct that you can't delete the top-level subvol. You can't delete the subvol marked as default, either. Assuming (or implying) that the two are the same is just plain wrong. So from what I've (confusedly) understood so far, 0 refers to the superstructure (or whatchamacallit) of the entire BtrFS-based contents of the device(s) and hence cannot be deleted but only reset by a mkfs.btrfs, but 5 is only the default subvol (mounted when the FS as a whole is mounted without subvol spec) provided by mkfs.btrfs, and subvol set-default can have another subvol mounted as default instead, after which 5 can actually be deleted? You can't delete subvolid=5. It's part of the fundamental whatchamacallit of the FS (a good name). Even if you change the default subvol, you still can't delete it. Hugo. -- Hugo Mills | People are too unreliable to be replaced by hugo@... carfax.org.uk | machines. http://carfax.org.uk/ | PGP: 65E74AC0 | Nathan Spring, Star Cops signature.asc Description: Digital signature