Re: Question of stability
On Sun, Sep 19, 2010 at 01:55:34AM +0200, Roy Sigurd Karlsbakk wrote: - Original Message - On Sat, Sep 18, 2010 at 11:37 PM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: Hi all I've been on this list for a year or so, and I have been following progress for some more. Are there any chances of btrfs stabilizing, as in terms of usability in production? If so, how far are we from this? Hi, I am using btrfs as my root filesystem on my Debian squeeze machine for a few month now and so far I haven't experienced any problems. It seems quite stable for me. I am not using raid functions, but am also very interested in the progress in raid5/6. I was more interested in large setups than a general install. Question remains, when is btrfs supposed to be stable, as in usable for large server setups? As has been pointed out by Anthony, there's no means of determining when something is stable -- not just for filesystems, but for any piece of software. All you can do is take a Bayesian approach: sum up the number (and type) of failures, and compare it to the number of user-hours that the software has been in use for, across all installations. When that failure rate (and recovery rate) reaches the point at which you're happy to use it in your situation -- whether that's on your bleeding-edge desktop test box, or for running your robotic heart surgeon -- you can call it stable. However, that point has to be your decision for your particular use case. If you're now thinking, but where do I get that information from?, congratulations -- you now know nearly as much about the user base as the btrfs developers. :) Your best bet is to keep an eye on this mailing list, and take a look at the number and type of reported failures. When that drops to the point that you feel safe, go ahead and use it. An alternative approach is to install a btrfs set-up on your internal development or test machines (you *do* have a test infrastructure for your mission critical systems, right?), and hammer it with the closest you can get to a real workload, and see what happens. Again, this is a statistical approach. It's the best we've got. At some point, we(*) hope, btrfs will have millions upon millions of users, doing all kinds of bad things to it, and tiny fractions of them will have problems. When that happens, someone will probably start calling it stable, and the name will stick. Until then, many people are happy with it for their uses, but nobody can (or will) magically stick a label on a piece of code of this complexity and say it's stable now! Hugo. (*) Speaking as an interested nobody, rather than a developer. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Be pure. Be vigilant. Behave. --- signature.asc Description: Digital signature
Some devices missing behaviour
I've just encountered some odd behaviour with regard to removed devices. Brief summary: - It's hard (in some sense) to tell a btrfs filesystem that a device has been removed permanently, and seems to require an unmount/remount, or resize to do so. - Removed devices break btrfs dev scan Details follow: # mkfs.btrfs -d raid10 -m raid10 -L btest /dev/primary/btrtest{1,2,3,4,5} # mount /dev/primary/btrtest1 /mnt # sudo btrfs fi show btest failed to read /dev/sr0 Label: 'btest' uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f Total devices 5 FS bytes used 28.00KB devid1 size 1.00GB used 276.00MB path /dev/dm-21 devid2 size 1.00GB used 136.00MB path /dev/dm-22 devid3 size 1.00GB used 136.00MB path /dev/dm-23 devid4 size 1.00GB used 264.00MB path /dev/dm-24 devid5 size 1.00GB used 264.00MB path /dev/dm-25 Btrfs v0.19-35-g1b444cd All well and good so far. # btrfs dev del /dev/primary/btrtest4 /mnt # btrfs fi show btest failed to read /dev/sr0 Label: 'btest' uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f Total devices 5 FS bytes used 100.13MB devid1 size 1.00GB used 190.38MB path /dev/dm-21 devid2 size 1.00GB used 170.38MB path /dev/dm-22 devid3 size 1.00GB used 170.38MB path /dev/dm-23 devid5 size 1.00GB used 170.38MB path /dev/dm-25 *** Some devices missing Btrfs v0.19-35-g1b444cd Now, it's claiming that some devices are missing, but what if I wanted to make this a permanent change? Say, the additional device was one added temporarily to the array as part of a migration to new hardware? On IRC, it was suggested that a rescan would fix it: # btrfs dev scan Scanning for Btrfs filesystems failed to read /dev/sr0 # btrfs fi show btest failed to read /dev/sr0 Label: 'btest' uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f Total devices 5 FS bytes used 100.13MB devid1 size 1.00GB used 190.38MB path /dev/dm-21 devid2 size 1.00GB used 170.38MB path /dev/dm-22 devid3 size 1.00GB used 170.38MB path /dev/dm-23 devid5 size 1.00GB used 170.38MB path /dev/dm-25 *** Some devices missing Btrfs v0.19-35-g1b444cd Nope. What about explicitly scanning the devices? # btrfs dev scan /dev/primary/btrtest* Scanning for Btrfs filesystems in '/dev/primary/btrtest1' Scanning for Btrfs filesystems in '/dev/primary/btrtest2' Scanning for Btrfs filesystems in '/dev/primary/btrtest3' Scanning for Btrfs filesystems in '/dev/primary/btrtest4' ERROR: unable to scan the device '/dev/primary/btrtest4' Note that it's stopped the scan immediately on encountering the removed device, so btrtest5 hasn't been picked up. Maybe it's something left in the device? # dd if=/dev/zero of=/dev/primary/btrtest4 # btrfs dev scan /dev/primary/btrtest* Scanning for Btrfs filesystems in '/dev/primary/btrtest1' Scanning for Btrfs filesystems in '/dev/primary/btrtest2' Scanning for Btrfs filesystems in '/dev/primary/btrtest3' Scanning for Btrfs filesystems in '/dev/primary/btrtest4' ERROR: unable to scan the device '/dev/primary/btrtest4' # btrfs fi show btest failed to read /dev/sr0 Label: 'btest' uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f Total devices 5 FS bytes used 100.13MB devid1 size 1.00GB used 190.38MB path /dev/dm-21 devid2 size 1.00GB used 170.38MB path /dev/dm-22 devid3 size 1.00GB used 170.38MB path /dev/dm-23 devid5 size 1.00GB used 170.38MB path /dev/dm-25 *** Some devices missing Btrfs v0.19-35-g1b444cd Zeroing the device has no effect. However, unmounting it does work, partially: # umount /mnt # btrfs dev scan Scanning for Btrfs filesystems failed to read /dev/sr0 # btrfs dev scan /dev/primary/btrtest* Scanning for Btrfs filesystems in '/dev/primary/btrtest1' Scanning for Btrfs filesystems in '/dev/primary/btrtest2' Scanning for Btrfs filesystems in '/dev/primary/btrtest3' Scanning for Btrfs filesystems in '/dev/primary/btrtest4' ERROR: unable to scan the device '/dev/primary/btrtest4' # btrfs fi show btest failed to read /dev/sr0 Label: 'btest' uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f Total devices 4 FS bytes used 100.13MB devid1 size 1.00GB used 190.38MB path /dev/dm-21 devid2 size 1.00GB used 170.38MB path /dev/dm-22 devid3 size 1.00GB used 170.38MB path /dev/dm-23 devid5 size 1.00GB used 170.38MB path /dev/dm-25 Btrfs v0.19-35-g1b444cd So, you need to unmount/remount the FS to make it believe that the device removal is permanent. (As an aside, I also found that resizing down by 1M then back to max has the same effect, if you don't want to unmount). However, the explicit scan of the block devices is still broken by the removed device, even after all data on it has been zeroed. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from
Re: converting one-disk btrfs into RAID-1?
On Tue, Oct 12, 2010 at 11:34:31AM +0200, Tomasz Torcz wrote: On Tue, Oct 12, 2010 at 11:32:07AM +0200, David Brown wrote: Is it possible to view the raid levels of data and meta data for an existing btrfs filesystem? It's easy to pick them when creating the system, but I couldn't find any way to view them afterwards. btrfs f df will show them, except for few kernel releases when the ioctl() was broken. Umm... h...@vlad:~ $ sudo btrfs fi df /mnt/ [sudo] password for hrm: Data: total=303.01GB, used=302.16GB Metadata: total=3.01GB, used=476.77MB System: total=11.88MB, used=36.00KB This is the latest btrfs git kernel and tools. What should I be seeing here? Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Emacs Makes A Computer Slow. --- signature.asc Description: Digital signature
Metadata size
I'm a little concerned about the size of my metadata. I'm doing raid10 on both data and metadata, and: h...@vlad:mnt $ sudo btrfs fi df /mnt Data: total=488.01GB, used=487.23GB Metadata: total=3.01GB, used=677.73MB System: total=11.88MB, used=52.00KB h...@vlad:mnt $ find /mnt | wc -l 20137 By my calculations, that's something on the order of 17.5K per filesystem object. This is mostly media files, plus some small metadata files. 17.5K on average seems very large to me. I have quite a bit of space on this system, so I'm not too concerned, but I wasn't sure if this kind of figure was representative or not. Overall file count by size: 0-1021 10-100 153 100-1K 778 1K-10K 279 10K-100K96 100K-1M238 1M-10M 12556 10M-100M 3452 100M-1G332 1G-10G 171 0-1K 952 1K-1M 613 1M-1G16340 1G+171 Interestingly, the metadata value was closer to 15K/object until my last batch of writing, which was the 171 1G+ files (and a few in the 100M-1G range), plus an equal number of small (2K) files. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Trouble rather the tiger in his lair than the sage amongst --- his books for to you kingdoms and their armies are mighty and enduring, but to him they are but toys of the moment to be overturned by the flicking of a finger. signature.asc Description: Digital signature
Apologies
I'm sorry about those last mails of mine. Clearly, nobody actually uses quilt mail to send mails. Or at least has never documented clearly how they do it. I shall test some more and try again. Irritated and embarrassed, Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eighty Megabytes And Constantly Swapping. --- signature.asc Description: Digital signature
Re: [patch 2/4] Add an option to show ISO, binary or raw bytes counts using df.
On Mon, Oct 18, 2010 at 09:21:56AM +0100, Frank Kingswood wrote: On 17/10/10 19:26, hugo-l...@carfax.org.uk wrote: Change btrfs filesystem df to allow the user to control the scales used for sizes in the output. Index: btrfs-progs-unstable/btrfs.c === --- btrfs-progs-unstable.orig/btrfs.c2010-10-17 18:43:57.0 +0100 +++ btrfs-progs-unstable/btrfs.c 2010-10-17 18:47:36.0 +0100 @@ -87,9 +87,10 @@ Show the info of a btrfs filesystem. If nouuid orlabel\n is passed, info of all the btrfs filesystem are shown. }, -{ do_df_filesystem, 1, - filesystem df, path\n +{ do_df_filesystem, -1, + filesystem df, [-r|-b|-i]path\n Show space usage information for a mount point\n. +-r, -b, -i for raw (bytes), binary or ISO sizes. }, This seems to eat up the short option namespace a bit quickly. Fileutils uses different names as well, it may be convenient for users to match its names: -h --human-readable powers of 2**10 -H --si powers of 1000 Matching fileutils is probably a good idea. I'm happy to use -h and -H. { do_balance, 1, filesystem balance, path\n Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-17 18:43:57.0 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c2010-10-17 18:47:36.0 +0100 @@ -841,7 +841,36 @@ u64 count = 0, i; int ret; int fd; -char *path = argv[1]; +char *path; +int format = PRETTY_SIZE_BINARY; Should the default not be to show sizes in bytes (RAW)? I was trying not to change the default behaviour at all, but with -h/-H (and no switch for --raw), that would make sense. I'll re-roll the patches. (And update the man pages, as Goffredo asked). Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I don't like the look of it, I tell you. Well, stop --- looking at it, then. signature.asc Description: Digital signature
csum errors, and resizing...
Just encountered an interesting issue. Rapid summary: when a resize encounters a file with broken checksums, it stops, and will not (apparently) proceed any further. Un/remount seems to clear the error condition. I've got a filesystem with some (lots of) checksum errors on it. It lives on a single partition. In trying to move all the data off this, onto a btrfs raid10 filesystem, I've been moving data, and shrinking the filesystem. The shrink process has now hit some of those csum errors: h...@vlad:~ $ sudo btrfs fi show -h failed to read /dev/sr0 Label: none uuid: fad2f415-979d-405e-9aa2-0c1011389273 Total devices 1 FS bytes used 660.75GiB devid1 size 675.40GiB used 1019.00GiB path /dev/dm-14 [...] h...@vlad:~ $ sudo strace btrfs fi resize 708209608k /media/vlad/video [...] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcd26974000 write(1, Resize '/media/vlad/video' of '70..., 43Resize '/media/vlad/video' of '708209608k' ) = 43 ioctl(3, 0x50009403, 0x7fffe9a8d140)= -1 EIO (Input/output error) close(3)= 0 write(2, ERROR: unable to resize '/media/v..., 44ERROR: unable to resize '/media/vlad/video' ) = 44 exit_group(30) = ? In syslog, I get a bunch of csum errors: Oct 21 19:40:01 vlad kernel: new size for /dev/mapper/media-video is 725206638592 Oct 21 19:40:03 vlad kernel: btrfs: relocating block group 1090913304576 flags 1 Oct 21 19:40:05 vlad kernel: btrfs_readpage_end_io_hook: 4088 callbacks suppressed Oct 21 19:40:05 vlad kernel: btrfs csum failed ino 257 off 131072 csum 752820288 private 2880127001 Oct 21 19:40:05 vlad kernel: btrfs csum failed ino 257 off 135168 csum 2112861244 private 3414608960 [and more] This is, I suppose, expected. However, it seems to put the filesystem into a state where a resize cannot be attempted again: h...@vlad:~ $ sudo strace btrfs fi resize 708209608k /media/vlad/video [...] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f559f8af000 write(1, Resize '/media/vlad/video' of '70..., 43Resize '/media/vlad/video' of '708209608k' ) = 43 ioctl(3, 0x50009403, 0x7fff8dfda770)= -1 EINVAL (Invalid argument) close(3)= 0 write(2, ERROR: unable to resize '/media/v..., 44ERROR: unable to resize '/media/vlad/video' ) = 44 exit_group(30) = ? Unmounting and remounting it resets the resize state, and I end up back in the first state again. Is this toggling of state intended? I'm on the git unstable kernel. Should I go up to 2.6.36 and try again? The other thing I can think of to do is to delete some of the files with bad checksums (I have backups) and see if I can get any further with the resize. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We teach people management skills by examining characters in --- Shakespeare. You could look at Claudius's crisis management techniques, for example. signature.asc Description: Digital signature
[patch v2 0/4] Size reporting in userspace tools
While playing around with resizing volumes recently, I realised that I didn't know whether btrfs fi show and btrfs fi df reported sizes in ISO (e.g. powers of 10^3) units, as they appear to from the labels they use, or in binary (powers of 2^10) units. Also, a mere three significant figures is somewhat less than I'm comfortable with if I'm about to resize the containing block device downwards. This patch series adds the ability to pick which scale is used for show and df, and labels the amounts properly (e.g. MB for ISO, MiB for binary units). I've incorporated Frank's suggestion of defaulting to raw, and matching coreutils' use of -h and -H. I've also updated the man pages as requested by Goffredo. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch v2 2/4] Add an option to show ISO, binary or raw bytes counts using df.
Change btrfs filesystem df to allow the user to control the scales used for sizes in the output. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c|6 +++--- btrfs_cmds.c | 42 -- man/btrfs.8.in |8 3 files changed, 47 insertions(+), 9 deletions(-) Index: btrfs-progs-unstable/btrfs.c === --- btrfs-progs-unstable.orig/btrfs.c 2010-10-20 19:12:18.0 +0100 +++ btrfs-progs-unstable/btrfs.c2010-10-20 19:48:34.0 +0100 @@ -87,9 +87,9 @@ Show the info of a btrfs filesystem. If no uuid or label\n is passed, info of all the btrfs filesystem are shown. }, - { do_df_filesystem, 1, - filesystem df, path\n - Show space usage information for a mount point\n. + { do_df_filesystem, -1, + filesystem df, [options] path\n + Show space usage information for a mount point. }, { do_balance, 1, filesystem balance, path\n Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-20 19:19:20.0 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c 2010-10-20 19:58:48.0 +0100 @@ -14,7 +14,6 @@ * Boston, MA 021110-1307, USA. */ - #include stdio.h #include stdlib.h #include string.h @@ -28,6 +27,7 @@ #include limits.h #include uuid/uuid.h #include ctype.h +#include getopt.h #undef ULONG_MAX @@ -835,13 +835,45 @@ return 0; } +const struct option df_options[] = { + { human-readable, 0, NULL, 'h' }, + { si, 0, NULL, 'H' }, + { NULL, 0, NULL, 0 } +}; + int do_df_filesystem(int nargs, char **argv) { struct btrfs_ioctl_space_args *sargs; u64 count = 0, i; int ret; int fd; - char *path = argv[1]; + char *path; + int format = PRETTY_SIZE_RAW; + + optind = 1; + while(1) { + int c = getopt_long(nargs, argv, hH, df_options, NULL); + if (c 0) + break; + switch(c) { + case 'h': + format = PRETTY_SIZE_BINARY; + break; + case 'H': + format = PRETTY_SIZE_ISO; + break; + default: + fprintf(stderr, Invalid arguments for df\n); + free(argv); + return 1; + } + } + if (nargs - optind != 1) { + fprintf(stderr, No path given for df\n); + free(argv); + return 1; + } + path = argv[optind]; fd = open_file_or_dir(path); if (fd 0) { @@ -914,10 +946,8 @@ written += 8; } - total_bytes = pretty_sizes(sargs-spaces[i].total_bytes, - PRETTY_SIZE_RAW); - used_bytes = pretty_sizes(sargs-spaces[i].used_bytes, - PRETTY_SIZE_RAW); + total_bytes = pretty_sizes(sargs-spaces[i].total_bytes, format); + used_bytes = pretty_sizes(sargs-spaces[i].used_bytes, format); printf(%s: total=%s, used=%s\n, description, total_bytes, used_bytes); } Index: btrfs-progs-unstable/man/btrfs.8.in === --- btrfs-progs-unstable.orig/man/btrfs.8.in2010-10-20 19:23:36.0 +0100 +++ btrfs-progs-unstable/man/btrfs.8.in 2010-10-20 19:28:14.0 +0100 @@ -21,6 +21,8 @@ .PP \fBbtrfs\fP \fBfilesystem resize\fP\fI [+/\-]size[gkm]|max filesystem\fP .PP +\fBbtrfs\fP \fBfilesystem df\fP\fI [options] path\fP +.PP \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP .PP \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP @@ -143,6 +145,12 @@ passed, \fBbtrfs\fR show info of all the btrfs filesystem. .TP +\fBfilesystem df\fR [options] path\fR +Show the amount of space used on this filesystem, in bytes. Options: +-h, --human-readable Use powers of 2^10 (1024) to report sizes. +-H, --si Use powers of 10^3 (1000) to report sizes, in SI multiples. +.TP + \fBdevice balance\fR \fIpath\fR Balance the chunks of the filesystem identified by \fIpath\fR across the devices. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord
[patch v2 4/4] Add an option to show ISO, binary or raw bytes counts using btrfs-show.
Change btrfs-show to allow the user to control the scales used for sizes in the output. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs-show.c| 27 +++ man/btrfs-show.8.in | 10 -- 2 files changed, 27 insertions(+), 10 deletions(-) Index: btrfs-progs-unstable/btrfs-show.c === --- btrfs-progs-unstable.orig/btrfs-show.c 2010-10-20 19:48:33.0 +0100 +++ btrfs-progs-unstable/btrfs-show.c 2010-10-20 20:18:11.0 +0100 @@ -52,7 +52,7 @@ return 0; } -static void print_one_uuid(struct btrfs_fs_devices *fs_devices) +static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format) { char uuidbuf[37]; struct list_head *cur; @@ -69,8 +69,7 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used, - PRETTY_SIZE_RAW); + super_bytes_used = pretty_sizes(device-super_bytes_used, format); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -82,8 +81,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); - bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); + total_bytes = pretty_sizes(device-total_bytes, format); + bytes_used = pretty_sizes(device-bytes_used, format); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); @@ -99,13 +98,18 @@ static void print_usage(void) { - fprintf(stderr, usage: btrfs-show [search label or device]\n); + fprintf(stderr, usage: btrfs-show [options] [search label or device]\n); + fprintf(stderr, Options:\n); + fprintf(stderr, \t-h, --human-readable\tShow sizes in powers of 2^10.\n); + fprintf(stderr, \t-s, --si\t\tShow sizes in powers of 10^3 (SI multiples).\n); fprintf(stderr, %s\n, BTRFS_BUILD_VERSION); exit(1); } static struct option long_options[] = { /* { byte-count, 1, NULL, 'b' }, */ + { human-readable, 0, NULL, 'h' }, + { si, 0, NULL, 'H' }, { 0, 0, 0, 0} }; @@ -117,14 +121,21 @@ char *search = NULL; int ret; int option_index = 0; + int format = PRETTY_SIZE_RAW; while(1) { int c; - c = getopt_long(ac, av, , long_options, + c = getopt_long(ac, av, hH, long_options, option_index); if (c 0) break; switch(c) { + case 'H': + format = PRETTY_SIZE_ISO; + break; + case 'h': + format = PRETTY_SIZE_BINARY; + break; default: print_usage(); } @@ -144,7 +155,7 @@ list); if (search uuid_search(fs_devices, search) == 0) continue; - print_one_uuid(fs_devices); + print_one_uuid(fs_devices, format); } printf(%s\n, BTRFS_BUILD_VERSION); return 0; Index: btrfs-progs-unstable/man/btrfs-show.8.in === --- btrfs-progs-unstable.orig/man/btrfs-show.8.in 2010-10-20 20:15:29.0 +0100 +++ btrfs-progs-unstable/man/btrfs-show.8.in2010-10-20 20:17:30.0 +0100 @@ -2,13 +2,19 @@ .SH NAME btrfs-show \- scan the /dev directory for btrfs partitions and print results. .SH SYNOPSIS -.B btrfs-show +.B btrfs-show [options] .SH DESCRIPTION .B btrfs-show is used to scan the /dev directory for btrfs partitions and display brief information such as lable, uuid, etc of each btrfs partition. .SH OPTIONS -none +.TP +\fB\-h\fR, \fB\-\-human\-readable\fR +Show values in multiples of 2^10. +.TP +\fB\-H\fR, \fB\-\-si\fR +Show values in multiples of 10^3 (SI multiples). + .SH AVAILABILITY .B btrfs-show is part of btrfs-progs. Btrfs is currently under heavy development, -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More
[patch v2 3/4] Add an option to show ISO, binary or raw bytes counts using show.
Change btrfs filesystem show to allow the user to control the scales used for sizes in the output. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c|2 +- btrfs_cmds.c | 45 ++--- man/btrfs.8.in | 10 ++ 3 files changed, 49 insertions(+), 8 deletions(-) Index: btrfs-progs-unstable/btrfs.c === --- btrfs-progs-unstable.orig/btrfs.c 2010-10-20 20:03:37.0 +0100 +++ btrfs-progs-unstable/btrfs.c2010-10-20 20:11:03.0 +0100 @@ -83,7 +83,7 @@ will occupe all available space on the device. }, { do_show_filesystem, 999, - filesystem show, [uuid|label]\n + filesystem show, [options] [uuid|label]\n Show the info of a btrfs filesystem. If no uuid or label\n is passed, info of all the btrfs filesystem are shown. }, Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-20 20:03:37.0 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c 2010-10-20 20:08:00.0 +0100 @@ -617,7 +617,7 @@ return 0; } -static void print_one_uuid(struct btrfs_fs_devices *fs_devices) +static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format) { char uuidbuf[37]; struct list_head *cur; @@ -634,8 +634,7 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used, - PRETTY_SIZE_RAW); + super_bytes_used = pretty_sizes(device-super_bytes_used, format); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -647,8 +646,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); - bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); + total_bytes = pretty_sizes(device-total_bytes, format); + bytes_used = pretty_sizes(device-bytes_used, format); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); @@ -662,13 +661,45 @@ printf(\n); } +const struct option show_options[] = { + { human-readable, 0, NULL, 'h' }, + { si, 0, NULL, 'H' }, + { NULL, 0, NULL, 0 } +}; + int do_show_filesystem(int argc, char **argv) { struct list_head *all_uuids; struct btrfs_fs_devices *fs_devices; struct list_head *cur_uuid; - char *search = argv[1]; + char *search; int ret; + int format = PRETTY_SIZE_RAW; + + optind = 1; + while(1) { + int c = getopt_long(argc, argv, hH, show_options, NULL); + if (c 0) + break; + switch(c) { + case 'h': + format = PRETTY_SIZE_BINARY; + break; + case 'H': + format = PRETTY_SIZE_ISO; + break; + default: + fprintf(stderr, Invalid arguments for show\n); + free(argv); + return 1; + } + } + if (argc - optind 1) { + fprintf(stderr, Too many arguments for show\n); + free(argv); + return 1; + } + search = argv[optind]; ret = btrfs_scan_one_dir(/dev, 0); if (ret){ @@ -682,7 +713,7 @@ list); if (search uuid_search(fs_devices, search) == 0) continue; - print_one_uuid(fs_devices); + print_one_uuid(fs_devices, format); } printf(%s\n, BTRFS_BUILD_VERSION); return 0; Index: btrfs-progs-unstable/man/btrfs.8.in === --- btrfs-progs-unstable.orig/man/btrfs.8.in2010-10-20 20:03:53.0 +0100 +++ btrfs-progs-unstable/man/btrfs.8.in 2010-10-20 20:08:15.0 +0100 @@ -23,6 +23,8 @@ .PP \fBbtrfs\fP \fBfilesystem df\fP\fI [options] path\fP .PP +\fBbtrfs\fP \fBfilesystem show\fP\fI [options] [uuid|label]\fP +.PP \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP .PP \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP @@ -151,6 +153,14 @@ -H, --si Use powers of 10^3 (1000) to report sizes, in SI multiples. .TP +\fBfilesystem show\fR [options] [uuid|label]\fR +Show the usage of each device in the btrfs filesystem with the given +uuid or label, or all
[patch v2 1/4] Update pretty-printer for different systems of counting multiples.
Make the pretty-printer for data sizes capable of printing in ISO (powers of 10^3), binary (powers of 2^10) or raw (a simple byte count). Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs-show.c |7 --- btrfs_cmds.c | 13 - mkfs.c |3 ++- utils.c | 48 +--- utils.h |7 ++- 5 files changed, 53 insertions(+), 25 deletions(-) Index: btrfs-progs-unstable/btrfs-show.c === --- btrfs-progs-unstable.orig/btrfs-show.c 2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/btrfs-show.c 2010-10-20 19:20:02.0 +0100 @@ -69,7 +69,8 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used); + super_bytes_used = pretty_sizes(device-super_bytes_used, + PRETTY_SIZE_RAW); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -81,8 +82,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes); - bytes_used = pretty_sizes(device-bytes_used); + total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); + bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c 2010-10-20 19:19:20.0 +0100 @@ -634,7 +634,8 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used); + super_bytes_used = pretty_sizes(device-super_bytes_used, + PRETTY_SIZE_RAW); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -646,8 +647,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes); - bytes_used = pretty_sizes(device-bytes_used); + total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); + bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); @@ -913,8 +914,10 @@ written += 8; } - total_bytes = pretty_sizes(sargs-spaces[i].total_bytes); - used_bytes = pretty_sizes(sargs-spaces[i].used_bytes); + total_bytes = pretty_sizes(sargs-spaces[i].total_bytes, + PRETTY_SIZE_RAW); + used_bytes = pretty_sizes(sargs-spaces[i].used_bytes, + PRETTY_SIZE_RAW); printf(%s: total=%s, used=%s\n, description, total_bytes, used_bytes); } Index: btrfs-progs-unstable/mkfs.c === --- btrfs-progs-unstable.orig/mkfs.c2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/mkfs.c 2010-10-17 19:35:08.0 +0100 @@ -524,7 +524,8 @@ printf(fs created label %s on %s\n\tnodesize %u leafsize %u sectorsize %u size %s\n, label, first_file, nodesize, leafsize, sectorsize, - pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy))); + pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy), + PRETTY_SIZE_BINARY)); printf(%s\n, BTRFS_BUILD_VERSION); btrfs_commit_transaction(trans, root); Index: btrfs-progs-unstable/utils.c === --- btrfs-progs-unstable.orig/utils.c 2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/utils.c2010-10-17 19:35:08.0 +0100 @@ -966,30 +966,48 @@ return ret; } -static char *size_strs[] = { , KB, MB, GB, TB, +static char *bin_size_strs[] = { , KiB, MiB, GiB, TiB, + PiB, EiB, ZiB, YiB}; +static char *iso_size_strs[] = { , kB, MB, GB, TB, PB, EB, ZB, YB
[patch v3 3/4] Add an option to show ISO, binary or raw bytes counts using show.
Change btrfs filesystem show to allow the user to control the scales used for sizes in the output. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c|2 +- btrfs_cmds.c | 45 ++--- man/btrfs.8.in | 15 ++- 3 files changed, 49 insertions(+), 13 deletions(-) Index: btrfs-progs-unstable/btrfs.c === --- btrfs-progs-unstable.orig/btrfs.c 2010-10-26 13:01:43.0 +0100 +++ btrfs-progs-unstable/btrfs.c2010-10-26 13:02:40.814489740 +0100 @@ -83,7 +83,7 @@ will occupe all available space on the device. }, { do_show_filesystem, 999, - filesystem show, [uuid|label]\n + filesystem show, [-h|--human-readable|-H|--si] [uuid|label]\n Show the info of a btrfs filesystem. If no uuid or label\n is passed, info of all the btrfs filesystem are shown. }, Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-26 13:00:39.0 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c 2010-10-26 13:02:40.834488902 +0100 @@ -617,7 +617,7 @@ return 0; } -static void print_one_uuid(struct btrfs_fs_devices *fs_devices) +static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format) { char uuidbuf[37]; struct list_head *cur; @@ -634,8 +634,7 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used, - PRETTY_SIZE_RAW); + super_bytes_used = pretty_sizes(device-super_bytes_used, format); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -647,8 +646,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); - bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); + total_bytes = pretty_sizes(device-total_bytes, format); + bytes_used = pretty_sizes(device-bytes_used, format); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); @@ -662,13 +661,45 @@ printf(\n); } +const struct option show_options[] = { + { human-readable, 0, NULL, 'h' }, + { si, 0, NULL, 'H' }, + { NULL, 0, NULL, 0 } +}; + int do_show_filesystem(int argc, char **argv) { struct list_head *all_uuids; struct btrfs_fs_devices *fs_devices; struct list_head *cur_uuid; - char *search = argv[1]; + char *search; int ret; + int format = PRETTY_SIZE_RAW; + + optind = 1; + while(1) { + int c = getopt_long(argc, argv, hH, show_options, NULL); + if (c 0) + break; + switch(c) { + case 'h': + format = PRETTY_SIZE_BINARY; + break; + case 'H': + format = PRETTY_SIZE_ISO; + break; + default: + fprintf(stderr, Invalid arguments for show\n); + free(argv); + return 1; + } + } + if (argc - optind 1) { + fprintf(stderr, Too many arguments for show\n); + free(argv); + return 1; + } + search = argv[optind]; ret = btrfs_scan_one_dir(/dev, 0); if (ret){ @@ -682,7 +713,7 @@ list); if (search uuid_search(fs_devices, search) == 0) continue; - print_one_uuid(fs_devices); + print_one_uuid(fs_devices, format); } printf(%s\n, BTRFS_BUILD_VERSION); return 0; Index: btrfs-progs-unstable/man/btrfs.8.in === --- btrfs-progs-unstable.orig/man/btrfs.8.in2010-10-26 13:01:27.0 +0100 +++ btrfs-progs-unstable/man/btrfs.8.in 2010-10-26 13:03:43.941854637 +0100 @@ -23,6 +23,8 @@ .PP \fBbtrfs\fP \fBfilesystem df\fP\fI [-h|-H|--human-readable|--si] path\fP .PP +\fBbtrfs\fP \fBfilesystem show\fP\fI [-h|--human-readable|-H|--si] [uuid|label]\fP +.PP \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP .PP \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP @@ -140,16 +142,19 @@ partition after reducing the size of the filesystem. .TP -\fBfilesystem show\fR [uuid|label]\fR -Show the btrfs filesystem with some additional
[patch v3 1/4] Update pretty-printer for different systems of counting multiples.
Make the pretty-printer for data sizes capable of printing in ISO (powers of 10^3), binary (powers of 2^10) or raw (a simple byte count). Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs-show.c |7 --- btrfs_cmds.c | 13 - mkfs.c |3 ++- utils.c | 48 +--- utils.h |7 ++- 5 files changed, 53 insertions(+), 25 deletions(-) Index: btrfs-progs-unstable/btrfs-show.c === --- btrfs-progs-unstable.orig/btrfs-show.c 2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/btrfs-show.c 2010-10-20 19:20:02.0 +0100 @@ -69,7 +69,8 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used); + super_bytes_used = pretty_sizes(device-super_bytes_used, + PRETTY_SIZE_RAW); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -81,8 +82,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes); - bytes_used = pretty_sizes(device-bytes_used); + total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); + bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c 2010-10-20 19:19:20.0 +0100 @@ -634,7 +634,8 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used); + super_bytes_used = pretty_sizes(device-super_bytes_used, + PRETTY_SIZE_RAW); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -646,8 +647,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes); - bytes_used = pretty_sizes(device-bytes_used); + total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); + bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); @@ -913,8 +914,10 @@ written += 8; } - total_bytes = pretty_sizes(sargs-spaces[i].total_bytes); - used_bytes = pretty_sizes(sargs-spaces[i].used_bytes); + total_bytes = pretty_sizes(sargs-spaces[i].total_bytes, + PRETTY_SIZE_RAW); + used_bytes = pretty_sizes(sargs-spaces[i].used_bytes, + PRETTY_SIZE_RAW); printf(%s: total=%s, used=%s\n, description, total_bytes, used_bytes); } Index: btrfs-progs-unstable/mkfs.c === --- btrfs-progs-unstable.orig/mkfs.c2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/mkfs.c 2010-10-17 19:35:08.0 +0100 @@ -524,7 +524,8 @@ printf(fs created label %s on %s\n\tnodesize %u leafsize %u sectorsize %u size %s\n, label, first_file, nodesize, leafsize, sectorsize, - pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy))); + pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy), + PRETTY_SIZE_BINARY)); printf(%s\n, BTRFS_BUILD_VERSION); btrfs_commit_transaction(trans, root); Index: btrfs-progs-unstable/utils.c === --- btrfs-progs-unstable.orig/utils.c 2010-10-09 15:39:09.0 +0100 +++ btrfs-progs-unstable/utils.c2010-10-17 19:35:08.0 +0100 @@ -966,30 +966,48 @@ return ret; } -static char *size_strs[] = { , KB, MB, GB, TB, +static char *bin_size_strs[] = { , KiB, MiB, GiB, TiB, + PiB, EiB, ZiB, YiB}; +static char *iso_size_strs[] = { , kB, MB, GB, TB, PB, EB, ZB, YB
[patch v3 0/4] Size reporting of btrfs tool
While playing around with resizing volumes recently, I realised that I didn't know whether btrfs fi show and btrfs fi df reported sizes in ISO (e.g. powers of 10^3) units, as they appear to from the labels they use, or in binary (powers of 2^10) units. Also, a mere three significant figures is somewhat less than I'm comfortable with if I'm about to resize the containing block device downwards. This patch series adds the ability to pick which scale is used for show and df, and labels the amounts properly (e.g. MB for ISO, MiB for binary units). I've incorporated Frank's suggestion of defaulting to raw, and matching coreutils' use of -h and -H. I've also updated the man pages and command help as requested by Goffredo. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Jazz is the sort of music where no-one plays anything the --- same way once. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch v3 4/4] Add an option to show ISO, binary or raw bytes counts using btrfs-show.
Change btrfs-show to allow the user to control the scales used for sizes in the output. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs-show.c| 27 +++ man/btrfs-show.8.in | 10 -- 2 files changed, 27 insertions(+), 10 deletions(-) Index: btrfs-progs-unstable/btrfs-show.c === --- btrfs-progs-unstable.orig/btrfs-show.c 2010-10-26 12:56:54.179226836 +0100 +++ btrfs-progs-unstable/btrfs-show.c 2010-10-26 13:05:48.626702902 +0100 @@ -52,7 +52,7 @@ return 0; } -static void print_one_uuid(struct btrfs_fs_devices *fs_devices) +static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format) { char uuidbuf[37]; struct list_head *cur; @@ -69,8 +69,7 @@ else printf(Label: none ); - super_bytes_used = pretty_sizes(device-super_bytes_used, - PRETTY_SIZE_RAW); + super_bytes_used = pretty_sizes(device-super_bytes_used, format); total = device-total_devs; printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, @@ -82,8 +81,8 @@ char *total_bytes; char *bytes_used; device = list_entry(cur, struct btrfs_device, dev_list); - total_bytes = pretty_sizes(device-total_bytes, PRETTY_SIZE_RAW); - bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW); + total_bytes = pretty_sizes(device-total_bytes, format); + bytes_used = pretty_sizes(device-bytes_used, format); printf(\tdevid %4llu size %s used %s path %s\n, (unsigned long long)device-devid, total_bytes, bytes_used, device-name); @@ -99,13 +98,18 @@ static void print_usage(void) { - fprintf(stderr, usage: btrfs-show [search label or device]\n); + fprintf(stderr, usage: btrfs-show [options] [search label or device]\n); + fprintf(stderr, Options:\n); + fprintf(stderr, \t-h, --human-readable\tShow sizes in powers of 2^10.\n); + fprintf(stderr, \t-s, --si\t\tShow sizes in powers of 10^3 (SI multiples).\n); fprintf(stderr, %s\n, BTRFS_BUILD_VERSION); exit(1); } static struct option long_options[] = { /* { byte-count, 1, NULL, 'b' }, */ + { human-readable, 0, NULL, 'h' }, + { si, 0, NULL, 'H' }, { 0, 0, 0, 0} }; @@ -117,14 +121,21 @@ char *search = NULL; int ret; int option_index = 0; + int format = PRETTY_SIZE_RAW; while(1) { int c; - c = getopt_long(ac, av, , long_options, + c = getopt_long(ac, av, hH, long_options, option_index); if (c 0) break; switch(c) { + case 'H': + format = PRETTY_SIZE_ISO; + break; + case 'h': + format = PRETTY_SIZE_BINARY; + break; default: print_usage(); } @@ -144,7 +155,7 @@ list); if (search uuid_search(fs_devices, search) == 0) continue; - print_one_uuid(fs_devices); + print_one_uuid(fs_devices, format); } printf(%s\n, BTRFS_BUILD_VERSION); return 0; Index: btrfs-progs-unstable/man/btrfs-show.8.in === --- btrfs-progs-unstable.orig/man/btrfs-show.8.in 2010-10-26 12:56:54.189226427 +0100 +++ btrfs-progs-unstable/man/btrfs-show.8.in2010-10-26 13:06:51.074147050 +0100 @@ -2,13 +2,19 @@ .SH NAME btrfs-show \- scan the /dev directory for btrfs partitions and print results. .SH SYNOPSIS -.B btrfs-show +.B btrfs-show [-h|-H|--human-readable|--si] .SH DESCRIPTION .B btrfs-show is used to scan the /dev directory for btrfs partitions and display brief information such as lable, uuid, etc of each btrfs partition. .SH OPTIONS -none +.TP +\fB\-h\fR, \fB\-\-human\-readable\fR +Show values in multiples of 2^10. +.TP +\fB\-H\fR, \fB\-\-si\fR +Show values in multiples of 10^3 (SI multiples). + .SH AVAILABILITY .B btrfs-show is part of btrfs-progs. Btrfs is currently under heavy development, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/2] Control filesystem balances (kernel side)
These two patches give a degree of control over balance operations. The first makes it possible to get an idea of how much work remains to do, by tracking the number of block groups (chunks) that need to be moved/rewritten. The second patch allows a running balance operation to be cancelled when the current block group has been moved. One fundamental question, though -- is the progress monitor function best implemented as an ioctl, as I've done here, or should it be two or three sysfs files? I'm thinking of /proc/mdstat... Obviously, /proc/mdstat would never get into /sys, but exposing the expected and remaining values as files has an attractive simplicity to it. The user-space side of things are in a separate patch series, to follow. Please be gentle with me, this is my first (serious, non-trivial) kernel patch. :) Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- No! My collection of rare, incurable diseases! Violated! --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/2] Balance progress monitoring.
This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. Signed-off-by: Hugo Mills h...@carfax.org.uk --- fs/btrfs/ctree.h |9 fs/btrfs/disk-io.c |2 + fs/btrfs/ioctl.c | 34 fs/btrfs/ioctl.h |7 ++ fs/btrfs/volumes.c | 55 +++-- 5 files changed, 105 insertions(+), 2 deletions(-) Index: linux-mainline/fs/btrfs/ctree.h === --- linux-mainline.orig/fs/btrfs/ctree.h2010-10-26 18:03:38.0 +0100 +++ linux-mainline/fs/btrfs/ctree.h 2010-10-29 17:20:43.860460761 +0100 @@ -803,6 +803,11 @@ struct list_head cluster_list; }; +struct btrfs_balance_info { + u64 expected; + u64 completed; +}; + struct reloc_control; struct btrfs_device; struct btrfs_fs_devices; @@ -1010,6 +1015,10 @@ unsigned metadata_ratio; void *bdev_holder; + + /* Keep track of any rebalance operations on this FS */ + spinlock_t balance_info_lock; + struct btrfs_balance_info *balance_info; }; /* Index: linux-mainline/fs/btrfs/ioctl.c === --- linux-mainline.orig/fs/btrfs/ioctl.c2010-10-26 18:03:38.0 +0100 +++ linux-mainline/fs/btrfs/ioctl.c 2010-10-29 17:21:26.128742389 +0100 @@ -1984,6 +1984,38 @@ return 0; } +/* + * Return the current status of any balance operation + */ +long btrfs_ioctl_balance_progress( + struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_balance_progress __user *user_dest) +{ + int ret = 0; + struct btrfs_ioctl_balance_progress dest; + + spin_lock(fs_info-balance_info_lock); + if (!fs_info-balance_info) { + ret = -EINVAL; + goto error; + } + + dest.expected = fs_info-balance_info-expected; + dest.completed = fs_info-balance_info-completed; + + spin_unlock(fs_info-balance_info_lock); + + if (copy_to_user(user_dest, dest, +sizeof(struct btrfs_ioctl_balance_progress))) + return -EFAULT; + + return 0; + +error: + spin_unlock(fs_info-balance_info_lock); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2017,6 +2049,8 @@ return btrfs_ioctl_rm_dev(root, argp); case BTRFS_IOC_BALANCE: return btrfs_balance(root-fs_info-dev_root); + case BTRFS_IOC_BALANCE_PROGRESS: + return btrfs_ioctl_balance_progress(root-fs_info, argp); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: Index: linux-mainline/fs/btrfs/ioctl.h === --- linux-mainline.orig/fs/btrfs/ioctl.h2010-10-26 18:03:38.0 +0100 +++ linux-mainline/fs/btrfs/ioctl.h 2010-10-29 17:05:44.447028825 +0100 @@ -138,6 +138,11 @@ struct btrfs_ioctl_space_info spaces[0]; }; +struct btrfs_ioctl_balance_progress { + __u64 expected; + __u64 completed; +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -178,4 +183,6 @@ #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64) #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \ struct btrfs_ioctl_space_args) +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ + struct btrfs_ioctl_balance_progress) #endif Index: linux-mainline/fs/btrfs/volumes.c === --- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-26 18:03:38.0 +0100 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100 @@ -1902,6 +1902,7 @@ struct btrfs_root *chunk_root = dev_root-fs_info-chunk_root; struct btrfs_trans_handle *trans; struct btrfs_key found_key; + struct btrfs_balance_status *bal_info; if (dev_root-fs_info-sb-s_flags MS_RDONLY) return -EROFS; @@ -1909,6 +1910,18 @@ mutex_lock(dev_root-fs_info-volume_mutex); dev_root = dev_root-fs_info-dev_root; + dev_root-fs_info-balance_info = kmalloc( + sizeof(struct btrfs_balance_info), + GFP_NOFS); + if (!dev_root-fs_info-balance_info) { + ret = -ENOSPC; + goto error_no_status; + } + bal_info = dev_root-fs_info-balance_info; + bal_info-expected = -1; /* One less than actually counted
[patch 2/2] Cancel filesystem balance.
This patch adds an ioctl for cancelling a btrfs balance operation mid-flight. The ioctl simply sets a flag, and the operation terminates after the current block group move has completed. Signed-off-by: Hugo Mills h...@carfax.org.uk --- fs/btrfs/ctree.h |1 + fs/btrfs/ioctl.c | 25 + fs/btrfs/ioctl.h |1 + fs/btrfs/volumes.c |7 ++- 4 files changed, 33 insertions(+), 1 deletion(-) Index: linux-mainline/fs/btrfs/ctree.h === --- linux-mainline.orig/fs/btrfs/ctree.h2010-10-29 17:20:43.860460761 +0100 +++ linux-mainline/fs/btrfs/ctree.h 2010-10-29 17:24:06.622214467 +0100 @@ -806,6 +806,7 @@ struct btrfs_balance_info { u64 expected; u64 completed; + int cancel_pending; }; struct reloc_control; Index: linux-mainline/fs/btrfs/ioctl.c === --- linux-mainline.orig/fs/btrfs/ioctl.c2010-10-29 17:21:26.128742389 +0100 +++ linux-mainline/fs/btrfs/ioctl.c 2010-10-29 17:27:51.933043374 +0100 @@ -2016,6 +2016,29 @@ return ret; } +/* + * Cancel a running balance operation + */ +long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info) +{ + int err = 0; + + spin_lock(fs_info-balance_info_lock); + if(!fs_info-balance_info) { + err = -EINVAL; + goto error; + } + if(fs_info-balance_info-cancel_pending) { + err = -ECANCELED; + goto error; + } + fs_info-balance_info-cancel_pending = 1; + +error: + spin_unlock(fs_info-balance_info_lock); + return err; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2051,6 +2074,8 @@ return btrfs_balance(root-fs_info-dev_root); case BTRFS_IOC_BALANCE_PROGRESS: return btrfs_ioctl_balance_progress(root-fs_info, argp); + case BTRFS_IOC_BALANCE_CANCEL: + return btrfs_ioctl_balance_cancel(root-fs_info); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: Index: linux-mainline/fs/btrfs/ioctl.h === --- linux-mainline.orig/fs/btrfs/ioctl.h2010-10-29 17:05:44.447028825 +0100 +++ linux-mainline/fs/btrfs/ioctl.h 2010-10-29 17:24:06.642213653 +0100 @@ -185,4 +185,5 @@ struct btrfs_ioctl_space_args) #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ struct btrfs_ioctl_balance_progress) +#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 22) #endif Index: linux-mainline/fs/btrfs/volumes.c === --- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:24:06.652213246 +0100 @@ -1921,6 +1921,7 @@ bal_info-expected = -1; /* One less than actually counted, because chunk 0 is special */ bal_info-completed = 0; + bal_info-cancel_pending = 0; /* step one make some room on all the devices */ list_for_each_entry(device, devices, dev_list) { @@ -1983,7 +1984,7 @@ key.offset = (u64)-1; key.type = BTRFS_CHUNK_ITEM_KEY; - while (1) { + while (!bal_info-cancel_pending) { ret = btrfs_search_slot(NULL, chunk_root, key, path, 0, 0); if (ret 0) goto error; @@ -2024,6 +2025,10 @@ bal_info-completed, bal_info-expected); } ret = 0; + if(bal_info-cancel_pending) { + printk(KERN_INFO btrfs: balance cancelled\n); + ret = -EINTR; + } error: btrfs_free_path(path); spin_lock(dev_root-fs_info-balance_info_lock); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/2] Control filesystem balances (userspace)
These two patches complement the previous two kernel-side patches. The first implements a way of displaying the current progress of any running balance process. The second patch allows a running balance to be cancelled. I'm a bit uncertain about the best name for these commands. Several options: 1) # btrfs filesystem progress path # btrfs filesystem cancel path Way too vague (cancel *what*?) 2) # btrfs filesystem balance-progress path # btrfs filesystem balance-cancel path Clashes horribly with filesystem balance -- no abbreviations possible. 3) btrfs filesystem balance -p path btrfs filesystem balance -c path Changes behaviour significantly on a switch, in contrast to the behaviour of the rest of the btrfs tool. 4) btrfs balance progress path btrfs balance cancel path My current favourite, although we introduce a new namespace (balance) for commands. We could add btrfs balance start path as a synonym for btrfs filesystem balance path, for some degree of consistency. At some point, I'll add a monitor function, which will poll at 1s intervals for progress updates, and print out progress when it changes. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- No! My collection of rare, incurable diseases! Violated! --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/2] User-space tool for cancelling balance operations.
Add an option to the btrfs tool to use the ioctl for cancelling balance operations. SIgned-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c |4 btrfs_cmds.c | 41 + btrfs_cmds.h |1 + ioctl.h |1 + 4 files changed, 47 insertions(+) Index: btrfs-progs-unstable/btrfs.c === --- btrfs-progs-unstable.orig/btrfs.c 2010-10-30 00:19:59.968416575 +0100 +++ btrfs-progs-unstable/btrfs.c2010-10-30 00:20:38.446849736 +0100 @@ -99,6 +99,10 @@ balance progress, path\n Show progress of the balance operation running on path. }, + { do_balance_cancel, 1, + balance cancel, path\n + Cancel the balance operation running on path. + }, { do_scan, 999, device scan, [device [device..]\n Scan all device for or the passed device for a btrfs\n Index: btrfs-progs-unstable/btrfs_cmds.c === --- btrfs-progs-unstable.orig/btrfs_cmds.c 2010-10-30 00:04:48.335524683 +0100 +++ btrfs-progs-unstable/btrfs_cmds.c 2010-10-30 00:20:22.267508562 +0100 @@ -848,6 +848,47 @@ return 0; } +int do_balance_cancel(int nargs, char **argv) +{ + char *path = argv[1]; + int fdmnt; + int ret = 0; + int err = 0; + + fdmnt = open_file_or_dir(path); + if(fdmnt 0) { + fprintf(stderr, ERROR: can't access '%s'\n, path); + return 12; + } + + ret = ioctl(fdmnt, BTRFS_IOC_BALANCE_CANCEL, NULL); + err = errno; + + if(ret) { + switch(err) { + case 0: + break; + case EINVAL: + fprintf(stderr, ERROR: no balance in progress.\n); + err = 20; + break; + case ECANCELED: + fprintf(stderr, ERROR: operation already cancelled.\n); + err = 21; + break; + default: + fprintf(stderr, ERROR: ioctl returned error '%d'.\n, + err); + err = 22; + break; + } + } + + close(fdmnt); + + return err; +} + int do_remove_volume(int nargs, char **args) { Index: btrfs-progs-unstable/btrfs_cmds.h === --- btrfs-progs-unstable.orig/btrfs_cmds.h 2010-10-30 00:04:48.335524683 +0100 +++ btrfs-progs-unstable/btrfs_cmds.h 2010-10-30 00:20:22.307506934 +0100 @@ -24,6 +24,7 @@ int do_add_volume(int nargs, char **args); int do_balance(int nargs, char **argv); int do_balance_progress(int nargs, char **argv); +int do_balance_cancel(int nargs, char **argv); int do_remove_volume(int nargs, char **args); int do_scan(int nargs, char **argv); int do_resize(int nargs, char **argv); Index: btrfs-progs-unstable/ioctl.h === --- btrfs-progs-unstable.orig/ioctl.h 2010-10-30 00:04:48.325525089 +0100 +++ btrfs-progs-unstable/ioctl.h2010-10-30 00:20:22.357504895 +0100 @@ -176,4 +176,5 @@ struct btrfs_ioctl_space_args) #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ struct btrfs_ioctl_balance_progress) +#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 22) #endif -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/2] Balance progress monitoring.
On Sat, Oct 30, 2010 at 01:07:27AM +0100, Hugo Mills wrote: This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. Dammit. An unrefreshed quilt patch let an error get through (see below). Updated patch in a few moments. Hugo. Index: linux-mainline/fs/btrfs/volumes.c === --- linux-mainline.orig/fs/btrfs/volumes.c2010-10-26 18:03:38.0 +0100 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100 @@ -1902,6 +1902,7 @@ struct btrfs_root *chunk_root = dev_root-fs_info-chunk_root; struct btrfs_trans_handle *trans; struct btrfs_key found_key; + struct btrfs_balance_status *bal_info; + struct btrfs_balance_info *bal_info; -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- dragon A linked list is still a binary tree. Just a --- very unbalanced one. signature.asc Description: Digital signature
Re: Horrible btrfs performance due to fragmentation
On Mon, Nov 01, 2010 at 12:36:58AM +0200, Felipe Contreras wrote: On Mon, Nov 1, 2010 at 12:25 AM, cwillu cwi...@cwillu.com wrote: btrfs fi defrag isn't recursive. btrfs filesystem defrag /home will defragment the space used to store the folder, without touching the space used to store files in that folder. Yes, that came up on the IRC, but: 1) It doesn't make sense: btrfs filesystem doesn't allow a fileystem as argument? Why would anyone want it to be _non_ recursive? You missed the subsequent discussion on IRC about the interaction of COW with defrag. Essentially, if you've got two files that are COW copies of each other, and one has had something written to it since, it's *impossible* for both files to be defragmented, without making a full copy of both: Start with a file (A, etc are data blocks on the disk): file1 = ABCDEF Cow copy it: file1 = ABCDEF file2 = ABCDEF Now write to one of them: file1 = ABCDEF file2 = ABCDxF So, either file1 is contiguous, and file2 is fragmented (with the block x somewhere else on disk), or file2 is contiguous, and file1 is fragmented (with E somewhere else on disk). In fact, we've determined by experiment that when you defrag a file that's sharing blocks with another one, the file gets copied in its entirety, thus separating the blocks of the file and its COW duplicate. 2) The filesystem should not degrade performance so horribly no matter how long the it has been used. Even git has automatic garbage collection. Since, I believe, btrfs uses COW very heavily internally for ensuring consistency, you can end up with fragmenting files and directories very easily. You probably need some kind of scrubber that goes looking for non-COW files that are fragmented, and defrags them in the background. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- No! My collection of rare, incurable diseases! Violated! --- signature.asc Description: Digital signature
Re: [patch 1/2] Balance progress monitoring (updated)
On Mon, Nov 01, 2010 at 04:06:53PM +0800, liubo wrote: On 10/30/2010 09:39 PM, Hugo Mills wrote: This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. IMO, tracking the information of blocks which are balancing also makes sense. For example, the block information's blocknr. It can help us monitor better. I don't see how that will help. The block group IDs (which is all that we get at this level) are effectively arbitrary 64-bit numbers, and are what appear in the kernel logs. How could that information be used to improve monitoring? I'm not ruling out the idea completely -- I just can't see at the moment how it would be used. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is a diversity twice as good as a university? --- signature.asc Description: Digital signature
Re: [patch 0/2] Control filesystem balances (kernel side)
On Sat, Oct 30, 2010 at 07:44:35PM +0200, Goffredo Baroncelli wrote: On Saturday, 30 October, 2010, Hugo Mills wrote: One fundamental question, though -- is the progress monitor function best implemented as an ioctl, as I've done here, or should it be two or three sysfs files? I'm thinking of /proc/mdstat... Obviously, /proc/mdstat would never get into /sys, but exposing the expected and remaining values as files has an attractive simplicity to it. I like the idea that these info should be put under sysfs. Something like /sys/btrfs/filesystem-uuid/ /sys/fs/btrfs/uuid I think. Also: /sys/fs/btrfs/label as a symlink to the uuid directory. balance - info on balancing For the one-value-per-file rule of sysfs, this should probably be balance_expected and balance_completed, each holding a count of block groups. devices - list of device (a directory of links or a file which contains the list of devices) subvolumes/ - info on subvolume(s) label - label of the filesystem other btrfs filesystem related knoba The other one that struck me earlier today as being useful was tracking the progress of a dev delete operation. But that'll come later. Obviously we need another btrfs command to extract an uuid from a btrfs filesystem like: # btrfs filesystem get-uuid /path/to/a/btrfs/filesystem f9b9c413-0dc8-4e3f-94f2-86faa702f519 Possibly a slightly more general fi metadata with switches for UUID and label? # btrfs fi metadata [-u|--uuid] /path # btrfs fi metadata [-l|--label] /path Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is a diversity twice as good as a university? --- signature.asc Description: Digital signature
Re: [PATCH v2] btrfs-progs: btrfs: implement 'start-sync' and 'wait-sync' commands
On Tue, Nov 02, 2010 at 07:58:27PM +0100, Goffredo Baroncelli wrote: On Monday, 01 November, 2010, Sage Weil wrote: The 'start-sync' command initiates a sync, but does not wait for it to complete. A transaction is printed that can be fed to 'wait-sync', which will wait for it to commit. 'wait-sync' can also be used in combination with 'async-snapshot' to wait for an async snapshot creation to commit. Updates the man page too. Signed-off-by: Sage Weil s...@newdream.net --- btrfs.c|9 + btrfs_cmds.c | 49 + btrfs_cmds.h |2 ++ man/btrfs.8.in | 14 ++ 4 files changed, 74 insertions(+), 0 deletions(-) diff --git a/btrfs.c b/btrfs.c index 46314cf..c871f4a 100644 --- a/btrfs.c +++ b/btrfs.c @@ -77,6 +77,15 @@ static struct Command commands[] = { filesystem sync, path\n Force a sync on the filesystem path. }, + { do_start_sync, 1, + filesystem start-sync, path\n + Start a sync on the filesystem path, and print the resulting\n + transaction id. + }, Like the command btrfs subvol snapshot, I think that it is better to add a modifier instead of a new command. btrfs filesystem sync [--async] Sorry if I noticed this too late. But I don't see a valid reason to add another command. From a UI point of view the meaning of the command is the same, change only slight the behavior. Even tough I have to admint that sync --async sound strange. May be flush is better ? How about btrfs filesystem sync --background? Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You're never alone with a rubber duck... --- signature.asc Description: Digital signature
Re: RFC: exporting info via sysfs [was Re: [patch 0/2] Control filesystem balances (kernel side)]
Hi, Goffredo, On Thu, Nov 04, 2010 at 11:55:24PM +0100, Goffredo Baroncelli wrote: I make a prototype for exporting info from btrfs via sysfs. Good stuff. I was going to take a look at doing that this weekend. :) Under /sys/btrfs were created two directories, named fs and devices. /sys/btrfs/fs/fs-uuid/ I'm pretty sure that /sys/btrfs won't get through any discussion on LKML. I'd suggest /sys/fs/btrfs as the base, since that's where the other filesystems seem to put their sysfs information. label- filesystem label num_devices- total number of devices open_devices - number of opened devices [...] /sys/btrfs/devices/dev-uuid/ devid - btrfs device number fsid - filesystem uuid (fs-uuid) major, minor - major minor I think the major, minor should instead be be a symlink to the relevant entry in /sys/devices/... (as done in /sys/block/*) or /sys/block (as done in /sys/block/md*/slaves). Call it device. name - device name Unnecessary -- and also, I think, unlikely to get through LKML review. Putting a device name here implies that the kernel knows better than userspace what the name of the device is (i.e. which device node you should be using). Having the link to /sys/block/* or /sys/devices/... as above is, I think, all that's needed here. Userspace should be able to convert the major/minor pair kept in /sys/fs/btrfs/devices/uuid/device/dev appropriately. writeable - is the device writeable where fs-uuid is the filesystem uuid, and dev-uuid is the device uuid. The link between devices and filesystem is the fsid parameter of a device. Could that be made a symlink instead? That seems to be the usual approach in sysfs. I create these structure because we should handle the case were the devices are present (like after a btrfs device scan) but the filesystem aren't mounted. ... ah, I see it can't. (Re: my previous comment) In this case the devices/ subdirectory is populated. Instead the fs/ subdirectory is empty. I don't attach a patch because the code is very ugly. Comments ? Thoughts ? Is it ugly because there are significant difficulties in making btrfs or sysfs do this, or just because you hacked something together as quickly as possible for a demo? Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold. signature.asc Description: Digital signature
Re: time for balance
B1;2401;0cOn Tue, Nov 09, 2010 at 04:09:00PM +0100, Helmut Hullen wrote: btrfs device add /dev/sdc1 /srv/MM btrfs filesystem balance /srv/MM adds /dev/sdc1 with about 1,5 TByte (df tells so), and the system works the second line (balance) since about 12 hours. How much time needs this balance command? Enough time to rewrite every piece of data in the filesystem. There are patches [1,2] for the kernel and userspace tools to allow you to monitor the progress of a balance. I'll be putting out a new revision of them either tonight or tomorrow (depending on how awkward git is feeling). [1] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg06558.html [2] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg06561.html If the machine hangs somewhere and I have to restart it: how can/must I repair the btrfs system? No need. Even while the balance is running, the filesystem should remain in a consistent state (assuming that you have working barriers). Note that if you restart the balance process, it will effectively start from the beginning again. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Anyone using a computer to generate random numbers is, of --- course, in a state of sin. signature.asc Description: Digital signature
Re: labelling
On Wed, Nov 10, 2010 at 08:40:00AM +0100, Helmut Hullen wrote: Hallo, linux-btrfs, I have problems with btrfs labels. My way: 2-TByte-disk: mkfs.btrs LABEL=MM2 /dev/sdd2 worked. Mounting mount LABEL=MM2 /srv/MM worked. Additional 1.5-TByte-Disk: btrfs add device /dev/sdc3 /srv/MM ... balance ... worked. findfs LABEL=MM2 shows /dev/sdd2 (the first partition) file -s /dev/sdd2 file -s /dev/sdc3 shows LABEL=MM2 for both partitions (that's not good). No, this is both good and correct. You've got a single filesystem spanning multiple block devices. The *filesystem* possesses the label, and with btrfs you can mount the filesystem using *any* of the block devices that compose it, so both block devices should indeed show the FS label, which is what's happening here. Unmounting /srv/MM and mount LABEL=MM2 /srv/MM doesn't work now, it tries to mount /dev/sdd2 and mourns. mount /dev/sdd2 /srv/MM shows the same error message, What's the error message? What do you get in your kernel logs when you do this? This should work, so there's something wrong, but it's (probably) not to do with disk labels. mount /dev/sdc3 /srv/MM (mounting the added partition) works fine, the whole space is available. But what can I do with the 2 identical labels? How can I delete (or change) the label of the first btrfs partition? You can't, as I explained above. By the way: df shows about 3.4 TByte usable space (2 TByte and 1.5 TByte), but btrfs filesystem df /srv/MM tells Data: total=2.70TB, used=1.64TB I'm missing about 0.7 TByte! In btrfs filesystem df, the total field is the space that has been allocated to block groups. As more space is needed on the filesystem, the total field will increase to use up the additional raw storage (if you're using RAID1 or RAID10, this will be at a ratio of 2:1; with RAID0 or simple allocation, the ratio is 1:1). Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In event of Last Trump, please form an orderly queue --- and await judgement. signature.asc Description: Digital signature
Re: Unhelpful error message from btrfs tool
On Thu, Nov 11, 2010 at 09:32:06PM +0100, Goffredo Baroncelli wrote: On Thursday, 11 November, 2010, Josh Berry wrote: Hi, I have a cron script that runs periodically, taking new snapshots and cleaning up old ones when space gets low on my filesystem. This morning, the script suddenly stopped being able to remove snapshots. When I tried to remove one manually, I got the following: # btrfs subvol del 2010-11-07-01:17:01 Delete subvolume '/btrfs/snapshot/2010-11-07-01:17:01' ERROR: cannot delete '/btrfs/snapshot/2010-11-07-01:17:01' There is nothing in dmesg or in the above output to tell me what the problem is, how to fix it, etc. I'm running kernel 2.6.36, and I updated btrfs-progs-unstable to the lastest Git revision (1b444cd2e6...), with the same result. How do I diagnose this issue? I'm not even sure where to start. This error is due to a failure during the ioctl. Could you strace btrfs subvolume delete ? # strace btrfs subvolume delete /btrfs/snapshot/2010-11-07-01:17:01 Definetely we need a more verbose error handling, about the return of the ioctl. I've made a good start on doing that. I'll try to finish it off over the weekend. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The future isn't what it used to be. --- signature.asc Description: Digital signature
[PATCH v2 3/3] User-space tool for cancelling balance operations.
Add an option to the btrfs tool to use the ioctl for cancelling balance operations. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c |4 btrfs_cmds.c | 41 + btrfs_cmds.h |1 + ioctl.h |1 + 4 files changed, 47 insertions(+), 0 deletions(-) diff --git a/btrfs.c b/btrfs.c index 0b6186c..93f7886 100644 --- a/btrfs.c +++ b/btrfs.c @@ -103,6 +103,10 @@ static struct Command commands[] = { balance progress, [-m|--monitor] path\n Show progress of the balance operation running on path. }, + { do_balance_cancel, 1, + balance cancel, path\n + Cancel the balance operation running on path. + }, { do_scan, 999, device scan, [device [device..]\n Scan all device for or the passed device for a btrfs\n diff --git a/btrfs_cmds.c b/btrfs_cmds.c index c681b5a..d246a8b 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -922,6 +922,47 @@ int do_balance_progress(int argc, char **argv) return 0; } +int do_balance_cancel(int nargs, char **argv) +{ + char *path = argv[1]; + int fdmnt; + int ret = 0; + int err = 0; + + fdmnt = open_file_or_dir(path); + if(fdmnt 0) { + fprintf(stderr, ERROR: can't access '%s'\n, path); + return 12; + } + + ret = ioctl(fdmnt, BTRFS_IOC_BALANCE_CANCEL, NULL); + err = errno; + + if(ret) { + switch(err) { + case 0: + break; + case EINVAL: + fprintf(stderr, ERROR: no balance in progress.\n); + err = 20; + break; + case ECANCELED: + fprintf(stderr, ERROR: operation already cancelled.\n); + err = 21; + break; + default: + fprintf(stderr, ERROR: ioctl returned error '%d'.\n, + err); + err = 22; + break; + } + } + + close(fdmnt); + + return err; +} + int do_remove_volume(int nargs, char **args) { diff --git a/btrfs_cmds.h b/btrfs_cmds.h index 47b0a27..5cb0d9c 100644 --- a/btrfs_cmds.h +++ b/btrfs_cmds.h @@ -24,6 +24,7 @@ int do_show_filesystem(int nargs, char **argv); int do_add_volume(int nargs, char **args); int do_balance(int nargs, char **argv); int do_balance_progress(int nargs, char **argv); +int do_balance_cancel(int nargs, char **argv); int do_remove_volume(int nargs, char **args); int do_scan(int nargs, char **argv); int do_resize(int nargs, char **argv); diff --git a/ioctl.h b/ioctl.h index 888ceb9..1fc665b 100644 --- a/ioctl.h +++ b/ioctl.h @@ -176,4 +176,5 @@ struct btrfs_ioctl_balance_progress { struct btrfs_ioctl_space_args) #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \ struct btrfs_ioctl_balance_progress) +#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 26) #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] Balance progress monitoring.
This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. We also add btrfs balance start as an alias for btrfs filesystem balance, so that all balance-related functions are available under one prefix. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c|8 +++ btrfs_cmds.c | 60 btrfs_cmds.h |1 + ioctl.h|7 ++ man/btrfs.8.in |7 ++ 5 files changed, 83 insertions(+), 0 deletions(-) diff --git a/btrfs.c b/btrfs.c index 46314cf..0b6186c 100644 --- a/btrfs.c +++ b/btrfs.c @@ -95,6 +95,14 @@ static struct Command commands[] = { filesystem balance, path\n Balance the chunks across the device. }, + { do_balance, 1, + balance start, path\n + Synonym for \btrfs filesystem balance\. + }, + { do_balance_progress, -1, + balance progress, [-m|--monitor] path\n + Show progress of the balance operation running on path. + }, { do_scan, 999, device scan, [device [device..]\n Scan all device for or the passed device for a btrfs\n diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 8031c58..2745d64 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -28,6 +28,7 @@ #include limits.h #include uuid/uuid.h #include ctype.h +#include getopt.h #undef ULONG_MAX @@ -776,6 +777,65 @@ int do_balance(int argc, char **argv) } return 0; } + +int get_balance_progress(char *path, struct btrfs_ioctl_balance_progress *bal) +{ + int fdmnt; + int ret = 0; + int err = 0; + + fdmnt = open_file_or_dir(path); + if(fdmnt 0) { + return -1; + } + + ret = ioctl(fdmnt, BTRFS_IOC_BALANCE_PROGRESS, bal); + if(ret) + err = errno; + close(fdmnt); + + return err; +} + +int do_balance_progress(int argc, char **argv) +{ + char *path; + int ret = 0; + int err = 0; + struct btrfs_ioctl_balance_progress bal; + + path = argv[1]; + + ret = get_balance_progress(path, bal); + if (!ret) + printf(\r%llu/%llu block groups moved, + %0.2f%% complete.\n, + bal.completed, + bal.expected, + (float)bal.completed/bal.expected*100.0); + + switch(ret) { + case 0: + break; + case -1: + fprintf(stderr, ERROR: can't access '%s'\n, path); + return 13; + case EINVAL: + if (!monitor) { + fprintf(stderr, + No balance operation running on '%s'.\n, + path); + return 20; + } + break; + default: + fprintf(stderr, ERROR: ioctl returned error %d., err); + return 21; + } + + return 0; +} + int do_remove_volume(int nargs, char **args) { diff --git a/btrfs_cmds.h b/btrfs_cmds.h index 7bde191..47b0a27 100644 --- a/btrfs_cmds.h +++ b/btrfs_cmds.h @@ -23,6 +23,7 @@ int do_defrag(int argc, char **argv); int do_show_filesystem(int nargs, char **argv); int do_add_volume(int nargs, char **args); int do_balance(int nargs, char **argv); +int do_balance_progress(int nargs, char **argv); int do_remove_volume(int nargs, char **args); int do_scan(int nargs, char **argv); int do_resize(int nargs, char **argv); diff --git a/ioctl.h b/ioctl.h index 776d7a9..888ceb9 100644 --- a/ioctl.h +++ b/ioctl.h @@ -132,6 +132,11 @@ struct btrfs_ioctl_space_args { struct btrfs_ioctl_space_info spaces[0]; }; +struct btrfs_ioctl_balance_progress { + __u64 expected; + __u64 completed; +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -169,4 +174,6 @@ struct btrfs_ioctl_space_args { #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64) #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \ struct btrfs_ioctl_space_args) +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \ + struct btrfs_ioctl_balance_progress) #endif diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 26ef982..69d8613 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -21,6 +21,8 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBfilesystem resize\fP\fI [+/\-]size[gkm]|max filesystem\fP .PP +\fBbtrfs\fP \fBbalance progress\fP \fIpath\fP +.PP \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP .PP \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP @@ -148,6 +150,11 @@ Balance the chunks
[PATCH v2 0/3] Balance management, userspace side
These three patches complement the previous two kernel-side patches. The first implements a way of displaying the current progress of any running balance process. The second adds a monitor mode, which watches the progress and makes an estimate of the completion time. The third and final patch allows a running balance to be cancelled. Hugo Mills (3): Balance progress monitoring. Add --monitor option to btrfs balance progress. User-space tool for cancelling balance operations. btrfs.c| 12 btrfs_cmds.c | 187 btrfs_cmds.h |2 + ioctl.h|8 +++ man/btrfs.8.in |7 ++ 5 files changed, 216 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] Balance progress monitoring.
This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. Signed-off-by: Hugo Mills h...@carfax.org.uk --- fs/btrfs/ctree.h |9 +++ fs/btrfs/disk-io.c |2 + fs/btrfs/ioctl.c | 34 + fs/btrfs/ioctl.h |7 ++ fs/btrfs/volumes.c | 61 ++- 5 files changed, 111 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8db9234..67fb603 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -841,6 +841,11 @@ struct btrfs_block_group_cache { struct list_head cluster_list; }; +struct btrfs_balance_info { + u64 expected; + u64 completed; +}; + struct reloc_control; struct btrfs_device; struct btrfs_fs_devices; @@ -1050,6 +1055,10 @@ struct btrfs_fs_info { unsigned metadata_ratio; void *bdev_holder; + + /* Keep track of any rebalance operations on this FS */ + spinlock_t balance_info_lock; + struct btrfs_balance_info *balance_info; }; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b40dfe4..87d9315 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1590,6 +1590,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, spin_lock_init(fs_info-ref_cache_lock); spin_lock_init(fs_info-fs_roots_radix_lock); spin_lock_init(fs_info-delayed_iput_lock); + spin_lock_init(fs_info-balance_info_lock); init_completion(fs_info-kobj_unregister); fs_info-tree_root = tree_root; @@ -1615,6 +1616,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info-sb = sb; fs_info-max_inline = 8192 * 1024; fs_info-metadata_ratio = 0; + fs_info-balance_info = NULL; fs_info-thread_pool_size = min_t(unsigned long, num_online_cpus() + 2, 8); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 463d91b..c247985 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2220,6 +2220,38 @@ static noinline long btrfs_ioctl_wait_sync(struct file *file, void __user *argp) return btrfs_wait_for_commit(root, transid); } +/* + * Return the current status of any balance operation + */ +long btrfs_ioctl_balance_progress( + struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_balance_progress __user *user_dest) +{ + int ret = 0; + struct btrfs_ioctl_balance_progress dest; + + spin_lock(fs_info-balance_info_lock); + if (!fs_info-balance_info) { + ret = -EINVAL; + goto error; + } + + dest.expected = fs_info-balance_info-expected; + dest.completed = fs_info-balance_info-completed; + + spin_unlock(fs_info-balance_info_lock); + + if (copy_to_user(user_dest, dest, +sizeof(struct btrfs_ioctl_balance_progress))) + return -EFAULT; + + return 0; + +error: + spin_unlock(fs_info-balance_info_lock); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2255,6 +2287,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_rm_dev(root, argp); case BTRFS_IOC_BALANCE: return btrfs_balance(root-fs_info-dev_root); + case BTRFS_IOC_BALANCE_PROGRESS: + return btrfs_ioctl_balance_progress(root-fs_info, argp); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 17c99eb..b2103b2 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -145,6 +145,11 @@ struct btrfs_ioctl_space_args { struct btrfs_ioctl_space_info spaces[0]; }; +struct btrfs_ioctl_balance_progress { + __u64 expected; + __u64 completed; +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -189,4 +194,6 @@ struct btrfs_ioctl_space_args { #define BTRFS_IOC_WAIT_SYNC _IOW(BTRFS_IOCTL_MAGIC, 22, __u64) #define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \ struct btrfs_ioctl_async_vol_args) +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \ + struct btrfs_ioctl_balance_progress) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 91851b5..f00edc1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1904,6 +1904,7 @@ int btrfs_balance(struct btrfs_root *dev_root) struct btrfs_root *chunk_root = dev_root-fs_info-chunk_root; struct btrfs_trans_handle *trans; struct btrfs_key found_key; + struct btrfs_balance_info *bal_info
[PATCH v2 0/2] Balance management, kernel side
These two patches give a degree of control over balance operations. The first makes it possible to get an idea of how much work remains to do, by tracking the number of block groups (chunks) that need to be moved/rewritten. The second patch allows a running balance operation to be cancelled when the current block group has been moved. Since the last version, I've added some more locking (assigning to a u64 isn't atomic on non-64-bit architectures). I've not added the sysfs bits, as I haven't had a chance to try out Goffredo's sysfs code yet. I've also not implemented liubo's suggestion of tracking the current block group ID (I'll take that discussion up with him separately -- basically it's not a good fit with the polling method required by this ioctl). Hugo Mills (2): Balance progress monitoring. Cancel filesystem balance. fs/btrfs/ctree.h | 10 fs/btrfs/disk-io.c |2 + fs/btrfs/ioctl.c | 62 fs/btrfs/ioctl.h |8 ++ fs/btrfs/volumes.c | 66 ++- 5 files changed, 146 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] Add --monitor option to btrfs balance progress.
For the impatient, this patch introduces the pot-watching --monitor option, which checks the balance progress at regular intervals, and updates a single status line with the current progress and an estimated completion time. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs_cmds.c | 102 +++ man/btrfs.8.in |4 +- 2 files changed, 96 insertions(+), 10 deletions(-) diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 2745d64..c681b5a 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -797,22 +797,108 @@ int get_balance_progress(char *path, struct btrfs_ioctl_balance_progress *bal) return err; } +const struct option progress_options[] = { + { monitor, 0, NULL, 'm' }, + { NULL, 0, NULL, 0 } +}; + int do_balance_progress(int argc, char **argv) { char *path; int ret = 0; int err = 0; struct btrfs_ioctl_balance_progress bal; + __u64 last_completed = -1; + __u64 initial_completed = -1; + struct timeval now; + struct timeval started; + int monitor = 0; + + optind = 1; + while(1) { + int c = getopt_long(argc, argv, m, progress_options, NULL); + if (c 0) + break; + switch(c) { + case 'm': + monitor = 1; + break; + default: + fprintf(stderr, Invalid arguments for balance progress\n); + free(argv); + return 1; + } + } + + if(optind = argc) { + fprintf(stderr, No filesystem path given for progress\n); + return 1; + } - path = argv[1]; + path = argv[optind]; + do { + int prs = 0; - ret = get_balance_progress(path, bal); - if (!ret) - printf(\r%llu/%llu block groups moved, - %0.2f%% complete.\n, - bal.completed, - bal.expected, - (float)bal.completed/bal.expected*100.0); + ret = get_balance_progress(path, bal); + if (ret) + break; + + if (last_completed != bal.completed) { + printf(\r%llu/%llu block groups moved, + %0.2f%% complete., + bal.completed, + bal.expected, + (float)bal.completed/bal.expected*100.0); + } + + if (initial_completed != -1 +initial_completed != bal.completed) { + ret = gettimeofday(now, NULL); + if (ret) { + fprintf(stderr, Can't read current time\n); + return 22; + } + /* Seconds per block */ + float rate = (float)(now.tv_sec - started.tv_sec) + / (bal.completed - initial_completed); + int secs_remaining = rate + * (bal.expected - bal.completed); + printf( Time remaining); + if (secs_remaining = 60*60*24) { + printf( %dd, secs_remaining / (60*60*24)); + secs_remaining %= 60*60*24; + prs = 1; + } + if (prs || secs_remaining = 60*60) { + printf( %dh, secs_remaining / (60*60)); + secs_remaining %= 60*60; + prs = 1; + } + if (prs || secs_remaining 60) { + printf( %dm, secs_remaining / 60); + secs_remaining %= 60; + } + printf( %ds\x1b[K, secs_remaining); + } + + if (last_completed != -1 last_completed != bal.completed) { + initial_completed = bal.completed; + ret = gettimeofday(started, NULL); + if (ret) { + fprintf(stderr, Can't read current time\n); + return 22; + } + } + + last_completed = bal.completed; + + if (monitor) { + fflush(stdout); + sleep(1); + } else { + printf(\n); + } + } while(monitor); switch(ret) { case 0: diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 69d8613..3f7642e 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -21,7 +21,7 @@ btrfs \- control a btrfs filesystem
[PATCH v2 2/2] Cancel filesystem balance.
This patch adds an ioctl for cancelling a btrfs balance operation mid-flight. The ioctl simply sets a flag, and the operation terminates after the current block group move has completed. Signed-off-by: Hugo Mills h...@carfax.org.uk --- fs/btrfs/ctree.h |1 + fs/btrfs/ioctl.c | 28 fs/btrfs/ioctl.h |3 ++- fs/btrfs/volumes.c |7 ++- 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 67fb603..5fa7163 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -844,6 +844,7 @@ struct btrfs_block_group_cache { struct btrfs_balance_info { u64 expected; u64 completed; + int cancel_pending; }; struct reloc_control; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index c247985..7e38856 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2252,6 +2252,32 @@ error: return ret; } +/* + * Cancel a running balance operation + */ +long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info) +{ + int err = 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + spin_lock(fs_info-balance_info_lock); + if(!fs_info-balance_info) { + err = -EINVAL; + goto error; + } + if(fs_info-balance_info-cancel_pending) { + err = -ECANCELED; + goto error; + } + fs_info-balance_info-cancel_pending = 1; + +error: + spin_unlock(fs_info-balance_info_lock); + return err; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2289,6 +2315,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_balance(root-fs_info-dev_root); case BTRFS_IOC_BALANCE_PROGRESS: return btrfs_ioctl_balance_progress(root-fs_info, argp); + case BTRFS_IOC_BALANCE_CANCEL: + return btrfs_ioctl_balance_cancel(root-fs_info); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index b2103b2..76ae121 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -195,5 +195,6 @@ struct btrfs_ioctl_balance_progress { #define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \ struct btrfs_ioctl_async_vol_args) #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \ - struct btrfs_ioctl_balance_progress) + struct btrfs_ioctl_balance_progress) +#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 26) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f00edc1..64b2f04 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1924,6 +1924,7 @@ int btrfs_balance(struct btrfs_root *dev_root) bal_info-expected = -1; /* One less than actually counted, because chunk 0 is special */ bal_info-completed = 0; + bal_info-cancel_pending = 0; spin_unlock(dev_root-fs_info-balance_info_lock); /* step one make some room on all the devices */ @@ -1989,7 +1990,7 @@ int btrfs_balance(struct btrfs_root *dev_root) key.offset = (u64)-1; key.type = BTRFS_CHUNK_ITEM_KEY; - while (1) { + while (!bal_info-cancel_pending) { ret = btrfs_search_slot(NULL, chunk_root, key, path, 0, 0); if (ret 0) goto error; @@ -2029,6 +2030,10 @@ int btrfs_balance(struct btrfs_root *dev_root) bal_info-completed, bal_info-expected); } ret = 0; + if(bal_info-cancel_pending) { + printk(KERN_INFO btrfs: balance cancelled\n); + ret = -EINTR; + } error: btrfs_free_path(path); spin_lock(dev_root-fs_info-balance_info_lock); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Cancel filesystem balance.
On Fri, Nov 12, 2010 at 03:28:08PM +1100, Chris Samuel wrote: On 12/11/10 12:33, Li Zefan wrote: Is there any blocker that prevents us from canceling balance by just Ctrl+C ? Given that there's been at least 1 report of it taking 12 hours to balance a non-trivial amount of data I suspect putting this operation into the background by default and having the cancel option might be a better plan. Only 12 hours? Last time I tried it, it took 19. :) It would certainly be easy enough to fork a copy of the userspace tool to run the ioctl in the background. Probably a little more work to make the balance a kernel thread. I'd prefer the former, for ease of implementation. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 3: Military Intelligence --- signature.asc Description: Digital signature
Re: [PATCH v2 2/2] Cancel filesystem balance.
On Fri, Nov 12, 2010 at 11:36:55AM +, Hugo Mills wrote: On Fri, Nov 12, 2010 at 03:28:08PM +1100, Chris Samuel wrote: On 12/11/10 12:33, Li Zefan wrote: Is there any blocker that prevents us from canceling balance by just Ctrl+C ? Given that there's been at least 1 report of it taking 12 hours to balance a non-trivial amount of data I suspect putting this operation into the background by default and having the cancel option might be a better plan. Only 12 hours? Last time I tried it, it took 19. :) It would certainly be easy enough to fork a copy of the userspace tool to run the ioctl in the background. Probably a little more work to make the balance a kernel thread. I'd prefer the former, for ease of implementation. How's this? This patch makes a balance operation fork and detach from the current terminal, to run the userspace side of the balance in the background. Introduce a --wait switch so that a synchronous balance can be done if the user requires. Signed-off-by: Hugo Mills h...@carfax.org.uk --- btrfs.c|8 btrfs_cmds.c | 56 +--- man/btrfs.8.in |2 +- 3 files changed, 58 insertions(+), 8 deletions(-) diff --git a/btrfs.c b/btrfs.c index 93f7886..7b42658 100644 --- a/btrfs.c +++ b/btrfs.c @@ -91,12 +91,12 @@ static struct Command commands[] = { filesystem df, path\n Show space usage information for a mount point\n. }, - { do_balance, 1, - filesystem balance, path\n + { do_balance, -1, + filesystem balance, [-w|--wait] path\n Balance the chunks across the device. }, - { do_balance, 1, - balance start, path\n + { do_balance, -1, + balance start, [-w|--wait] path\n Synonym for \btrfs filesystem balance\. }, { do_balance_progress, -1, diff --git a/btrfs_cmds.c b/btrfs_cmds.c index d246a8b..13be603 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -754,12 +754,41 @@ int do_add_volume(int nargs, char **args) } +const struct option balance_options[] = { + { wait, 0, NULL, 'w' }, + { NULL, 0, NULL, 0 } +}; + int do_balance(int argc, char **argv) { - int fdmnt, ret=0; + int background = 1; struct btrfs_ioctl_vol_args args; - char*path = argv[1]; + char *path; + int ttyfd; + + optind = 1; + while(1) { + int c = getopt_long(argc, argv, w, balance_options, NULL); + if (c 0) + break; + switch(c) { + case 'w': + background = 0; + break; + default: + fprintf(stderr, Invalid arguments for balance\n); + free(argv); + return 1; + } + } + + if(optind = argc) { + fprintf(stderr, No filesystem path given for balance\n); + return 1; + } + + path = argv[optind]; fdmnt = open_file_or_dir(path); if (fdmnt 0) { @@ -767,8 +796,29 @@ int do_balance(int argc, char **argv) return 12; } + if (background) { + int pid = fork(); + if (pid == 0) { + /* We're in the child, and can run in the background */ + ttyfd = open(/dev/tty, O_RDWR); + if (ttyfd 0) + ioctl(ttyfd, TIOCNOTTY, 0); + /* Fall through to the BTRFS_IOC_BALANCE ioctl */ + } else if (pid 0) { + /* We're in the parent, and the fork succeeded */ + printf(Background balance started\n); + return 0; + } else { + /* We're in the parent, and the fork failed */ + fprintf(stderr, ERROR: can't start background process -- %s\n, + strerror(errno)); + } + } + memset(args, 0, sizeof(args)); - ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, args); + printf(ioctl\n); + sleep(60); + /* ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, args); */ close(fdmnt); if(ret0){ fprintf(stderr, ERROR: balancing '%s'\n, path); diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 3f7642e..1410aaa 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -27,7 +27,7 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP .PP -\fBbtrfs\fP \fBdevice balance\fP\fI path \fP +\fBbtrfs\fP \fBdevice balance\fP [\fB-w\fP|\fB--wait\fP] \fIpath\fP .PP \fBbtrfs\fP \fBdevice add\fP\fI dev [dev..] path \fP .PP -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from
Re: my mail
On Fri, Nov 12, 2010 at 07:33:57PM +, h...@carfax.org.uk wrote: From 2de353ddda78ef5cbc84e1d3267606bc44e48faa Mon Sep 17 00:00:00 2001 Gaah. This worked last night. Sorry. :( -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You got very nice eyes, Deedee. Never noticed them --- before. They real? signature.asc Description: Digital signature
Re: Update to Project_ideas wiki page
On Tue, Nov 16, 2010 at 10:19:45PM -0500, Chris Ball wrote: Hi, Chris Mason has posted a bunch of interesting updates to the Project_ideas wiki page. If you're interested in working on any of these, feel free to speak up and ask for more information if you need it. Here are the new sections, for the curious: == Block group reclaim == The split between data and metadata block groups means that we sometimes have mostly empty block groups dedicated to only data or metadata. As files are deleted, we should be able to reclaim these and put the space back into the free space pool. We also need rebalancing ioctls that focus only on specific raid levels. == Changing RAID levels == We need ioctls to change between different raid levels. Some of these are quite easy -- e.g. for RAID0 to RAID1, we just halve the available bytes on the fs, then queue a rebalance. I would be interested in the rebalancing ioctls, and in RAID level management. I'm still very much trying to learn the basics, though, so I may go very slowly at first... Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- We demand rigidly defined areas of doubt and uncertainty! --- signature.asc Description: Digital signature
Re: Update to Project_ideas wiki page
On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote: Can I suggest we combine this new RAID level management with a modernisation of the terminology for storage redundancy, as has been discussed previously in the Raid1 with 3 drives thread of March this year? I.e. abandon the burdened raid* terminology in favour of something that makes more sense for a filesystem. Well, our current RAID modes are: * 1 Copy (SINGLE) * 2 Copies (DUP) * 2 Copies, different spindles (RAID1) * 1 Copy, 2 Stripes (RAID0) * 2 Copies, 2 Stripes [each] (RAID10) The forthcoming RAID5/6 code will expand on that, with * 1 Copy, n Stripes + 1 Parity (RAID5) * 1 Copy, n Stripes + 2 Parity (RAID6) (I'm not certain how n will be selected -- it could be a config option, or simply selected on the basis of the number of spindles/devices currently in the FS). We could further postulate a RAID50/RAID60 mode, which would be * 2 Copies, n Stripes + 1 Parity * 2 Copies, n Stripes + 2 Parity For brevity, we could collapse these names down to: 1C, 2C, 2CR, 1C2S, 2C2S, 1CnS1P, 1CnS2P, 2CnS1P, 2CnS2P. However, that's probably a bit too condensed for useful readability. I'd support some set of terms based on this taxonomy, though, as it's fairly extensible, and tells you the details of the duplication strategy in question. Mostly this would involve a discussion about what terms would make most sense, though some changes in the behaviour of btrfs redundancy modes may be warranted if they make things more intuitive. Consider the above a first suggestion. :) I could help you make these changes in your patches, or write my own patches against yours, though I'm also completely new to kernel development. Probably best to keep the kernel internals unchanged for this particular issue, as they don't make much difference to the naming, but patches to the userspace side of things (mkfs.btrfs and btrfs fi df specifically) should be fairly straightforward. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- gdb The enemy have elected for Death by Powerpoint. That's --- what they shall get. signature.asc Description: Digital signature
Re: Update to Project_ideas wiki page
On Wed, Nov 17, 2010 at 07:14:47PM +0100, Andreas Philipp wrote: On 17.11.2010 18:56, Hugo Mills wrote: On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote: Can I suggest we combine this new RAID level management with a modernisation of the terminology for storage redundancy, as has been discussed previously in the Raid1 with 3 drives thread of March this year? I.e. abandon the burdened raid* terminology in favour of something that makes more sense for a filesystem. Well, our current RAID modes are: * 1 Copy (SINGLE) * 2 Copies (DUP) * 2 Copies, different spindles (RAID1) * 1 Copy, 2 Stripes (RAID0) * 2 Copies, 2 Stripes [each] (RAID10) The forthcoming RAID5/6 code will expand on that, with * 1 Copy, n Stripes + 1 Parity (RAID5) * 1 Copy, n Stripes + 2 Parity (RAID6) (I'm not certain how n will be selected -- it could be a config option, or simply selected on the basis of the number of spindles/devices currently in the FS). Just one question on small n: If one has N = 3*k = 6 spindles, then RAID5 with n = N/2-1 results in something like RAID50? So having an option for small n might realize RAID50 given the right choice for n. I see what you're getting at, but actually, that would just be RAID-5 with small n. It merely happens to spread chunks out over more spindles than the minimum n+1 required to give you what you asked for. (See the explanation below for why). We could further postulate a RAID50/RAID60 mode, which would be * 2 Copies, n Stripes + 1 Parity * 2 Copies, n Stripes + 2 Parity Isn't this RAID51/RAID61 (or 15/16 unsure on how to put) and would RAID50/RAID60 correspond to Errr... yes, you're right. My mistake. Although... again, see the conclusion below. :) * 2 Stripes, n Stripes + 1 Parity * 2 Stripes, n Stripes + 2 Parity I'm not sure talking about RAID50-like things (as you state above) makes much sense, given the internal data structures that btrfs uses: As far as I know(*), data is firstly allocated in chunks of about 1GiB per device. Chunks are grouped together to give you replication. So, for a RAID-0 or RAID-1 arrangement, chunks are allocated in pairs, picked from different devices. For RAID-10, they're allocated in quartets, again on different devices. For RAID-5, they'd be allocated in groups of n+1. For RAID-61, we'd use 2n+4 chunks in an allocation. For replication strategies where it matters (anything other than DUP, SINGLE, RAID-1 so far), the chunks are then subdivided into stripes of a fixed width. Data written to the disk is spread across the stripes in an appropriate manner. From this point of view, RAID50 and RAID51 look much the same, unless the stripe size for the 5 is different to the stripe size for the 0 or 1. I'm not sure that's the case. If the stripe sizes are the same, you'll basically get the same layout of data across the 2n+2 chunks -- it's just that (possibly) the internal labels of the chunks which indicate which bit of data they're holding in the pattern will be different. Hugo. (*) I could be wrong, hopefully someone will correct me if so. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A cross? Oy vey, have you picked the wrong vampire! --- signature.asc Description: Digital signature
Re: A little confused about what remains to make a stable release
On Wed, Nov 17, 2010 at 02:27:39PM -0800, Daniel Farina wrote: I have been tracking the development of btrfs for some time, as the built-in support for snapshotting would be of great convenience for relational database use cases. I have been crawling the wiki (especially the FAQ), but I still don't have a clear sense of what's left *besides* the need for a 'fsck' utility that can be called absolutely vital. That, and testing and bug reports. This question (and answer) should probably go in the FAQ... Nobody is going to magically stick a label on btrfs and say it's stable now! Software -- particularly something as complex as this -- just doesn't work that way. It's stable *for you* when it functions with the workloads *you* expect of it, with a failure rate that is acceptable *to you*. For what I'm using it for right now, it's already stable by that definition, *for me*. From the materials I've been able to find, it's hard for me to get a sense of how one could assist the project towards being recommended for general use; do the denizens of this list has a sense of what those things might be? (Or a link?) The primary things you can do: Use it, test it, file bug reports. Do this with, as close as you can, the use-cases (IOPs/s, feature uses, data sizes) that you want to use it for. Beyond that: Fix bugs. Add the features that you think are important. Add features that other people think are important (see the Project Ideas page on the wiki for the latter). Hugo. /2-penn'orth -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The trouble with you, Ibid, is you think you know everything. --- signature.asc Description: Digital signature
Re: A little confused about what remains to make a stable release
On Wed, Nov 17, 2010 at 05:46:30PM -0700, Anthony Roberts wrote: It's stable *for you* when it functions with the workloads *you* expect of it, with a failure rate that is acceptable *to you*. I think there's a few ancillary things like a working fsck needed before it can even be recommended for widespread use, even to users willing to risk any residual bugs. IIRC at this point the utilities don't even aspire to provide basic recovery functionality (though Chris has posted that fsck is coming). Beyond that, the management capabilities at this point don't look ready for long term use in a production environment. By this I mean adding/removing disks, That much is already there and working. reshaping arrays, etc. Without that I might use BTRFS on top of LVM/RAID just like any other filesystem, and there's features I'm looking forward to even if I that's all I can do, but without robust management features there's certain environments where it just doesn't make sense yet. What do you think is missing? Could you create and maintain a wishlist page on the wiki[1], and populate it with all the things that people need for production use? (This is an ongoing task -- track what's actually finished and remove it; track what's currently being worked on and mark it as such; keep an eye on discussions on the mailing list for things that people need...) There's one or two other things I'm keeping an eye on. That limitation on the number of hardlinks you can have in a directory is kinda irksome. Also, dedup needs a way to verify/dedup safely before people can start doing stuff like deduping live VM images. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/Main_Page -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Someone's been throwing dead sheep down my Fun Well --- signature.asc Description: Digital signature
Re: SI units
On Thu, Nov 18, 2010 at 02:53:00PM +0100, Helmut Hullen wrote: Du meintest am 18.11.10: when I invoke btrfs filesystem show then it shows the size of my Terabyte disks in TiByte but tells TB. It's a difference of about 10% - either there should be a switch like in df (option -H or --si), or TB should be changed to TiB (the same with GiB, MiB etc.) I posted patches[1] to do just that, a few weeks ago. I've just compiled btrfs from git (20101117) - at least this patch isn't included. git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git as recommended in https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories Do I use an antique version? No, that's the latest version, as far as I know. The patches haven't been picked up and integrated by Chris yet. (In fact, I should probably send them again). In the meantime, I'm afraid you'll have to apply the patches manually. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Questions are a burden, and answers a prison for oneself. --- signature.asc Description: Digital signature
Re: btrfs problems and fedora 14
Hi, On Tue, Nov 23, 2010 at 10:19:43AM +1100, david grant wrote: I thought I would try btrfs on a new installation of f14. yes, I know its experimental but stable so it seemed to be a good time to try it. I am not sure if I have missed something out of all my searching but am I correct in thinking that currently: I. it is not possible to boot from a snapshot of the operating system and, in particular, the yum snapshots cannot be used for that purpose You can use btrfs subvolume set-default to set the default subvolume that is mounted if no subvol= or subvolid= parameter is given to mount. (And you can then subsequently access the original root of the filesystem using mount -o subvolid=0). II. it is so easy to create raid arrays of btrfs partitions but they cannot be read by f13 or f14 There's no particular reason that this should be the case. How do you come to this conclusion? What did you try, what did you expect to happen, and what actually happened? III. it is not possible to copy btrfs partitions with snapshots except possibly by the use of dd. Again, I can't see a reason that this shouldn't work. What are you trying to do, exactly, and how is it failing? Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. signature.asc Description: Digital signature
Re: Errors during defragmentation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, Nov 29, 2010 at 10:02:56PM +0100, Andrej Podzimek wrote: Hello, I decided to test the 'defragment' feature on my system (after a huge number of system updates and prelinking): find /bin /sbin /lib /usr/lib /usr/bin /usr/sbin -type d -exec btrfs filesystem defragment '{}' '+' I have already defragmented a couple of (very large) directories with no errors at all, so this was expected to work somehow. Surprisingly, this time there were thousands of messages like this: ioctl failed on directory name ret -1 errno 28 errno 28 is ENOSPC You've run out of disk space. (Or at least, btrfs thinks so). Most of the reported directories had zero files/subdirectories. However, *most* of them were *not* empty... What does this error message mean? Could someone shed more light on this, please? Should I get ready for a bad crash? ;-) (There seems to be no data loss so far. No new messages in dmesg, no unexpected system behavior.) Hugo. - -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFM9BoxIKyzvlFcI40RAv5SAJkB13ClPuTeRElrN1ARFhvDJ2C76gCghC/d zFczJesxQGbd2jC2ildNNI0= =i+tI -END PGP SIGNATURE- signature.asc Description: Digital signature
Re: What to do about subvolumes?
On Wed, Dec 01, 2010 at 09:21:36AM -0500, Josef Bacik wrote: === Quotas === This is a huge topic in and of itself, but Christoph mentioned wanting to have an idea of what we wanted to do with it, so I'm putting it here. There are really 2 things here 1) Limiting the size of subvolumes. This is really easy for us, just create a subvolume and at creation time set a maximum size it can grow to and not let it go farther than that. Nice, simple and straightforward. 2) Normal quotas, via the quota tools. This just comes down to how do we want to charge users, do we want to do it per subvolume, or per filesystem. My vote is per filesystem. Obviously this will make it tricky with snapshots, but I think if we're just charging the diff's between the original volume and the snapshot to the user then that will be the easiest for people to understand, rather than making a snapshot all of a sudden count the users currently used quota * 2. This is going to be tricky to get the semantics right, I suspect. Say you've created a subvolume, A, containing 10G of Useful Stuff (say, a base image for VMs). This counts 10G against your quota. Now, I come along and snapshot that subvolume (as a writable subvolume) -- call it B. This is essentially free for me, because I've got a COW copy of your subvolume (and the original counts against your quota). If I now modify a file in subvolume B, the full modified section goes onto my quota. This is all well and good. But what happens if you delete your subvolume, A? Suddenly, I get lumbered with 10G of extra files. Worse, what happens if someone else had made a snapshot of A, too? Who gets the 10G added to their quota, me or them? What if I'd filled up my quota? Would that stop you from deleting your copy, because my copy can't be charged against my quota? Would I just end up unexpectedly 10G over quota? This is a whole gigantic can of worms, as far as I can see, and I don't think it's going to be possible to implement quotas, even on a filesystem level, until there's some good and functional model for dealing with all the implications of COW copies. :( Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. signature.asc Description: Digital signature
Re: What to do about subvolumes?
On Wed, Dec 01, 2010 at 12:38:30PM -0500, Josef Bacik wrote: On Wed, Dec 01, 2010 at 04:38:00PM +, Hugo Mills wrote: On Wed, Dec 01, 2010 at 09:21:36AM -0500, Josef Bacik wrote: === Quotas === This is a huge topic in and of itself, but Christoph mentioned wanting to have an idea of what we wanted to do with it, so I'm putting it here. There are really 2 things here 1) Limiting the size of subvolumes. This is really easy for us, just create a subvolume and at creation time set a maximum size it can grow to and not let it go farther than that. Nice, simple and straightforward. 2) Normal quotas, via the quota tools. This just comes down to how do we want to charge users, do we want to do it per subvolume, or per filesystem. My vote is per filesystem. Obviously this will make it tricky with snapshots, but I think if we're just charging the diff's between the original volume and the snapshot to the user then that will be the easiest for people to understand, rather than making a snapshot all of a sudden count the users currently used quota * 2. This is going to be tricky to get the semantics right, I suspect. Say you've created a subvolume, A, containing 10G of Useful Stuff (say, a base image for VMs). This counts 10G against your quota. Now, I come along and snapshot that subvolume (as a writable subvolume) -- call it B. This is essentially free for me, because I've got a COW copy of your subvolume (and the original counts against your quota). If I now modify a file in subvolume B, the full modified section goes onto my quota. This is all well and good. But what happens if you delete your subvolume, A? Suddenly, I get lumbered with 10G of extra files. Worse, what happens if someone else had made a snapshot of A, too? Who gets the 10G added to their quota, me or them? What if I'd filled up my quota? Would that stop you from deleting your copy, because my copy can't be charged against my quota? Would I just end up unexpectedly 10G over quota? If you delete your subvolume A, like use the btrfs tool to delete it, you will only be stuck with what you changed in snapshot B. So if you only changed 5gig worth of information, and you deleted the original subvolume, you would have 5gig charged to your quota. This doesn't work, though, if the owners of the original and new subvolume are different: Case 1: * Porthos creates 10G data. * Athos makes a snapshot of Porthos's data. * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of Porthos's data to Athos. * Porthos deletes his copy of the data. Case 2: * Porthos creates 10G of data. * Athos makes a snapshot of Porthos's data. * Porthos deletes his copy of the data. * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of Porthos's data to Athos. Case 3: * Porthos creates 10G data. * Athos makes a snapshot of Porthos's data. * Aramis makes a snapshot of Porthos's data. * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of Porthos's data to Athos. * Porthos deletes his copy of the data. Case 4: * Porthos creates 10G data. * Athos makes a snapshot of Porthos's data. * Aramis makes a snapshot of Athos's data. * Porthos deletes his copy of the data. [Consider also Richelieu changing ownerships of Athos's and Aramis's data at alternative points in this sequence] In each of these, who gets charged (and how much) for their copy of the data? The idea is you are only charged for what blocks you have on the disk. Thanks, My point was that it's perfectly possible to have blocks on the disk that are effectively owned by two people, and that the person to charge for those blocks is, to me, far from clear. You either end up charging twice for a single set of blocks on the disk, or you end up in a situation where one person's actions can cause another person's quota to fill up. Neither of these is particularly obvious behaviour. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. signature.asc Description: Digital signature
Re: What to do about subvolumes?
On Wed, Dec 01, 2010 at 12:24:28PM -0800, Freddie Cash wrote: On Wed, Dec 1, 2010 at 11:35 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: The idea is you are only charged for what blocks you have on the disk. Thanks, My point was that it's perfectly possible to have blocks on the disk that are effectively owned by two people, and that the person to charge for those blocks is, to me, far from clear. You either end up charging twice for a single set of blocks on the disk, or you end up in a situation where one person's actions can cause another person's quota to fill up. Neither of these is particularly obvious behaviour. As a sysadmin and as a user, quotas shouldn't be about physical blocks of storage used but should be about logical storage used. IOW, if the filesystem is compressed, using 1 GB of physical space to store 10 GB of data, my quota used should be 10 GB. Similar for deduplication. The quota is based on the storage *before* the file is deduped. Not after. Similar for snapshots. If UserA has 10 GB of quota used, I snapshot their filesystem, then my quota used would be 10 GB as well. As data in my snapshot changes, my quota used is updated to reflect that (change 1 GB of data compared to snapshot, use 1 GB of quota). So if I've got 10G of data, and I snapshot it, I've just used another 10G of quota? You have to (or at least should) keep two sets of stats for storage usage: - logical amount used (real file size, before compression, before de-dupe, before snapshots, etc) - physical amount used (what's actually written to disk) User-level quotas are based on the logical storage used. Admin-level quotas (if you want to implement them) would be based on physical storage used. Thus, the output of things like df, du, ls would show the logical storage used and file sizes. And you would either have an additional option to those apps (--real or something) to show the actual storage used and file sizes as stored on disk. Trying to make quotas and disk usage utilities to work based on what's physically on disk is just backwards, imo. And prone to a lot of confusion. Trying to make quotas work based on what's physically on the disk appears to have serious issues on the semantics of using up space, so I agree with you on this point (and, indeed, it was the point I was trying to make). However, doing it that way also effectively penalises users and prevents (or severely discourages) them from using the advanced functions of the filesystem. There's no benefit (in disk usage terms) to the user in using a snapshot -- they might as well use plain cp. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. signature.asc Description: Digital signature
Re: 800 GByte free, but no space left
On Sun, Dec 05, 2010 at 04:08:26AM -0700, Evert Vorster wrote: On Sun, Dec 5, 2010 at 12:48 AM, Helmut Hullen hul...@t-online.de wrote: Hallo, Evert, Du meintest am 04.12.10 zum Thema Re: 800 GByte free, but no space left: I am not an expert on this by a long shot, but it looks like you added these two disks in raid0. Nope -- btrfs will spread out its allocations across both disks. This means that the total space cannot exceed the space of the smallest disk. [...] Especially: no RAID definition. If the smallest device defines the capacity then I should use 2*1.35 TiByte, but my system tells no space left at about 2.4 TiByte - where are (at least) 300 GiByte hidden? devid2 size 1.35TB used 1.35TB path /dev/sdc3 devid1 size 1.81TB used 1.35TB path /dev/sdf2 Here devid 2 is at 100%, and hence you are getting the no more space left errors. So, the 300 TB is on the bigger disk, and not usable for you right now. I _think_ that a balance is all that's needed at this point. It can't hurt, anyway (other than taking quite a long time). I know of the disk mode you speak.. an old raid card of mine called it Just a bunch of disks and it literally filled up the first disk before carrying on to the second one until that was full under windows... under UNIX it had the effect of just adding all the sectors to each other, and stretching the file system over the disks in a linear fashion. Most UNIX file systems writes files in the middle of the largest contiguous free space, which meant that some files got written on the first disk, and some on the second. As far as I know, btrfs does not support this raid mode. It does support it: that's what the single RAID profile in mkfs.btrfs is. It attempts to use the disk space marginally more intelligently than traditional linear mode, though, as it allocates block groups (in chunks of about 1G) to each disk in turn. This isn't the same as RAID-0, which stripes within block groups with a much smaller stripe size. Another thing to keep in mind is that as far as I know you cannot remove devid 1 from a btrfs volume. This is due to be fixed, but I have no idea on the status of that. I've done it (I have a filesystem with IDs 7, 8, 9, 12, 13, 14). Looks like that particular problem has been fixed. You could, if you really wanted to use all of two differently sized disks in a btrfs, subdivide the disks in equal sized partitions, and just put all of those partitions in a btrfs raid0... [...] That would be a really bad idea, as your disks would thrash horribly, reading stripes from different locations on the disk. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Nostalgia isn't what it used to be. --- signature.asc Description: Digital signature
Re: 800 GByte free, but no space left
On Mon, Dec 06, 2010 at 02:13:00PM +0100, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 06.12.10: But after copying about 300 MByte (part of a 1.5-GByte *.mpg) I got no space left on device. Looks like balancing has stolen about 300 GByte. This sounds exactly like a problem I've had. What output do you get from btrfs fi df /srv/MM? I've just written a script for gathering the (perhaps) interesting data ... # btrfs filesystem show Label: 'MM2' uuid: ad7c0668-316c-4a79-ba00-3b505b9d99b4 Total devices 2 FS bytes used 2.37TB devid2 size 1.35TB used 1.20TB path /dev/sdc3 devid1 size 1.81TB used 1.20TB path /dev/sdf2 Btrfs Btrfs v0.19 # btrfs filesystem df /srv/MM Data: total=2.39TB, used=2.37TB Metadata: total=5.25GB, used=3.51GB System: total=12.00MB, used=188.00KB Can you try that again with either the latest 2.6.37-rc, or with the btrfs-unstable kernel? There's a bug in earlier versions that breaks the reporting of RAID types, which is what I wanted to see here. Do you mean git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git as btrfs-unstable kernel? Yes. It's 2.6.36, plus the patches that Chris has sent to Linus for inclusion into 2.6.37. Compiling 2.6.37-rc is no big problem, it only needs som time. Just now I'm using Kernel 2.6.35.8 btrfs-git from 20101117 I've moved about 50 Gbyte away from srv/MM in the meantime, before running the script with this output. And I don't dare running balance again - maybe it reduces the available space again and again. If you've hit the bug I think you have, then yes, it will. Hmm - it can't get worse ... If the error is related to the kernel or to the btrfs version and I try a newer one: can that lead to more free space? Not yet. I've taken the whole of December off work (using up my leave allocation for last year), and my plan is to get myself to the point where I can understand enough of the code to fix this particular problem. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What are we going to do tonight? The same thing we do --- every night, Pinky. Try to take over the world! signature.asc Description: Digital signature
Re: 800 GByte free, but no space left
Helmut - On Mon, Dec 06, 2010 at 03:45:00PM +0100, Helmut Hullen wrote: If/when I install 2.6.37-rc4: should I update btrfs (from the 20101117 version)? I think that's the latest version. How can I see that changing the kernel makes things better? It's more and more difficult to externalize (?) btrfs directories to other disks ... Updating the kernel won't fix the problem I'm thinking of (sorry). It will, however, fix the bug that stops the btrfs tool from reporting what RAID levels you've got. The problem I suspect you may have (because your symptoms seem to be the same as mine) is that there are some circumstances where the filesystem can change RAID levels pretty much arbitrarily. Running btrfs fi df with a kernel that reports RAID levels will show whether that's the case, as you'll have more than one RAID level listed. Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn't have made us of meat. signature.asc Description: Digital signature
Re: 800 GByte free, but no space left
On Mon, Dec 06, 2010 at 06:13:00PM +0100, Helmut Hullen wrote: Hallo, Hugo, Du meintest am 06.12.10: How can I see that changing the kernel makes things better? It's more and more difficult to externalize (?) btrfs directories to other disks ... Updating the kernel won't fix the problem I'm thinking of (sorry). It will, however, fix the bug that stops the btrfs tool from reporting what RAID levels you've got. The problem I suspect you may have (because your symptoms seem to be the same as mine) is that there are some circumstances where the filesystem can change RAID levels pretty much arbitrarily. Running btrfs fi df with a kernel that reports RAID levels will show whether that's the case, as you'll have more than one RAID level listed. Kernel 2.6.37-rc4: # btrfs filesystem df /srv/MM Data, RAID0: total=2.39TB, used=2.37TB System, RAID1: total=8.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=4.25GB, used=3.51GB Metadata, DUP: total=1.00GB, used=2.33MB Hope it helps! Yup. You've got what I've got(*). You have two different RAID types for metadata, which shouldn't happen (but does, due to a bug). Hugo. (*) My .sig fairy is clearly working overtime for appropriate quotations. :) -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Charting the inexorable advance of Western syphilisation... --- signature.asc Description: Digital signature
Re: 800 GByte free, but no space left
Helmut - On Tue, Dec 07, 2010 at 06:05:00PM +0100, Helmut Hullen wrote: Du meintest am 06.12.10: Kernel 2.6.37-rc4: # btrfs filesystem df /srv/MM Data, RAID0: total=2.39TB, used=2.37TB System, RAID1: total=8.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=4.25GB, used=3.51GB Metadata, DUP: total=1.00GB, used=2.33MB Hope it helps! Yup. You've got what I've got(*). You have two different RAID types for metadata, which shouldn't happen (but does, due to a bug). Fear I right that balancing tries to reduce the system to something like RAID1? It _should_ move all the data on the disk to somewhere else on the disk, whilst honouring the RAID settings for the filesystem. However, since it's got buggered up RAID settings now and has been using some of the space for the wrong RAID type, the balance can't find space with the right RAID parameters to write to, so it runs out of space. (I think I got that right, anyway. I'm working off a conversation with Chris on IRC some weeks ago, about what happened to my filesystem). Hugo. -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Always be sincere, whether you mean it or not. --- signature.asc Description: Digital signature
Re: btrfs-progs branch updated
On Thu, Jul 05, 2012 at 04:01:22PM -0400, Chris Mason wrote: Hi everyone, I've updated the master branch with the pending stable btrfs-progs commit that should make the 0.20 release. Thanks to Hugh for helping to queue up a few of them. We'll have more ^ that's an o, not an h. :) frequent releases from here as we pull in the major new features going into progs (raid5/6, send/receive, quotas, fsck improvements). I've still got a stack of new feature patches sitting here that mostly apply OK to integration. I'll try to filter out the ones already applied and put together another stack in approximately kernel-feature order. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Nothing right in my left brain. Nothing left in --- my right brain. signature.asc Description: Digital signature
Re: btrfs fi df won't show total=
On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote: On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be observed: after a change of the profiles, total=,used= is no longer shown: 20:49 mmsrv1:~ # btrfs fi df /top.srv/ Data, RAID10: total=152.00GiB, used=121.07GiB System, RAID1: total=40.00MiB, used=44.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID1: total=112.00GiB, used=1.30GiB Metadata: total=8.00MiB, used=0.00 20:50 mmsrv1:~ # btrfs fi bal start -mconvert=raid10 -sconvert=raid10 /top.srv/ Refusing to explicitly operate on system chunks. Pass --force if you really want to do that. 20:52 mmsrv1:~ # btrfs fi bal start -mconvert=raid10 -sconvert=raid10 --force /top.srv/ ... 21:10 mmsrv1:~ # btrfs fi df /top.srv/ Data, RAID10: total=156.00GiB, used=124.35GiB System, RAID10: total=128.00MiB, used=48.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID10: total=112.00GiB, used=1.38GiB What's the problem here? You no longer have any RAID1 chunks, so it's not showing them. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- IMPROVE YOUR ORGANISMS!! -- Subject line of spam email --- signature.asc Description: Digital signature
Re: btrfs fi df won't show total=
On Mon, Jul 09, 2012 at 10:06:24PM +0200, Jan Engelhardt wrote: On Monday 2012-07-09 21:25, Hugo Mills wrote: On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote: On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be observed: after a change of the profiles, total=,used= is no longer shown: 20:49 mmsrv1:~ # btrfs fi df /top.srv/ Data, RAID10: total=152.00GiB, used=121.07GiB System, RAID1: total=40.00MiB, used=44.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID1: total=112.00GiB, used=1.30GiB Metadata: total=8.00MiB, used=0.00 [...] 21:10 mmsrv1:~ # btrfs fi df /top.srv/ Data, RAID10: total=156.00GiB, used=124.35GiB System, RAID10: total=128.00MiB, used=48.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID10: total=112.00GiB, used=1.38GiB What's the problem here? You no longer have any RAID1 chunks, so it's not showing them. Rather tha a 4-line output, I would have expected this 6-line output that I would also get when mkfs'ing a new fresh btrfs volume with raid10 from the start: Data, RAID10: total=156.00GiB, used=124.35GiB Data: total=foo, used=bar System, RAID10: total=128.00MiB, used=48.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID10: total=112.00GiB, used=1.38GiB Metadata: total=foo, used=bar The lines without the RAID marker are there as a result of the way that mkfs works -- it creates stub chunks which are never used, and then upgrades to the required RAID level immediately afterwards. The balance (any balance, not just a conversion) processes these chunks as well as all the other chunks in the FS, and rewrites all of the data in them (all 0 bytes of it) somewhere else, removing the originals. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, you don't get to be a kernel hacker simply by looking --- good in Speedos. -- Rusty Russell signature.asc Description: Digital signature
Re: Can't mount, power failure - recoverable?
On Fri, Jul 13, 2012 at 02:23:53PM +0200, Martin Steigerwald wrote: Am Montag, 26. März 2012 schrieb Skylar Burtenshaw: Fajar A. Nugraha list at fajar.net writes: Didn't Chris' last response basically say use kernel 3.2 or newer, mount the fs (possibly with -o ro), and copy the data elsewhere? Why yes, yes it did actually. I appreciate your spotlighting it, just in case I somehow managed to miss it, though. Have you done that? I have. In fact, in my first message, I stated that in all kernels up to present 3.2 kernels, I get several minutes of disk churning, then a stack trace. Also present in my messages is the fact that the filesystem will not mount, as well as data output from the recovery program etc which fail to recognize things in the filesystem that they require in order to fix it. Did you have something you wished to suggest, in order to help me? If so, I'd gladly listen to any proposed ideas. Since I didn´t found any explicit mention on it: Did you try btrfs-zero-log on the partition prior to mounting it? All of my BTRFS will not mount after sudden write interruption cases have been solved by it. Except one with a BTRFS RAID 0 with lots of 2 TB drives at a time where I didn´t know about btrfs-zero-log. Maybe it would have helped there, too. Of course I could be completely off track and this could be a completely different issue. I'm afraid you probably are -- there's nothing I can see in the stack trace that would indicate that it's falling over in the log tree replay, which is the only thing that btrfs-zero-log would help with. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A diverse working environment: Di longer you vork here, di --- verse it gets. signature.asc Description: Digital signature
Re: Can't mount, power failure - recoverable?
On Sat, Jul 14, 2012 at 01:01:04AM +, Skylar Burtenshaw wrote: I noticed there've been some recent (since I last looked at least) updates including fsck and such, however I haven't run anything git-based since the last time I pulled the btrfs tools, and I had to dig for ages to find info on how to get the RECENT stuff from the CORRECT source. I can find a dozen Google results that seem relevant, but can someone give me a definitive answer on which tree to pull down (and how) to test the new tools on my mess? This is the definitive source on where to get things: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories You will need the official -progs repository, as that's most up to date right now. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 4: Future Perfect --- signature.asc Description: Digital signature
Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)
On Tue, Jul 24, 2012 at 08:39:36PM -0400, Marios Titas wrote: When I create a btrfs volume of size strictly less than 256 MiB then if I do mount /dev/sdb1 /mnt/test the kernel tries unsuccessfully to do the mount with many other file systems before successfully trying with btrfs. For volumes of size larger than or equal to 256 MiB it just mounts the volume without doing that. Why is this discrepancy? Are you using the --mixed option when creating the filesystem? If not, you should do with something that small. Hugo. Another possibly related symptom is that the volume does not appear in /dev/disk/by-label and /dev/disk/by-uuid at all. This means that it is impossible to mount the volume by uuid or label. To make sure that this isn't a udev bug, I booted my system with init=/bin/bash in the kernel command line, and then I tried again to mount the volume. This time it would not mount it at all unless I explicitly specified the fs type. On the other hand, it could mount larger volumes without any issues. All the experiments were done in an initially zeroed out disk. I am using 3.4.6 kernel with btrfs from 3.5 and the latest btrfs-progs from git. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In theory, theory and practice are the same. In --- practice, they're different. signature.asc Description: Digital signature
Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)
On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote: On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen sensi...@gmx.net wrote: On 04.07.2012 15:39, Alexander Block wrote: Hello all, This is the user space side of btrfs send/receive. You can apply them manually or use my git repo: git://github.com/ablock84/btrfs-progs.git (branch send) The branch is based on Hugo's integration-20120605 branch. I had to add a temporary commit to fix a bug introduced in one of the strncpy/overflow patches that got into btrfs-progs. This fix is not part of the btrfs send/receive patchset, but you'll probably need it if you want to base on the integration branch. I hope this is not required in the future when a new integration branch comes out. Example usage: Multiple snapshots at once: btrfs send /mnt/snap[123] snap123.btrfs a) Do we really want a single token command here, not btrfs filesystem send or subvol send? In my opinion the single token is easier to type and remember. But if enough speaks for normal subcommands this can be changed (but by someone else as I'm running out of time). Since everything else is two commands, yes, I think we need it for consistency. (And, since it's a publically-visible interface, for acceptance of the patches -- we don't want to be changing the way the commands work after the fact). b) zfs makes sure stdout is not a tty, to prevent flooding your console. This kinda makes sense. This makes sense. But again, this has to be done by someone else. Can you keep a brief list of such cleanups/features and dump it on the wiki as a proposed project when your time does run out, please. That way the details don't get lost, and they can be found by other people and dealt with independently. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Turning, pages turning in the widening bath, / The spine --- cannot bear the humidity. / Books fall apart; the binding cannot hold. / Page 129 is loosed upon the world. signature.asc Description: Digital signature
Re: fail to mount after first reboot
On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote: I created a 1TB RAID1. So far it is just for testing, no important data on there. After a reboot, I tried to mount it again # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0 mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg00-btrfsvol0_0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so With multi-volume btrfs filesystems, you have to run btrfs dev scan before trying to mount it. Usually, the distribution will do this in the initrd (if you've installed its btrfs-progs package). Then I did btrfsck - it reported no errors, but mounted OK: # btrfsck /dev/mapper/vg00-btrfsvol0_0 [...] The first thing that btrfsck does is to do a device scan. [...] Can anyone comment on this? See above. Also, df is reporting double the actual RAID1 volume size, and double the amount of data stored in this filesystem: # df -lh . FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg00-btrfsvol0_0 1.9T 51G 1.8T 3% /mnt/btrfs0 I would expect to see Size=1T, Used=25G # strace -v -e trace=statfs df -lh /mnt/btrfs0 statfs(/mnt/btrfs0, {f_type=0x9123683e, f_bsize=4096, f_blocks=488374272, f_bfree=475264720, f_bavail=474749786, f_files=0, f_ffree=0, f_fsid={2083217090, -1714407264}, f_namelen=255, f_frsize=4096}) = 0 FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg00-btrfsvol0_0 1.9T 51G 1.8T 3% /mnt/btrfs0 This is an FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_is_free_space_so_complicated.3F tl;dr: It's reporting the total number of raw storage bytes, because it's impossible to compute actual usable space in the general case. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In one respect at least, the Martians are a happy people: --- they have no lawyers. signature.asc Description: Digital signature
Re: fail to mount after first reboot
On Sun, Aug 19, 2012 at 02:33:14PM +, Daniel Pocock wrote: On 19/08/12 14:15, Hugo Mills wrote: On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote: I created a 1TB RAID1. So far it is just for testing, no important data on there. After a reboot, I tried to mount it again # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0 mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg00-btrfsvol0_0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so With multi-volume btrfs filesystems, you have to run btrfs dev scan before trying to mount it. Usually, the distribution will do this in the initrd (if you've installed its btrfs-progs package). I'm running Debian, I've just updated the system from squeeze to wheezy (with 3.2 kernel) so I could try btrfs and do other QA testing on wheezy (as it is in the beta phase now) I already had the btrfs-tools package installed, before creating the filesystem. So it appears Debian doesn't have an init script It does have /lib/udev/rules.d/60-btrfs.rules: SUBSYSTEM!=block, GOTO=btrfs_end ACTION!=add|change, GOTO=btrfs_end ENV{ID_FS_TYPE}!=btrfs, GOTO=btrfs_end RUN+=/sbin/modprobe btrfs RUN+=/sbin/btrfs device scan $env{DEVNAME} LABEL=btrfs_end but I'm guessing that isn't any use to my logical volumes that are activated early in the boot sequence? Could I be having this problem because I put my btrfs on logical volumes? Possibly. You may need the Device mapper uevents option in the kernel (CONFIG_DM_UEVENT) to trigger that udev rule when you enable your VG(s). Not sure if it's available/enabled in your kernel. Here is the package version I have: # dpkg --list | grep btrfs ii btrfs-tools 0.19+20120328-7 Checksumming Copy on Write Filesystem utilities That should be fine. Here is a more thorough dmesg, since boot, does this suggest the scan was invoked? I remember seeing some message about checking for btrfs filesystems just after selecting the kernel in grub (root is ext3) That message was probably grub checking the FS. # dmesg | grep btrfs [ 40.677505] btrfs: setting nodatacow [ 40.677514] btrfs: turning off barriers [17216.145092] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/mapper/vg00-btrfsvol0_0 [17216.145639] btrfs: disk space caching is enabled [17216.146987] btrfs: failed to read the system array on dm-100 [17216.147556] btrfs: open_ctree failed [17310.978518] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/mapper/vg00-btrfsvol0_0 [17310.993882] btrfs: disk space caching is enabled [17598.736657] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 37 /dev/mapper/vg00-btrfsvol0_0 [17598.750849] btrfs: disk space caching is enabled No, doesn't look like there were any scan results coming in before 17216. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In one respect at least, the Martians are a happy people: --- they have no lawyers. signature.asc Description: Digital signature
Re: How to get Btrfs on 2nd partition of USB HDD to automount as read/write
On Sun, Aug 19, 2012 at 03:51:47PM -0400, dg1727 wrote: Hello, The question below is based on https://lists.ubuntu.com/archives/xubuntu-users/2012- August/004509.html Thanks in advance for any help with the following question, including pointing me to some other info resource if needed. I have a user with an Xubuntu 12.04.1 laptop, 32-bit. He needs to use a USB hard disk drive which has 2 partitions: the 1st partition is NTFS and the 2nd partition is Btrfs. When he plugs in the hard drive, both partitions auto-mount OK, except that the Btrfs partition automounts read-only. That is, the permissions of the directories in /dev are drwx-- for the NTFS and dr-xr-xr-x for the Btrfs. How can the OS be set up so that the Btrfs will automount read/write? As Anthony points out, this is a property of the filesystem, not the OS or the mount options. Just use chmod. (It's only filesystems like FAT, which have no concept of permissions, which have mount options to set permissions) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Two things came out of Berkeley in the 1960s: LSD and Unix. --- This is not a coincidence. signature.asc Description: Digital signature
Re: “Bug”-report: inconsistency kernel - tools
On Thu, Aug 30, 2012 at 08:24:53PM +0200, Goffredo Baroncelli wrote: On 08/28/2012 09:52 PM, M G Berberich wrote: (7) reinserted disk (and rebooted) At some point before reboot the first 10 sectors of one disk were zeroed to test if the disk gets removed from the btrfs. IIRC the superblock is not placed at the beginning of the disk. On the basis of [1] it should be near the 64KB (around the sector #128) Just for the record, the first is at 64KiB; each subsequent one is shifted 12 bits left (256MiB, 1TiB, 4EiB, 16ZiB, 64YiB). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- This chap Anon is writing some perfectly lovely stuff --- at the moment. signature.asc Description: Digital signature
Re: Btrfs-Progs integration branch question
On Mon, Sep 03, 2012 at 09:31:16AM -0700, Suman C wrote: Hi, I would like to get the latest btrfs-progs code. To me, Chris Mason's repo at git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git seems latest but obviously its missing the last several patches I see in the mailing list. I also tried Hugo Mills' integration repo at http://git.darksatanic.net/repo/btrfs-progs-unstable.git and unless I am looking at it wrong, it seems behind. It is. I'm out of date. Can someone please point me to the latest process that is followed for testing/developing recent btrfs-progs? Chris's repo, right now. I am trying to integrate the quota patch from August 10th by Jan Schmidt and getting conflicts when I git apply. Good luck. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What do you give the man who has everything? -- Penicillin is --- a good start... signature.asc Description: Digital signature
Re: Is my btrfs full?
On Tue, Sep 04, 2012 at 06:07:50PM +0200, Petr Tichý wrote: I have a 130 GB btrfs with rsnapshot-like backups mounted with compressoin=zlib and now I'm getting ENOSPC, while df shows only some 60 % used. I'm running Linux version 3.2.0-3-amd64 (Debian 3.2.23-1). Is my btrfs really full? Will a more recent kernel solve this? We strongly recommend using the latest available kernel (currently 3.5 or 3.6-rc4) if you're running btrfs. The code is still moving very quickly, and the main devs are still finding and fixing fairly serious bugs. root@roura:~# btrfs filesystem show --all-devices Label: none uuid: 12880174-8337-47bc-be05-f485a0b7503f Total devices 1 FS bytes used 72.67GB devid1 size 130.00GB used 130.00GB path /dev/sdd You may want to read about df[1], and then about ENOSPC errors[2]. If you've still got questions after that, please do come back and ask them. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/FAQ#Why_does_df_show_incorrect_free_space_for_my_RAID_volume.3F [2] https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Geek, n.: Circus sideshow performer specialising in the --- eating of live animals. signature.asc Description: Digital signature
Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable
On Mon, Sep 17, 2012 at 12:48:10PM +0800, Anand Jain wrote: btrfs send introduced a part of code to read kernel-data from user-end using pipe. We need this part of code to be useable outside of send sub-cmd, so that developing service sub-cmd can use it. What's 'service sub-cmd' please? at the moment 'btrfs service history mnt|dev' to show logs of maintenance. comments/suggestions welcome. As I said in our private email exchange some months ago, I don't think this is the right way to be doing this. For example, if you use an alternative tool (such as btrfs-gui) which uses the ioctls directly, you've lost that logging information. Keeping a log of what's been done to the FS is much better done by extending the available logging in the kernel (and making it a compile-time option for those who don't want or need it). You can then write a simple shell script to chomp through the normal kernel logs to extract this information. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'll take your bet, but make it ten thousand francs. I'm only --- a _poor_ corrupt official. signature.asc Description: Digital signature
Re: Rebuilding chunk root?
On Mon, Sep 24, 2012 at 04:28:08PM +0300, Sami Haahtinen wrote: Due to certain unfortunate chain of events, I managed to overwrite a small portion of my btrfs array which had only single redundancy for metadata. The data itself is present and only a small portion (2.5%) of the array was overwritten. After quite a bit of debugging and tinkering, I realized that my chunk root was in the portion that was overwritten. After reading through the documentation I was able to pull together it's still unclear to me whether chunk root is something that can be rebuilt. Chris had some experimental code for doing it in btrfsck which never saw the light of day (because it was too unreliable). He may be able to offer you something to help, though. A transcript of btrfsck trying to recover with superblock 2 which is uncorrupted by itself: root@sysresccd /root/btrfs-progs % ./btrfsck --super 2 /dev/patience/home using SB copy 2, bytenr 274877906944 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn't read chunk root If I'm interpreting the output correctly, it's trying to read bytes from address 139264, which would fall into the corrupted area. No, I believe the want=, have= text is referring to a generation ID, not a block number. That's not to say that your chunk tree isn't damaged, though -- I'm just clarifying your interpretation of the numbers. Out of interest, does mounting with -o recovery help at all? (I'm not expecting it to do much if your chunk tree's gone, but it might do something). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: BTRF - Storage Usage
On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote: I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F Short answer: you can't know in general. Longer answer -- see below. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 You have 131.64 GiB of raw storage in your filesystem. Of that, 10.04 GiB is currently allocated for use by the FS (and it will take more as it needs it). poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB 4.01 GiB of the 10.04 GiB allocation is assigned for use by data, and 2.16 GiB of that allocation actually contains data. System, DUP: total=8.00MB, used=4.00KB 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for use as two copies of the system data. There is 4 KiB of system data actually used. System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for use as metadata, with two copies (DUP) being kept. 429.16 MiB of the 3.00 GiB is currently in use. Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / Plain old df can't handle the truth, so this is at best only a hint at what's actually happening. When Avail reaches zero, your FS is probably full. Other than that, you can't necessarily say very much. === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? This should all be covered in the glossary on the website: https://btrfs.wiki.kernel.org/index.php/Glossary Data is the contents of your files. Metadata is all the other stuff that the FS needs in order to store your files -- directory structures, permissions, locations of the file data, that kind of thing. System is a particular bit of the metadata (the chunk tree) which governs an internal physical/virtual mapping, and which needs to be read before anything else can make any kind of sense. DUP is a bit like RAID-1: anything stored in a DUP chunk is actually written to two different places on the disk, and can help recovery in the case of physical disk corruption (e.g. bad blocks, head crash). == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux You [probably(*)] need to upgrade your kernel as soon as possible. btrfs code moves very fast, and 3.0 has significant bugs in it. You should be running the latest released kernel -- right now, that's 3.5, or 3.6-rc7. Next week, it will probably change to 3.6 when Linus makes the next release. Most distributions have a repository somewhere which will give you access to new kernels without too much trouble. Hugo. (*) Some of the enterprise distributions do have backported btrfs fixes in their apparently older kernels. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- __(_' Squeak! --- signature.asc Description: Digital signature
Re: BTRF - Storage Usage
On Thu, Sep 27, 2012 at 01:25:58PM +0200, Sébastien Maury wrote: Hi, Thanks for the quick reply, this clarify me lots of things. I've had read the articles you mentioned, but i must admit that your explanations based on my examples makes things even more clearer. Also, if i understand things properly, snaphots size aren't included in the btrfs filesystem show command output ? So, the use, for example, of a du -sh /.snapshots is correct to determine the disk usage of my snapshots ? Disk usage of a snapshot has two different answers: 1) The total size of the files listed in the snapshot, which you can get from du. 2) The amount of space that would be freed up by deleting the snapshot, which isn't currently available, but probably will be soon. (The additional bookkeeping code was part of the qgroups patches, which are in 3.6). I will see with the people of my company in charge of maintaining distributions to provide us a more recent kernel. PS : I use SLES 11 SP2 distribution. OK, that one's actually one of the few that does keep proper backports: https://btrfs.wiki.kernel.org/index.php/Getting_started#Distro_support That said, I don't know how good they are at keeping up -- probably pretty good, but other people here may be able to answer that better. Hugo. Hugo Mills h...@carfax.org.uk a écrit : On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote: I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F Short answer: you can't know in general. Longer answer -- see below. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 You have 131.64 GiB of raw storage in your filesystem. Of that, 10.04 GiB is currently allocated for use by the FS (and it will take more as it needs it). poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB 4.01 GiB of the 10.04 GiB allocation is assigned for use by data, and 2.16 GiB of that allocation actually contains data. System, DUP: total=8.00MB, used=4.00KB 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for use as two copies of the system data. There is 4 KiB of system data actually used. System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for use as metadata, with two copies (DUP) being kept. 429.16 MiB of the 3.00 GiB is currently in use. Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / Plain old df can't handle the truth, so this is at best only a hint at what's actually happening. When Avail reaches zero, your FS is probably full. Other than that, you can't necessarily say very much. === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? This should all be covered in the glossary on the website: https://btrfs.wiki.kernel.org/index.php/Glossary Data is the contents of your files. Metadata is all the other stuff that the FS needs in order to store your files -- directory structures, permissions, locations of the file data, that kind of thing. System is a particular bit of the metadata (the chunk tree) which governs an internal physical/virtual mapping, and which needs to be read before anything else can make any kind of sense. DUP is a bit like RAID-1: anything stored in a DUP chunk is actually written to two different places on the disk, and can help recovery in the case of physical disk corruption (e.g. bad blocks, head crash). == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux You [probably(*)] need to upgrade your kernel as soon as possible. btrfs code moves very fast, and 3.0 has significant bugs in it. You should be running the latest released kernel -- right now, that's 3.5, or 3.6-rc7. Next week, it will probably
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote: On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncelli kreij...@libero.it wrote: Sorry for the space error: Below a more correct example $ btrfs filesystem disk-free / Summary: Total: 135.00GB Allocated: 10.51GB Unallocated:124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % How do you estimate Free here? Sorry I didn't check the source code in git, but from the Details below nothing leads me to believe that this FS is doomed to only be able to usefully utilize only ~86GB of the partition, and not more. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) Data_to_disk_ratio, maybe? Why use underscores instead of spaces? So that you can use, say, read in the shell to extract data from each line. To that end, there should be a space between the value and the unit throughout. Details: Chunk-typeMode AllocatedUsedFree -- - - Minor thing: The underlines are largely superfluous. Few basic CL tools I can think of use them. Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB I think we need another column here, to indicate how much *actual* disk space is used by each row, so adding up that column will give you the Allocated value in the first clause. I think that's probably the biggest cause of confusion. Raw alloc., maybe, and use the term raw somewhere in the first clause to hammer the point home. My only concern here is that we're a bit too close to the existing solution (albeit merging the two sets of output), which has proven itself over time to be somewhat confusing. I think the Alloc_Raw column is the minimum necessary to link the two in some easily determinable way. Adding totals to Alloc_Raw, and Used (but not Free or Alloc) would help, I think. I don't think it's useful to add them to the Free or Alloc columns, because those figures change as the FS allocates chunks, and we'll end up with people querying the fact that the total of Free doesn't add up to any of the figures in the summary. Say, something like this: Summary_(Raw): Total:135.00 GiB Allocated: 10.51 GiB Unallocated: 124.49 GiB Free_(Estimated): 86.56 GiB Average_disk_efficiency: 62 % Details: Chunk_type ModeAlloc_Raw Alloc UsedFree DataSingle 4.01 GiB 4.01 GiB2.16 GiB 1.87 GiB System DUP 32.00 MiB 16.00 MiB4.00 KiB 7.99 MiB System Single 4.00 MiB 4.00 MiB0.00 B4.00 MiB MetadataDUP 12.00 GiB 6.00 GiB 429.16 MiB 2.57 GiB MetadataSingle 8.00 MiB 8.00 MiB0.00 B8.00 MiB Total 16.04 GiB 2.59 GiB The other thing is that there should be a switch (or possibly two) to give highly machine-readable versions of the output -- no units (units as bytes by default, with other units settable by a switch), tab-separated, possibly a different option for each of the above output clauses. Ultimately, I think the bikeshed should be turquoise. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Python is executable pseudocode; perl --- is executable line-noise. signature.asc Description: Digital signature
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
Hi, Goffredo, On Fri, Sep 28, 2012 at 07:27:16PM +0200, Goffredo Baroncelli wrote: On 09/28/2012 10:58 AM, Hugo Mills wrote: On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote: On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncellikreij...@libero.it wrote: [...] [...] Details: Chunk-typeMode AllocatedUsedFree -- - - [...] Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB I think we need another column here, to indicate how much *actual* disk space is used by each row, so adding up that column will give you the Allocated value in the first clause. I think that's probably the biggest cause of confusion. Raw alloc., maybe, and use the term raw somewhere in the first clause to hammer the point home. I think that there is a little misunderstanding. We are saying the same thing. Only I call allocated what you call raw alloc OK, I think we need both. We need to indicate somewhere (in the Details section in my version) both the total number of bits of rust used and the amount of data stored. It's not good to ask the user to know that they need to multiply/divide by two for certain storage modes (or even more complicated for RAID-5/6). Somewhere, they will find that values change twice as fast as they expect (or at half the speed), and that causes problems. We need to find some way of connecting the two in a way that makes it reasonably obvious where the figures come from.. My only concern here is that we're a bit too close to the existing solution (albeit merging the two sets of output), which has proven itself over time to be somewhat confusing. I think the Alloc_Raw column is the minimum necessary to link the two in some easily determinable way. Adding totals to Alloc_Raw, and Used (but not Free or Alloc) would help, I think. I don't think it's useful to add them to the Free or Alloc columns, because those figures change as the FS allocates chunks, and we'll end up with people querying the fact that the total of Free doesn't add up to any of the figures in the summary. Say, something like this: Summary_(Raw): Total:135.00 GiB Allocated: 10.51 GiB Unallocated: 124.49 GiB Free_(Estimated): 86.56 GiB Average_disk_efficiency: 62 % Details: Chunk_type ModeAlloc_Raw Alloc UsedFree DataSingle 4.01 GiB 4.01 GiB2.16 GiB 1.87 GiB System DUP 32.00 MiB 16.00 MiB4.00 KiB 7.99 MiB System Single 4.00 MiB 4.00 MiB0.00 B4.00 MiB MetadataDUP 12.00 GiB 6.00 GiB 429.16 MiB 2.57 GiB MetadataSingle 8.00 MiB 8.00 MiB0.00 B8.00 MiB Total 16.04 GiB 2.59 GiB The other thing is that there should be a switch (or possibly two) to give highly machine-readable versions of the output -- no units (units as bytes by default, with other units settable by a switch), tab-separated, possibly a different option for each of the above output clauses. I fully Agree. But my first concern was about the wording (if fact even though we are saying the same thing you didn't understood me). Let me propose the following: Summary: Disk_size: 135.00 GiB Disk_allocated: 10.51 GiB Disk_unallocated: 124.49 GiB Used: 2.59 GiB Free_(Estimated):91.93 GiB Average_disk_efficiency: 70 % Details: Chunk-typeMode Disk-allocated Used Available Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB Where: Disk-allocated - space used on the disk by the chunk Disk-size - size of the disk Disk-unallocated- disk not used in any chunk Used- space used by the files/metadata The problem here is that if you're using raw storage, the Used value in the second stanza grows twice as fast as the user expects. I think this second stanza should at minimum include the cooked values used in btrfs fi df, because those reflect the user's experience. Then adding [some of?] the raw values you've got here to help connect the values to the raw data in the first stanza of output. As I said above, it's the connection between I wrote a 1GiB file to my
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Sat, Sep 29, 2012 at 12:02:23AM +0600, Roman Mamedov wrote: On Fri, 28 Sep 2012 18:44:07 +0200 Goffredo Baroncelli kreij...@inwind.it wrote: This means that the ration of space physically allocated on the disk and the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB are available. You assume metadata allocation will always grow linearly with data, which is not true. So in my opinion it is not a good estimate. No, but it's the best model we have right now. (And probably about the best model we will have, without knowledge of the future intentions of the user). Without inlining file data, the metadata is dominated by checksums, which is a linear relationship (approx 1000:1). With inlining file data, metadata is probably dominated by inline data; assuming the ratio of small-to-large files on the FS remains unchanged in future, a linear relationship also applies. For general usage, I'm happy to assume that the current ratio of data to metadata will remain largely unchanged over the lifetime of the FS. Why use underscores instead of spaces? Simplify the parsing in scripts I think it looks awkward and is not warranted since this is a primarily user-facing utility. Also none of the other similar tools shy from having spaces anywhere they need to, e.g. # mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed May 25 00:07:38 2011 Raid Level : raid5 Array Size : 3907003136 (3726.01 GiB 4000.77 GB) Used Dev Size : 976750784 (931.50 GiB 1000.19 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 28 21:20:51 2012 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : avdeb:0 (local to host avdeb) UUID : b99961fb:ed1f76c8:ec2dad31:6db45332 Events : 14254 Number Major Minor RaidDevice State 7 8 170 active sync /dev/sdb1 6 8 331 active sync /dev/sdc1 3 8 652 active sync /dev/sde1 4 8 493 active sync /dev/sdd1 5 8 814 active sync /dev/sdf1 # lvdisplay --- Logical volume --- LV Path/dev/alpha/lv1 LV Namelv1 VG Namealpha LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2 LV Write Accessread/write LV Creation host, time , LV Status available # open 0 LV Size3.52 TiB Current LE 115431 Segments 3 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 252:0 ... and I've always found those hard to deal with in scripts. :) (But they do have plumbing options, to use the git terminology, so I'd be happy with having a parsable output option). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Hey, Virtual Memory! Now I can have a *really big* ramdisk! --- signature.asc Description: Digital signature
Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df
Looks good. Only a few comments, inline. On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote: $ ./btrfs filesystem df --help usage: btrfs filesystem disk-usage [-d][-s][-k] path [path..] Show space usage information for a mount point(s). -k Set KB (1024 bytes) as unit -s Don't show the summary section -d Don't show the detail section These are kind of logical, but I think would be hard to remember the right way round. I would suggest swapping the actions of the switches, and rewording the help: -s Show only summary section -d Show only detail section $ ./btrfs filesystem df / Path: / Summary: Disk_size: 72.57GB ^ space between the value and the unit (as ISO says), throughout. This also makes it easier to parse, if anyone wants to. Also, use kB, MB, GB, TB for powers-of-ten based units, and KiB, MiB, GiB, TiB for powers-of-two based units, please. I don't care which you report in, but please do make the distinction. (And note that it's kB with a lower case k, but KiB with an upper case K). This brings us in line with the relevant ISO and IEEE standards. Disk_allocated:25.10GB Disk_unallocated: 47.48GB Logical_size: 23.06GB Used: 11.01GB Free_(Estimated): 55.66GB(Max: 59.52GB, Min: 35.78GB) Data_to_disk_ratio: 92 % Details: Chunk-type Mode Chunk-size Logical-sizeUsed DataSingle 21.01GB 21.01GB 10.34GB System DUP 80.00MB 40.00MB 4.00KB System Single4.00MB 4.00MB0.00 MetadataDUP 4.00GB 2.00GB686.93MB MetadataSingle8.00MB 8.00MB0.00 Why are the field headings here using - where the field headings in the first section used _? Should you be using _ in both places? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The most exciting phrase to hear in science, the one that --- heralds new discoveries, is not Eureka!, but That's funny... signature.asc Description: Digital signature
Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df
On Wed, Oct 03, 2012 at 06:17:53PM +0200, Goffredo Baroncelli wrote: On 10/03/2012 01:56 PM, Hugo Mills wrote: Looks good. Only a few comments, inline. On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote: [snip] Also, use kB, MB, GB, TB for powers-of-ten based units, and KiB, MiB, GiB, TiB for powers-of-two based units, please. I don't care which you report in, but please do make the distinction. (And note that it's kB with a lower case k, but KiB with an upper case K). This brings us in line with the relevant ISO and IEEE standards. I forgot to reply you when you raised this question the first time. Even though I am inclined to accept your suggestions, this change is not related to my patches. My code uses the functions print_sizes(), which is quite old (about 2008). This function is used in a lot of places. This suggested to address this issue with another patch. OK. [snip] Why are the field headings here using - where the field headings in the first section used _? Should you be using _ in both places? 2 persons highlighted that :-( ... I will update the code It's just a niggle, really, but it's an obvious one. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There is no dark side to the Moon, really. As a matter of --- fact, it's all dark. signature.asc Description: Digital signature
Re: [PATCH 1/2] Update btrfs filesystem df command
On Wed, Oct 03, 2012 at 11:34:00PM +0300, Ilya Dryomov wrote: On Wed, Oct 03, 2012 at 07:22:31PM +0200, Goffredo Baroncelli wrote: [snip] +static const char * const cmd_disk_free_usage[] = { + btrfs filesystem df [-d|-s][-k] path [path..], + Show space usage information for a mount point(s)., + , + -k\tSet KB (1024 bytes) as unit, + -s\tShow the summary section only, + -d\tShow the detail section only, + NULL +}; + +static int cmd_disk_free(int argc, char **argv) +{ + + int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT; + int i, more_than_one=0; + + optind = 1; + while(1){ + charc = getopt(argc, argv, dsk); + if(c0) + break; + switch(c){ + case 'd': + flags = ~DF_SHOW_SUMMARY; + break; + case 's': + flags = ~DF_SHOW_DETAIL; + break; + case 'k': + flags = ~DF_HUMAN_UNIT; + break; + default: + usage(cmd_disk_free_usage); + } + } + + if( !(flags (DF_SHOW_SUMMARY|DF_SHOW_DETAIL)) ){ + fprintf(stderr, btrfs filesystem df: it is not possible to specify -s AND -d\n); This doesn't look right at all. You are adding two switches and specifying both of them is an error? A little too much for a command whose job is to do some basic math and pretty-print the result. How about displaying just the summary by default and then adding a *single* switch (-v or whatever) for summary+details? I'd prefer to see both sections by default. The reason for this is that without both sections, people tend to get confused because they don't know they're looking at half the story (e.g. some numbers change twice as fast as they think they should). I think supplying both options should probably show both sections again, and make it not an error to do so, but I'm happy either way. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- He's a nutcase, you know. There's no getting away from it -- --- he'll end up with a knighthood signature.asc Description: Digital signature
Re: Will RAID have issues with disks that spin down?
On Thu, Oct 04, 2012 at 10:36:43AM -0400, Ken D'Ambrosio wrote: Hi. I know that several hardware RAID solutions have issues with disks that spin down when idle; the time to spin back up -- usually on the order of five seconds -- causes unhappy timeouts, etc. I was wondering if that would be an issue with RAID a-la btrfs? I have (some of(*)) the disks in my 8-drive RAID-1 btrfs array set to spin down after 10 minutes of no use. I've not had a problem with it so far. So I'd say it's not an issue from my limited testing. Hugo. (*) Damn you, Samsung! -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 3: Military Intelligence --- signature.asc Description: Digital signature
Re: [PATCH] Fits: tool to parse stream
On Sat, Oct 13, 2012 at 09:02:27AM +0200, Arne Jansen wrote: On 10/12/12 15:32, Arne Jansen wrote: The idea of the btrfs send stream format was to generate it in a way that it is easy to receive on different platforms. Thus the proposed name FITS, for Filesystem Incremental Backup Stream. We should also build the tools to receive the stream on different platforms. I meant to write 'Filesystem Incremental Transport Stream', but, as Andrey Kuzmin pointed out, the name FITS is already taken. As the 'Backup' slipped in somehow, FIBS might be an alternative. Any thoughts? Fibs is a slang term for lies. Probably not ideal. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Strive for apathy! --- signature.asc Description: Digital signature
Re: Can not Mount btrfs No Space left
On Mon, Oct 15, 2012 at 03:52:15PM -0400, Shawn Dakin wrote: I have a btrfs volume that will not mount due to No space on device I would gladly free up some space if I could only mount the volume. Does anyone have a trick to getting this volume back up and running? Any help would be great!! Start with a 3.6 kernel (which has lots of ENOSPC fixes in it). Try mounting with -o ro, which won't allow you to modify anything but may show you if the FS is at least mountable in that state (and will give you the capability to copy the data elsewhere in extremis). If you're lucky, that mount may then allow you to mount it again without the -o ro. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Great oxymorons of the world, no. 6: Mature Student --- signature.asc Description: Digital signature
Re: device delete, error removing device
On Mon, Oct 22, 2012 at 12:02:08AM -0600, Chris Murphy wrote: On Oct 21, 2012, at 10:32 PM, Chris Murphy li...@colorremedies.com wrote: This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Probably not a good idea to omit this is a beta *test candidate* not a beta. Two things that make this possibly not realistic: 1. The virtual disks are obviously very small, 3GB each with the 4th one only 12GB. 2. The original 3 device volume was ~97% full with a single large file prior to adding the 4th device. Approximately 313MB free space remained on the volume. I'm not entirely sure what's going on here(*), but it looks like an awkward interaction between the unequal sizes of the devices, the fact that three of them are very small, and the RAID-0/RAID-1 on data/metadata respectively. You can't relocate any of the data chunks, because RAID-0 requires at least two chunks, and all your data chunks are more than 50% full, so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB chunk on the remaining space on the small disk, which is the only way it could proceed. You _may_ be able to get some more success by changing the data to single: # btrfs balance start -dconvert=single /mountpoint You may also possibly be able to reclaim some metadata space with: # btrfs balance start -m /mountpoint but I think that's unlikely. Hugo. (*) It may be an as-yet-undiscovered reservation problem, in which case you get to see Josef scream loudly and hide under his desk, gibbering. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If it ain't broke, hit it again. --- signature.asc Description: Digital signature
Re: RAID 5/6
On Mon, Oct 22, 2012 at 10:58:07AM -0500, Michael wrote: Does anyone know when RAID 5/6 are planned to be included in the Kernel? This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Can_I_use_RAID.5B56.5D_on_my_Btrfs_filesystem.3F Short answer: Not yet, probably soon. I am starting to buy parts for my next computer and would very much like to use BTRFS because I want a FS that can grow and also recover from undetected read errors - it will be large enough that these are possible. I'm hoping that it will be available for use in the coming months. You can switch storage types on the fly, so you could at least start with RAID-1, and then restripe to RAID-5 (or -6) when it's stable enough for you. This assumes that you can manage to use RAID-1 in the first place and expand later. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's more than one way to do it is not a commandment. It --- is a dire warning. signature.asc Description: Digital signature
Re: device delete, error removing device
On Mon, Oct 22, 2012 at 10:42:18AM -0600, Chris Murphy wrote: Thanks for the response Hugo, On Oct 22, 2012, at 3:19 AM, Hugo Mills h...@carfax.org.uk wrote: I'm not entirely sure what's going on here(*), but it looks like an awkward interaction between the unequal sizes of the devices, the fact that three of them are very small, and the RAID-0/RAID-1 on data/metadata respectively. I'm fine accepting the devices are very small and the original file system was packed completely full: to the point this is effectively sabotage. The idea was merely to test how a full (I was aiming more for 90%, not 97%, oops) volume handles being migrated to a replacement disk, which I think for a typical user would be larger not the same, knowing in advance that not all of the space on the new disk is usable. And I was doing it at a one order magnitude reduced scale for space consideration. You can't relocate any of the data chunks, because RAID-0 requires at least two chunks, and all your data chunks are more than 50% full, so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB chunk on the remaining space on the small disk, which is the only way it could proceed. Interesting. So the way device delete moves extents is not at all similar to how LVM pvmove moves extents, which is unidirectional (away from the device being demoted). My, seemingly flawed, expectation was that device delete would cause extents on the deleted device to be moved to the newly added disk. It's more like a balance which moves everything that has some (part of its) existence on a device. So when you have RAID-0 or RAID-1 data, all of the related chunks on other disks get moved too (so in RAID-1, it's the mirror chunk as well as the chunk on the removed disk that gets rewritten). If I add yet another 12GB virtual disk, sdf, and then attempt a delete, it works, no errors. Result: [root@f18v ~]# btrfs device delete /dev/sdb /mnt [root@f18v ~]# btrfs fi show failed to read /dev/sr0 Label: none uuid: 6e96a96e-3357-4f23-b064-0f0713366d45 Total devices 5 FS bytes used 7.52GB devid5 size 12.00GB used 4.17GB path /dev/sdf devid4 size 12.00GB used 4.62GB path /dev/sde devid3 size 3.00GB used 2.68GB path /dev/sdd devid2 size 3.00GB used 2.68GB path /dev/sdc *** Some devices missing However, I think that last line is a bug. When I [root@f18v ~]# btrfs device delete missing /mnt I get [ 2152.257163] btrfs: no missing devices found to remove So they're missing but not missing? If you run sync, or wait for 30 seconds, you'll find that fi show shows the correct information again -- btrfs fi show reads the superblocks directly, and if you run it immediately after the dev del, they've not been flushed back to disk yet. btrfs balance start -dconvert=single /mountpoint Yeah that's perhaps a better starting point for many regular Joe users setting up a multiple device btrfs volume, in particular where different sized disks can be anticipated. I think we should probably default to single on multi-device filesystems, not RAID-0, as this kind of problem bites a lot of people, particularly when trying to drop the second disk in a pair. In similar vein, I'd suggest that an automatic downgrade from RAID-1 to DUP metadata on removing one device from a 2-device array should also be done, but I suspect there's some good reasons for not doing that, that I've not thought of. This has also bitten a lot of people in the past. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There's more than one way to do it is not a commandment. It --- is a dire warning. signature.asc Description: Digital signature
Re: device delete, error removing device
On Mon, Oct 22, 2012 at 01:36:31PM -0600, Chris Murphy wrote: On Oct 22, 2012, at 11:18 AM, Hugo Mills h...@carfax.org.uk wrote: It's more like a balance which moves everything that has some (part of its) existence on a device. So when you have RAID-0 or RAID-1 data, all of the related chunks on other disks get moved too (so in RAID-1, it's the mirror chunk as well as the chunk on the removed disk that gets rewritten). Does this mean device delete depends on an ability to make writes to the device being removed? I immediately think of SSD failures, which seem to fail writing, while still being able to reliably read. Would that behavior inhibit the ability to remove the device from the volume? No, the device being removed isn't modified at all. (Which causes its own set of weird problemettes, but I think most of those have gone away). [ 2152.257163] btrfs: no missing devices found to remove So they're missing but not missing? If you run sync, or wait for 30 seconds, you'll find that fi show shows the correct information again -- btrfs fi show reads the superblocks directly, and if you run it immediately after the dev del, they've not been flushed back to disk yet. Even after an hour, btrfs fi show says there are missing devices. After mkfs.btrfs on that missing device, 'btrfs fi show' no longer shows the missing device message. Hmm. Someone had this on IRC yesterday. It sounds like something's not properly destroying the superblock(s) on the removed device. I think we should probably default to single on multi-device filesystems, not RAID-0, as this kind of problem bites a lot of people, particularly when trying to drop the second disk in a pair. I'm not thinking of an obvious advantage raid0 has over single other than performance. It seems the more common general purpose use case is better served by single, especially the likelihood of volumes being grown with arbitrary drive capacities. Indeed. I found this [1] thread discussing a case where a -d single volume is upgraded to the raid0 profile. I'm not finding this to be the case when trying it today. mkfs.btrfs on 1 drive, then adding a 2nd drive, produces: Data: total=8.00MB, used=128.00KB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=409.56MB, used=24.00KB Metadata: total=8.00MB, used=0.00 This appears to retain the single profile. This is expected at this point? What I find a bit problematic is that metadata is still DUP rather than being automatically upgraded to raid1. Yes, the automatic single - RAID-0 upgrade was fixed. If you haven't run a balance on (at least) the metadata after adding the new device, then you won't get the DUP - RAID-1 upgrade on metadata. (I can tell you haven't run the balance, because you still have the empty single metadata chunk). What is the likelihood of a mkfs.btrfs 2+ device change in the default data profile from raid0 to single? Non-zero. I think it mostly just wants someone to write the patch, and then beat off any resulting bikeshedding. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt signature.asc Description: Digital signature
Re: Naming of subvolumes
On Thu, Oct 25, 2012 at 01:30:20PM +0100, Richard Hughes wrote: I'm planning to use btrfs subvolume snapshot -r name in the system upgrade functionality[1] if the user is using btrfs for their root file system. We've got most of the bits in place already for Fedora 18. One think that confuses me is the convention for the naming of snapshots. Is there any conventions or prior art there? Can I add metadata to the snapshot so that I don't have encode everything in the snapshot name itself? How about user xattrs? IIRC, that's the user.* namespace. The only convention I'm aware of is Ubuntu's use of an @ substitution, where the subvolume to be mounted at / is called @, and the subvolume to be mounted at /home becomes @home. Both of those subvolumes are stored in the (otherwise empty) top-level of the filesystem, which is not mounted in normal operation. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- This chap Anon is writing some perfectly lovely stuff --- at the moment. signature.asc Description: Digital signature
Re: How does btrfs behave on checksum mismatch?
On Sat, Oct 27, 2012 at 09:56:45PM +, Michael Kjörling wrote: I came across the tidbit that ZFS has a contract guarantee that the data read back will either be correct (the checksum computed over the data read from the disk matches the checksum stored on disk), or you get an I/O error. Obviously, this greatly reduces the probability that the data is invalid. (Particularly when taken in combination with the disk firmware's own ECC and checksumming.) With the default options, does btrfs make any similar guarantees? If not, then are there any options to force it to make such guarantees? It does indeed do the same thing: if the checksum doesn't match the block, then the alternative block is read (if one exists, e.g. RAID-1, RAID-10). If that does not exist, or also has a checksum failure, then EIO is returned. Hugo. I'm interested in this both from a specification and an implementation point of view. The last thing anyone wants is probably undetected bit rot, and with today's large drives, even with the quite low bit rot numbers it can be a real concern. If even the act of simply successfully reading a file guarantees, to the extent of the checksumming algorithm's ability to detect changes, that the data read is the same as was once written, that would be a major selling point for btrfs for me personally. The closest I was able to find was that btrfs uses crc32c currently for data and metadata checksumming and that this can be turned off if so desired (using the nodatasum mount option), but nothing about what the file system code does or is supposed to do in the face of a checksum mismatch. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- It used to take a lot of talent and a certain type of --- upbringing to be perfectly polite and have filthy manners at the same time. Now all it needs is a computer. signature.asc Description: Digital signature
Re: [RFC] New attempt to a better btrfs fi df
On Sun, Oct 28, 2012 at 12:30:44AM +0200, Martin Steigerwald wrote: Am Samstag, 27. Oktober 2012 schrieb Michael Kjörling: On 27 Oct 2012 18:43 +0200, from mar...@lichtvoll.de (Martin Steigerwald): Possibly this could be done tabular as well, like: vdb vdc vdd Data, RAID 0 307,25MB307,25MB307,25MB … System,RAID1 - 8MB 8MB … Unused2,23GB 2,69GB 2,24GB I like this. But what if the filesystem has 100 disks? Maybe I'm just not familiar enough with btrfs yet to punch an immediate hole in the idea, but how about pivoting that table? Columns for data values (data, raid 0, system, raid 1, unused, ...) and rows for the underlying devices? Something like this, copying the numbers from your example. And I'm using colon here rather than comma, because I believe that it better captures the intent. Data: RAID 0 System: RAID 1 Unused /dev/vdb 307.25 MB-2.23 GB /dev/vdc 307.25 MB 8 MB2.69 GB /dev/vdd 307.25 MB 8 MB2.24 GB == TOTAL921.75 MB16 MB7.16 GB Hmmm, good idea. I like it this way around. It would scale better with the number of drives and there is a good way to place the totals. I wonder about how to possibly include the used part of each tree. With mostly 5 columns it might be doable. Note that this could get arbitrarily wide in the presence of the (planned) per-object replication config. Otherwise, it works. The width is probably likely to grow more slowly than the length, though, so this way round is probably the better option. IMO. Eggshell blue is good enough. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Some days, it's just not worth gnawing through the straps. --- signature.asc Description: Digital signature
Re: How does btrfs behave on checksum mismatch?
On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote: Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson: In a raid1 situation, it will also rewrite the effected data, on the drive that failed the checksum Will it do so without an explicit scrub? If a failed checksum is detected, yes. If there's a bad block, and the FS happens to read the good copy first, it won't fix it, because it hasn't tried reading the bad copy yet. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them. signature.asc Description: Digital signature
Re: How does btrfs behave on checksum mismatch?
On Sun, Oct 28, 2012 at 02:36:24PM +0100, Martin Steigerwald wrote: Am Sonntag, 28. Oktober 2012 schrieb Hugo Mills: On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote: Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson: In a raid1 situation, it will also rewrite the effected data, on the drive that failed the checksum Will it do so without an explicit scrub? If a failed checksum is detected, yes. If there's a bad block, and the FS happens to read the good copy first, it won't fix it, because it hasn't tried reading the bad copy yet. Ah, okay. I think I read some while ago in a case of bad checksum detected it won´t repair automatically. Has this been changed? It was changed some time ago -- the kernel release after scrub went in, IIRC. Anyway, a regular scrub still makes sense, as BTRFS only reads files that applications demand and BTRFS may read from a good copy as you pointed out. Indeed. I have a cron job in /etc/cron.monthy for my main FS to do just that. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them. signature.asc Description: Digital signature
Re: Old (almost 2 years) btrfs failed fs. Parent transid failure. Can it be fixed ?
On Mon, Oct 29, 2012 at 01:33:13PM +0100, Tomasz Torcz wrote: On Mon, Oct 29, 2012 at 01:22:59PM +0100, Tommy Jonsson wrote: Hi, i have an old btrfs file-system that crashed on a power-failure for about 2 years ago. i have clone the git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git (at 2012-10-29) and compiled the tools. I think you need to get branch dangerdonoteveruse to get real fsck code. That's a very outdated piece of advice. That code is now in mainline btrfs-progs, and has been since March. Tommy has the correct and up-to-date version of the progs. sudo mount -t btrfs /dev/sda /mnt/disk/ Could you try with -o recovery? That's worth a try as a first step. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The last man on Earth sat in a room. Suddenly, there was a --- knock at the door. signature.asc Description: Digital signature
Re: How to find (out if) files sharing content?
On Tue, Oct 30, 2012 at 04:20:05PM +0100, Gábor Nyers wrote: Hi, How could one find out if 2 files share any extents on a btrfs file system? A more generic variation of the above: How to list files on the same file system/subvolume sharing content? You have direct (read-only) access to the metadata trees through the TREE_SEARCH ioctl. It should be possible to walk through the extents of a given file, and (I think) follow back-refs from the extent back to the other files that share it. There's no simple code to do that right now, though. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- And what rough beast, its hour come round at last / slouches --- towards Bethlehem, to be born? signature.asc Description: Digital signature
Re: Why btrfs inline small file by default?
On Wed, Oct 31, 2012 at 05:40:25AM +0800, ching wrote: On 10/30/2012 08:17 PM, cwillu wrote: If there is a lot of small files, then the size of metadata will be undesirable due to deduplication Yes, that is a fact, but if that really matters depends on the use-case (e.g., the small files to large files ratio, ...). But as btrfs is designed explicitly as a general purpose file system, you usually want the good performance instead of the better disk-usage (especially as disk space isn't expensive anymore). As I understand it, in basically all cases the total storage used by inlining will be _smaller_, as the allocation doesn't need to be aligned to the sector size. if i have 10G small files in total, then it will consume 20G by default. If those small files are each 128 bytes in size, then you have approximately 80 million of them, and they'd take up 80 million pages, or 320 GiB of total disk space. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I always felt that as a C programmer, I --- was becoming typecast. signature.asc Description: Digital signature
Re: Why btrfs inline small file by default?
On Tue, Oct 30, 2012 at 10:14:12PM +, Hugo Mills wrote: On Wed, Oct 31, 2012 at 05:40:25AM +0800, ching wrote: On 10/30/2012 08:17 PM, cwillu wrote: If there is a lot of small files, then the size of metadata will be undesirable due to deduplication Yes, that is a fact, but if that really matters depends on the use-case (e.g., the small files to large files ratio, ...). But as btrfs is designed explicitly as a general purpose file system, you usually want the good performance instead of the better disk-usage (especially as disk space isn't expensive anymore). As I understand it, in basically all cases the total storage used by inlining will be _smaller_, as the allocation doesn't need to be aligned to the sector size. if i have 10G small files in total, then it will consume 20G by default. If those small files are each 128 bytes in size, then you have approximately 80 million of them, and they'd take up 80 million pages, or 320 GiB of total disk space. Sorry, to make that clear -- I meant if they were stored in Data. If they're inlined in metadata, then they'll take approximately 20 GiB as you claim, which is a lot less than the 320 GiB they'd be if they're not. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I always felt that as a C programmer, I --- was becoming typecast. signature.asc Description: Digital signature
Re: [Request for review] [RFC] Add label support for snapshots and subvols
On Fri, Nov 02, 2012 at 05:28:01AM +0700, Fajar A. Nugraha wrote: On Fri, Nov 2, 2012 at 5:16 AM, cwillu cwi...@cwillu.com wrote: btrfs fi label -t /btrfs/snap1-sv1 Prod-DB-sand-box-testing Why is this better than: # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test # ls /btrfs/ Prod-DB Prod-DB-production-test ... because it would mean possibilty to decouple subvol name from whatever-data-you-need (in this case, a label). My request, though, is to just implement properties, and USER properties, like what we have in zfs. This seems to be a cleaner, saner approach. For example, this is on Ubutu + zfsonlinux: # zfs create rpool/u # zfs set user:label=Some test filesystem rpool/u # zfs get creation,user:label rpool/u NAME PROPERTYVALUE SOURCE rpool/u creationFri Nov 2 5:24 2012 - rpool/u user:label Some test filesystem local Don't we already have an equivalent to that with user xattrs? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I spent most of my money on drink, women and fast cars. The --- rest I wasted. -- James Hunt signature.asc Description: Digital signature
Re: no space left on device.
On Fri, Nov 02, 2012 at 10:54:47AM -0500, Kyle Gates wrote: So I have ended up in a state where I can't delete files with rm. the error I get is no space on device. however I'm not even close to empty. /dev/sdb1 38G 27G 9.5G 75% there is about 800k files/dirs in this filesystem extra strange is that I can in another directory create and delete files. So I tried pretty much all I could google my way to but problem persisted. So I decided to do a backup and a format. But when the backup was done I tried one more time and now it was possible to delete the directory and all content? using the 3.5 kernel in ubuntu 12.10. Is this a known issue ? is it fixed in later kernels? fsck /btrfs scrub and kernel log. nothing indicate any problem of any kind. First let's see the output of: btrfs fi df /mountpoint You're probably way over allocated in metadata so a balance should help: btrfs bal start -m /mountpoint or omit the -m option to run a full balance. Or, better, -musage=5 (or 1), which will do even less work. ... but let's see the btrfs fi df output first. Could you also add the output of btrfs fi show (no parameters), please? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You're never alone with a rubber duck... --- signature.asc Description: Digital signature
Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
On Fri, Nov 02, 2012 at 07:05:37PM +, Gabriel wrote: On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote: On 2012-11-02 12:18, Martin Steigerwald wrote: Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB in total. I understand the logic behind this, but this could be a bit confusing. But it makes sense: Showing real allocation on device level makes sense, cause thats what really allocated on disk. Total makes some sense, cause thats what is being used from the tree by BTRFS. Yes, me too. At the first I was confused when you noticed this discrepancy. So I have to admit that it is not so obvious to understand. However we didn't find any way to make it more clear... It still looks confusing at first… We could use Chunk(s) capacity instead of total/size ? I would like an opinion from a english people point of view.. This is easy to fix, here's a mockup: Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2 /dev/dm-03.50GB I've not considered the full semantics of all this yet -- I'll try to do that tomorrow. However, I note that the ×2 here could become non-integer with the RAID-5/6 code (which is due Real Soon Now). In the first RAID-5/6 code drop, it won't even be simple to calculate where there are different-sized devices in the filesystem. Putting an exact figure on that number is potentially going to be awkward. I think we're going to need kernel help for working out what that number should be, in the general case. Again, I'm raising minor points based on future capabilities, but I feel it's worth considering them at this stage, even if the correct answer is yes, we'll do this now, and deal with any other problems later. Hugo. Data Metadata MetadataSystem System Single Single DUP Single DUP Unallocated /dev/dm-16 1.31TB 8.00MB 56.00GB4.00MB 16.00MB 0.00 == === == === === Total 1.31TB 8.00MB 28.00GB ×2 4.00MB 8.00MB ×20.00 Used 1.31TB 0.00 5.65GB ×2 0.00 152.00KB ×2 Also, I don't know if you could use libblkid, but it finds more descriptive names than dm-NN (thanks to some smart sorting logic). -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- My doctor tells me that I have a malformed public-duty gland, --- and a natural deficiency in moral fibre. signature.asc Description: Digital signature
Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
On Fri, Nov 02, 2012 at 11:23:14PM +, Gabriel wrote: On Fri, 02 Nov 2012 22:06:04 +, Hugo Mills wrote: On Fri, Nov 02, 2012 at 07:05:37PM +, Gabriel wrote: On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote: On 2012-11-02 12:18, Martin Steigerwald wrote: Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB in total. I understand the logic behind this, but this could be a bit confusing. But it makes sense: Showing real allocation on device level makes sense, cause thats what really allocated on disk. Total makes some sense, cause thats what is being used from the tree by BTRFS. Yes, me too. At the first I was confused when you noticed this discrepancy. So I have to admit that it is not so obvious to understand. However we didn't find any way to make it more clear... It still looks confusing at first… We could use Chunk(s) capacity instead of total/size ? I would like an opinion from a english people point of view.. This is easy to fix, here's a mockup: Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2 /dev/dm-03.50GB I've not considered the full semantics of all this yet -- I'll try to do that tomorrow. However, I note that the ×2 here could become non-integer with the RAID-5/6 code (which is due Real Soon Now). In the first RAID-5/6 code drop, it won't even be simple to calculate where there are different-sized devices in the filesystem. Putting an exact figure on that number is potentially going to be awkward. I think we're going to need kernel help for working out what that number should be, in the general case. DUP can be nested below a device because it represents same-device redundancy (purpose: survive smudges but not device failure). On the other hand raid levels should occupy the same space on all linked devices (a necessary consequence of the guarantee that RAID5 can survive the loss of any device and RAID6 any two devices). No, the multiplier here is variable. Consider: 1 MiB stored in RAID-5 across 3 devices takes up 1.5 MiB -- multiplier ×1.5 (1 MiB over 2 devices is 512 KiB, plus an additional 512 KiB for parity) 1 MiB stored in RAID-5 across 6 devices takes up 1.2 MiB -- multipler ×1.2 (1 MiB over 5 devices is 204.8 KiB, plus an additional 204.8 KiB for parity) With the (initial) proposed implementation of RAID-5, the stripe-width (i.e. the number of devices used for any given chunk allocation) will be *as many as can be allocated*. Chris confirmed this today on IRC. So if I have a disk array of 2T, 2T, 2T, 1T, 1T, 1T, then the first 1T of allocation will stripe across 6 devices, giving me 5 data+1 parity, or a multiplier of ×1.2. As soon as the smaller devices are full, the stripe width will drop to 3 devices, and we'll be using 2 data+1 parity allocation, or a multiplier of ×1.5 for any subsequent chunks. So, as more data over the first 5T is stored, the multiplier steadily decreases, until we fill the FS, and we get a multiplier of ×1.35 overall. This gets more complicated if you have devices of many different sizes. (Imagine 6 disks with sizes 500G, 1T, 1.5T, 2T, 3T, 3T). We probably can work out the current RAID overhead and feed it back sensibly, but it's (a) not constant as the allocation of the chunks increases, and (b) not trivial to compute. The two probably won't need to be represented at the same time except during a reshape, because I imagine DUP gets converted to RAID (1 or 5) as soon as the second device is added. A 1→2 reshape would look a bit like this (doing only the data column and skipping totals): InitialDevice Reserved 1.21TB Used 1.21TB RAID1(InitialDevice, SecondDevice) Reserved 1.31TB + 100GB Used 2× 100GB RAID5, RAID6: same with fractions, n+1⁄n and n+2⁄n. Except that n isn't guaranteed to be constant. That was pretty much my only point. Don't assume that it will be (or at the very least, be aware that you are assuming it is, and be prepared for inconsistencies). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, sir, the floor is yours. But remember, the --- roof is ours! signature.asc Description: Digital signature
Re: How does btrfs handle sudden shutdowns?
On Tue, Nov 06, 2012 at 12:33:08PM +, Michael Kjörling wrote: Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm mainly thinking of power outages which lead to logical structure damage but not physical media damage.) In theory (i.e. by the design of the FS), you should be able to pull the plug on btrfs at any point, and the FS will always be consistent. This makes some assumptions: That writing a single page to the FS is atomic. That the hardware reports barriers to the OS reliably. i.e. if the hardware says it's fully stored data without losing it, then it actually has. There are also some caveats: while the FS should always be consistent, the latest transaction write may not have been completed, so you could potentially lose up to 30 seconds of writes to the FS from immediately before the crash. If the FS does corrupt over a power failure, and the hardware can be demonstrated to be good, then we have a bug that needs to be tracked down. (There have been a number of these over the development of the FS so far, but they do get fixed). What would be the risk points, file-system-wise? Can for example a rotating snapshot schedule mitigate some or all issues relating to sudden shutdowns, if any? (_For example_, take a snapshot every minute, keeping the last five; if the main file system fails to mount, then could the most recent usable snapshot be used as a fallback, or is it likely to be equally damaged or inconsistent?) No, snapshots give you no additional guarantees -- if the FS corrupts and is unmountable, a snapshot is part of the same FS and will also be unmountable. Obviously a UPS or other form of fallback power is preferable to no UPS if power outages are a concern, so as to allow a controlled system shutdown (or fail-over to a more long-term backup power supply) in the event of a prolonged power outage, but I'm wondering about situations where such don't exist or even fail. As I said above, the FS structures _should_ be completely reliable in the face of power loss; that they haven't been in the past is definitely a bug, and those bugs have been / are being fixed as they're found. We've had very few transid match failures recently, which used to be the main failure mode for these bugs. I don't know whether that's because people aren't reporting them, or because they're not happening nearly so often these days. I suspect the latter. I guess the question for you is: are you after the _expected_ behaviour of the FS (should always be consistent on good hardware, but you may lose up to 30 seconds of writes), or are you after mitigation strategies in the face of FS bugs (keep off-site backups and be prepared to use them)? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eighty Megabytes And Constantly Swapping. --- signature.asc Description: Digital signature