Re: Question of stability

2010-09-19 Thread Hugo Mills
On Sun, Sep 19, 2010 at 01:55:34AM +0200, Roy Sigurd Karlsbakk wrote:
 - Original Message -
  On Sat, Sep 18, 2010 at 11:37 PM, Roy Sigurd Karlsbakk
  r...@karlsbakk.net wrote:
   Hi all
  
   I've been on this list for a year or so, and I have been following
   progress for some more. Are there any chances of btrfs stabilizing,
   as in terms of usability in production? If so, how far are we from
   this?
  Hi,
  
  I am using btrfs as my root filesystem on my Debian squeeze machine
  for a few month now and so far I haven't experienced any problems.
  It seems quite stable for me. I am not using raid functions, but am
  also very interested in the progress in raid5/6.
 
 I was more interested in large setups than a general install.
 
 Question remains, when is btrfs supposed to be stable, as in usable for large 
 server setups?

   As has been pointed out by Anthony, there's no means of determining
when something is stable -- not just for filesystems, but for any
piece of software. All you can do is take a Bayesian approach: sum up
the number (and type) of failures, and compare it to the number of
user-hours that the software has been in use for, across all
installations. When that failure rate (and recovery rate) reaches the
point at which you're happy to use it in your situation -- whether
that's on your bleeding-edge desktop test box, or for running your
robotic heart surgeon -- you can call it stable. However, that point
has to be your decision for your particular use case.

   If you're now thinking, but where do I get that information
from?, congratulations -- you now know nearly as much about the user
base as the btrfs developers. :) Your best bet is to keep an eye on
this mailing list, and take a look at the number and type of reported
failures. When that drops to the point that you feel safe, go ahead
and use it.

   An alternative approach is to install a btrfs set-up on your
internal development or test machines (you *do* have a test
infrastructure for your mission critical systems, right?), and hammer
it with the closest you can get to a real workload, and see what
happens. Again, this is a statistical approach. It's the best we've
got.

   At some point, we(*) hope, btrfs will have millions upon millions
of users, doing all kinds of bad things to it, and tiny fractions of
them will have problems. When that happens, someone will probably
start calling it stable, and the name will stick. Until then, many
people are happy with it for their uses, but nobody can (or will)
magically stick a label on a piece of code of this complexity and say
it's stable now!

   Hugo.

(*) Speaking as an interested nobody, rather than a developer.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Be pure. Be vigilant. Behave. ---  


signature.asc
Description: Digital signature


Some devices missing behaviour

2010-10-09 Thread Hugo Mills
   I've just encountered some odd behaviour with regard to removed
devices. Brief summary:

 - It's hard (in some sense) to tell a btrfs filesystem that a device
   has been removed permanently, and seems to require an
   unmount/remount, or resize to do so.

 - Removed devices break btrfs dev scan

   Details follow:

# mkfs.btrfs -d raid10 -m raid10 -L btest /dev/primary/btrtest{1,2,3,4,5}
# mount /dev/primary/btrtest1 /mnt
# sudo btrfs fi show btest
failed to read /dev/sr0
Label: 'btest'  uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f
Total devices 5 FS bytes used 28.00KB
devid1 size 1.00GB used 276.00MB path /dev/dm-21
devid2 size 1.00GB used 136.00MB path /dev/dm-22
devid3 size 1.00GB used 136.00MB path /dev/dm-23
devid4 size 1.00GB used 264.00MB path /dev/dm-24
devid5 size 1.00GB used 264.00MB path /dev/dm-25

Btrfs v0.19-35-g1b444cd

   All well and good so far.

# btrfs dev del /dev/primary/btrtest4 /mnt
# btrfs fi show btest
failed to read /dev/sr0
Label: 'btest'  uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f
Total devices 5 FS bytes used 100.13MB
devid1 size 1.00GB used 190.38MB path /dev/dm-21
devid2 size 1.00GB used 170.38MB path /dev/dm-22
devid3 size 1.00GB used 170.38MB path /dev/dm-23
devid5 size 1.00GB used 170.38MB path /dev/dm-25
*** Some devices missing

Btrfs v0.19-35-g1b444cd

   Now, it's claiming that some devices are missing, but what if I
wanted to make this a permanent change? Say, the additional device was
one added temporarily to the array as part of a migration to new
hardware?

   On IRC, it was suggested that a rescan would fix it:

# btrfs dev scan
Scanning for Btrfs filesystems
failed to read /dev/sr0
# btrfs fi show btest
failed to read /dev/sr0
Label: 'btest'  uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f
Total devices 5 FS bytes used 100.13MB
devid1 size 1.00GB used 190.38MB path /dev/dm-21
devid2 size 1.00GB used 170.38MB path /dev/dm-22
devid3 size 1.00GB used 170.38MB path /dev/dm-23
devid5 size 1.00GB used 170.38MB path /dev/dm-25
*** Some devices missing

Btrfs v0.19-35-g1b444cd

   Nope. What about explicitly scanning the devices?

# btrfs dev scan /dev/primary/btrtest*
Scanning for Btrfs filesystems in '/dev/primary/btrtest1'
Scanning for Btrfs filesystems in '/dev/primary/btrtest2'
Scanning for Btrfs filesystems in '/dev/primary/btrtest3'
Scanning for Btrfs filesystems in '/dev/primary/btrtest4'
ERROR: unable to scan the device '/dev/primary/btrtest4'

   Note that it's stopped the scan immediately on encountering the
removed device, so btrtest5 hasn't been picked up. Maybe it's
something left in the device?

# dd if=/dev/zero of=/dev/primary/btrtest4
# btrfs dev scan /dev/primary/btrtest*
Scanning for Btrfs filesystems in '/dev/primary/btrtest1'
Scanning for Btrfs filesystems in '/dev/primary/btrtest2'
Scanning for Btrfs filesystems in '/dev/primary/btrtest3'
Scanning for Btrfs filesystems in '/dev/primary/btrtest4'
ERROR: unable to scan the device '/dev/primary/btrtest4'
# btrfs fi show btest
failed to read /dev/sr0
Label: 'btest'  uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f
Total devices 5 FS bytes used 100.13MB
devid1 size 1.00GB used 190.38MB path /dev/dm-21
devid2 size 1.00GB used 170.38MB path /dev/dm-22
devid3 size 1.00GB used 170.38MB path /dev/dm-23
devid5 size 1.00GB used 170.38MB path /dev/dm-25
*** Some devices missing

Btrfs v0.19-35-g1b444cd

   Zeroing the device has no effect. However, unmounting it does work,
partially:

# umount /mnt
# btrfs dev scan
Scanning for Btrfs filesystems
failed to read /dev/sr0
# btrfs dev scan /dev/primary/btrtest*
Scanning for Btrfs filesystems in '/dev/primary/btrtest1'
Scanning for Btrfs filesystems in '/dev/primary/btrtest2'
Scanning for Btrfs filesystems in '/dev/primary/btrtest3'
Scanning for Btrfs filesystems in '/dev/primary/btrtest4'
ERROR: unable to scan the device '/dev/primary/btrtest4'
# btrfs fi show btest
failed to read /dev/sr0
Label: 'btest'  uuid: 908f87d7-23d8-453e-ab04-ae1426306e0f
Total devices 4 FS bytes used 100.13MB
devid1 size 1.00GB used 190.38MB path /dev/dm-21
devid2 size 1.00GB used 170.38MB path /dev/dm-22
devid3 size 1.00GB used 170.38MB path /dev/dm-23
devid5 size 1.00GB used 170.38MB path /dev/dm-25

Btrfs v0.19-35-g1b444cd

   So, you need to unmount/remount the FS to make it believe that the
device removal is permanent. (As an aside, I also found that resizing
down by 1M then back to max has the same effect, if you don't want to
unmount). However, the explicit scan of the block devices is still
broken by the removed device, even after all data on it has been
zeroed.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from

Re: converting one-disk btrfs into RAID-1?

2010-10-12 Thread Hugo Mills
On Tue, Oct 12, 2010 at 11:34:31AM +0200, Tomasz Torcz wrote:
 On Tue, Oct 12, 2010 at 11:32:07AM +0200, David Brown wrote:
  Is it possible to view the raid levels of data and meta data for an
  existing btrfs filesystem?  It's easy to pick them when creating the
  system, but I couldn't find any way to view them afterwards.
 
   btrfs f df will show them, except for few kernel releases when the ioctl()
 was broken.

   Umm...

h...@vlad:~ $ sudo btrfs fi df /mnt/
[sudo] password for hrm: 
Data: total=303.01GB, used=302.16GB
Metadata: total=3.01GB, used=476.77MB
System: total=11.88MB, used=36.00KB

   This is the latest btrfs git kernel and tools. What should I be
seeing here?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- emacs: Emacs Makes A Computer Slow. ---   


signature.asc
Description: Digital signature


Metadata size

2010-10-14 Thread Hugo Mills
   I'm a little concerned about the size of my metadata. I'm doing
raid10 on both data and metadata, and:

h...@vlad:mnt $ sudo btrfs fi df /mnt
Data: total=488.01GB, used=487.23GB
Metadata: total=3.01GB, used=677.73MB
System: total=11.88MB, used=52.00KB

h...@vlad:mnt $ find /mnt | wc -l
20137

   By my calculations, that's something on the order of 17.5K per
filesystem object. This is mostly media files, plus some small
metadata files. 17.5K on average seems very large to me. I have quite
a bit of space on this system, so I'm not too concerned, but I wasn't
sure if this kind of figure was representative or not.

   Overall file count by size:

0-1021
10-100 153
100-1K 778
1K-10K 279
10K-100K96
100K-1M238
1M-10M   12556
10M-100M  3452
100M-1G332
1G-10G 171

0-1K   952
1K-1M  613
1M-1G16340
1G+171

   Interestingly, the metadata value was closer to 15K/object until my
last batch of writing, which was the 171 1G+ files (and a few in the
100M-1G range), plus an equal number of small (2K) files.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Trouble rather the tiger in his lair than the sage amongst ---
his books for to you kingdoms and their armies are mighty
and enduring,  but to him they are but toys of the moment
  to be overturned by the flicking of a finger.  


signature.asc
Description: Digital signature


Apologies

2010-10-17 Thread Hugo Mills
   I'm sorry about those last mails of mine. Clearly, nobody actually
uses quilt mail to send mails. Or at least has never documented
clearly how they do it.

   I shall test some more and try again.

   Irritated and embarrassed,
   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature


Re: [patch 2/4] Add an option to show ISO, binary or raw bytes counts using df.

2010-10-18 Thread Hugo Mills
On Mon, Oct 18, 2010 at 09:21:56AM +0100, Frank Kingswood wrote:
 On 17/10/10 19:26, hugo-l...@carfax.org.uk wrote:
 Change btrfs filesystem df to allow the user to control the scales
 used for sizes in the output.

 Index: btrfs-progs-unstable/btrfs.c
 ===
 --- btrfs-progs-unstable.orig/btrfs.c2010-10-17 18:43:57.0 
 +0100
 +++ btrfs-progs-unstable/btrfs.c 2010-10-17 18:47:36.0 +0100
 @@ -87,9 +87,10 @@
  Show the info of a btrfs filesystem. If nouuid  orlabel\n
  is passed, info of all the btrfs filesystem are shown.
  },
 -{ do_df_filesystem, 1,
 -  filesystem df, path\n
 +{ do_df_filesystem, -1,
 +  filesystem df, [-r|-b|-i]path\n
  Show space usage information for a mount point\n.
 +-r, -b, -i for raw (bytes), binary or ISO sizes.
  },

 This seems to eat up the short option namespace a bit quickly.
 Fileutils uses different names as well, it may be convenient for users to 
 match its names:
   -h --human-readable   powers of 2**10
   -H --si   powers of 1000

   Matching fileutils is probably a good idea. I'm happy to use -h and
-H.

  { do_balance, 1,
filesystem balance, path\n
 Index: btrfs-progs-unstable/btrfs_cmds.c
 ===
 --- btrfs-progs-unstable.orig/btrfs_cmds.c   2010-10-17 18:43:57.0 
 +0100
 +++ btrfs-progs-unstable/btrfs_cmds.c2010-10-17 18:47:36.0 
 +0100
 @@ -841,7 +841,36 @@
  u64 count = 0, i;
  int ret;
  int fd;
 -char *path = argv[1];
 +char *path;
 +int format = PRETTY_SIZE_BINARY;

 Should the default not be to show sizes in bytes (RAW)?

   I was trying not to change the default behaviour at all, but with
-h/-H (and no switch for --raw), that would make sense. I'll re-roll
the patches. (And update the man pages, as Goffredo asked).

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I don't like the look of it,  I tell you. Well, stop --- 
  looking at it, then.  


signature.asc
Description: Digital signature


csum errors, and resizing...

2010-10-21 Thread Hugo Mills
   Just encountered an interesting issue.

   Rapid summary: when a resize encounters a file with broken
checksums, it stops, and will not (apparently) proceed any further.
Un/remount seems to clear the error condition.

   I've got a filesystem with some (lots of) checksum errors on it.
It lives on a single partition. In trying to move all the data off
this, onto a btrfs raid10 filesystem, I've been moving data, and
shrinking the filesystem. The shrink process has now hit some of those
csum errors:

h...@vlad:~ $ sudo btrfs fi show -h
failed to read /dev/sr0
Label: none  uuid: fad2f415-979d-405e-9aa2-0c1011389273
Total devices 1 FS bytes used 660.75GiB
devid1 size 675.40GiB used 1019.00GiB path /dev/dm-14
[...]

h...@vlad:~ $ sudo strace btrfs fi resize 708209608k /media/vlad/video
[...]
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fcd26974000
write(1, Resize '/media/vlad/video' of '70..., 43Resize '/media/vlad/video' 
of '708209608k'
) = 43
ioctl(3, 0x50009403, 0x7fffe9a8d140)= -1 EIO (Input/output error)
close(3)= 0
write(2, ERROR: unable to resize '/media/v..., 44ERROR: unable to resize 
'/media/vlad/video'
) = 44
exit_group(30)  = ?

   In syslog, I get a bunch of csum errors:

Oct 21 19:40:01 vlad kernel: new size for /dev/mapper/media-video is 
725206638592
Oct 21 19:40:03 vlad kernel: btrfs: relocating block group 1090913304576 flags 1
Oct 21 19:40:05 vlad kernel: btrfs_readpage_end_io_hook: 4088 callbacks 
suppressed
Oct 21 19:40:05 vlad kernel: btrfs csum failed ino 257 off 131072 csum 
752820288 private 2880127001
Oct 21 19:40:05 vlad kernel: btrfs csum failed ino 257 off 135168 csum 
2112861244 private 3414608960
[and more]

   This is, I suppose, expected.

   However, it seems to put the filesystem into a state where a resize
cannot be attempted again:

h...@vlad:~ $ sudo strace btrfs fi resize 708209608k /media/vlad/video
[...]
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f559f8af000
write(1, Resize '/media/vlad/video' of '70..., 43Resize '/media/vlad/video' 
of '708209608k'
) = 43
ioctl(3, 0x50009403, 0x7fff8dfda770)= -1 EINVAL (Invalid argument)
close(3)= 0
write(2, ERROR: unable to resize '/media/v..., 44ERROR: unable to resize 
'/media/vlad/video'
) = 44
exit_group(30)  = ?

   Unmounting and remounting it resets the resize state, and I end up
back in the first state again. Is this toggling of state intended?

   I'm on the git unstable kernel. Should I go up to 2.6.36 and try
again? The other thing I can think of to do is to delete some of the
files with bad checksums (I have backups) and see if I can get any
further with the resize.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- We teach people management skills by examining characters in ---   
Shakespeare.  You could look at Claudius's crisis
   management techniques, for example.   


signature.asc
Description: Digital signature


[patch v2 0/4] Size reporting in userspace tools

2010-10-21 Thread Hugo Mills
   While playing around with resizing volumes recently, I realised
that I didn't know whether btrfs fi show and btrfs fi df reported
sizes in ISO (e.g. powers of 10^3) units, as they appear to from the
labels they use, or in binary (powers of 2^10) units. Also, a mere
three significant figures is somewhat less than I'm comfortable with
if I'm about to resize the containing block device downwards.
   
   This patch series adds the ability to pick which scale is used for
show and df, and labels the amounts properly (e.g. MB for ISO, MiB for
binary units).

   I've incorporated Frank's suggestion of defaulting to raw, and
matching coreutils' use of -h and -H. I've also updated the man pages
as requested by Goffredo.

   Hugo.
   
-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v2 2/4] Add an option to show ISO, binary or raw bytes counts using df.

2010-10-21 Thread Hugo Mills
Change btrfs filesystem df to allow the user to control the scales
used for sizes in the output.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c|6 +++---
 btrfs_cmds.c   |   42 --
 man/btrfs.8.in |8 
 3 files changed, 47 insertions(+), 9 deletions(-)

Index: btrfs-progs-unstable/btrfs.c
===
--- btrfs-progs-unstable.orig/btrfs.c   2010-10-20 19:12:18.0 +0100
+++ btrfs-progs-unstable/btrfs.c2010-10-20 19:48:34.0 +0100
@@ -87,9 +87,9 @@
Show the info of a btrfs filesystem. If no uuid or label\n
is passed, info of all the btrfs filesystem are shown.
},
-   { do_df_filesystem, 1,
- filesystem df, path\n
-   Show space usage information for a mount point\n.
+   { do_df_filesystem, -1,
+ filesystem df, [options] path\n
+   Show space usage information for a mount point.
},
{ do_balance, 1,
  filesystem balance, path\n
Index: btrfs-progs-unstable/btrfs_cmds.c
===
--- btrfs-progs-unstable.orig/btrfs_cmds.c  2010-10-20 19:19:20.0 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.c   2010-10-20 19:58:48.0 +0100
@@ -14,7 +14,6 @@
  * Boston, MA 021110-1307, USA.
  */
 
-
 #include stdio.h
 #include stdlib.h
 #include string.h
@@ -28,6 +27,7 @@
 #include limits.h
 #include uuid/uuid.h
 #include ctype.h
+#include getopt.h
 
 #undef ULONG_MAX
 
@@ -835,13 +835,45 @@
return 0;
 }
 
+const struct option df_options[] = {
+   { human-readable, 0, NULL, 'h' },
+   { si, 0, NULL, 'H' },
+   { NULL, 0, NULL, 0 }
+};
+
 int do_df_filesystem(int nargs, char **argv)
 {
struct btrfs_ioctl_space_args *sargs;
u64 count = 0, i;
int ret;
int fd;
-   char *path = argv[1];
+   char *path;
+   int format = PRETTY_SIZE_RAW;
+
+   optind = 1;
+   while(1) {
+   int c = getopt_long(nargs, argv, hH, df_options, NULL);
+   if (c  0)
+   break;
+   switch(c) {
+   case 'h':
+   format = PRETTY_SIZE_BINARY;
+   break;
+   case 'H':
+   format = PRETTY_SIZE_ISO;
+   break;
+   default:
+   fprintf(stderr, Invalid arguments for df\n);
+   free(argv);
+   return 1;
+   }
+   }
+   if (nargs - optind != 1) {
+   fprintf(stderr, No path given for df\n);
+   free(argv);
+   return 1;
+   }
+   path = argv[optind];
 
fd = open_file_or_dir(path);
if (fd  0) {
@@ -914,10 +946,8 @@
written += 8;
}
 
-   total_bytes = pretty_sizes(sargs-spaces[i].total_bytes,
-   
PRETTY_SIZE_RAW);
-   used_bytes = pretty_sizes(sargs-spaces[i].used_bytes,
-   
PRETTY_SIZE_RAW);
+   total_bytes = pretty_sizes(sargs-spaces[i].total_bytes, 
format);
+   used_bytes = pretty_sizes(sargs-spaces[i].used_bytes, format);
printf(%s: total=%s, used=%s\n, description, total_bytes,
   used_bytes);
}
Index: btrfs-progs-unstable/man/btrfs.8.in
===
--- btrfs-progs-unstable.orig/man/btrfs.8.in2010-10-20 19:23:36.0 
+0100
+++ btrfs-progs-unstable/man/btrfs.8.in 2010-10-20 19:28:14.0 +0100
@@ -21,6 +21,8 @@
 .PP
 \fBbtrfs\fP \fBfilesystem resize\fP\fI [+/\-]size[gkm]|max filesystem\fP
 .PP
+\fBbtrfs\fP \fBfilesystem df\fP\fI [options] path\fP
+.PP
 \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP
 .PP
 \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP
@@ -143,6 +145,12 @@
 passed, \fBbtrfs\fR show info of all the btrfs filesystem.
 .TP
 
+\fBfilesystem df\fR [options] path\fR
+Show the amount of space used on this filesystem, in bytes. Options:
+-h, --human-readable Use powers of 2^10 (1024) to report sizes.
+-H, --si Use powers of 10^3 (1000) to report sizes, in SI multiples.
+.TP
+
 \fBdevice balance\fR \fIpath\fR
 Balance the chunks of the filesystem identified by \fIpath\fR
 across the devices.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord

[patch v2 4/4] Add an option to show ISO, binary or raw bytes counts using btrfs-show.

2010-10-21 Thread Hugo Mills
Change btrfs-show to allow the user to control the scales used for
sizes in the output.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs-show.c|   27 +++
 man/btrfs-show.8.in |   10 --
 2 files changed, 27 insertions(+), 10 deletions(-)

Index: btrfs-progs-unstable/btrfs-show.c
===
--- btrfs-progs-unstable.orig/btrfs-show.c  2010-10-20 19:48:33.0 
+0100
+++ btrfs-progs-unstable/btrfs-show.c   2010-10-20 20:18:11.0 +0100
@@ -52,7 +52,7 @@
return 0;
 }
 
-static void print_one_uuid(struct btrfs_fs_devices *fs_devices)
+static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format)
 {
char uuidbuf[37];
struct list_head *cur;
@@ -69,8 +69,7 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used,
-   
PRETTY_SIZE_RAW);
+   super_bytes_used = pretty_sizes(device-super_bytes_used, format);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -82,8 +81,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
-   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
+   total_bytes = pretty_sizes(device-total_bytes, format);
+   bytes_used = pretty_sizes(device-bytes_used, format);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
@@ -99,13 +98,18 @@
 
 static void print_usage(void)
 {
-   fprintf(stderr, usage: btrfs-show [search label or device]\n);
+   fprintf(stderr, usage: btrfs-show [options] [search label or 
device]\n);
+   fprintf(stderr, Options:\n);
+   fprintf(stderr, \t-h, --human-readable\tShow sizes in powers of 
2^10.\n);
+   fprintf(stderr, \t-s, --si\t\tShow sizes in powers of 10^3 (SI 
multiples).\n);
fprintf(stderr, %s\n, BTRFS_BUILD_VERSION);
exit(1);
 }
 
 static struct option long_options[] = {
/* { byte-count, 1, NULL, 'b' }, */
+   { human-readable, 0, NULL, 'h' },
+   { si, 0, NULL, 'H' },
{ 0, 0, 0, 0}
 };
 
@@ -117,14 +121,21 @@
char *search = NULL;
int ret;
int option_index = 0;
+   int format = PRETTY_SIZE_RAW;
 
while(1) {
int c;
-   c = getopt_long(ac, av, , long_options,
+   c = getopt_long(ac, av, hH, long_options,
option_index);
if (c  0)
break;
switch(c) {
+   case 'H':
+   format = PRETTY_SIZE_ISO;
+   break;
+   case 'h':
+   format = PRETTY_SIZE_BINARY;
+   break;
default:
print_usage();
}
@@ -144,7 +155,7 @@
list);
if (search  uuid_search(fs_devices, search) == 0)
continue;
-   print_one_uuid(fs_devices);
+   print_one_uuid(fs_devices, format);
}
printf(%s\n, BTRFS_BUILD_VERSION);
return 0;
Index: btrfs-progs-unstable/man/btrfs-show.8.in
===
--- btrfs-progs-unstable.orig/man/btrfs-show.8.in   2010-10-20 
20:15:29.0 +0100
+++ btrfs-progs-unstable/man/btrfs-show.8.in2010-10-20 20:17:30.0 
+0100
@@ -2,13 +2,19 @@
 .SH NAME
 btrfs-show \- scan the /dev directory for btrfs partitions and print results.
 .SH SYNOPSIS
-.B btrfs-show
+.B btrfs-show [options]
 .SH DESCRIPTION
 .B btrfs-show
 is used to scan the /dev directory for btrfs partitions and display brief
 information such as lable, uuid, etc of each btrfs partition.
 .SH OPTIONS
-none
+.TP
+\fB\-h\fR, \fB\-\-human\-readable\fR
+Show values in multiples of 2^10.
+.TP
+\fB\-H\fR, \fB\-\-si\fR
+Show values in multiples of 10^3 (SI multiples).
+
 .SH AVAILABILITY
 .B btrfs-show
 is part of btrfs-progs. Btrfs is currently under heavy development,

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More

[patch v2 3/4] Add an option to show ISO, binary or raw bytes counts using show.

2010-10-21 Thread Hugo Mills
Change btrfs filesystem show to allow the user to control the scales
used for sizes in the output.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c|2 +-
 btrfs_cmds.c   |   45 ++---
 man/btrfs.8.in |   10 ++
 3 files changed, 49 insertions(+), 8 deletions(-)

Index: btrfs-progs-unstable/btrfs.c
===
--- btrfs-progs-unstable.orig/btrfs.c   2010-10-20 20:03:37.0 +0100
+++ btrfs-progs-unstable/btrfs.c2010-10-20 20:11:03.0 +0100
@@ -83,7 +83,7 @@
will occupe all available space on the device.
},
{ do_show_filesystem, 999,
- filesystem show, [uuid|label]\n
+ filesystem show, [options] [uuid|label]\n
Show the info of a btrfs filesystem. If no uuid or label\n
is passed, info of all the btrfs filesystem are shown.
},
Index: btrfs-progs-unstable/btrfs_cmds.c
===
--- btrfs-progs-unstable.orig/btrfs_cmds.c  2010-10-20 20:03:37.0 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.c   2010-10-20 20:08:00.0 +0100
@@ -617,7 +617,7 @@
return 0;
 }
 
-static void print_one_uuid(struct btrfs_fs_devices *fs_devices)
+static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format)
 {
char uuidbuf[37];
struct list_head *cur;
@@ -634,8 +634,7 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used,
-   
PRETTY_SIZE_RAW);
+   super_bytes_used = pretty_sizes(device-super_bytes_used, format);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -647,8 +646,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
-   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
+   total_bytes = pretty_sizes(device-total_bytes, format);
+   bytes_used = pretty_sizes(device-bytes_used, format);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
@@ -662,13 +661,45 @@
printf(\n);
 }
 
+const struct option show_options[] = {
+   { human-readable, 0, NULL, 'h' },
+   { si, 0, NULL, 'H' },
+   { NULL, 0, NULL, 0 }
+};
+
 int do_show_filesystem(int argc, char **argv)
 {
struct list_head *all_uuids;
struct btrfs_fs_devices *fs_devices;
struct list_head *cur_uuid;
-   char *search = argv[1];
+   char *search;
int ret;
+   int format = PRETTY_SIZE_RAW;
+
+   optind = 1;
+   while(1) {
+   int c = getopt_long(argc, argv, hH, show_options, NULL);
+   if (c  0)
+   break;
+   switch(c) {
+   case 'h':
+   format = PRETTY_SIZE_BINARY;
+   break;
+   case 'H':
+   format = PRETTY_SIZE_ISO;
+   break;
+   default:
+   fprintf(stderr, Invalid arguments for show\n);
+   free(argv);
+   return 1;
+   }
+   }
+   if (argc - optind  1) {
+   fprintf(stderr, Too many arguments for show\n);
+   free(argv);
+   return 1;
+   }
+   search = argv[optind];
 
ret = btrfs_scan_one_dir(/dev, 0);
if (ret){
@@ -682,7 +713,7 @@
list);
if (search  uuid_search(fs_devices, search) == 0)
continue;
-   print_one_uuid(fs_devices);
+   print_one_uuid(fs_devices, format);
}
printf(%s\n, BTRFS_BUILD_VERSION);
return 0;
Index: btrfs-progs-unstable/man/btrfs.8.in
===
--- btrfs-progs-unstable.orig/man/btrfs.8.in2010-10-20 20:03:53.0 
+0100
+++ btrfs-progs-unstable/man/btrfs.8.in 2010-10-20 20:08:15.0 +0100
@@ -23,6 +23,8 @@
 .PP
 \fBbtrfs\fP \fBfilesystem df\fP\fI [options] path\fP
 .PP
+\fBbtrfs\fP \fBfilesystem show\fP\fI [options] [uuid|label]\fP
+.PP
 \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP
 .PP
 \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP
@@ -151,6 +153,14 @@
 -H, --si Use powers of 10^3 (1000) to report sizes, in SI multiples.
 .TP
 
+\fBfilesystem show\fR [options] [uuid|label]\fR
+Show the usage of each device in the btrfs filesystem with the given
+uuid or label, or all

[patch v2 1/4] Update pretty-printer for different systems of counting multiples.

2010-10-21 Thread Hugo Mills
Make the pretty-printer for data sizes capable of printing in ISO
(powers of 10^3), binary (powers of 2^10) or raw (a simple byte
count).

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs-show.c |7 ---
 btrfs_cmds.c |   13 -
 mkfs.c   |3 ++-
 utils.c  |   48 +---
 utils.h  |7 ++-
 5 files changed, 53 insertions(+), 25 deletions(-)

Index: btrfs-progs-unstable/btrfs-show.c
===
--- btrfs-progs-unstable.orig/btrfs-show.c  2010-10-09 15:39:09.0 
+0100
+++ btrfs-progs-unstable/btrfs-show.c   2010-10-20 19:20:02.0 +0100
@@ -69,7 +69,8 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used);
+   super_bytes_used = pretty_sizes(device-super_bytes_used,
+   
PRETTY_SIZE_RAW);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -81,8 +82,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes);
-   bytes_used = pretty_sizes(device-bytes_used);
+   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
+   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
Index: btrfs-progs-unstable/btrfs_cmds.c
===
--- btrfs-progs-unstable.orig/btrfs_cmds.c  2010-10-09 15:39:09.0 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.c   2010-10-20 19:19:20.0 +0100
@@ -634,7 +634,8 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used);
+   super_bytes_used = pretty_sizes(device-super_bytes_used,
+   
PRETTY_SIZE_RAW);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -646,8 +647,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes);
-   bytes_used = pretty_sizes(device-bytes_used);
+   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
+   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
@@ -913,8 +914,10 @@
written += 8;
}
 
-   total_bytes = pretty_sizes(sargs-spaces[i].total_bytes);
-   used_bytes = pretty_sizes(sargs-spaces[i].used_bytes);
+   total_bytes = pretty_sizes(sargs-spaces[i].total_bytes,
+   
PRETTY_SIZE_RAW);
+   used_bytes = pretty_sizes(sargs-spaces[i].used_bytes,
+   
PRETTY_SIZE_RAW);
printf(%s: total=%s, used=%s\n, description, total_bytes,
   used_bytes);
}
Index: btrfs-progs-unstable/mkfs.c
===
--- btrfs-progs-unstable.orig/mkfs.c2010-10-09 15:39:09.0 +0100
+++ btrfs-progs-unstable/mkfs.c 2010-10-17 19:35:08.0 +0100
@@ -524,7 +524,8 @@
printf(fs created label %s on %s\n\tnodesize %u leafsize %u 
sectorsize %u size %s\n,
label, first_file, nodesize, leafsize, sectorsize,
-   pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy)));
+   
pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy),
+   PRETTY_SIZE_BINARY));
 
printf(%s\n, BTRFS_BUILD_VERSION);
btrfs_commit_transaction(trans, root);
Index: btrfs-progs-unstable/utils.c
===
--- btrfs-progs-unstable.orig/utils.c   2010-10-09 15:39:09.0 +0100
+++ btrfs-progs-unstable/utils.c2010-10-17 19:35:08.0 +0100
@@ -966,30 +966,48 @@
return ret;
 }
 
-static char *size_strs[] = { , KB, MB, GB, TB,
+static char *bin_size_strs[] = { , KiB, MiB, GiB, TiB,
+   PiB, EiB, ZiB, YiB};
+static char *iso_size_strs[] = { , kB, MB, GB, TB,
PB, EB, ZB, YB

[patch v3 3/4] Add an option to show ISO, binary or raw bytes counts using show.

2010-10-26 Thread Hugo Mills
Change btrfs filesystem show to allow the user to control the scales
used for sizes in the output.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c|2 +-
 btrfs_cmds.c   |   45 ++---
 man/btrfs.8.in |   15 ++-
 3 files changed, 49 insertions(+), 13 deletions(-)

Index: btrfs-progs-unstable/btrfs.c
===
--- btrfs-progs-unstable.orig/btrfs.c   2010-10-26 13:01:43.0 +0100
+++ btrfs-progs-unstable/btrfs.c2010-10-26 13:02:40.814489740 +0100
@@ -83,7 +83,7 @@
will occupe all available space on the device.
},
{ do_show_filesystem, 999,
- filesystem show, [uuid|label]\n
+ filesystem show, [-h|--human-readable|-H|--si] [uuid|label]\n
Show the info of a btrfs filesystem. If no uuid or label\n
is passed, info of all the btrfs filesystem are shown.
},
Index: btrfs-progs-unstable/btrfs_cmds.c
===
--- btrfs-progs-unstable.orig/btrfs_cmds.c  2010-10-26 13:00:39.0 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.c   2010-10-26 13:02:40.834488902 +0100
@@ -617,7 +617,7 @@
return 0;
 }
 
-static void print_one_uuid(struct btrfs_fs_devices *fs_devices)
+static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format)
 {
char uuidbuf[37];
struct list_head *cur;
@@ -634,8 +634,7 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used,
-   
PRETTY_SIZE_RAW);
+   super_bytes_used = pretty_sizes(device-super_bytes_used, format);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -647,8 +646,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
-   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
+   total_bytes = pretty_sizes(device-total_bytes, format);
+   bytes_used = pretty_sizes(device-bytes_used, format);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
@@ -662,13 +661,45 @@
printf(\n);
 }
 
+const struct option show_options[] = {
+   { human-readable, 0, NULL, 'h' },
+   { si, 0, NULL, 'H' },
+   { NULL, 0, NULL, 0 }
+};
+
 int do_show_filesystem(int argc, char **argv)
 {
struct list_head *all_uuids;
struct btrfs_fs_devices *fs_devices;
struct list_head *cur_uuid;
-   char *search = argv[1];
+   char *search;
int ret;
+   int format = PRETTY_SIZE_RAW;
+
+   optind = 1;
+   while(1) {
+   int c = getopt_long(argc, argv, hH, show_options, NULL);
+   if (c  0)
+   break;
+   switch(c) {
+   case 'h':
+   format = PRETTY_SIZE_BINARY;
+   break;
+   case 'H':
+   format = PRETTY_SIZE_ISO;
+   break;
+   default:
+   fprintf(stderr, Invalid arguments for show\n);
+   free(argv);
+   return 1;
+   }
+   }
+   if (argc - optind  1) {
+   fprintf(stderr, Too many arguments for show\n);
+   free(argv);
+   return 1;
+   }
+   search = argv[optind];
 
ret = btrfs_scan_one_dir(/dev, 0);
if (ret){
@@ -682,7 +713,7 @@
list);
if (search  uuid_search(fs_devices, search) == 0)
continue;
-   print_one_uuid(fs_devices);
+   print_one_uuid(fs_devices, format);
}
printf(%s\n, BTRFS_BUILD_VERSION);
return 0;
Index: btrfs-progs-unstable/man/btrfs.8.in
===
--- btrfs-progs-unstable.orig/man/btrfs.8.in2010-10-26 13:01:27.0 
+0100
+++ btrfs-progs-unstable/man/btrfs.8.in 2010-10-26 13:03:43.941854637 +0100
@@ -23,6 +23,8 @@
 .PP
 \fBbtrfs\fP \fBfilesystem df\fP\fI [-h|-H|--human-readable|--si] path\fP
 .PP
+\fBbtrfs\fP \fBfilesystem show\fP\fI [-h|--human-readable|-H|--si] 
[uuid|label]\fP
+.PP
 \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP
 .PP
 \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP
@@ -140,16 +142,19 @@
 partition after reducing the size of the filesystem.
 .TP
 
-\fBfilesystem show\fR [uuid|label]\fR
-Show the btrfs filesystem with some additional

[patch v3 1/4] Update pretty-printer for different systems of counting multiples.

2010-10-26 Thread Hugo Mills
Make the pretty-printer for data sizes capable of printing in ISO
(powers of 10^3), binary (powers of 2^10) or raw (a simple byte
count).

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs-show.c |7 ---
 btrfs_cmds.c |   13 -
 mkfs.c   |3 ++-
 utils.c  |   48 +---
 utils.h  |7 ++-
 5 files changed, 53 insertions(+), 25 deletions(-)

Index: btrfs-progs-unstable/btrfs-show.c
===
--- btrfs-progs-unstable.orig/btrfs-show.c  2010-10-09 15:39:09.0 
+0100
+++ btrfs-progs-unstable/btrfs-show.c   2010-10-20 19:20:02.0 +0100
@@ -69,7 +69,8 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used);
+   super_bytes_used = pretty_sizes(device-super_bytes_used,
+   
PRETTY_SIZE_RAW);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -81,8 +82,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes);
-   bytes_used = pretty_sizes(device-bytes_used);
+   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
+   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
Index: btrfs-progs-unstable/btrfs_cmds.c
===
--- btrfs-progs-unstable.orig/btrfs_cmds.c  2010-10-09 15:39:09.0 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.c   2010-10-20 19:19:20.0 +0100
@@ -634,7 +634,8 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used);
+   super_bytes_used = pretty_sizes(device-super_bytes_used,
+   
PRETTY_SIZE_RAW);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -646,8 +647,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes);
-   bytes_used = pretty_sizes(device-bytes_used);
+   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
+   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
@@ -913,8 +914,10 @@
written += 8;
}
 
-   total_bytes = pretty_sizes(sargs-spaces[i].total_bytes);
-   used_bytes = pretty_sizes(sargs-spaces[i].used_bytes);
+   total_bytes = pretty_sizes(sargs-spaces[i].total_bytes,
+   
PRETTY_SIZE_RAW);
+   used_bytes = pretty_sizes(sargs-spaces[i].used_bytes,
+   
PRETTY_SIZE_RAW);
printf(%s: total=%s, used=%s\n, description, total_bytes,
   used_bytes);
}
Index: btrfs-progs-unstable/mkfs.c
===
--- btrfs-progs-unstable.orig/mkfs.c2010-10-09 15:39:09.0 +0100
+++ btrfs-progs-unstable/mkfs.c 2010-10-17 19:35:08.0 +0100
@@ -524,7 +524,8 @@
printf(fs created label %s on %s\n\tnodesize %u leafsize %u 
sectorsize %u size %s\n,
label, first_file, nodesize, leafsize, sectorsize,
-   pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy)));
+   
pretty_sizes(btrfs_super_total_bytes(root-fs_info-super_copy),
+   PRETTY_SIZE_BINARY));
 
printf(%s\n, BTRFS_BUILD_VERSION);
btrfs_commit_transaction(trans, root);
Index: btrfs-progs-unstable/utils.c
===
--- btrfs-progs-unstable.orig/utils.c   2010-10-09 15:39:09.0 +0100
+++ btrfs-progs-unstable/utils.c2010-10-17 19:35:08.0 +0100
@@ -966,30 +966,48 @@
return ret;
 }
 
-static char *size_strs[] = { , KB, MB, GB, TB,
+static char *bin_size_strs[] = { , KiB, MiB, GiB, TiB,
+   PiB, EiB, ZiB, YiB};
+static char *iso_size_strs[] = { , kB, MB, GB, TB,
PB, EB, ZB, YB

[patch v3 0/4] Size reporting of btrfs tool

2010-10-26 Thread Hugo Mills
   While playing around with resizing volumes recently, I realised
that I didn't know whether btrfs fi show and btrfs fi df reported
sizes in ISO (e.g. powers of 10^3) units, as they appear to from the
labels they use, or in binary (powers of 2^10) units. Also, a mere
three significant figures is somewhat less than I'm comfortable with
if I'm about to resize the containing block device downwards.
   
   This patch series adds the ability to pick which scale is used for
show and df, and labels the amounts properly (e.g. MB for ISO, MiB for
binary units).

   I've incorporated Frank's suggestion of defaulting to raw, and
matching coreutils' use of -h and -H. I've also updated the man pages
and command help as requested by Goffredo.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Jazz is the sort of music where no-one plays anything the ---
 same way once.  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v3 4/4] Add an option to show ISO, binary or raw bytes counts using btrfs-show.

2010-10-26 Thread Hugo Mills
Change btrfs-show to allow the user to control the scales used for
sizes in the output.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs-show.c|   27 +++
 man/btrfs-show.8.in |   10 --
 2 files changed, 27 insertions(+), 10 deletions(-)

Index: btrfs-progs-unstable/btrfs-show.c
===
--- btrfs-progs-unstable.orig/btrfs-show.c  2010-10-26 12:56:54.179226836 
+0100
+++ btrfs-progs-unstable/btrfs-show.c   2010-10-26 13:05:48.626702902 +0100
@@ -52,7 +52,7 @@
return 0;
 }
 
-static void print_one_uuid(struct btrfs_fs_devices *fs_devices)
+static void print_one_uuid(struct btrfs_fs_devices *fs_devices, int format)
 {
char uuidbuf[37];
struct list_head *cur;
@@ -69,8 +69,7 @@
else
printf(Label: none );
 
-   super_bytes_used = pretty_sizes(device-super_bytes_used,
-   
PRETTY_SIZE_RAW);
+   super_bytes_used = pretty_sizes(device-super_bytes_used, format);
 
total = device-total_devs;
printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf,
@@ -82,8 +81,8 @@
char *total_bytes;
char *bytes_used;
device = list_entry(cur, struct btrfs_device, dev_list);
-   total_bytes = pretty_sizes(device-total_bytes, 
PRETTY_SIZE_RAW);
-   bytes_used = pretty_sizes(device-bytes_used, PRETTY_SIZE_RAW);
+   total_bytes = pretty_sizes(device-total_bytes, format);
+   bytes_used = pretty_sizes(device-bytes_used, format);
printf(\tdevid %4llu size %s used %s path %s\n,
   (unsigned long long)device-devid,
   total_bytes, bytes_used, device-name);
@@ -99,13 +98,18 @@
 
 static void print_usage(void)
 {
-   fprintf(stderr, usage: btrfs-show [search label or device]\n);
+   fprintf(stderr, usage: btrfs-show [options] [search label or 
device]\n);
+   fprintf(stderr, Options:\n);
+   fprintf(stderr, \t-h, --human-readable\tShow sizes in powers of 
2^10.\n);
+   fprintf(stderr, \t-s, --si\t\tShow sizes in powers of 10^3 (SI 
multiples).\n);
fprintf(stderr, %s\n, BTRFS_BUILD_VERSION);
exit(1);
 }
 
 static struct option long_options[] = {
/* { byte-count, 1, NULL, 'b' }, */
+   { human-readable, 0, NULL, 'h' },
+   { si, 0, NULL, 'H' },
{ 0, 0, 0, 0}
 };
 
@@ -117,14 +121,21 @@
char *search = NULL;
int ret;
int option_index = 0;
+   int format = PRETTY_SIZE_RAW;
 
while(1) {
int c;
-   c = getopt_long(ac, av, , long_options,
+   c = getopt_long(ac, av, hH, long_options,
option_index);
if (c  0)
break;
switch(c) {
+   case 'H':
+   format = PRETTY_SIZE_ISO;
+   break;
+   case 'h':
+   format = PRETTY_SIZE_BINARY;
+   break;
default:
print_usage();
}
@@ -144,7 +155,7 @@
list);
if (search  uuid_search(fs_devices, search) == 0)
continue;
-   print_one_uuid(fs_devices);
+   print_one_uuid(fs_devices, format);
}
printf(%s\n, BTRFS_BUILD_VERSION);
return 0;
Index: btrfs-progs-unstable/man/btrfs-show.8.in
===
--- btrfs-progs-unstable.orig/man/btrfs-show.8.in   2010-10-26 
12:56:54.189226427 +0100
+++ btrfs-progs-unstable/man/btrfs-show.8.in2010-10-26 13:06:51.074147050 
+0100
@@ -2,13 +2,19 @@
 .SH NAME
 btrfs-show \- scan the /dev directory for btrfs partitions and print results.
 .SH SYNOPSIS
-.B btrfs-show
+.B btrfs-show [-h|-H|--human-readable|--si]
 .SH DESCRIPTION
 .B btrfs-show
 is used to scan the /dev directory for btrfs partitions and display brief
 information such as lable, uuid, etc of each btrfs partition.
 .SH OPTIONS
-none
+.TP
+\fB\-h\fR, \fB\-\-human\-readable\fR
+Show values in multiples of 2^10.
+.TP
+\fB\-H\fR, \fB\-\-si\fR
+Show values in multiples of 10^3 (SI multiples).
+
 .SH AVAILABILITY
 .B btrfs-show
 is part of btrfs-progs. Btrfs is currently under heavy development,


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/2] Control filesystem balances (kernel side)

2010-10-29 Thread Hugo Mills
   These two patches give a degree of control over balance operations.
The first makes it possible to get an idea of how much work remains to
do, by tracking the number of block groups (chunks) that need to be
moved/rewritten. The second patch allows a running balance operation
to be cancelled when the current block group has been moved.

   One fundamental question, though -- is the progress monitor
function best implemented as an ioctl, as I've done here, or should it
be two or three sysfs files? I'm thinking of /proc/mdstat...
Obviously, /proc/mdstat would never get into /sys, but exposing the
expected and remaining values as files has an attractive
simplicity to it.

   The user-space side of things are in a separate patch series, to
follow.

   Please be gentle with me, this is my first (serious, non-trivial)
kernel patch. :)

   Hugo.


-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- No!  My collection of rare, incurable diseases! Violated! ---   

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2] Balance progress monitoring.

2010-10-29 Thread Hugo Mills
This patch introduces a basic form of progress monitoring for balance
operations, by counting the number of block groups remaining. The
information is exposed to userspace by an ioctl.

Signed-off-by: Hugo Mills h...@carfax.org.uk

---
 fs/btrfs/ctree.h   |9 
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |   34 
 fs/btrfs/ioctl.h   |7 ++
 fs/btrfs/volumes.c |   55 +++--
 5 files changed, 105 insertions(+), 2 deletions(-)

Index: linux-mainline/fs/btrfs/ctree.h
===
--- linux-mainline.orig/fs/btrfs/ctree.h2010-10-26 18:03:38.0 
+0100
+++ linux-mainline/fs/btrfs/ctree.h 2010-10-29 17:20:43.860460761 +0100
@@ -803,6 +803,11 @@
struct list_head cluster_list;
 };
 
+struct btrfs_balance_info {
+   u64 expected;
+   u64 completed;
+};
+
 struct reloc_control;
 struct btrfs_device;
 struct btrfs_fs_devices;
@@ -1010,6 +1015,10 @@
unsigned metadata_ratio;
 
void *bdev_holder;
+
+   /* Keep track of any rebalance operations on this FS */
+   spinlock_t balance_info_lock;
+   struct btrfs_balance_info *balance_info;
 };
 
 /*
Index: linux-mainline/fs/btrfs/ioctl.c
===
--- linux-mainline.orig/fs/btrfs/ioctl.c2010-10-26 18:03:38.0 
+0100
+++ linux-mainline/fs/btrfs/ioctl.c 2010-10-29 17:21:26.128742389 +0100
@@ -1984,6 +1984,38 @@
return 0;
 }
 
+/*
+ * Return the current status of any balance operation
+ */
+long btrfs_ioctl_balance_progress(
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_balance_progress __user *user_dest)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_progress dest;
+
+   spin_lock(fs_info-balance_info_lock);
+   if (!fs_info-balance_info) {
+   ret = -EINVAL;
+   goto error;
+   }
+
+   dest.expected = fs_info-balance_info-expected;
+   dest.completed = fs_info-balance_info-completed;
+
+   spin_unlock(fs_info-balance_info_lock);
+
+   if (copy_to_user(user_dest, dest,
+sizeof(struct btrfs_ioctl_balance_progress)))
+   return -EFAULT;
+
+   return 0;
+
+error:
+   spin_unlock(fs_info-balance_info_lock);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2017,6 +2049,8 @@
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
return btrfs_balance(root-fs_info-dev_root);
+   case BTRFS_IOC_BALANCE_PROGRESS:
+   return btrfs_ioctl_balance_progress(root-fs_info, argp);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
Index: linux-mainline/fs/btrfs/ioctl.h
===
--- linux-mainline.orig/fs/btrfs/ioctl.h2010-10-26 18:03:38.0 
+0100
+++ linux-mainline/fs/btrfs/ioctl.h 2010-10-29 17:05:44.447028825 +0100
@@ -138,6 +138,11 @@
struct btrfs_ioctl_space_info spaces[0];
 };
 
+struct btrfs_ioctl_balance_progress {
+   __u64 expected;
+   __u64 completed;
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -178,4 +183,6 @@
 #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64)
 #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \
struct btrfs_ioctl_space_args)
+#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \
+   struct btrfs_ioctl_balance_progress)
 #endif
Index: linux-mainline/fs/btrfs/volumes.c
===
--- linux-mainline.orig/fs/btrfs/volumes.c  2010-10-26 18:03:38.0 
+0100
+++ linux-mainline/fs/btrfs/volumes.c   2010-10-29 17:23:40.463279287 +0100
@@ -1902,6 +1902,7 @@
struct btrfs_root *chunk_root = dev_root-fs_info-chunk_root;
struct btrfs_trans_handle *trans;
struct btrfs_key found_key;
+   struct btrfs_balance_status *bal_info;
 
if (dev_root-fs_info-sb-s_flags  MS_RDONLY)
return -EROFS;
@@ -1909,6 +1910,18 @@
mutex_lock(dev_root-fs_info-volume_mutex);
dev_root = dev_root-fs_info-dev_root;
 
+   dev_root-fs_info-balance_info = kmalloc(
+   sizeof(struct btrfs_balance_info),
+   GFP_NOFS);
+   if (!dev_root-fs_info-balance_info) {
+   ret = -ENOSPC;
+   goto error_no_status;
+   }
+   bal_info = dev_root-fs_info-balance_info;
+   bal_info-expected = -1; /* One less than actually counted

[patch 2/2] Cancel filesystem balance.

2010-10-29 Thread Hugo Mills
This patch adds an ioctl for cancelling a btrfs balance operation
mid-flight. The ioctl simply sets a flag, and the operation terminates
after the current block group move has completed.

Signed-off-by: Hugo Mills h...@carfax.org.uk

---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/ioctl.c   |   25 +
 fs/btrfs/ioctl.h   |1 +
 fs/btrfs/volumes.c |7 ++-
 4 files changed, 33 insertions(+), 1 deletion(-)

Index: linux-mainline/fs/btrfs/ctree.h
===
--- linux-mainline.orig/fs/btrfs/ctree.h2010-10-29 17:20:43.860460761 
+0100
+++ linux-mainline/fs/btrfs/ctree.h 2010-10-29 17:24:06.622214467 +0100
@@ -806,6 +806,7 @@
 struct btrfs_balance_info {
u64 expected;
u64 completed;
+   int cancel_pending;
 };
 
 struct reloc_control;
Index: linux-mainline/fs/btrfs/ioctl.c
===
--- linux-mainline.orig/fs/btrfs/ioctl.c2010-10-29 17:21:26.128742389 
+0100
+++ linux-mainline/fs/btrfs/ioctl.c 2010-10-29 17:27:51.933043374 +0100
@@ -2016,6 +2016,29 @@
return ret;
 }
 
+/*
+ * Cancel a running balance operation
+ */
+long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info)
+{
+   int err = 0;
+
+   spin_lock(fs_info-balance_info_lock);
+   if(!fs_info-balance_info) {
+   err = -EINVAL;
+   goto error;
+   }
+   if(fs_info-balance_info-cancel_pending) {
+   err = -ECANCELED;
+   goto error;
+   }
+   fs_info-balance_info-cancel_pending = 1;
+
+error:
+   spin_unlock(fs_info-balance_info_lock);
+   return err;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2051,6 +2074,8 @@
return btrfs_balance(root-fs_info-dev_root);
case BTRFS_IOC_BALANCE_PROGRESS:
return btrfs_ioctl_balance_progress(root-fs_info, argp);
+   case BTRFS_IOC_BALANCE_CANCEL:
+   return btrfs_ioctl_balance_cancel(root-fs_info);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
Index: linux-mainline/fs/btrfs/ioctl.h
===
--- linux-mainline.orig/fs/btrfs/ioctl.h2010-10-29 17:05:44.447028825 
+0100
+++ linux-mainline/fs/btrfs/ioctl.h 2010-10-29 17:24:06.642213653 +0100
@@ -185,4 +185,5 @@
struct btrfs_ioctl_space_args)
 #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \
struct btrfs_ioctl_balance_progress)
+#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 22)
 #endif
Index: linux-mainline/fs/btrfs/volumes.c
===
--- linux-mainline.orig/fs/btrfs/volumes.c  2010-10-29 17:23:40.463279287 
+0100
+++ linux-mainline/fs/btrfs/volumes.c   2010-10-29 17:24:06.652213246 +0100
@@ -1921,6 +1921,7 @@
bal_info-expected = -1; /* One less than actually counted,
because chunk 0 is special */
bal_info-completed = 0;
+   bal_info-cancel_pending = 0;
 
/* step one make some room on all the devices */
list_for_each_entry(device, devices, dev_list) {
@@ -1983,7 +1984,7 @@
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
 
-   while (1) {
+   while (!bal_info-cancel_pending) {
ret = btrfs_search_slot(NULL, chunk_root, key, path, 0, 0);
if (ret  0)
goto error;
@@ -2024,6 +2025,10 @@
   bal_info-completed, bal_info-expected);
}
ret = 0;
+   if(bal_info-cancel_pending) {
+   printk(KERN_INFO btrfs: balance cancelled\n);
+   ret = -EINTR;
+   }
 error:
btrfs_free_path(path);
spin_lock(dev_root-fs_info-balance_info_lock);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/2] Control filesystem balances (userspace)

2010-10-29 Thread Hugo Mills
   These two patches complement the previous two kernel-side
patches. The first implements a way of displaying the current progress
of any running balance process. The second patch allows a running
balance to be cancelled.

   I'm a bit uncertain about the best name for these commands. Several
options:

1)
# btrfs filesystem progress path
# btrfs filesystem cancel path

   Way too vague (cancel *what*?)


2)
# btrfs filesystem balance-progress path
# btrfs filesystem balance-cancel path

   Clashes horribly with filesystem balance -- no abbreviations
possible.


3)
btrfs filesystem balance -p path
btrfs filesystem balance -c path

   Changes behaviour significantly on a switch, in contrast to the
behaviour of the rest of the btrfs tool.


4)
btrfs balance progress path
btrfs balance cancel path

   My current favourite, although we introduce a new namespace
(balance) for commands. We could add btrfs balance start path as
a synonym for btrfs filesystem balance path, for some degree of
consistency.

   At some point, I'll add a monitor function, which will poll at 1s
intervals for progress updates, and print out progress when it changes.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- No!  My collection of rare, incurable diseases! Violated! ---   

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] User-space tool for cancelling balance operations.

2010-10-29 Thread Hugo Mills
Add an option to the btrfs tool to use the ioctl for cancelling
balance operations.

SIgned-off-by: Hugo Mills h...@carfax.org.uk

---
 btrfs.c  |4 
 btrfs_cmds.c |   41 +
 btrfs_cmds.h |1 +
 ioctl.h  |1 +
 4 files changed, 47 insertions(+)

Index: btrfs-progs-unstable/btrfs.c
===
--- btrfs-progs-unstable.orig/btrfs.c   2010-10-30 00:19:59.968416575 +0100
+++ btrfs-progs-unstable/btrfs.c2010-10-30 00:20:38.446849736 +0100
@@ -99,6 +99,10 @@
  balance progress, path\n
Show progress of the balance operation running on path.
},
+   { do_balance_cancel, 1,
+ balance cancel, path\n
+   Cancel the balance operation running on path.
+   },
{ do_scan,
  999, device scan, [device [device..]\n
Scan all device for or the passed device for a btrfs\n
Index: btrfs-progs-unstable/btrfs_cmds.c
===
--- btrfs-progs-unstable.orig/btrfs_cmds.c  2010-10-30 00:04:48.335524683 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.c   2010-10-30 00:20:22.267508562 +0100
@@ -848,6 +848,47 @@
return 0;
 }
 
+int do_balance_cancel(int nargs, char **argv)
+{
+   char *path = argv[1];
+   int fdmnt;
+   int ret = 0;
+   int err = 0;
+
+   fdmnt = open_file_or_dir(path);
+   if(fdmnt  0) {
+   fprintf(stderr, ERROR: can't access '%s'\n, path);
+   return 12;
+   }
+
+   ret = ioctl(fdmnt, BTRFS_IOC_BALANCE_CANCEL, NULL);
+   err = errno;
+
+   if(ret) {
+   switch(err) {
+   case 0:
+   break;
+   case EINVAL:
+   fprintf(stderr, ERROR: no balance in progress.\n);
+   err = 20;
+   break;
+   case ECANCELED:
+   fprintf(stderr, ERROR: operation already 
cancelled.\n);
+   err = 21;
+   break;
+   default:
+   fprintf(stderr, ERROR: ioctl returned error '%d'.\n,
+   err);
+   err = 22;
+   break;
+   }
+   }
+
+   close(fdmnt);
+
+   return err;
+}
+
 int do_remove_volume(int nargs, char **args)
 {
 
Index: btrfs-progs-unstable/btrfs_cmds.h
===
--- btrfs-progs-unstable.orig/btrfs_cmds.h  2010-10-30 00:04:48.335524683 
+0100
+++ btrfs-progs-unstable/btrfs_cmds.h   2010-10-30 00:20:22.307506934 +0100
@@ -24,6 +24,7 @@
 int do_add_volume(int nargs, char **args);
 int do_balance(int nargs, char **argv);
 int do_balance_progress(int nargs, char **argv);
+int do_balance_cancel(int nargs, char **argv);
 int do_remove_volume(int nargs, char **args);
 int do_scan(int nargs, char **argv);
 int do_resize(int nargs, char **argv);
Index: btrfs-progs-unstable/ioctl.h
===
--- btrfs-progs-unstable.orig/ioctl.h   2010-10-30 00:04:48.325525089 +0100
+++ btrfs-progs-unstable/ioctl.h2010-10-30 00:20:22.357504895 +0100
@@ -176,4 +176,5 @@
struct btrfs_ioctl_space_args)
 #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \
struct btrfs_ioctl_balance_progress)
+#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 22)
 #endif


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/2] Balance progress monitoring.

2010-10-30 Thread Hugo Mills
On Sat, Oct 30, 2010 at 01:07:27AM +0100, Hugo Mills wrote:
 This patch introduces a basic form of progress monitoring for balance
 operations, by counting the number of block groups remaining. The
 information is exposed to userspace by an ioctl.

   Dammit. An unrefreshed quilt patch let an error get through (see
below). Updated patch in a few moments.

   Hugo.

 Index: linux-mainline/fs/btrfs/volumes.c
 ===
 --- linux-mainline.orig/fs/btrfs/volumes.c2010-10-26 18:03:38.0 
 +0100
 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100
 @@ -1902,6 +1902,7 @@
   struct btrfs_root *chunk_root = dev_root-fs_info-chunk_root;
   struct btrfs_trans_handle *trans;
   struct btrfs_key found_key;
 + struct btrfs_balance_status *bal_info;

+   struct btrfs_balance_info *bal_info;


-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- dragon A linked list is still a binary tree. Just a ---  
  very unbalanced one.   


signature.asc
Description: Digital signature


Re: Horrible btrfs performance due to fragmentation

2010-10-31 Thread Hugo Mills
On Mon, Nov 01, 2010 at 12:36:58AM +0200, Felipe Contreras wrote:
 On Mon, Nov 1, 2010 at 12:25 AM, cwillu cwi...@cwillu.com wrote:
  btrfs fi defrag isn't recursive.  btrfs filesystem defrag /home will
  defragment the space used to store the folder, without touching the
  space used to store files in that folder.
 
 Yes, that came up on the IRC, but:
 
 1) It doesn't make sense: btrfs filesystem doesn't allow a fileystem
 as argument? Why would anyone want it to be _non_ recursive?

   You missed the subsequent discussion on IRC about the interaction
of COW with defrag. Essentially, if you've got two files that are COW
copies of each other, and one has had something written to it since,
it's *impossible* for both files to be defragmented, without making a
full copy of both:

Start with a file (A, etc are data blocks on the disk):

file1 = ABCDEF

Cow copy it:

file1 = ABCDEF
file2 = ABCDEF

Now write to one of them:

file1 = ABCDEF
file2 = ABCDxF

   So, either file1 is contiguous, and file2 is fragmented (with the
block x somewhere else on disk), or file2 is contiguous, and file1 is
fragmented (with E somewhere else on disk). In fact, we've determined
by experiment that when you defrag a file that's sharing blocks with
another one, the file gets copied in its entirety, thus separating the
blocks of the file and its COW duplicate.

 2) The filesystem should not degrade performance so horribly no matter
 how long the it has been used. Even git has automatic garbage
 collection.

   Since, I believe, btrfs uses COW very heavily internally for
ensuring consistency, you can end up with fragmenting files and
directories very easily. You probably need some kind of scrubber that
goes looking for non-COW files that are fragmented, and defrags them
in the background.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- No!  My collection of rare, incurable diseases! Violated! ---   


signature.asc
Description: Digital signature


Re: [patch 1/2] Balance progress monitoring (updated)

2010-11-01 Thread Hugo Mills
On Mon, Nov 01, 2010 at 04:06:53PM +0800, liubo wrote:
 On 10/30/2010 09:39 PM, Hugo Mills wrote:
  This patch introduces a basic form of progress monitoring for balance
  operations, by counting the number of block groups remaining. The
  information is exposed to userspace by an ioctl.
  
 
 IMO, tracking the information of blocks which are balancing also makes sense. 
 For example, the block information's blocknr. 
 It can help us monitor better.

   I don't see how that will help. The block group IDs (which is all
that we get at this level) are effectively arbitrary 64-bit numbers,
and are what appear in the kernel logs. How could that information be
used to improve monitoring?

   I'm not ruling out the idea completely -- I just can't see at the
moment how it would be used.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Is a diversity twice as good as a university? ---  


signature.asc
Description: Digital signature


Re: [patch 0/2] Control filesystem balances (kernel side)

2010-11-01 Thread Hugo Mills
On Sat, Oct 30, 2010 at 07:44:35PM +0200, Goffredo Baroncelli wrote:
 On Saturday, 30 October, 2010, Hugo Mills wrote:
 One fundamental question, though -- is the progress monitor
  function best implemented as an ioctl, as I've done here, or should it
  be two or three sysfs files? I'm thinking of /proc/mdstat...
  Obviously, /proc/mdstat would never get into /sys, but exposing the
  expected and remaining values as files has an attractive
  simplicity to it.
 
 I like the idea that these info should be put under sysfs. Something like
 
 /sys/btrfs/filesystem-uuid/

/sys/fs/btrfs/uuid I think. Also:
/sys/fs/btrfs/label as a symlink to the uuid directory.

  balance  - info on balancing

For the one-value-per-file rule of sysfs, this should probably be
balance_expected and balance_completed, each holding a count of block
groups.

  devices  - list of device (a directory of
links or a file which contains 
the list of devices)
  subvolumes/ - info on subvolume(s)
  label   - label of the filesystem
  other btrfs filesystem related knoba

   The other one that struck me earlier today as being useful was
tracking the progress of a dev delete operation. But that'll come
later.

 Obviously we need another btrfs command to extract an uuid from a btrfs 
 filesystem like:
 
 # btrfs filesystem get-uuid /path/to/a/btrfs/filesystem
 f9b9c413-0dc8-4e3f-94f2-86faa702f519

   Possibly a slightly more general fi metadata with switches for
UUID and label?

# btrfs fi metadata [-u|--uuid] /path
# btrfs fi metadata [-l|--label] /path

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Is a diversity twice as good as a university? ---  


signature.asc
Description: Digital signature


Re: [PATCH v2] btrfs-progs: btrfs: implement 'start-sync' and 'wait-sync' commands

2010-11-03 Thread Hugo Mills
On Tue, Nov 02, 2010 at 07:58:27PM +0100, Goffredo Baroncelli wrote:
 On Monday, 01 November, 2010, Sage Weil wrote:
  The 'start-sync' command initiates a sync, but does not wait for it to
  complete.  A transaction is printed that can be fed to 'wait-sync', which
  will wait for it to commit.
  
  'wait-sync' can also be used in combination with 'async-snapshot' to wait
  for an async snapshot creation to commit.
  
  Updates the man page too.
  
  Signed-off-by: Sage Weil s...@newdream.net
  ---
   btrfs.c|9 +
   btrfs_cmds.c   |   49 +
   btrfs_cmds.h   |2 ++
   man/btrfs.8.in |   14 ++
   4 files changed, 74 insertions(+), 0 deletions(-)
  
  diff --git a/btrfs.c b/btrfs.c
  index 46314cf..c871f4a 100644
  --- a/btrfs.c
  +++ b/btrfs.c
  @@ -77,6 +77,15 @@ static struct Command commands[] = {
filesystem sync, path\n
  Force a sync on the filesystem path.
  },
  +   { do_start_sync, 1,
  + filesystem start-sync, path\n
  +   Start a sync on the filesystem path, and print the 
 resulting\n
  +   transaction id.
  +   },
 
 Like the command btrfs subvol snapshot, I think that it is better to add a 
 modifier instead of a new command.
 
  btrfs filesystem sync [--async]
 
 Sorry if I noticed this too late. But I don't see a valid reason to add 
 another command. From a UI point of view the meaning of the command is the 
 same, change only slight the behavior.
 
 Even tough I have to admint that sync --async sound strange. May be flush 
 is 
 better ?

   How about btrfs filesystem sync --background?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You're never alone with a rubber duck... --- 


signature.asc
Description: Digital signature


Re: RFC: exporting info via sysfs [was Re: [patch 0/2] Control filesystem balances (kernel side)]

2010-11-05 Thread Hugo Mills
   Hi, Goffredo,

On Thu, Nov 04, 2010 at 11:55:24PM +0100, Goffredo Baroncelli wrote:
 I make a prototype for exporting info from btrfs via sysfs.

   Good stuff. I was going to take a look at doing that this
weekend. :)

 Under /sys/btrfs were created two directories, named fs and devices.

 /sys/btrfs/fs/fs-uuid/

   I'm pretty sure that /sys/btrfs won't get through any discussion on
LKML. I'd suggest /sys/fs/btrfs as the base, since that's where the
other filesystems seem to put their sysfs information.

  label- filesystem label
  num_devices- total number of devices
  open_devices   - number of opened devices
  [...]
 /sys/btrfs/devices/dev-uuid/
  devid  - btrfs device number
  fsid   - filesystem uuid (fs-uuid)
  major, minor   - major minor

   I think the major, minor should instead be be a symlink to the
relevant entry in /sys/devices/...  (as done in /sys/block/*) or
/sys/block (as done in /sys/block/md*/slaves). Call it device.

  name   - device name

   Unnecessary -- and also, I think, unlikely to get through LKML
review. Putting a device name here implies that the kernel knows
better than userspace what the name of the device is (i.e. which
device node you should be using). Having the link to /sys/block/* or
/sys/devices/... as above is, I think, all that's needed here.
Userspace should be able to convert the major/minor pair kept in
/sys/fs/btrfs/devices/uuid/device/dev appropriately.

  writeable  - is the device writeable

 where fs-uuid is the filesystem uuid, and dev-uuid is the device uuid. 
 The 
 link between devices and filesystem is the fsid parameter of a device.

   Could that be made a symlink instead? That seems to be the usual
approach in sysfs.

 I create these structure because we should handle the case were the devices 
 are present (like after a btrfs device scan) but the filesystem aren't 
 mounted.

   ... ah, I see it can't. (Re: my previous comment)

 In this case the devices/ subdirectory is populated. Instead the fs/ 
 subdirectory is empty.
 
 I don't attach a patch because the code is very ugly.
 Comments ? Thoughts ?

   Is it ugly because there are significant difficulties in making
btrfs or sysfs do this, or just because you hacked something together
as quickly as possible for a demo?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold.


signature.asc
Description: Digital signature


Re: time for balance

2010-11-09 Thread Hugo Mills
B1;2401;0cOn Tue, Nov 09, 2010 at 04:09:00PM +0100, Helmut Hullen wrote:
 btrfs device add /dev/sdc1 /srv/MM
 btrfs filesystem balance /srv/MM
 
 adds /dev/sdc1 with about 1,5 TByte (df tells so), and the system  
 works the second line (balance) since about 12 hours. How much time  
 needs this balance command?

   Enough time to rewrite every piece of data in the filesystem.

   There are patches [1,2] for the kernel and userspace tools to allow
you to monitor the progress of a balance. I'll be putting out a new
revision of them either tonight or tomorrow (depending on how awkward
git is feeling).

[1] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg06558.html
[2] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg06561.html

 If the machine hangs somewhere and I have to restart it: how can/must I  
 repair the btrfs system?

   No need. Even while the balance is running, the filesystem should
remain in a consistent state (assuming that you have working
barriers). Note that if you restart the balance process, it will
effectively start from the beginning again.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Anyone using a computer to generate random numbers is, of ---
   course,  in a state of sin.   


signature.asc
Description: Digital signature


Re: labelling

2010-11-10 Thread Hugo Mills
On Wed, Nov 10, 2010 at 08:40:00AM +0100, Helmut Hullen wrote:
 Hallo, linux-btrfs,
 
 I have problems with btrfs labels.
 
 My way:
 
 2-TByte-disk:
 
 mkfs.btrs LABEL=MM2 /dev/sdd2
 
 worked.
 Mounting mount LABEL=MM2 /srv/MM worked.
 
 Additional 1.5-TByte-Disk:
 
 btrfs add device /dev/sdc3 /srv/MM
 ... balance ...
 
 worked.
 
 findfs LABEL=MM2
 
 shows /dev/sdd2 (the first partition)
 
 file -s /dev/sdd2
 file -s /dev/sdc3
 
 shows LABEL=MM2 for both partitions (that's not good).

   No, this is both good and correct. You've got a single filesystem
spanning multiple block devices. The *filesystem* possesses the label,
and with btrfs you can mount the filesystem using *any* of the block
devices that compose it, so both block devices should indeed show the
FS label, which is what's happening here.

 Unmounting /srv/MM and
 
 mount LABEL=MM2 /srv/MM
 
 doesn't work now, it tries to mount /dev/sdd2 and mourns.
 
 mount /dev/sdd2 /srv/MM
 
 shows the same error message,

   What's the error message? What do you get in your kernel logs when
you do this? This should work, so there's something wrong, but it's
(probably) not to do with disk labels.

 mount /dev/sdc3 /srv/MM
 
 (mounting the added partition) works fine, the whole space is available.
 
 But what can I do with the 2 identical labels? How can I delete (or  
 change) the label of the first btrfs partition?

   You can't, as I explained above.

 
 
 By the way:
 
 df shows about 3.4 TByte usable space (2 TByte and 1.5 TByte), but
 
 btrfs filesystem df /srv/MM
 
 tells
 
 Data: total=2.70TB, used=1.64TB
 
 I'm missing about 0.7 TByte!

   In btrfs filesystem df, the total field is the space that has
been allocated to block groups. As more space is needed on the
filesystem, the total field will increase to use up the additional
raw storage (if you're using RAID1 or RAID10, this will be at a ratio
of 2:1; with RAID0 or simple allocation, the ratio is 1:1).

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- In event of Last Trump,  please form an orderly queue ---  
  and await judgement.   


signature.asc
Description: Digital signature


Re: Unhelpful error message from btrfs tool

2010-11-11 Thread Hugo Mills
On Thu, Nov 11, 2010 at 09:32:06PM +0100, Goffredo Baroncelli wrote:
 On Thursday, 11 November, 2010, Josh Berry wrote:
  Hi,
  
  I have a cron script that runs periodically, taking new snapshots and
  cleaning up old ones when space gets low on my filesystem.  This
  morning, the script suddenly stopped being able to remove snapshots.
  When I tried to remove one manually, I got the following:
  
  # btrfs subvol del 2010-11-07-01:17:01
  Delete subvolume '/btrfs/snapshot/2010-11-07-01:17:01'
  ERROR: cannot delete '/btrfs/snapshot/2010-11-07-01:17:01'
  
  There is nothing in dmesg or in the above output to tell me what the
  problem is, how to fix it, etc.  I'm running kernel 2.6.36, and I
  updated btrfs-progs-unstable to the lastest Git revision
  (1b444cd2e6...), with the same result.
  
  How do I diagnose this issue?  I'm not even sure where to start.
 
 This error is due to a failure during the ioctl. Could you strace btrfs 
 subvolume delete ?
 
 # strace btrfs subvolume delete /btrfs/snapshot/2010-11-07-01:17:01
 
 Definetely we need a more verbose error handling, about the return of the 
 ioctl.

   I've made a good start on doing that. I'll try to finish it off
over the weekend.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The future isn't what it used to be. ---   


signature.asc
Description: Digital signature


[PATCH v2 3/3] User-space tool for cancelling balance operations.

2010-11-11 Thread Hugo Mills
Add an option to the btrfs tool to use the ioctl for cancelling
balance operations.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c  |4 
 btrfs_cmds.c |   41 +
 btrfs_cmds.h |1 +
 ioctl.h  |1 +
 4 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 0b6186c..93f7886 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -103,6 +103,10 @@ static struct Command commands[] = {
  balance progress, [-m|--monitor] path\n
Show progress of the balance operation running on path.
},
+   { do_balance_cancel, 1,
+ balance cancel, path\n
+   Cancel the balance operation running on path.
+   },
{ do_scan,
  999, device scan, [device [device..]\n
Scan all device for or the passed device for a btrfs\n
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index c681b5a..d246a8b 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -922,6 +922,47 @@ int do_balance_progress(int argc, char **argv)
return 0;
 }
 
+int do_balance_cancel(int nargs, char **argv)
+{
+   char *path = argv[1];
+   int fdmnt;
+   int ret = 0;
+   int err = 0;
+
+   fdmnt = open_file_or_dir(path);
+   if(fdmnt  0) {
+   fprintf(stderr, ERROR: can't access '%s'\n, path);
+   return 12;
+   }
+
+   ret = ioctl(fdmnt, BTRFS_IOC_BALANCE_CANCEL, NULL);
+   err = errno;
+
+   if(ret) {
+   switch(err) {
+   case 0:
+   break;
+   case EINVAL:
+   fprintf(stderr, ERROR: no balance in progress.\n);
+   err = 20;
+   break;
+   case ECANCELED:
+   fprintf(stderr, ERROR: operation already 
cancelled.\n);
+   err = 21;
+   break;
+   default:
+   fprintf(stderr, ERROR: ioctl returned error '%d'.\n,
+   err);
+   err = 22;
+   break;
+   }
+   }
+
+   close(fdmnt);
+
+   return err;
+}
+
 int do_remove_volume(int nargs, char **args)
 {
 
diff --git a/btrfs_cmds.h b/btrfs_cmds.h
index 47b0a27..5cb0d9c 100644
--- a/btrfs_cmds.h
+++ b/btrfs_cmds.h
@@ -24,6 +24,7 @@ int do_show_filesystem(int nargs, char **argv);
 int do_add_volume(int nargs, char **args);
 int do_balance(int nargs, char **argv);
 int do_balance_progress(int nargs, char **argv);
+int do_balance_cancel(int nargs, char **argv);
 int do_remove_volume(int nargs, char **args);
 int do_scan(int nargs, char **argv);
 int do_resize(int nargs, char **argv);
diff --git a/ioctl.h b/ioctl.h
index 888ceb9..1fc665b 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -176,4 +176,5 @@ struct btrfs_ioctl_balance_progress {
struct btrfs_ioctl_space_args)
 #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \
struct btrfs_ioctl_balance_progress)
+#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 26)
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] Balance progress monitoring.

2010-11-11 Thread Hugo Mills
This patch introduces a basic form of progress monitoring for balance
operations, by counting the number of block groups remaining. The
information is exposed to userspace by an ioctl.

We also add btrfs balance start as an alias for btrfs filesystem
balance, so that all balance-related functions are available under
one prefix.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c|8 +++
 btrfs_cmds.c   |   60 
 btrfs_cmds.h   |1 +
 ioctl.h|7 ++
 man/btrfs.8.in |7 ++
 5 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 46314cf..0b6186c 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -95,6 +95,14 @@ static struct Command commands[] = {
  filesystem balance, path\n
Balance the chunks across the device.
},
+   { do_balance, 1,
+ balance start, path\n
+   Synonym for \btrfs filesystem balance\.
+   },
+   { do_balance_progress, -1,
+ balance progress, [-m|--monitor] path\n
+   Show progress of the balance operation running on path.
+   },
{ do_scan,
  999, device scan, [device [device..]\n
Scan all device for or the passed device for a btrfs\n
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 8031c58..2745d64 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -28,6 +28,7 @@
 #include limits.h
 #include uuid/uuid.h
 #include ctype.h
+#include getopt.h
 
 #undef ULONG_MAX
 
@@ -776,6 +777,65 @@ int do_balance(int argc, char **argv)
}
return 0;
 }
+
+int get_balance_progress(char *path, struct btrfs_ioctl_balance_progress *bal)
+{
+   int fdmnt;
+   int ret = 0;
+   int err = 0;
+
+   fdmnt = open_file_or_dir(path);
+   if(fdmnt  0) {
+   return -1;
+   }
+
+   ret = ioctl(fdmnt, BTRFS_IOC_BALANCE_PROGRESS, bal);
+   if(ret)
+   err = errno;
+   close(fdmnt);
+
+   return err;
+}
+
+int do_balance_progress(int argc, char **argv)
+{
+   char *path;
+   int ret = 0;
+   int err = 0;
+   struct btrfs_ioctl_balance_progress bal;
+
+   path = argv[1];
+
+   ret = get_balance_progress(path, bal);
+   if (!ret)
+   printf(\r%llu/%llu block groups moved, 
+  %0.2f%% complete.\n,
+  bal.completed,
+  bal.expected,
+  (float)bal.completed/bal.expected*100.0);
+
+   switch(ret) {
+   case 0:
+   break;
+   case -1:
+   fprintf(stderr, ERROR: can't access '%s'\n, path);
+   return 13;
+   case EINVAL:
+   if (!monitor) {
+   fprintf(stderr,
+   No balance operation running on '%s'.\n,
+   path);
+   return 20;
+   }
+   break;
+   default:
+   fprintf(stderr, ERROR: ioctl returned error %d., err);
+   return 21;
+   }
+
+   return 0;
+}
+
 int do_remove_volume(int nargs, char **args)
 {
 
diff --git a/btrfs_cmds.h b/btrfs_cmds.h
index 7bde191..47b0a27 100644
--- a/btrfs_cmds.h
+++ b/btrfs_cmds.h
@@ -23,6 +23,7 @@ int do_defrag(int argc, char **argv);
 int do_show_filesystem(int nargs, char **argv);
 int do_add_volume(int nargs, char **args);
 int do_balance(int nargs, char **argv);
+int do_balance_progress(int nargs, char **argv);
 int do_remove_volume(int nargs, char **args);
 int do_scan(int nargs, char **argv);
 int do_resize(int nargs, char **argv);
diff --git a/ioctl.h b/ioctl.h
index 776d7a9..888ceb9 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -132,6 +132,11 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_space_info spaces[0];
 };
 
+struct btrfs_ioctl_balance_progress {
+   __u64 expected;
+   __u64 completed;
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -169,4 +174,6 @@ struct btrfs_ioctl_space_args {
 #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64)
 #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \
struct btrfs_ioctl_space_args)
+#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \
+   struct btrfs_ioctl_balance_progress)
 #endif
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 26ef982..69d8613 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -21,6 +21,8 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBfilesystem resize\fP\fI [+/\-]size[gkm]|max filesystem\fP
 .PP
+\fBbtrfs\fP \fBbalance progress\fP \fIpath\fP
+.PP
 \fBbtrfs\fP \fBdevice scan\fP\fI [device [device..]]\fP
 .PP
 \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP
@@ -148,6 +150,11 @@ Balance the chunks

[PATCH v2 0/3] Balance management, userspace side

2010-11-11 Thread Hugo Mills
   These three patches complement the previous two kernel-side
patches. The first implements a way of displaying the current progress
of any running balance process. The second adds a monitor mode,
which watches the progress and makes an estimate of the completion
time. The third and final patch allows a running balance to be
cancelled.

Hugo Mills (3):
  Balance progress monitoring.
  Add --monitor option to btrfs balance progress.
  User-space tool for cancelling balance operations.

 btrfs.c|   12 
 btrfs_cmds.c   |  187 
 btrfs_cmds.h   |2 +
 ioctl.h|8 +++
 man/btrfs.8.in |7 ++
 5 files changed, 216 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] Balance progress monitoring.

2010-11-11 Thread Hugo Mills
This patch introduces a basic form of progress monitoring for balance
operations, by counting the number of block groups remaining. The
information is exposed to userspace by an ioctl.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 fs/btrfs/ctree.h   |9 +++
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |   34 +
 fs/btrfs/ioctl.h   |7 ++
 fs/btrfs/volumes.c |   61 ++-
 5 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8db9234..67fb603 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -841,6 +841,11 @@ struct btrfs_block_group_cache {
struct list_head cluster_list;
 };
 
+struct btrfs_balance_info {
+   u64 expected;
+   u64 completed;
+};
+
 struct reloc_control;
 struct btrfs_device;
 struct btrfs_fs_devices;
@@ -1050,6 +1055,10 @@ struct btrfs_fs_info {
unsigned metadata_ratio;
 
void *bdev_holder;
+
+   /* Keep track of any rebalance operations on this FS */
+   spinlock_t balance_info_lock;
+   struct btrfs_balance_info *balance_info;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b40dfe4..87d9315 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1590,6 +1590,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
spin_lock_init(fs_info-ref_cache_lock);
spin_lock_init(fs_info-fs_roots_radix_lock);
spin_lock_init(fs_info-delayed_iput_lock);
+   spin_lock_init(fs_info-balance_info_lock);
 
init_completion(fs_info-kobj_unregister);
fs_info-tree_root = tree_root;
@@ -1615,6 +1616,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info-sb = sb;
fs_info-max_inline = 8192 * 1024;
fs_info-metadata_ratio = 0;
+   fs_info-balance_info = NULL;
 
fs_info-thread_pool_size = min_t(unsigned long,
  num_online_cpus() + 2, 8);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 463d91b..c247985 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2220,6 +2220,38 @@ static noinline long btrfs_ioctl_wait_sync(struct file 
*file, void __user *argp)
return btrfs_wait_for_commit(root, transid);
 }
 
+/*
+ * Return the current status of any balance operation
+ */
+long btrfs_ioctl_balance_progress(
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_balance_progress __user *user_dest)
+{
+   int ret = 0;
+   struct btrfs_ioctl_balance_progress dest;
+
+   spin_lock(fs_info-balance_info_lock);
+   if (!fs_info-balance_info) {
+   ret = -EINVAL;
+   goto error;
+   }
+
+   dest.expected = fs_info-balance_info-expected;
+   dest.completed = fs_info-balance_info-completed;
+
+   spin_unlock(fs_info-balance_info_lock);
+
+   if (copy_to_user(user_dest, dest,
+sizeof(struct btrfs_ioctl_balance_progress)))
+   return -EFAULT;
+
+   return 0;
+
+error:
+   spin_unlock(fs_info-balance_info_lock);
+   return ret;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2255,6 +2287,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_rm_dev(root, argp);
case BTRFS_IOC_BALANCE:
return btrfs_balance(root-fs_info-dev_root);
+   case BTRFS_IOC_BALANCE_PROGRESS:
+   return btrfs_ioctl_balance_progress(root-fs_info, argp);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 17c99eb..b2103b2 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -145,6 +145,11 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_space_info spaces[0];
 };
 
+struct btrfs_ioctl_balance_progress {
+   __u64 expected;
+   __u64 completed;
+};
+
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
   struct btrfs_ioctl_vol_args)
 #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \
@@ -189,4 +194,6 @@ struct btrfs_ioctl_space_args {
 #define BTRFS_IOC_WAIT_SYNC  _IOW(BTRFS_IOCTL_MAGIC, 22, __u64)
 #define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \
   struct btrfs_ioctl_async_vol_args)
+#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \
+ struct btrfs_ioctl_balance_progress)
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 91851b5..f00edc1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1904,6 +1904,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
struct btrfs_root *chunk_root = dev_root-fs_info-chunk_root;
struct btrfs_trans_handle *trans;
struct btrfs_key found_key;
+   struct btrfs_balance_info *bal_info

[PATCH v2 0/2] Balance management, kernel side

2010-11-11 Thread Hugo Mills
   These two patches give a degree of control over balance operations.
The first makes it possible to get an idea of how much work remains to
do, by tracking the number of block groups (chunks) that need to be
moved/rewritten. The second patch allows a running balance operation
to be cancelled when the current block group has been moved.

   Since the last version, I've added some more locking (assigning to
a u64 isn't atomic on non-64-bit architectures). I've not added the
sysfs bits, as I haven't had a chance to try out Goffredo's sysfs code
yet. I've also not implemented liubo's suggestion of tracking the
current block group ID (I'll take that discussion up with him
separately -- basically it's not a good fit with the polling method
required by this ioctl).

Hugo Mills (2):
  Balance progress monitoring.
  Cancel filesystem balance.

 fs/btrfs/ctree.h   |   10 
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/ioctl.c   |   62 
 fs/btrfs/ioctl.h   |8 ++
 fs/btrfs/volumes.c |   66 ++-
 5 files changed, 146 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] Add --monitor option to btrfs balance progress.

2010-11-11 Thread Hugo Mills
For the impatient, this patch introduces the pot-watching --monitor
option, which checks the balance progress at regular intervals, and
updates a single status line with the current progress and an
estimated completion time.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs_cmds.c   |  102 +++
 man/btrfs.8.in |4 +-
 2 files changed, 96 insertions(+), 10 deletions(-)

diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 2745d64..c681b5a 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -797,22 +797,108 @@ int get_balance_progress(char *path, struct 
btrfs_ioctl_balance_progress *bal)
return err;
 }
 
+const struct option progress_options[] = {
+   { monitor, 0, NULL, 'm' },
+   { NULL, 0, NULL, 0 }
+};
+
 int do_balance_progress(int argc, char **argv)
 {
char *path;
int ret = 0;
int err = 0;
struct btrfs_ioctl_balance_progress bal;
+   __u64 last_completed = -1;
+   __u64 initial_completed = -1;
+   struct timeval now;
+   struct timeval started;
+   int monitor = 0;
+
+   optind = 1;
+   while(1) {
+   int c = getopt_long(argc, argv, m, progress_options, NULL);
+   if (c  0)
+   break;
+   switch(c) {
+   case 'm':
+   monitor = 1;
+   break;
+   default:
+   fprintf(stderr, Invalid arguments for balance 
progress\n);
+   free(argv);
+   return 1;
+   }
+   }
+
+   if(optind = argc) {
+   fprintf(stderr, No filesystem path given for progress\n);
+   return 1;
+   }
 
-   path = argv[1];
+   path = argv[optind];
+   do {
+   int prs = 0;
 
-   ret = get_balance_progress(path, bal);
-   if (!ret)
-   printf(\r%llu/%llu block groups moved, 
-  %0.2f%% complete.\n,
-  bal.completed,
-  bal.expected,
-  (float)bal.completed/bal.expected*100.0);
+   ret = get_balance_progress(path, bal);
+   if (ret)
+   break;
+
+   if (last_completed != bal.completed) {
+   printf(\r%llu/%llu block groups moved, 
+  %0.2f%% complete.,
+  bal.completed,
+  bal.expected,
+  (float)bal.completed/bal.expected*100.0);
+   }
+
+   if (initial_completed != -1
+initial_completed != bal.completed) {
+   ret = gettimeofday(now, NULL);
+   if (ret) {
+   fprintf(stderr, Can't read current time\n);
+   return 22;
+   }
+   /* Seconds per block */
+   float rate = (float)(now.tv_sec - started.tv_sec)
+   / (bal.completed - initial_completed);
+   int secs_remaining = rate
+   * (bal.expected - bal.completed);
+   printf( Time remaining);
+   if (secs_remaining = 60*60*24) {
+   printf( %dd, secs_remaining / (60*60*24));
+   secs_remaining %= 60*60*24;
+   prs = 1;
+   }
+   if (prs || secs_remaining = 60*60) {
+   printf( %dh, secs_remaining / (60*60));
+   secs_remaining %= 60*60;
+   prs = 1;
+   }
+   if (prs || secs_remaining  60) {
+   printf( %dm, secs_remaining / 60);
+   secs_remaining %= 60;
+   }
+   printf( %ds\x1b[K, secs_remaining);
+   }
+
+   if (last_completed != -1  last_completed != bal.completed) {
+   initial_completed = bal.completed;
+   ret = gettimeofday(started, NULL);
+   if (ret) {
+   fprintf(stderr, Can't read current time\n);
+   return 22;
+   }
+   }
+
+   last_completed = bal.completed;
+
+   if (monitor) {
+   fflush(stdout);
+   sleep(1);
+   } else {
+   printf(\n);
+   }
+   } while(monitor);
 
switch(ret) {
case 0:
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 69d8613..3f7642e 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -21,7 +21,7 @@ btrfs \- control a btrfs filesystem

[PATCH v2 2/2] Cancel filesystem balance.

2010-11-11 Thread Hugo Mills
This patch adds an ioctl for cancelling a btrfs balance operation
mid-flight. The ioctl simply sets a flag, and the operation terminates
after the current block group move has completed.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/ioctl.c   |   28 
 fs/btrfs/ioctl.h   |3 ++-
 fs/btrfs/volumes.c |7 ++-
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 67fb603..5fa7163 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -844,6 +844,7 @@ struct btrfs_block_group_cache {
 struct btrfs_balance_info {
u64 expected;
u64 completed;
+   int cancel_pending;
 };
 
 struct reloc_control;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c247985..7e38856 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2252,6 +2252,32 @@ error:
return ret;
 }
 
+/*
+ * Cancel a running balance operation
+ */
+long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info)
+{
+   int err = 0;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   spin_lock(fs_info-balance_info_lock);
+   if(!fs_info-balance_info) {
+   err = -EINVAL;
+   goto error;
+   }
+   if(fs_info-balance_info-cancel_pending) {
+   err = -ECANCELED;
+   goto error;
+   }
+   fs_info-balance_info-cancel_pending = 1;
+
+error:
+   spin_unlock(fs_info-balance_info_lock);
+   return err;
+}
+
 long btrfs_ioctl(struct file *file, unsigned int
cmd, unsigned long arg)
 {
@@ -2289,6 +2315,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_balance(root-fs_info-dev_root);
case BTRFS_IOC_BALANCE_PROGRESS:
return btrfs_ioctl_balance_progress(root-fs_info, argp);
+   case BTRFS_IOC_BALANCE_CANCEL:
+   return btrfs_ioctl_balance_cancel(root-fs_info);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index b2103b2..76ae121 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -195,5 +195,6 @@ struct btrfs_ioctl_balance_progress {
 #define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \
   struct btrfs_ioctl_async_vol_args)
 #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 25, \
- struct btrfs_ioctl_balance_progress)
+   struct btrfs_ioctl_balance_progress)
+#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 26)
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f00edc1..64b2f04 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1924,6 +1924,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
bal_info-expected = -1; /* One less than actually counted,
because chunk 0 is special */
bal_info-completed = 0;
+   bal_info-cancel_pending = 0;
spin_unlock(dev_root-fs_info-balance_info_lock);
 
/* step one make some room on all the devices */
@@ -1989,7 +1990,7 @@ int btrfs_balance(struct btrfs_root *dev_root)
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
 
-   while (1) {
+   while (!bal_info-cancel_pending) {
ret = btrfs_search_slot(NULL, chunk_root, key, path, 0, 0);
if (ret  0)
goto error;
@@ -2029,6 +2030,10 @@ int btrfs_balance(struct btrfs_root *dev_root)
   bal_info-completed, bal_info-expected);
}
ret = 0;
+   if(bal_info-cancel_pending) {
+   printk(KERN_INFO btrfs: balance cancelled\n);
+   ret = -EINTR;
+   }
 error:
btrfs_free_path(path);
spin_lock(dev_root-fs_info-balance_info_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Hugo Mills
On Fri, Nov 12, 2010 at 03:28:08PM +1100, Chris Samuel wrote:
 On 12/11/10 12:33, Li Zefan wrote:
 
  Is there any blocker that prevents us from canceling balance
  by just Ctrl+C ?
 
 Given that there's been at least 1 report of it taking 12 hours
 to balance a non-trivial amount of data I suspect putting this
 operation into the background by default and having the cancel
 option might be a better plan.

   Only 12 hours? Last time I tried it, it took 19. :)

   It would certainly be easy enough to fork a copy of the userspace
tool to run the ioctl in the background. Probably a little more work
to make the balance a kernel thread. I'd prefer the former, for
ease of implementation.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no.  3: Military Intelligence ---   


signature.asc
Description: Digital signature


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Hugo Mills
On Fri, Nov 12, 2010 at 11:36:55AM +, Hugo Mills wrote:
 On Fri, Nov 12, 2010 at 03:28:08PM +1100, Chris Samuel wrote:
  On 12/11/10 12:33, Li Zefan wrote:
  
   Is there any blocker that prevents us from canceling balance
   by just Ctrl+C ?
  
  Given that there's been at least 1 report of it taking 12 hours
  to balance a non-trivial amount of data I suspect putting this
  operation into the background by default and having the cancel
  option might be a better plan.
 
Only 12 hours? Last time I tried it, it took 19. :)
 
It would certainly be easy enough to fork a copy of the userspace
 tool to run the ioctl in the background. Probably a little more work
 to make the balance a kernel thread. I'd prefer the former, for
 ease of implementation.

   How's this?


This patch makes a balance operation fork and detach from the current
terminal, to run the userspace side of the balance in the background.

Introduce a --wait switch so that a synchronous balance can be done if
the user requires.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 btrfs.c|8 
 btrfs_cmds.c   |   56 +---
 man/btrfs.8.in |2 +-
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 93f7886..7b42658 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -91,12 +91,12 @@ static struct Command commands[] = {
  filesystem df, path\n
Show space usage information for a mount point\n.
},
-   { do_balance, 1,
- filesystem balance, path\n
+   { do_balance, -1,
+ filesystem balance, [-w|--wait] path\n
Balance the chunks across the device.
},
-   { do_balance, 1,
- balance start, path\n
+   { do_balance, -1,
+ balance start, [-w|--wait] path\n
Synonym for \btrfs filesystem balance\.
},
{ do_balance_progress, -1,
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index d246a8b..13be603 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -754,12 +754,41 @@ int do_add_volume(int nargs, char **args)
 
 }
 
+const struct option balance_options[] = {
+   { wait, 0, NULL, 'w' },
+   { NULL, 0, NULL, 0 }
+};
+
 int do_balance(int argc, char **argv)
 {
-
int fdmnt, ret=0;
+   int background = 1;
struct btrfs_ioctl_vol_args args;
-   char*path = argv[1];
+   char *path;
+   int ttyfd;
+
+   optind = 1;
+   while(1) {
+   int c = getopt_long(argc, argv, w, balance_options, NULL);
+   if (c  0)
+   break;
+   switch(c) {
+   case 'w':
+   background = 0;
+   break;
+   default:
+   fprintf(stderr, Invalid arguments for balance\n);
+   free(argv);
+   return 1;
+   }
+   }
+
+   if(optind = argc) {
+   fprintf(stderr, No filesystem path given for balance\n);
+   return 1;
+   }
+
+   path = argv[optind];
 
fdmnt = open_file_or_dir(path);
if (fdmnt  0) {
@@ -767,8 +796,29 @@ int do_balance(int argc, char **argv)
return 12;
}
 
+   if (background) {
+   int pid = fork();
+   if (pid == 0) {
+   /* We're in the child, and can run in the background */
+   ttyfd = open(/dev/tty, O_RDWR);
+   if (ttyfd  0)
+   ioctl(ttyfd, TIOCNOTTY, 0);
+   /* Fall through to the BTRFS_IOC_BALANCE ioctl */
+   } else if (pid  0) {
+   /* We're in the parent, and the fork succeeded */
+   printf(Background balance started\n);
+   return 0;
+   } else {
+   /* We're in the parent, and the fork failed */
+   fprintf(stderr, ERROR: can't start background process 
-- %s\n,
+   strerror(errno));
+   }
+   }
+
memset(args, 0, sizeof(args));
-   ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, args);
+   printf(ioctl\n);
+   sleep(60);
+   /* ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, args); */
close(fdmnt);
if(ret0){
fprintf(stderr, ERROR: balancing '%s'\n, path);
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 3f7642e..1410aaa 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -27,7 +27,7 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBdevice show\fP\fI dev|label [dev|label...]\fP
 .PP
-\fBbtrfs\fP \fBdevice balance\fP\fI path \fP
+\fBbtrfs\fP \fBdevice balance\fP [\fB-w\fP|\fB--wait\fP] \fIpath\fP
 .PP
 \fBbtrfs\fP \fBdevice add\fP\fI dev [dev..] path \fP
 .PP

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from

Re: my mail

2010-11-12 Thread Hugo Mills
On Fri, Nov 12, 2010 at 07:33:57PM +, h...@carfax.org.uk wrote:
 From 2de353ddda78ef5cbc84e1d3267606bc44e48faa Mon Sep 17 00:00:00 2001

   Gaah. This worked last night. Sorry. :(

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You got very nice eyes, Deedee. Never noticed them ---   
   before. They real?   


signature.asc
Description: Digital signature


Re: Update to Project_ideas wiki page

2010-11-17 Thread Hugo Mills
On Tue, Nov 16, 2010 at 10:19:45PM -0500, Chris Ball wrote:
 Hi,
 
 Chris Mason has posted a bunch of interesting updates to the
 Project_ideas wiki page.  If you're interested in working on any
 of these, feel free to speak up and ask for more information if
 you need it.  Here are the new sections, for the curious:
 
 == Block group reclaim ==
 
 The split between data and metadata block groups means that we
 sometimes have mostly empty block groups dedicated to only data or
 metadata.  As files are deleted, we should be able to reclaim these
 and put the space back into the free space pool.
 
 We also need rebalancing ioctls that focus only on specific raid
 levels.

 == Changing RAID levels ==
 
 We need ioctls to change between different raid levels.  Some of these
 are quite easy -- e.g. for RAID0 to RAID1, we just halve the available
 bytes on the fs, then queue a rebalance.

   I would be interested in the rebalancing ioctls, and in RAID level
management. I'm still very much trying to learn the basics, though, so
I may go very slowly at first...

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- We demand rigidly defined areas of doubt and uncertainty! ---


signature.asc
Description: Digital signature


Re: Update to Project_ideas wiki page

2010-11-17 Thread Hugo Mills
On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote:
 Can I suggest we combine this new RAID level management with a
 modernisation of the terminology for storage redundancy, as has been
 discussed previously in the Raid1 with 3 drives thread of March this
 year? I.e. abandon the burdened raid* terminology in favour of
 something that makes more sense for a filesystem.

   Well, our current RAID modes are:

 * 1 Copy (SINGLE)
 * 2 Copies (DUP)
 * 2 Copies, different spindles (RAID1)
 * 1 Copy, 2 Stripes (RAID0)
 * 2 Copies, 2 Stripes [each] (RAID10)

   The forthcoming RAID5/6 code will expand on that, with

 * 1 Copy, n Stripes + 1 Parity (RAID5)
 * 1 Copy, n Stripes + 2 Parity (RAID6)

   (I'm not certain how n will be selected -- it could be a config
option, or simply selected on the basis of the number of
spindles/devices currently in the FS).

   We could further postulate a RAID50/RAID60 mode, which would be

 * 2 Copies, n Stripes + 1 Parity
 * 2 Copies, n Stripes + 2 Parity

   For brevity, we could collapse these names down to: 1C, 2C, 2CR,
1C2S, 2C2S, 1CnS1P, 1CnS2P, 2CnS1P, 2CnS2P. However, that's probably a
bit too condensed for useful readability. I'd support some set of
terms based on this taxonomy, though, as it's fairly extensible, and
tells you the details of the duplication strategy in question.

 Mostly this would involve a discussion about what terms would make
 most sense, though some changes in the behaviour of btrfs redundancy
 modes may be warranted if they make things more intuitive.

   Consider the above a first suggestion. :)

 I could help you make these changes in your patches, or write my own
 patches against yours, though I'm also completely new to kernel
 development.

   Probably best to keep the kernel internals unchanged for this
particular issue, as they don't make much difference to the naming,
but patches to the userspace side of things (mkfs.btrfs and btrfs fi
df specifically) should be fairly straightforward.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- gdb The enemy have elected for Death by Powerpoint.  That's ---  
  what they shall get.   


signature.asc
Description: Digital signature


Re: Update to Project_ideas wiki page

2010-11-17 Thread Hugo Mills
On Wed, Nov 17, 2010 at 07:14:47PM +0100, Andreas Philipp wrote:
 On 17.11.2010 18:56, Hugo Mills wrote:
  On Wed, Nov 17, 2010 at 04:12:29PM +0100, Bart Noordervliet wrote:
  Can I suggest we combine this new RAID level management with a
  modernisation of the terminology for storage redundancy, as has been
  discussed previously in the Raid1 with 3 drives thread of March this
  year? I.e. abandon the burdened raid* terminology in favour of
  something that makes more sense for a filesystem.
 
  Well, our current RAID modes are:
 
  * 1 Copy (SINGLE)
  * 2 Copies (DUP)
  * 2 Copies, different spindles (RAID1)
  * 1 Copy, 2 Stripes (RAID0)
  * 2 Copies, 2 Stripes [each] (RAID10)
 
  The forthcoming RAID5/6 code will expand on that, with
 
  * 1 Copy, n Stripes + 1 Parity (RAID5)
  * 1 Copy, n Stripes + 2 Parity (RAID6)
 
  (I'm not certain how n will be selected -- it could be a config
  option, or simply selected on the basis of the number of
  spindles/devices currently in the FS).
 Just one question on small n: If one has N = 3*k = 6 spindles, then
 RAID5 with n = N/2-1 results in something like RAID50? So having an
 option for small n might realize RAID50 given the right choice for n.

   I see what you're getting at, but actually, that would just be
RAID-5 with small n. It merely happens to spread chunks out over more
spindles than the minimum n+1 required to give you what you asked for.
(See the explanation below for why).

  We could further postulate a RAID50/RAID60 mode, which would be
 
  * 2 Copies, n Stripes + 1 Parity
  * 2 Copies, n Stripes + 2 Parity
 Isn't this RAID51/RAID61 (or 15/16 unsure on how to put) and would
 RAID50/RAID60 correspond to

   Errr... yes, you're right. My mistake. Although... again, see the
conclusion below. :)

 * 2 Stripes, n Stripes + 1 Parity
 * 2 Stripes, n Stripes + 2 Parity

   I'm not sure talking about RAID50-like things (as you state above)
makes much sense, given the internal data structures that btrfs uses:

   As far as I know(*), data is firstly allocated in chunks of about
1GiB per device. Chunks are grouped together to give you replication.
So, for a RAID-0 or RAID-1 arrangement, chunks are allocated in pairs,
picked from different devices. For RAID-10, they're allocated in
quartets, again on different devices. For RAID-5, they'd be allocated
in groups of n+1. For RAID-61, we'd use 2n+4 chunks in an allocation.

   For replication strategies where it matters (anything other than
DUP, SINGLE, RAID-1 so far), the chunks are then subdivided into
stripes of a fixed width. Data written to the disk is spread across
the stripes in an appropriate manner.

   From this point of view, RAID50 and RAID51 look much the same,
unless the stripe size for the 5 is different to the stripe size for
the 0 or 1. I'm not sure that's the case. If the stripe sizes are
the same, you'll basically get the same layout of data across the 2n+2
chunks -- it's just that (possibly) the internal labels of the chunks
which indicate which bit of data they're holding in the pattern will
be different.

   Hugo.

(*) I could be wrong, hopefully someone will correct me if so.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- A cross? Oy vey, have you picked the wrong vampire! ---   


signature.asc
Description: Digital signature


Re: A little confused about what remains to make a stable release

2010-11-17 Thread Hugo Mills
On Wed, Nov 17, 2010 at 02:27:39PM -0800, Daniel Farina wrote:
 I have been tracking the development of btrfs for some time, as the
 built-in support for snapshotting would be of great convenience for
 relational database use cases. I have been crawling the wiki
 (especially the FAQ), but I still don't have a clear sense of what's
 left *besides* the need for a 'fsck' utility that can be called
 absolutely vital.
 
 That, and testing and bug reports.

   This question (and answer) should probably go in the FAQ...

   Nobody is going to magically stick a label on btrfs and say it's
stable now! Software -- particularly something as complex as this --
just doesn't work that way.

   It's stable *for you* when it functions with the workloads *you*
expect of it, with a failure rate that is acceptable *to you*.

   For what I'm using it for right now, it's already stable by that
definition, *for me*.

 From the materials I've been able to find, it's hard for me to get a
 sense of how one could assist the project towards being recommended
 for general use; do the denizens of this list has a sense of what
 those things might be? (Or a link?)

   The primary things you can do: Use it, test it, file bug reports.
Do this with, as close as you can, the use-cases (IOPs/s, feature
uses, data sizes) that you want to use it for.

   Beyond that: Fix bugs. Add the features that you think are
important. Add features that other people think are important (see the
Project Ideas page on the wiki for the latter).

   Hugo.

/2-penn'orth

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The trouble with you, Ibid, is you think you know everything. ---  
 


signature.asc
Description: Digital signature


Re: A little confused about what remains to make a stable release

2010-11-18 Thread Hugo Mills
On Wed, Nov 17, 2010 at 05:46:30PM -0700, Anthony Roberts wrote:
It's stable *for you* when it functions with the workloads *you*
 expect of it, with a failure rate that is acceptable *to you*.
 
 I think there's a few ancillary things like a working fsck needed
 before it can even be recommended for widespread use, even to users
 willing to risk any residual bugs. IIRC at this point the utilities
 don't even aspire to provide basic recovery functionality (though
 Chris has posted that fsck is coming).
 
 Beyond that, the management capabilities at this point don't look
 ready for long term use in a production environment. By this I
 mean adding/removing disks,

   That much is already there and working.

 reshaping arrays, etc. Without that I
 might use BTRFS on top of LVM/RAID just like any other filesystem,
 and there's features I'm looking forward to even if I that's all
 I can do, but without robust management features there's certain
 environments where it just doesn't make sense yet.

   What do you think is missing? Could you create and maintain a
wishlist page on the wiki[1], and populate it with all the things that
people need for production use? (This is an ongoing task -- track
what's actually finished and remove it; track what's currently being
worked on and mark it as such; keep an eye on discussions on the
mailing list for things that people need...)

 There's one or two other things I'm keeping an eye on. That
 limitation on the number of hardlinks you can have in a directory
 is kinda irksome. Also, dedup needs a way to verify/dedup safely
 before people can start doing stuff like deduping live VM images.

   Hugo.

[1] https://btrfs.wiki.kernel.org/index.php/Main_Page

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Someone's been throwing dead sheep down my Fun Well ---   


signature.asc
Description: Digital signature


Re: SI units

2010-11-18 Thread Hugo Mills
On Thu, Nov 18, 2010 at 02:53:00PM +0100, Helmut Hullen wrote:
 Du meintest am 18.11.10:
 
  when I invoke
 
  btrfs filesystem show
 
  then it shows the size of my Terabyte disks in TiByte but tells
  TB. It's a difference of about 10% - either there should be a
  switch like in df (option -H or --si), or TB should be changed
  to TiB (the same with GiB, MiB etc.)
 
 I posted patches[1] to do just that, a few weeks ago.
 
 I've just compiled btrfs from git (20101117) - at least this patch isn't  
 included.
 
   git clone 
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
 
 as recommended in
 
   https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories
 
 Do I use an antique version?

   No, that's the latest version, as far as I know. The patches
haven't been picked up and integrated by Chris yet. (In fact, I should
probably send them again).

   In the meantime, I'm afraid you'll have to apply the patches
manually.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Questions are a burden, and answers a prison for oneself. ---


signature.asc
Description: Digital signature


Re: btrfs problems and fedora 14

2010-11-22 Thread Hugo Mills
   Hi,

On Tue, Nov 23, 2010 at 10:19:43AM +1100, david grant wrote:
 I thought I would try btrfs on a new installation of f14. yes, I know
 its experimental but stable so it seemed to be a good time to try it.
 I am not sure if I have missed something out of all my searching but am
 I correct in thinking that currently: 
  I. it is not possible to boot from a snapshot of the operating
 system and, in particular, the yum snapshots cannot be used for
 that purpose 

   You can use btrfs subvolume set-default to set the default
subvolume that is mounted if no subvol= or subvolid= parameter is
given to mount. (And you can then subsequently access the original
root of the filesystem using mount -o subvolid=0).

 II. it is so easy to create raid arrays of btrfs partitions but they
 cannot be read by f13 or f14 

   There's no particular reason that this should be the case. How do
you come to this conclusion? What did you try, what did you expect to
happen, and what actually happened?

III. it is not possible to copy btrfs partitions with snapshots
 except possibly by the use of dd.

   Again, I can't see a reason that this shouldn't work. What are you
trying to do, exactly, and how is it failing?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I believe that it's closely correlated with ---   
   the aeroswine coefficient.


signature.asc
Description: Digital signature


Re: Errors during defragmentation

2010-11-29 Thread Hugo Mills
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Nov 29, 2010 at 10:02:56PM +0100, Andrej Podzimek wrote:
 Hello,
 
 I decided to test the 'defragment' feature on my system (after a huge number 
 of system updates and prelinking):
 
   find /bin /sbin /lib /usr/lib /usr/bin /usr/sbin -type d -exec btrfs 
 filesystem defragment '{}' '+'
 
 I have already defragmented a couple of (very large) directories with no 
 errors at all, so this was expected to work somehow. Surprisingly, this time 
 there were thousands of messages like this:
 
   ioctl failed on directory name ret -1 errno 28

   errno 28 is ENOSPC

   You've run out of disk space. (Or at least, btrfs thinks so).

 Most of the reported directories had zero files/subdirectories. However, 
 *most* of them were *not* empty...

 What does this error message mean? Could someone shed more light on
 this, please? Should I get ready for a bad crash? ;-) (There seems
 to be no data loss so far. No new messages in dmesg, no unexpected
 system behavior.)

   Hugo.

- -- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I believe that it's closely correlated with ---   
   the aeroswine coefficient.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iD8DBQFM9BoxIKyzvlFcI40RAv5SAJkB13ClPuTeRElrN1ARFhvDJ2C76gCghC/d
zFczJesxQGbd2jC2ildNNI0=
=i+tI
-END PGP SIGNATURE-


signature.asc
Description: Digital signature


Re: What to do about subvolumes?

2010-12-01 Thread Hugo Mills
On Wed, Dec 01, 2010 at 09:21:36AM -0500, Josef Bacik wrote:
 === Quotas ===
 
 This is a huge topic in and of itself, but Christoph mentioned wanting to have
 an idea of what we wanted to do with it, so I'm putting it here.  There are
 really 2 things here
 
 1) Limiting the size of subvolumes.  This is really easy for us, just create a
 subvolume and at creation time set a maximum size it can grow to and not let 
 it
 go farther than that.  Nice, simple and straightforward.
 
 2) Normal quotas, via the quota tools.  This just comes down to how do we want
 to charge users, do we want to do it per subvolume, or per filesystem.  My 
 vote
 is per filesystem.  Obviously this will make it tricky with snapshots, but I
 think if we're just charging the diff's between the original volume and the
 snapshot to the user then that will be the easiest for people to understand,
 rather than making a snapshot all of a sudden count the users currently used
 quota * 2.

   This is going to be tricky to get the semantics right, I suspect.

   Say you've created a subvolume, A, containing 10G of Useful Stuff
(say, a base image for VMs). This counts 10G against your quota. Now,
I come along and snapshot that subvolume (as a writable subvolume) --
call it B. This is essentially free for me, because I've got a COW
copy of your subvolume (and the original counts against your quota).

   If I now modify a file in subvolume B, the full modified section
goes onto my quota. This is all well and good. But what happens if you
delete your subvolume, A? Suddenly, I get lumbered with 10G of extra
files.  Worse, what happens if someone else had made a snapshot of A,
too? Who gets the 10G added to their quota, me or them? What if I'd
filled up my quota? Would that stop you from deleting your copy,
because my copy can't be charged against my quota? Would I just end up
unexpectedly 10G over quota?

   This is a whole gigantic can of worms, as far as I can see, and I
don't think it's going to be possible to implement quotas, even on a
filesystem level, until there's some good and functional model for
dealing with all the implications of COW copies. :(

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I believe that it's closely correlated with ---   
   the aeroswine coefficient.


signature.asc
Description: Digital signature


Re: What to do about subvolumes?

2010-12-01 Thread Hugo Mills
On Wed, Dec 01, 2010 at 12:38:30PM -0500, Josef Bacik wrote:
 On Wed, Dec 01, 2010 at 04:38:00PM +, Hugo Mills wrote:
  On Wed, Dec 01, 2010 at 09:21:36AM -0500, Josef Bacik wrote:
   === Quotas ===
   
   This is a huge topic in and of itself, but Christoph mentioned wanting to 
   have
   an idea of what we wanted to do with it, so I'm putting it here.  There 
   are
   really 2 things here
   
   1) Limiting the size of subvolumes.  This is really easy for us, just 
   create a
   subvolume and at creation time set a maximum size it can grow to and not 
   let it
   go farther than that.  Nice, simple and straightforward.
   
   2) Normal quotas, via the quota tools.  This just comes down to how do we 
   want
   to charge users, do we want to do it per subvolume, or per filesystem.  
   My vote
   is per filesystem.  Obviously this will make it tricky with snapshots, 
   but I
   think if we're just charging the diff's between the original volume and 
   the
   snapshot to the user then that will be the easiest for people to 
   understand,
   rather than making a snapshot all of a sudden count the users currently 
   used
   quota * 2.
  
 This is going to be tricky to get the semantics right, I suspect.
  
 Say you've created a subvolume, A, containing 10G of Useful Stuff
  (say, a base image for VMs). This counts 10G against your quota. Now,
  I come along and snapshot that subvolume (as a writable subvolume) --
  call it B. This is essentially free for me, because I've got a COW
  copy of your subvolume (and the original counts against your quota).
  
 If I now modify a file in subvolume B, the full modified section
  goes onto my quota. This is all well and good. But what happens if you
  delete your subvolume, A? Suddenly, I get lumbered with 10G of extra
  files.  Worse, what happens if someone else had made a snapshot of A,
  too? Who gets the 10G added to their quota, me or them? What if I'd
  filled up my quota? Would that stop you from deleting your copy,
  because my copy can't be charged against my quota? Would I just end up
  unexpectedly 10G over quota?
  
 
 If you delete your subvolume A, like use the btrfs tool to delete it, you will
 only be stuck with what you changed in snapshot B.  So if you only changed 
 5gig
 worth of information, and you deleted the original subvolume, you would have
 5gig charged to your quota.

   This doesn't work, though, if the owners of the original and
new subvolume are different:

Case 1:

 * Porthos creates 10G data.
 * Athos makes a snapshot of Porthos's data.
 * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of
   Porthos's data to Athos.
 * Porthos deletes his copy of the data.

Case 2:

 * Porthos creates 10G of data.
 * Athos makes a snapshot of Porthos's data.
 * Porthos deletes his copy of the data.
 * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of
   Porthos's data to Athos.

Case 3:

 * Porthos creates 10G data.
 * Athos makes a snapshot of Porthos's data.
 * Aramis makes a snapshot of Porthos's data.
 * A sysadmin (Richelieu) changes the ownership on Athos's snapshot of
   Porthos's data to Athos.
 * Porthos deletes his copy of the data.

Case 4:

 * Porthos creates 10G data.
 * Athos makes a snapshot of Porthos's data.
 * Aramis makes a snapshot of Athos's data.
 * Porthos deletes his copy of the data.
   [Consider also Richelieu changing ownerships of Athos's and Aramis's
   data at alternative points in this sequence]

   In each of these, who gets charged (and how much) for their copy of
the data?

  The idea is you are only charged for what blocks
 you have on the disk.  Thanks,

   My point was that it's perfectly possible to have blocks on the
disk that are effectively owned by two people, and that the person to
charge for those blocks is, to me, far from clear. You either end up
charging twice for a single set of blocks on the disk, or you end up
in a situation where one person's actions can cause another person's
quota to fill up. Neither of these is particularly obvious behaviour.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I believe that it's closely correlated with ---   
   the aeroswine coefficient.


signature.asc
Description: Digital signature


Re: What to do about subvolumes?

2010-12-01 Thread Hugo Mills
On Wed, Dec 01, 2010 at 12:24:28PM -0800, Freddie Cash wrote:
 On Wed, Dec 1, 2010 at 11:35 AM, Hugo Mills hugo-l...@carfax.org.uk wrote:
   The idea is you are only charged for what blocks
  you have on the disk.  Thanks,
 
    My point was that it's perfectly possible to have blocks on the
  disk that are effectively owned by two people, and that the person to
  charge for those blocks is, to me, far from clear. You either end up
  charging twice for a single set of blocks on the disk, or you end up
  in a situation where one person's actions can cause another person's
  quota to fill up. Neither of these is particularly obvious behaviour.
 
 As a sysadmin and as a user, quotas shouldn't be about physical
 blocks of storage used but should be about logical storage used.
 
 IOW, if the filesystem is compressed, using 1 GB of physical space to
 store 10 GB of data, my quota used should be 10 GB.
 
 Similar for deduplication.  The quota is based on the storage *before*
 the file is deduped.  Not after.
 
 Similar for snapshots.  If UserA has 10 GB of quota used, I snapshot
 their filesystem, then my quota used would be 10 GB as well.  As
 data in my snapshot changes, my quota used is updated to reflect
 that (change 1 GB of data compared to snapshot, use 1 GB of quota).

   So if I've got 10G of data, and I snapshot it, I've just used
another 10G of quota?

 You have to (or at least should) keep two sets of stats for storage usage:
   - logical amount used (real file size, before compression, before
 de-dupe, before snapshots, etc)
   - physical amount used (what's actually written to disk)
 
 User-level quotas are based on the logical storage used.
 Admin-level quotas (if you want to implement them) would be based on
 physical storage used.
 
 Thus, the output of things like df, du, ls would show the logical
 storage used and file sizes.  And you would either have an additional
 option to those apps (--real or something) to show the actual
 storage used and file sizes as stored on disk.
 
 Trying to make quotas and disk usage utilities to work based on what's
 physically on disk is just backwards, imo.  And prone to a lot of
 confusion.

   Trying to make quotas work based on what's physically on the disk
appears to have serious issues on the semantics of using up space,
so I agree with you on this point (and, indeed, it was the point I was
trying to make).

   However, doing it that way also effectively penalises users and
prevents (or severely discourages) them from using the advanced
functions of the filesystem. There's no benefit (in disk usage terms)
to the user in using a snapshot -- they might as well use plain cp.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I believe that it's closely correlated with ---   
   the aeroswine coefficient.


signature.asc
Description: Digital signature


Re: 800 GByte free, but no space left

2010-12-05 Thread Hugo Mills
On Sun, Dec 05, 2010 at 04:08:26AM -0700, Evert Vorster wrote:
 On Sun, Dec 5, 2010 at 12:48 AM, Helmut Hullen hul...@t-online.de wrote:
  Hallo, Evert,
 
  Du meintest am 04.12.10 zum Thema Re: 800 GByte free, but no space left:
 
  I am not an expert on this by a long shot, but it looks like you
  added these two disks in raid0.

   Nope -- btrfs will spread out its allocations across both disks.

  This means that the total space cannot exceed the space of the
  smallest disk.
[...]
  Especially: no RAID definition.
 
  If the smallest device defines the capacity then I should use 2*1.35
  TiByte, but my system tells no space left at about 2.4 TiByte - where
  are (at least) 300 GiByte hidden?
 
devid2 size 1.35TB used 1.35TB path /dev/sdc3
devid1 size 1.81TB used 1.35TB path /dev/sdf2
 
 Here devid 2 is at 100%, and hence you are getting the no more space
 left errors. So, the 300 TB is on the bigger disk, and not usable for
 you right now.

   I _think_ that a balance is all that's needed at this point. It
can't hurt, anyway (other than taking quite a long time).

 I know of the disk mode you speak.. an old raid card of mine called it
 Just a bunch of disks and it literally filled up the first disk
 before carrying on to the second one until that was full under
 windows... under UNIX it had the effect of just adding all the sectors
 to each other, and stretching the file system over the disks in a
 linear fashion. Most UNIX file systems writes files in the middle of
 the largest contiguous free space, which meant that some files got
 written on the first disk, and some on the second. As far as I know,
 btrfs does not support this raid mode.

   It does support it: that's what the single RAID profile in
mkfs.btrfs is. It attempts to use the disk space marginally more
intelligently than traditional linear mode, though, as it allocates
block groups (in chunks of about 1G) to each disk in turn. This isn't
the same as RAID-0, which stripes within block groups with a much
smaller stripe size.

 Another thing to keep in mind is that as far as I know you cannot
 remove devid 1 from a btrfs volume. This is due to be fixed, but I
 have no idea on the status of that.

   I've done it (I have a filesystem with IDs 7, 8, 9, 12, 13, 14).
Looks like that particular problem has been fixed.

 You could, if you really wanted to use all of two differently sized
 disks in a btrfs, subdivide the disks in equal sized partitions, and
 just put all of those partitions in a btrfs raid0...
[...]

   That would be a really bad idea, as your disks would thrash
horribly, reading stripes from different locations on the disk.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Nostalgia isn't what it used to be. ---   


signature.asc
Description: Digital signature


Re: 800 GByte free, but no space left

2010-12-06 Thread Hugo Mills
On Mon, Dec 06, 2010 at 02:13:00PM +0100, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 06.12.10:
 
  But after copying about 300 MByte (part of a 1.5-GByte *.mpg) I
  got no space left on device. Looks like balancing has stolen
  about 300 GByte.
 
 This sounds exactly like a problem I've had. What output do you
  get from btrfs fi df /srv/MM?
 
  I've just written a script for gathering the (perhaps) interesting
  data ...
 
  # btrfs filesystem show
  Label: 'MM2'  uuid: ad7c0668-316c-4a79-ba00-3b505b9d99b4
 Total devices 2 FS bytes used 2.37TB
 devid2 size 1.35TB used 1.20TB path /dev/sdc3
 devid1 size 1.81TB used 1.20TB path /dev/sdf2
 
  Btrfs Btrfs v0.19
 
  # btrfs filesystem df /srv/MM
  Data: total=2.39TB, used=2.37TB
  Metadata: total=5.25GB, used=3.51GB
  System: total=12.00MB, used=188.00KB
 
 Can you try that again with either the latest 2.6.37-rc, or with
  the btrfs-unstable kernel? There's a bug in earlier versions that
  breaks the reporting of RAID types, which is what I wanted to see
  here.
 
 Do you mean
 
   git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git
 
 as btrfs-unstable kernel?

   Yes. It's 2.6.36, plus the patches that Chris has sent to Linus for
inclusion into 2.6.37.

 Compiling 2.6.37-rc is no big problem, it only needs som time.
 
 Just now I'm using
 
 Kernel 2.6.35.8
 btrfs-git from 20101117
 
  I've moved about 50 Gbyte away from srv/MM in the meantime, before
  running the script with this output.
 
  And I don't dare running balance again - maybe it reduces the
  available space again and again.
 
 If you've hit the bug I think you have, then yes, it will.
 
 Hmm - it can't get worse ...
 If the error is related to the kernel or to the btrfs version and I try  
 a newer one: can that lead to more free space?

   Not yet. I've taken the whole of December off work (using up my
leave allocation for last year), and my plan is to get myself to the
point where I can understand enough of the code to fix this particular
problem.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- What are we going to do tonight? The same thing we do --- 
every night, Pinky.  Try to take over the world!


signature.asc
Description: Digital signature


Re: 800 GByte free, but no space left

2010-12-06 Thread Hugo Mills
   Helmut - 

On Mon, Dec 06, 2010 at 03:45:00PM +0100, Helmut Hullen wrote:
 If/when I install 2.6.37-rc4: should I update btrfs (from the 20101117  
 version)?

   I think that's the latest version.

 How can I see that changing the kernel makes things better? It's more  
 and more difficult to externalize (?) btrfs directories to other disks  
 ...

   Updating the kernel won't fix the problem I'm thinking of (sorry).
It will, however, fix the bug that stops the btrfs tool from reporting
what RAID levels you've got.

   The problem I suspect you may have (because your symptoms seem to
be the same as mine) is that there are some circumstances where the
filesystem can change RAID levels pretty much arbitrarily. Running
btrfs fi df with a kernel that reports RAID levels will show whether
that's the case, as you'll have more than one RAID level listed.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- But people have always eaten people,  / what else is there to ---  
 eat?  / If the Juju had meant us not to eat people / he 
 wouldn't have made us of meat.  


signature.asc
Description: Digital signature


Re: 800 GByte free, but no space left

2010-12-06 Thread Hugo Mills
On Mon, Dec 06, 2010 at 06:13:00PM +0100, Helmut Hullen wrote:
 Hallo, Hugo,
 
 Du meintest am 06.12.10:
 
  How can I see that changing the kernel makes things better? It's
  more and more difficult to externalize (?) btrfs directories to
  other disks ...
 
 Updating the kernel won't fix the problem I'm thinking of (sorry).
  It will, however, fix the bug that stops the btrfs tool from
  reporting what RAID levels you've got.
 
 The problem I suspect you may have (because your symptoms seem to
  be the same as mine) is that there are some circumstances where the
  filesystem can change RAID levels pretty much arbitrarily. Running
  btrfs fi df with a kernel that reports RAID levels will show
  whether that's the case, as you'll have more than one RAID level
  listed.

 Kernel 2.6.37-rc4:
 
 # btrfs filesystem df /srv/MM
 Data, RAID0: total=2.39TB, used=2.37TB
 System, RAID1: total=8.00MB, used=188.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=4.25GB, used=3.51GB
 Metadata, DUP: total=1.00GB, used=2.33MB
 
 Hope it helps!

   Yup. You've got what I've got(*). You have two different RAID types
for metadata, which shouldn't happen (but does, due to a bug).

   Hugo.

(*) My .sig fairy is clearly working overtime for appropriate
quotations. :)

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Charting the inexorable advance of Western syphilisation... ---   


signature.asc
Description: Digital signature


Re: 800 GByte free, but no space left

2010-12-07 Thread Hugo Mills
   Helmut -

On Tue, Dec 07, 2010 at 06:05:00PM +0100, Helmut Hullen wrote:
 Du meintest am 06.12.10:
 
  Kernel 2.6.37-rc4:
 
  # btrfs filesystem df /srv/MM
  Data, RAID0: total=2.39TB, used=2.37TB
  System, RAID1: total=8.00MB, used=188.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=4.25GB, used=3.51GB
  Metadata, DUP: total=1.00GB, used=2.33MB
 
  Hope it helps!
 
 Yup. You've got what I've got(*). You have two different RAID
  types for metadata, which shouldn't happen (but does, due to a bug).
 
 Fear I right that balancing tries to reduce the system to something  
 like RAID1?

   It _should_ move all the data on the disk to somewhere else on the
disk, whilst honouring the RAID settings for the filesystem. However,
since it's got buggered up RAID settings now and has been using some
of the space for the wrong RAID type, the balance can't find space
with the right RAID parameters to write to, so it runs out of space.

   (I think I got that right, anyway. I'm working off a conversation
with Chris on IRC some weeks ago, about what happened to my
filesystem).

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Always be sincere,  whether you mean it or not. --- 


signature.asc
Description: Digital signature


Re: btrfs-progs branch updated

2012-07-05 Thread Hugo Mills
On Thu, Jul 05, 2012 at 04:01:22PM -0400, Chris Mason wrote:
 Hi everyone,
 
 I've updated the master branch with the pending stable btrfs-progs
 commit that should make the 0.20 release.
 
 Thanks to Hugh for helping to queue up a few of them.  We'll have more
   ^ that's an o, not an h. :)

 frequent releases from here as we pull in the major new features going
 into progs (raid5/6, send/receive, quotas, fsck improvements).

   I've still got a stack of new feature patches sitting here that
mostly apply OK to integration. I'll try to filter out the ones
already applied and put together another stack in approximately
kernel-feature order.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Nothing right in my left brain. Nothing left in --- 
 my right brain. 


signature.asc
Description: Digital signature


Re: btrfs fi df won't show total=

2012-07-09 Thread Hugo Mills
On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote:
 
 On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be
 observed: after a change of the profiles, total=,used= is no
 longer shown:
 
 
 20:49 mmsrv1:~ # btrfs fi df /top.srv/
 Data, RAID10: total=152.00GiB, used=121.07GiB
 System, RAID1: total=40.00MiB, used=44.00KiB
 System: total=4.00MiB, used=0.00
 Metadata, RAID1: total=112.00GiB, used=1.30GiB
 Metadata: total=8.00MiB, used=0.00
 20:50 mmsrv1:~ # btrfs fi bal start -mconvert=raid10 -sconvert=raid10 
 /top.srv/
 Refusing to explicitly operate on system chunks.
 Pass --force if you really want to do that.
 20:52 mmsrv1:~ # btrfs fi bal start -mconvert=raid10 -sconvert=raid10 
 --force /top.srv/
 ...
 21:10 mmsrv1:~ # btrfs fi df /top.srv/
 Data, RAID10: total=156.00GiB, used=124.35GiB
 System, RAID10: total=128.00MiB, used=48.00KiB
 System: total=4.00MiB, used=0.00
 Metadata, RAID10: total=112.00GiB, used=1.38GiB

   What's the problem here? You no longer have any RAID1 chunks, so
it's not showing them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- IMPROVE YOUR ORGANISMS!!  -- Subject line of spam email --- 


signature.asc
Description: Digital signature


Re: btrfs fi df won't show total=

2012-07-09 Thread Hugo Mills
On Mon, Jul 09, 2012 at 10:06:24PM +0200, Jan Engelhardt wrote:
 
 On Monday 2012-07-09 21:25, Hugo Mills wrote:
 On Mon, Jul 09, 2012 at 09:14:03PM +0200, Jan Engelhardt wrote:
  
  On openSUSE_12.1 with Btrfs v0.19+20120406, the following can be
  observed: after a change of the profiles, total=,used= is no
  longer shown:
  
  20:49 mmsrv1:~ # btrfs fi df /top.srv/
  Data, RAID10: total=152.00GiB, used=121.07GiB
  System, RAID1: total=40.00MiB, used=44.00KiB
  System: total=4.00MiB, used=0.00
  Metadata, RAID1: total=112.00GiB, used=1.30GiB
  Metadata: total=8.00MiB, used=0.00
 [...]
  21:10 mmsrv1:~ # btrfs fi df /top.srv/
  Data, RAID10: total=156.00GiB, used=124.35GiB
  System, RAID10: total=128.00MiB, used=48.00KiB
  System: total=4.00MiB, used=0.00
  Metadata, RAID10: total=112.00GiB, used=1.38GiB
 
What's the problem here? You no longer have any RAID1 chunks, so
 it's not showing them.
 
 Rather tha a 4-line output, I would have expected this 6-line output
 that I would also get when mkfs'ing a new fresh btrfs volume with
 raid10 from the start:
 
 Data, RAID10: total=156.00GiB, used=124.35GiB
 Data: total=foo, used=bar
 System, RAID10: total=128.00MiB, used=48.00KiB
 System: total=4.00MiB, used=0.00
 Metadata, RAID10: total=112.00GiB, used=1.38GiB
 Metadata: total=foo, used=bar

   The lines without the RAID marker are there as a result of the way
that mkfs works -- it creates stub chunks which are never used, and
then upgrades to the required RAID level immediately afterwards.

   The balance (any balance, not just a conversion) processes these
chunks as well as all the other chunks in the FS, and rewrites all of
the data in them (all 0 bytes of it) somewhere else, removing the
originals.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Well, you don't get to be a kernel hacker simply by looking ---   
good in Speedos. -- Rusty Russell


signature.asc
Description: Digital signature


Re: Can't mount, power failure - recoverable?

2012-07-13 Thread Hugo Mills
On Fri, Jul 13, 2012 at 02:23:53PM +0200, Martin Steigerwald wrote:
 Am Montag, 26. März 2012 schrieb Skylar Burtenshaw:
  Fajar A. Nugraha list at fajar.net writes:
   Didn't Chris' last response basically say use kernel 3.2 or newer,
   mount the fs (possibly with -o ro), and copy the data elsewhere?
  
  Why yes, yes it did actually. I appreciate your spotlighting it, just
  in case I somehow managed to miss it, though.
  
   Have you done that?
  
  I have. In fact, in my first message, I stated that in all kernels up
  to present 3.2 kernels, I get several minutes of disk churning, then a
  stack trace. Also present in my messages is the fact that the
  filesystem will not mount, as well as data output from the recovery
  program etc which fail to recognize things in the filesystem that they
  require in order to fix it. Did you have something you wished to
  suggest, in order to help me? If so, I'd gladly listen to any proposed
  ideas.
 
 Since I didn´t found any explicit mention on it:
 
 Did you try btrfs-zero-log on the partition prior to mounting it?
 
 All of my BTRFS will not mount after sudden write interruption cases have 
 been solved by it. Except one with a BTRFS RAID 0 with lots of 2 TB drives 
 at a time where I didn´t know about btrfs-zero-log. Maybe it would have 
 helped there, too.
 
 Of course I could be completely off track and this could be a completely 
 different issue.

   I'm afraid you probably are -- there's nothing I can see in the
stack trace that would indicate that it's falling over in the log tree
replay, which is the only thing that btrfs-zero-log would help with.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- A diverse working environment:  Di longer you vork here, di ---   
 verse it gets.  


signature.asc
Description: Digital signature


Re: Can't mount, power failure - recoverable?

2012-07-15 Thread Hugo Mills
On Sat, Jul 14, 2012 at 01:01:04AM +, Skylar Burtenshaw wrote:
 I noticed there've been some recent (since I last looked at least) updates
 including fsck and such, however I haven't run anything git-based since the
 last time I pulled the btrfs tools, and I had to dig for ages to find info
 on how to get the RECENT stuff from the CORRECT source. I can find a dozen
 Google results that seem relevant, but can someone give me a definitive 
 answer on which tree to pull down (and how) to test the new tools on my mess?

   This is the definitive source on where to get things:

   https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

   You will need the official -progs repository, as that's most up to
date right now.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no. 4: Future Perfect ---   


signature.asc
Description: Digital signature


Re: No/bad auto-detection of fs type for small volumes (related to mixed metadata/data?)

2012-07-25 Thread Hugo Mills
On Tue, Jul 24, 2012 at 08:39:36PM -0400, Marios Titas wrote:
 When I create a btrfs volume of size strictly less than 256 MiB then if I do
 mount /dev/sdb1 /mnt/test
 the kernel tries unsuccessfully to do the mount with many other file systems
 before successfully trying with btrfs. For volumes of size larger than
 or equal to
 256 MiB it just mounts the volume without doing that. Why is this discrepancy?

   Are you using the --mixed option when creating the filesystem? If
not, you should do with something that small.

   Hugo.

 Another possibly related symptom is that the volume does not appear in
 /dev/disk/by-label and /dev/disk/by-uuid at all. This means that it is
 impossible
 to mount the volume by uuid or label.
 
 To make sure that this isn't a udev bug, I booted my system with 
 init=/bin/bash
 in the kernel command line, and then I tried again to mount the
 volume. This time
 it would not mount it at all unless I explicitly specified the fs
 type. On the other
 hand, it could mount larger volumes without any issues.
 
 All the experiments were done in an initially zeroed out disk. I am
 using 3.4.6 kernel
 with btrfs from 3.5 and the latest btrfs-progs from git.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- In theory, theory and practice are the same. In --- 
  practice,  they're different.  


signature.asc
Description: Digital signature


Re: [RFC PATCH 0/6] Experimental btrfs send/receive (btrfs-progs)

2012-07-25 Thread Hugo Mills
On Wed, Jul 25, 2012 at 12:41:56PM +0200, Alexander Block wrote:
 On Mon, Jul 23, 2012 at 2:29 PM, Arne Jansen sensi...@gmx.net wrote:
  On 04.07.2012 15:39, Alexander Block wrote:
  Hello all,
 
  This is the user space side of btrfs send/receive.
 
  You can apply them manually or use my git repo:
 
  git://github.com/ablock84/btrfs-progs.git (branch send)
 
  The branch is based on Hugo's integration-20120605 branch. I had to add a 
  temporary
  commit to fix a bug introduced in one of the strncpy/overflow patches that 
  got into
  btrfs-progs. This fix is not part of the btrfs send/receive patchset, but 
  you'll
  probably need it if you want to base on the integration branch. I hope 
  this is not
  required in the future when a new integration branch comes out.
 
  Example usage:
 
  Multiple snapshots at once:
  btrfs send /mnt/snap[123]  snap123.btrfs
 
  a) Do we really want a single token command here, not
  btrfs filesystem send or subvol send?
 In my opinion the single token is easier to type and remember. But if
 enough speaks for normal subcommands this can be changed (but by
 someone else as I'm running out of time).

   Since everything else is two commands, yes, I think we need it for
consistency. (And, since it's a publically-visible interface, for
acceptance of the patches -- we don't want to be changing the way the
commands work after the fact).

  b) zfs makes sure stdout is not a tty, to prevent flooding
  your console. This kinda makes sense.
 This makes sense. But again, this has to be done by someone else.

   Can you keep a brief list of such cleanups/features and dump it on
the wiki as a proposed project when your time does run out, please.
That way the details don't get lost, and they can be found by other
people and dealt with independently.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Turning,  pages turning in the widening bath, / The spine ---
cannot bear the humidity. / Books fall apart; the binding
cannot hold. / Page 129 is loosed upon the world.


signature.asc
Description: Digital signature


Re: fail to mount after first reboot

2012-08-19 Thread Hugo Mills
On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote:
 
 
 I created a 1TB RAID1.  So far it is just for testing, no important data
 on there.
 
 
 After a reboot, I tried to mount it again
 
 # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0
 mount: wrong fs type, bad option, bad superblock on
 /dev/mapper/vg00-btrfsvol0_0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

   With multi-volume btrfs filesystems, you have to run btrfs dev
scan before trying to mount it. Usually, the distribution will do
this in the initrd (if you've installed its btrfs-progs package).

 Then I did btrfsck - it reported no errors, but mounted OK:
 
 # btrfsck /dev/mapper/vg00-btrfsvol0_0
[...]

   The first thing that btrfsck does is to do a device scan.

[...]
 Can anyone comment on this?

   See above.

 Also, df is reporting double the actual RAID1 volume size, and double
 the amount of data stored in this filesystem:
 
 # df -lh .
 FilesystemSize  Used Avail Use% Mounted on
 /dev/mapper/vg00-btrfsvol0_0  1.9T   51G  1.8T   3% /mnt/btrfs0
 
 I would expect to see Size=1T, Used=25G
 
 # strace -v -e trace=statfs df -lh /mnt/btrfs0
 statfs(/mnt/btrfs0, {f_type=0x9123683e, f_bsize=4096,
 f_blocks=488374272, f_bfree=475264720, f_bavail=474749786, f_files=0,
 f_ffree=0, f_fsid={2083217090, -1714407264}, f_namelen=255,
 f_frsize=4096}) = 0
 FilesystemSize  Used Avail Use% Mounted on
 /dev/mapper/vg00-btrfsvol0_0  1.9T   51G  1.8T   3% /mnt/btrfs0

   This is an FAQ:

   
https://btrfs.wiki.kernel.org/index.php/FAQ#Why_is_free_space_so_complicated.3F

   tl;dr: It's reporting the total number of raw storage bytes,
because it's impossible to compute actual usable space in the general
case.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- In one respect at least, the Martians are a happy people: ---
  they have no lawyers.  


signature.asc
Description: Digital signature


Re: fail to mount after first reboot

2012-08-19 Thread Hugo Mills
On Sun, Aug 19, 2012 at 02:33:14PM +, Daniel Pocock wrote:
 On 19/08/12 14:15, Hugo Mills wrote:
  On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote:
  I created a 1TB RAID1.  So far it is just for testing, no important data
  on there.
 
  After a reboot, I tried to mount it again
 
  # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0
  mount: wrong fs type, bad option, bad superblock on
  /dev/mapper/vg00-btrfsvol0_0,
 missing codepage or helper program, or other error
 In some cases useful info is found in syslog - try
 dmesg | tail  or so
  
 With multi-volume btrfs filesystems, you have to run btrfs dev
  scan before trying to mount it. Usually, the distribution will do
  this in the initrd (if you've installed its btrfs-progs package).
 
 I'm running Debian, I've just updated the system from squeeze to wheezy
 (with 3.2 kernel) so I could try btrfs and do other QA testing on wheezy
 (as it is in the beta phase now)
 
 I already had the btrfs-tools package installed, before creating the
 filesystem.  So it appears Debian doesn't have an init script
 
 It does have /lib/udev/rules.d/60-btrfs.rules:
 SUBSYSTEM!=block, GOTO=btrfs_end
 ACTION!=add|change, GOTO=btrfs_end
 ENV{ID_FS_TYPE}!=btrfs, GOTO=btrfs_end
 RUN+=/sbin/modprobe btrfs
 RUN+=/sbin/btrfs device scan $env{DEVNAME}
 
 LABEL=btrfs_end
 
 but I'm guessing that isn't any use to my logical volumes that are
 activated early in the boot sequence?
 
 Could I be having this problem because I put my btrfs on logical volumes?

   Possibly. You may need the Device mapper uevents option in the
kernel (CONFIG_DM_UEVENT) to trigger that udev rule when you enable
your VG(s). Not sure if it's available/enabled in your kernel.

 Here is the package version I have:
 
 # dpkg --list | grep btrfs
 ii  btrfs-tools   0.19+20120328-7
Checksumming Copy on Write Filesystem utilities

   That should be fine.

 Here is a more thorough dmesg, since boot, does this suggest the scan
 was invoked?  I remember seeing some message about checking for btrfs
 filesystems just after selecting the kernel in grub (root is ext3)

   That message was probably grub checking the FS.

 # dmesg | grep btrfs
 [   40.677505] btrfs: setting nodatacow
 [   40.677514] btrfs: turning off barriers
 [17216.145092] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
 transid 34 /dev/mapper/vg00-btrfsvol0_0
 [17216.145639] btrfs: disk space caching is enabled
 [17216.146987] btrfs: failed to read the system array on dm-100
 [17216.147556] btrfs: open_ctree failed
 [17310.978518] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
 transid 34 /dev/mapper/vg00-btrfsvol0_0
 [17310.993882] btrfs: disk space caching is enabled
 [17598.736657] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
 transid 37 /dev/mapper/vg00-btrfsvol0_0
 [17598.750849] btrfs: disk space caching is enabled

   No, doesn't look like there were any scan results coming in before
17216.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- In one respect at least, the Martians are a happy people: ---
  they have no lawyers.  


signature.asc
Description: Digital signature


Re: How to get Btrfs on 2nd partition of USB HDD to automount as read/write

2012-08-19 Thread Hugo Mills
On Sun, Aug 19, 2012 at 03:51:47PM -0400, dg1727 wrote:
 Hello, 
 
 The question below is based on 
 https://lists.ubuntu.com/archives/xubuntu-users/2012-
 August/004509.html
 
 Thanks in advance for any help with the following question, 
 including pointing me to some other info resource if needed.  
 
 I have a user with an Xubuntu 12.04.1 laptop, 32-bit.  He needs to 
 use a USB hard disk drive which has 2 partitions:  the 1st 
 partition is NTFS and the 2nd partition is Btrfs.  
 
 When he plugs in the hard drive, both partitions auto-mount OK, 
 except that the Btrfs partition automounts read-only.  That is, the 
 permissions of the directories in /dev are drwx-- for the NTFS 
 and dr-xr-xr-x for the Btrfs.  
 
 How can the OS be set up so that the Btrfs will automount 
 read/write?  

   As Anthony points out, this is a property of the filesystem, not
the OS or the mount options. Just use chmod.

   (It's only filesystems like FAT, which have no concept of
permissions, which have mount options to set permissions)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Two things came out of Berkeley in the 1960s: LSD and Unix. ---   
   This is not a coincidence.


signature.asc
Description: Digital signature


Re: “Bug”-report: inconsistency kernel - tools

2012-08-30 Thread Hugo Mills
On Thu, Aug 30, 2012 at 08:24:53PM +0200, Goffredo Baroncelli wrote:
 On 08/28/2012 09:52 PM, M G Berberich wrote:
 (7) reinserted disk (and rebooted)
  At some point before reboot the first 10 sectors of one disk
  were zeroed to test if the disk gets removed from the btrfs.
 
 IIRC the superblock is not placed at the beginning of the disk. On
 the basis of [1] it should be near the 64KB (around the sector #128)

   Just for the record, the first is at 64KiB; each subsequent one is
shifted 12 bits left (256MiB, 1TiB, 4EiB, 16ZiB, 64YiB).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- This chap Anon is writing some perfectly lovely stuff ---  
 at the moment.  


signature.asc
Description: Digital signature


Re: Btrfs-Progs integration branch question

2012-09-03 Thread Hugo Mills
On Mon, Sep 03, 2012 at 09:31:16AM -0700, Suman C wrote:
 Hi,
 
 I would like to get the latest btrfs-progs code. To me, Chris Mason's
 repo at git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
 seems latest but obviously its missing the last several patches I see
 in the mailing list. I also tried Hugo Mills' integration repo at
 http://git.darksatanic.net/repo/btrfs-progs-unstable.git and unless I
 am looking at it wrong, it seems behind.

   It is. I'm out of date.

 Can someone please point me to the latest process that is followed for
 testing/developing recent btrfs-progs?

   Chris's repo, right now.

 I am trying to integrate the quota patch from August 10th by Jan
 Schmidt and getting conflicts when I git apply.

   Good luck. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- What do you give the man who has everything? -- Penicillin is ---  
 a good start... 


signature.asc
Description: Digital signature


Re: Is my btrfs full?

2012-09-04 Thread Hugo Mills
On Tue, Sep 04, 2012 at 06:07:50PM +0200, Petr Tichý wrote:
 I have a 130 GB btrfs with rsnapshot-like backups mounted with
 compressoin=zlib and now I'm getting ENOSPC, while df shows only
 some 60 % used. I'm running Linux version 3.2.0-3-amd64 (Debian
 3.2.23-1). Is my btrfs really full? Will a more recent kernel solve
 this?

   We strongly recommend using the latest available kernel (currently
3.5 or 3.6-rc4) if you're running btrfs. The code is still moving very
quickly, and the main devs are still finding and fixing fairly serious
bugs.

 root@roura:~# btrfs filesystem show --all-devices
 Label: none  uuid: 12880174-8337-47bc-be05-f485a0b7503f
   Total devices 1 FS bytes used 72.67GB
   devid1 size 130.00GB used 130.00GB path /dev/sdd

   You may want to read about df[1], and then about ENOSPC errors[2].

   If you've still got questions after that, please do come back and
ask them.

   Hugo.

[1] 
https://btrfs.wiki.kernel.org/index.php/FAQ#Why_does_df_show_incorrect_free_space_for_my_RAID_volume.3F
[2] 
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Geek, n.: Circus sideshow performer specialising in the --- 
 eating of live animals. 


signature.asc
Description: Digital signature


Re: [PATCH] Btrfs + Btrfs-progs: make pipe functions re-usable

2012-09-17 Thread Hugo Mills
On Mon, Sep 17, 2012 at 12:48:10PM +0800, Anand Jain wrote:
 
   btrfs send introduced a part of code to read kernel-data
   from user-end using pipe. We need this part of code to be
   useable outside of send sub-cmd, so that developing
   service sub-cmd can use it.
 
 What's 'service sub-cmd' please?
 
   at the moment 'btrfs service history mnt|dev'
   to show logs of maintenance.
   comments/suggestions welcome.

   As I said in our private email exchange some months ago, I don't
think this is the right way to be doing this. For example, if you use
an alternative tool (such as btrfs-gui) which uses the ioctls
directly, you've lost that logging information.

   Keeping a log of what's been done to the FS is much better done by
extending the available logging in the kernel (and making it a
compile-time option for those who don't want or need it). You can then
write a simple shell script to chomp through the normal kernel logs to
extract this information.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I'll take your bet, but make it ten thousand francs. I'm only ---  
   a _poor_ corrupt official.


signature.asc
Description: Digital signature


Re: Rebuilding chunk root?

2012-09-24 Thread Hugo Mills
On Mon, Sep 24, 2012 at 04:28:08PM +0300, Sami Haahtinen wrote:
 Due to certain unfortunate chain of events, I managed to overwrite a
 small portion of my btrfs array which had only single redundancy for
 metadata. The data itself is present and only a small portion (2.5%)
 of the array was overwritten.
 
 After quite a bit of debugging and tinkering, I realized that my chunk
 root was in the portion that was overwritten. After reading through
 the documentation I was able to pull together it's still unclear to me
 whether chunk root is something that can be rebuilt.

   Chris had some experimental code for doing it in btrfsck which
never saw the light of day (because it was too unreliable). He may
be able to offer you something to help, though.

 A transcript of btrfsck trying to recover with superblock 2 which is
 uncorrupted by itself:
 
 root@sysresccd /root/btrfs-progs % ./btrfsck --super 2 /dev/patience/home
 using SB copy 2, bytenr 274877906944
 Check tree block failed, want=139264, have=0
 Check tree block failed, want=139264, have=0
 Check tree block failed, want=139264, have=0
 read block failed check_tree_block
 Couldn't read chunk root
 
 If I'm interpreting the output correctly, it's trying to read bytes
 from address 139264, which would fall into the corrupted area.

   No, I believe the want=, have= text is referring to a generation
ID, not a block number. That's not to say that your chunk tree isn't
damaged, though -- I'm just clarifying your interpretation of the
numbers.

   Out of interest, does mounting with -o recovery help at all? (I'm
not expecting it to do much if your chunk tree's gone, but it might do
something).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Eighth Army Push Bottles Up Germans -- WWII newspaper ---  
 headline (possibly apocryphal)  


signature.asc
Description: Digital signature


Re: BTRF - Storage Usage

2012-09-27 Thread Hugo Mills
On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote:
 I've installed a new server using btrfs for my root partition (/).
 
 It uses snapper for snapshots management and all seems to work pretty fine.
 
 My problem is to be able to know the remaining REAL free space in my  
 partition.

   This is in the FAQ: 
https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F

   Short answer: you can't know in general.

   Longer answer -- see below.

 Using different commands, i have different results, and i don't know  
 how to interpret them correctly :

 poivron:~ # btrfs filesystem show /dev/sda3
 Label: none  uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212
  Total devices 1 FS bytes used 2.58GB
  devid1 size 131.64GB used 10.04GB path /dev/sda3

   You have 131.64 GiB of raw storage in your filesystem. Of that,
10.04 GiB is currently allocated for use by the FS (and it will take
more as it needs it).

 poivron:~ # btrfs filesystem df /
 Data: total=4.01GB, used=2.16GB

   4.01 GiB of the 10.04 GiB allocation is assigned for use by data,
and 2.16 GiB of that allocation actually contains data.

 System, DUP: total=8.00MB, used=4.00KB

   16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for
use as two copies of the system data. There is 4 KiB of system data
actually used.

 System: total=4.00MB, used=0.00
 Metadata, DUP: total=3.00GB, used=429.16MB

   6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for
use as metadata, with two copies (DUP) being kept. 429.16 MiB of the
3.00 GiB is currently in use.

 Metadata: total=8.00MB, used=0.00

 poivron:~ #  df -hP /
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sda3   132G  3.0G  124G   3% /

   Plain old df can't handle the truth, so this is at best only a hint
at what's actually happening. When Avail reaches zero, your FS is
probably full. Other than that, you can't necessarily say very much.

 ===
 
 Please help me understand and interpret those information to know the  
 most accurately as possible what is my real remaining space, and what  
 space is used by what.
 
 Although, i don't really understand the output of the command btrfs  
 filesystem df / : what are exactly Data, System DUP, System  
 total, Metadata DUP and Metadata total ?

   This should all be covered in the glossary on the website:
https://btrfs.wiki.kernel.org/index.php/Glossary

   Data is the contents of your files. Metadata is all the other stuff
that the FS needs in order to store your files -- directory
structures, permissions, locations of the file data, that kind of
thing. System is a particular bit of the metadata (the chunk tree)
which governs an internal physical/virtual mapping, and which needs to
be read before anything else can make any kind of sense.

   DUP is a bit like RAID-1: anything stored in a DUP chunk is
actually written to two different places on the disk, and can help
recovery in the case of physical disk corruption (e.g. bad blocks,
head crash).

 ==
 
 Here are some complementary informations :
 poivron:~ # uname -a
 Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012  
 (3829766) x86_64 x86_64 x86_64 GNU/Linux

   You [probably(*)] need to upgrade your kernel as soon as possible.
btrfs code moves very fast, and 3.0 has significant bugs in it. You
should be running the latest released kernel -- right now, that's 3.5,
or 3.6-rc7. Next week, it will probably change to 3.6 when Linus makes
the next release. Most distributions have a repository somewhere which
will give you access to new kernels without too much trouble.

   Hugo.

(*) Some of the enterprise distributions do have backported btrfs
fixes in their apparently older kernels.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   ---   __(_'  Squeak!   ---   


signature.asc
Description: Digital signature


Re: BTRF - Storage Usage

2012-09-27 Thread Hugo Mills
On Thu, Sep 27, 2012 at 01:25:58PM +0200, Sébastien Maury wrote:
 Hi,
 
 Thanks for the quick reply, this clarify me lots of things.
 I've had read the articles you mentioned, but i must admit that your  
 explanations based on my examples makes things even more clearer.
 
 Also, if i understand things properly, snaphots size aren't included  
 in the btrfs filesystem show command output ?
 So, the use, for example, of a du -sh /.snapshots is correct to  
 determine the disk usage of my snapshots ?

   Disk usage of a snapshot has two different answers:

1) The total size of the files listed in the snapshot, which you can
   get from du.

2) The amount of space that would be freed up by deleting the
   snapshot, which isn't currently available, but probably will be
   soon. (The additional bookkeeping code was part of the qgroups
   patches, which are in 3.6).

 I will see with the people of my company in charge of maintaining  
 distributions to provide us a more recent kernel.
 
 PS : I use SLES 11 SP2 distribution.

   OK, that one's actually one of the few that does keep proper
backports: 
https://btrfs.wiki.kernel.org/index.php/Getting_started#Distro_support

   That said, I don't know how good they are at keeping up -- probably
pretty good, but other people here may be able to answer that better.

   Hugo.

 Hugo Mills h...@carfax.org.uk a écrit :
 
  On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote:
  I've installed a new server using btrfs for my root partition (/).
 
  It uses snapper for snapshots management and all seems to work pretty fine.
 
  My problem is to be able to know the remaining REAL free space in my
  partition.
 
 This is in the FAQ:   
  https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F
 
 Short answer: you can't know in general.
 
 Longer answer -- see below.
 
  Using different commands, i have different results, and i don't know
  how to interpret them correctly :
 
  poivron:~ # btrfs filesystem show /dev/sda3
  Label: none  uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212
   Total devices 1 FS bytes used 2.58GB
   devid1 size 131.64GB used 10.04GB path /dev/sda3
 
 You have 131.64 GiB of raw storage in your filesystem. Of that,
  10.04 GiB is currently allocated for use by the FS (and it will take
  more as it needs it).
 
  poivron:~ # btrfs filesystem df /
  Data: total=4.01GB, used=2.16GB
 
 4.01 GiB of the 10.04 GiB allocation is assigned for use by data,
  and 2.16 GiB of that allocation actually contains data.
 
  System, DUP: total=8.00MB, used=4.00KB
 
 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for
  use as two copies of the system data. There is 4 KiB of system data
  actually used.
 
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=3.00GB, used=429.16MB
 
 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for
  use as metadata, with two copies (DUP) being kept. 429.16 MiB of the
  3.00 GiB is currently in use.
 
  Metadata: total=8.00MB, used=0.00
 
  poivron:~ #  df -hP /
  Filesystem  Size  Used Avail Use% Mounted on
  /dev/sda3   132G  3.0G  124G   3% /
 
 Plain old df can't handle the truth, so this is at best only a hint
  at what's actually happening. When Avail reaches zero, your FS is
  probably full. Other than that, you can't necessarily say very much.
 
  ===
 
  Please help me understand and interpret those information to know the
  most accurately as possible what is my real remaining space, and what
  space is used by what.
 
  Although, i don't really understand the output of the command btrfs
  filesystem df / : what are exactly Data, System DUP, System
  total, Metadata DUP and Metadata total ?
 
 This should all be covered in the glossary on the website:
  https://btrfs.wiki.kernel.org/index.php/Glossary
 
 Data is the contents of your files. Metadata is all the other stuff
  that the FS needs in order to store your files -- directory
  structures, permissions, locations of the file data, that kind of
  thing. System is a particular bit of the metadata (the chunk tree)
  which governs an internal physical/virtual mapping, and which needs to
  be read before anything else can make any kind of sense.
 
 DUP is a bit like RAID-1: anything stored in a DUP chunk is
  actually written to two different places on the disk, and can help
  recovery in the case of physical disk corruption (e.g. bad blocks,
  head crash).
 
  ==
 
  Here are some complementary informations :
  poivron:~ # uname -a
  Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012
  (3829766) x86_64 x86_64 x86_64 GNU/Linux
 
 You [probably(*)] need to upgrade your kernel as soon as possible.
  btrfs code moves very fast, and 3.0 has significant bugs in it. You
  should be running the latest released kernel -- right now, that's 3.5,
  or 3.6-rc7. Next week, it will probably

Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Hugo Mills
On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote:
 On Thu, 27 Sep 2012 23:02:35 +0200
 Goffredo Baroncelli kreij...@libero.it wrote:
 
  Sorry for the space error:
  Below a more correct example
  
  $ btrfs filesystem disk-free /
  Summary:
  Total:  135.00GB
  Allocated:   10.51GB
  Unallocated:124.49GB
  Free_(Estimated)  86.56GB
  Average_disk_efficiency: 62 %
 
 How do you estimate Free here? Sorry I didn't check the source code in git,
 but from the Details below nothing leads me to believe that this FS is
 doomed to only be able to usefully utilize only ~86GB of the partition, and 
 not
 more.
 
 Are you ready to answer the flood of questions from people why their disk is
 only 62% efficient, and how to tune it to 100%? :-)

   Data_to_disk_ratio, maybe?

 Why use underscores instead of spaces?

   So that you can use, say, read in the shell to extract data from
each line. To that end, there should be a space between the value and
the unit throughout.

  Details:
  Chunk-typeMode   AllocatedUsedFree
  --   -   -

   Minor thing: The underlines are largely superfluous. Few basic CL
tools I can think of use them.

  Data  Single4.01GB  2.16GB  1.87GB
  SystemDUP  16.00MB  4.00KB  7.99MB
  SystemSingle4.00MB0.00  4.00MB
  Metadata  DUP   6.00GB429.16MB  2.57GB
  Metadata  Single8.00MB0.00  8.00MB

   I think we need another column here, to indicate how much *actual*
disk space is used by each row, so adding up that column will give you
the Allocated value in the first clause. I think that's probably the
biggest cause of confusion. Raw alloc., maybe, and use the term
raw somewhere in the first clause to hammer the point home.

   My only concern here is that we're a bit too close to the existing
solution (albeit merging the two sets of output), which has proven
itself over time to be somewhat confusing. I think the Alloc_Raw
column is the minimum necessary to link the two in some easily
determinable way. Adding totals to Alloc_Raw, and Used (but not Free
or Alloc) would help, I think. I don't think it's useful to add them
to the Free or Alloc columns, because those figures change as the FS
allocates chunks, and we'll end up with people querying the fact that
the total of Free doesn't add up to any of the figures in the
summary.

   Say, something like this:

Summary_(Raw):
  Total:135.00 GiB
  Allocated: 10.51 GiB
  Unallocated:  124.49 GiB
  Free_(Estimated):  86.56 GiB
  Average_disk_efficiency:  62 %

Details:
  Chunk_type  ModeAlloc_Raw  Alloc  UsedFree
  DataSingle   4.01 GiB   4.01 GiB2.16 GiB  1.87 GiB
  System  DUP 32.00 MiB  16.00 MiB4.00 KiB  7.99 MiB
  System  Single   4.00 MiB   4.00 MiB0.00 B4.00 MiB
  MetadataDUP 12.00 GiB   6.00 GiB  429.16 MiB  2.57 GiB
  MetadataSingle   8.00 MiB   8.00 MiB0.00 B8.00 MiB
  Total   16.04 GiB   2.59 GiB

   The other thing is that there should be a switch (or possibly two)
to give highly machine-readable versions of the output -- no units
(units as bytes by default, with other units settable by a switch),
tab-separated, possibly a different option for each of the above
output clauses.

   Ultimately, I think the bikeshed should be turquoise.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Python is executable pseudocode; perl ---  
is executable line-noise.


signature.asc
Description: Digital signature


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Hugo Mills
   Hi, Goffredo,

On Fri, Sep 28, 2012 at 07:27:16PM +0200, Goffredo Baroncelli wrote:
 On 09/28/2012 10:58 AM, Hugo Mills wrote:
 On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote:
 On Thu, 27 Sep 2012 23:02:35 +0200
 Goffredo Baroncellikreij...@libero.it  wrote:
 
 [...]
[...]
 Details:
  Chunk-typeMode   AllocatedUsedFree
  --   -   -
[...]
  Data  Single4.01GB  2.16GB  1.87GB
  SystemDUP  16.00MB  4.00KB  7.99MB
  SystemSingle4.00MB0.00  4.00MB
  Metadata  DUP   6.00GB429.16MB  2.57GB
  Metadata  Single8.00MB0.00  8.00MB
 
 I think we need another column here, to indicate how much *actual*
 disk space is used by each row, so adding up that column will give you
 the Allocated value in the first clause. I think that's probably the
 biggest cause of confusion. Raw alloc., maybe, and use the term
 raw somewhere in the first clause to hammer the point home.
 
 I think that there is a little misunderstanding. We are saying the
 same thing. Only I call allocated what you call raw alloc

   OK, I think we need both. We need to indicate somewhere (in the
Details section in my version) both the total number of bits of rust
used and the amount of data stored. It's not good to ask the user to
know that they need to multiply/divide by two for certain storage
modes (or even more complicated for RAID-5/6). Somewhere, they will
find that values change twice as fast as they expect (or at half the
speed), and that causes problems. We need to find some way of
connecting the two in a way that makes it reasonably obvious where the
figures come from..

 My only concern here is that we're a bit too close to the existing
 solution (albeit merging the two sets of output), which has proven
 itself over time to be somewhat confusing. I think the Alloc_Raw
 column is the minimum necessary to link the two in some easily
 determinable way. Adding totals to Alloc_Raw, and Used (but not Free
 or Alloc) would help, I think. I don't think it's useful to add them
 to the Free or Alloc columns, because those figures change as the FS
 allocates chunks, and we'll end up with people querying the fact that
 the total of Free doesn't add up to any of the figures in the
 summary.
 
 Say, something like this:
 
 Summary_(Raw):
Total:135.00 GiB
Allocated: 10.51 GiB
Unallocated:  124.49 GiB
Free_(Estimated):  86.56 GiB
Average_disk_efficiency:  62 %
 
 Details:
Chunk_type  ModeAlloc_Raw  Alloc  UsedFree
DataSingle   4.01 GiB   4.01 GiB2.16 GiB  1.87 GiB
System  DUP 32.00 MiB  16.00 MiB4.00 KiB  7.99 MiB
System  Single   4.00 MiB   4.00 MiB0.00 B4.00 MiB
MetadataDUP 12.00 GiB   6.00 GiB  429.16 MiB  2.57 GiB
MetadataSingle   8.00 MiB   8.00 MiB0.00 B8.00 MiB
Total   16.04 GiB   2.59 GiB
 
 The other thing is that there should be a switch (or possibly two)
 to give highly machine-readable versions of the output -- no units
 (units as bytes by default, with other units settable by a switch),
 tab-separated, possibly a different option for each of the above
 output clauses.
 I fully Agree. But my first concern was about the wording (if fact
 even though we are saying the same thing you didn't understood me).
 
 Let me propose the following:
 
 Summary:
Disk_size:  135.00 GiB
Disk_allocated:  10.51 GiB
Disk_unallocated:   124.49 GiB
Used: 2.59 GiB
Free_(Estimated):91.93 GiB
Average_disk_efficiency:  70 %
 
 Details:
   Chunk-typeMode Disk-allocated Used   Available
   Data  Single4.01GB  2.16GB  1.87GB
   SystemDUP  16.00MB  4.00KB  7.99MB
   SystemSingle4.00MB0.00  4.00MB
   Metadata  DUP   6.00GB429.16MB  2.57GB
   Metadata  Single8.00MB0.00  8.00MB
 
 
 
 Where:
   Disk-allocated  - space used on the disk by the chunk
   Disk-size   - size of the disk
   Disk-unallocated- disk not used in any chunk
   Used- space used by the files/metadata

   The problem here is that if you're using raw storage, the Used
value in the second stanza grows twice as fast as the user expects. I
think this second stanza should at minimum include the cooked values
used in btrfs fi df, because those reflect the user's experience. Then
adding [some of?] the raw values you've got here to help connect the
values to the raw data in the first stanza of output.

   As I said above, it's the connection between I wrote a 1GiB file
to my

Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Hugo Mills
On Sat, Sep 29, 2012 at 12:02:23AM +0600, Roman Mamedov wrote:
 On Fri, 28 Sep 2012 18:44:07 +0200
 Goffredo Baroncelli kreij...@inwind.it wrote:
 
  This means that the ration of space physically allocated on the disk and 
  the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB 
  are available.
 
 You assume metadata allocation will always grow linearly with data, which is
 not true. So in my opinion it is not a good estimate.

   No, but it's the best model we have right now. (And probably about
the best model we will have, without knowledge of the future
intentions of the user). Without inlining file data, the metadata is
dominated by checksums, which is a linear relationship (approx
1000:1). With inlining file data, metadata is probably dominated by
inline data; assuming the ratio of small-to-large files on the FS
remains unchanged in future, a linear relationship also applies. For
general usage, I'm happy to assume that the current ratio of data to
metadata will remain largely unchanged over the lifetime of the FS.

   Why use underscores instead of spaces?
  
  Simplify the parsing in scripts
 
 I think it looks awkward and is not warranted since this is a primarily
 user-facing utility. Also none of the other similar tools shy from having
 spaces anywhere they need to, e.g.
 
 # mdadm --detail /dev/md0
 /dev/md0:
 Version : 1.2
   Creation Time : Wed May 25 00:07:38 2011
  Raid Level : raid5
  Array Size : 3907003136 (3726.01 GiB 4000.77 GB)
   Used Dev Size : 976750784 (931.50 GiB 1000.19 GB)
Raid Devices : 5
   Total Devices : 5
 Persistence : Superblock is persistent
 
   Intent Bitmap : Internal
 
 Update Time : Fri Sep 28 21:20:51 2012
   State : active 
  Active Devices : 5
 Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0
 
  Layout : left-symmetric
  Chunk Size : 64K
 
Name : avdeb:0  (local to host avdeb)
UUID : b99961fb:ed1f76c8:ec2dad31:6db45332
  Events : 14254
 
 Number   Major   Minor   RaidDevice State
7   8   170  active sync   /dev/sdb1
6   8   331  active sync   /dev/sdc1
3   8   652  active sync   /dev/sde1
4   8   493  active sync   /dev/sdd1
5   8   814  active sync   /dev/sdf1
 
 # lvdisplay 
   --- Logical volume ---
   LV Path/dev/alpha/lv1
   LV Namelv1
   VG Namealpha
   LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2
   LV Write Accessread/write
   LV Creation host, time , 
   LV Status  available
   # open 0
   LV Size3.52 TiB
   Current LE 115431
   Segments   3
   Allocation inherit
   Read ahead sectors auto
   - currently set to 4096
   Block device   252:0

   ... and I've always found those hard to deal with in scripts. :)

   (But they do have plumbing options, to use the git terminology,
so I'd be happy with having a parsable output option).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hey, Virtual Memory! Now I can have a *really big* ramdisk! ---   


signature.asc
Description: Digital signature


Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Hugo Mills
   Looks good. Only a few comments, inline.

On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote:
 $ ./btrfs filesystem df --help
 usage: btrfs filesystem disk-usage [-d][-s][-k] path [path..]
 
 Show space usage information for a mount point(s).
 
 -k  Set KB (1024 bytes) as unit
 -s  Don't show the summary section
 -d  Don't show the detail section

   These are kind of logical, but I think would be hard to remember
the right way round. I would suggest swapping the actions of the
switches, and rewording the help:

-s Show only summary section
-d Show only detail section

 $ ./btrfs filesystem df /
 Path: /
 Summary:
   Disk_size: 72.57GB

^ space between the value and the
  unit (as ISO says), throughout.
  This also makes it easier to
  parse, if anyone wants to.

   Also, use kB, MB, GB, TB for powers-of-ten based units, and KiB,
MiB, GiB, TiB for powers-of-two based units, please. I don't care
which you report in, but please do make the distinction. (And note
that it's kB with a lower case k, but KiB with an upper case K). This
brings us in line with the relevant ISO and IEEE standards.

   Disk_allocated:25.10GB
   Disk_unallocated:  47.48GB
   Logical_size:  23.06GB
   Used:  11.01GB
   Free_(Estimated):  55.66GB(Max: 59.52GB, Min: 35.78GB)
   Data_to_disk_ratio:   92 %
 
 Details:
   Chunk-type  Mode  Chunk-size Logical-sizeUsed
   DataSingle   21.01GB  21.01GB 10.34GB
   System  DUP  80.00MB  40.00MB  4.00KB
   System  Single4.00MB   4.00MB0.00
   MetadataDUP   4.00GB   2.00GB686.93MB
   MetadataSingle8.00MB   8.00MB0.00

   Why are the field headings here using - where the field headings in
the first section used _? Should you be using _ in both places?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- The most exciting phrase to hear in science, the one that ---
   heralds new discoveries,  is not Eureka!,   
  but That's funny...  


signature.asc
Description: Digital signature


Re: [PATCH][BTRFS-PROGS][V1] btrfs filesystem df

2012-10-03 Thread Hugo Mills
On Wed, Oct 03, 2012 at 06:17:53PM +0200, Goffredo Baroncelli wrote:
 On 10/03/2012 01:56 PM, Hugo Mills wrote:
 Looks good. Only a few comments, inline.
 
 On Wed, Oct 03, 2012 at 01:43:14PM +0200, Goffredo Baroncelli wrote:
[snip]
 Also, use kB, MB, GB, TB for powers-of-ten based units, and KiB,
 MiB, GiB, TiB for powers-of-two based units, please. I don't care
 which you report in, but please do make the distinction. (And note
 that it's kB with a lower case k, but KiB with an upper case K). This
 brings us in line with the relevant ISO and IEEE standards.
 
 I forgot to reply you when you raised this question the first time.
 Even though I am inclined to accept your suggestions, this change is
 not related to my patches. My code uses the functions print_sizes(),
 which is quite old (about 2008). This function is used in a lot of
 places. This suggested to address this issue with another patch.

   OK.

[snip]
 Why are the field headings here using - where the field headings in
 the first section used _? Should you be using _ in both places?
 
 2 persons highlighted that :-( ... I will update the code

   It's just a niggle, really, but it's an obvious one.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- There is no dark side to the Moon, really. As a matter of ---
  fact,  it's all dark.  


signature.asc
Description: Digital signature


Re: [PATCH 1/2] Update btrfs filesystem df command

2012-10-03 Thread Hugo Mills
On Wed, Oct 03, 2012 at 11:34:00PM +0300, Ilya Dryomov wrote:
 On Wed, Oct 03, 2012 at 07:22:31PM +0200, Goffredo Baroncelli wrote:
[snip]
  +static const char * const cmd_disk_free_usage[] = {
  +   btrfs filesystem df [-d|-s][-k] path [path..],
  +   Show space usage information for a mount point(s).,
  +   ,
  +   -k\tSet KB (1024 bytes) as unit,
  +   -s\tShow the summary section only,
  +   -d\tShow the detail section only,
  +   NULL
  +};
  +
  +static int cmd_disk_free(int argc, char **argv)
  +{
  +
  +   int flags=DF_SHOW_SUMMARY|DF_SHOW_DETAIL|DF_HUMAN_UNIT;
  +   int i, more_than_one=0;
  +
  +   optind = 1;
  +   while(1){
  +   charc = getopt(argc, argv, dsk);
  +   if(c0)
  +   break;
  +   switch(c){
  +   case 'd':
  +   flags = ~DF_SHOW_SUMMARY;
  +   break;
  +   case 's':
  +   flags = ~DF_SHOW_DETAIL;
  +   break;
  +   case 'k':
  +   flags = ~DF_HUMAN_UNIT;
  +   break;
  +   default:
  +   usage(cmd_disk_free_usage);
  +   }
  +   }
  +
  +   if( !(flags  (DF_SHOW_SUMMARY|DF_SHOW_DETAIL)) ){
  +   fprintf(stderr, btrfs filesystem df: it is not possible to 
  specify -s AND -d\n);
 
 This doesn't look right at all.  You are adding two switches and
 specifying both of them is an error?  A little too much for a command
 whose job is to do some basic math and pretty-print the result.
 
 How about displaying just the summary by default and then adding a
 *single* switch (-v or whatever) for summary+details?

   I'd prefer to see both sections by default. The reason for this is
that without both sections, people tend to get confused because they
don't know they're looking at half the story (e.g. some numbers change
twice as fast as they think they should).

   I think supplying both options should probably show both sections
again, and make it not an error to do so, but I'm happy either way.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- He's a nutcase, you know. There's no getting away from it -- ---  
 he'll end up with a knighthood 


signature.asc
Description: Digital signature


Re: Will RAID have issues with disks that spin down?

2012-10-04 Thread Hugo Mills
On Thu, Oct 04, 2012 at 10:36:43AM -0400, Ken D'Ambrosio wrote:
 Hi.  I know that several hardware RAID solutions have issues with
 disks that spin down when idle; the time to spin back up -- usually
 on the order of five seconds -- causes unhappy timeouts, etc.  I was
 wondering if that would be an issue with RAID a-la btrfs?

   I have (some of(*)) the disks in my 8-drive RAID-1 btrfs array set
to spin down after 10 minutes of no use. I've not had a problem with
it so far. So I'd say it's not an issue from my limited testing.

   Hugo.

(*) Damn you, Samsung!

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no.  3: Military Intelligence ---   


signature.asc
Description: Digital signature


Re: [PATCH] Fits: tool to parse stream

2012-10-13 Thread Hugo Mills
On Sat, Oct 13, 2012 at 09:02:27AM +0200, Arne Jansen wrote:
 On 10/12/12 15:32, Arne Jansen wrote:
 
  The idea of the btrfs send stream format was to generate it in a way that
  it is easy to receive on different platforms. Thus the proposed name FITS, 
  for
  Filesystem Incremental Backup Stream. We should also build the tools to
  receive the stream on different platforms.
 
 I meant to write 'Filesystem Incremental Transport Stream', but, as
 Andrey Kuzmin pointed out, the name FITS is already taken. As the
 'Backup' slipped in somehow, FIBS might be an alternative. Any
 thoughts?

   Fibs is a slang term for lies. Probably not ideal.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Strive for apathy! ---


signature.asc
Description: Digital signature


Re: Can not Mount btrfs No Space left

2012-10-15 Thread Hugo Mills
On Mon, Oct 15, 2012 at 03:52:15PM -0400, Shawn Dakin wrote:
 I have a btrfs volume that will not mount due to No space on device
 I would gladly free up some space if I could only mount the volume.
 Does anyone have a trick to getting this volume back up and running?
 Any help would be great!!

   Start with a 3.6 kernel (which has lots of ENOSPC fixes in it). Try
mounting with -o ro, which won't allow you to modify anything but may
show you if the FS is at least mountable in that state (and will give
you the capability to copy the data elsewhere in extremis). If you're
lucky, that mount may then allow you to mount it again without the -o
ro.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no. 6: Mature Student ---   


signature.asc
Description: Digital signature


Re: device delete, error removing device

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 12:02:08AM -0600, Chris Murphy wrote:
 
 On Oct 21, 2012, at 10:32 PM, Chris Murphy li...@colorremedies.com wrote:
 
  This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 
  17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 
 Probably not a good idea to omit this is a beta *test candidate* not a beta. 
 
 Two things that make this possibly not realistic:
 
 1. The virtual disks are obviously very small, 3GB each with the 4th one only 
 12GB.
 
 2. The original 3 device volume was ~97% full with a single large file prior 
 to adding the 4th device. Approximately 313MB free space remained on the 
 volume.

   I'm not entirely sure what's going on here(*), but it looks like an
awkward interaction between the unequal sizes of the devices, the fact
that three of them are very small, and the RAID-0/RAID-1 on
data/metadata respectively.

   You can't relocate any of the data chunks, because RAID-0 requires
at least two chunks, and all your data chunks are more than 50% full,
so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB
chunk on the remaining space on the small disk, which is the only way
it could proceed.

   You _may_ be able to get some more success by changing the data to
single:

# btrfs balance start -dconvert=single /mountpoint

   You may also possibly be able to reclaim some metadata space with:

# btrfs balance start -m /mountpoint

but I think that's unlikely.

   Hugo.

(*) It may be an as-yet-undiscovered reservation problem, in which
case you get to see Josef scream loudly and hide under his desk,
gibbering.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- If it ain't broke,  hit it again. ---


signature.asc
Description: Digital signature


Re: RAID 5/6

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 10:58:07AM -0500, Michael wrote:
 Does anyone know when RAID 5/6 are planned to be included in the
 Kernel?

   This is in the FAQ:

https://btrfs.wiki.kernel.org/index.php/FAQ#Can_I_use_RAID.5B56.5D_on_my_Btrfs_filesystem.3F

   Short answer: Not yet, probably soon.

 I am starting to buy parts for my next computer and would very
 much like to use BTRFS because I want a FS that can grow and also
 recover from undetected read errors - it will be large enough that
 these are possible. I'm hoping that it will be available for use in
 the coming months.

   You can switch storage types on the fly, so you could at least
start with RAID-1, and then restripe to RAID-5 (or -6) when it's
stable enough for you. This assumes that you can manage to use RAID-1
in the first place and expand later.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- There's more than one way to do it is not a commandment. It ---  
   is a dire warning.


signature.asc
Description: Digital signature


Re: device delete, error removing device

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 10:42:18AM -0600, Chris Murphy wrote:
 Thanks for the response Hugo,
 
 On Oct 22, 2012, at 3:19 AM, Hugo Mills h...@carfax.org.uk wrote:
 
I'm not entirely sure what's going on here(*), but it looks like an
  awkward interaction between the unequal sizes of the devices, the fact
  that three of them are very small, and the RAID-0/RAID-1 on
  data/metadata respectively.
 
 I'm fine accepting the devices are very small and the original file system 
 was packed completely full: to the point this is effectively sabotage. 
 
 The idea was merely to test how a full (I was aiming more for 90%, not 97%, 
 oops) volume handles being migrated to a replacement disk, which I think for 
 a typical user would be larger not the same, knowing in advance that not all 
 of the space on the new disk is usable. And I was doing it at a one order 
 magnitude reduced scale for space consideration.
 
 
You can't relocate any of the data chunks, because RAID-0 requires
  at least two chunks, and all your data chunks are more than 50% full,
  so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB
  chunk on the remaining space on the small disk, which is the only way
  it could proceed.
 
 Interesting. So the way device delete moves extents is not at all similar 
 to how LVM pvmove moves extents, which is unidirectional (away from the 
 device being demoted). My, seemingly flawed, expectation was that device 
 delete would cause extents on the deleted device to be moved to the newly 
 added disk.

   It's more like a balance which moves everything that has some (part
of its) existence on a device. So when you have RAID-0 or RAID-1 data,
all of the related chunks on other disks get moved too (so in RAID-1,
it's the mirror chunk as well as the chunk on the removed disk that
gets rewritten).

 If I add yet another 12GB virtual disk, sdf, and then attempt a delete, it 
 works, no errors. Result:
 [root@f18v ~]# btrfs device delete /dev/sdb /mnt
 [root@f18v ~]# btrfs fi show
 failed to read /dev/sr0
 Label: none  uuid: 6e96a96e-3357-4f23-b064-0f0713366d45
   Total devices 5 FS bytes used 7.52GB
   devid5 size 12.00GB used 4.17GB path /dev/sdf
   devid4 size 12.00GB used 4.62GB path /dev/sde
   devid3 size 3.00GB used 2.68GB path /dev/sdd
   devid2 size 3.00GB used 2.68GB path /dev/sdc
   *** Some devices missing
 
 However, I think that last line is a bug. When I
 
 [root@f18v ~]# btrfs device delete missing /mnt
 
 I get
 
 [ 2152.257163] btrfs: no missing devices found to remove
 
 So they're missing but not missing?

   If you run sync, or wait for 30 seconds, you'll find that fi show
shows the correct information again -- btrfs fi show reads the
superblocks directly, and if you run it immediately after the dev del,
they've not been flushed back to disk yet.

  btrfs balance start -dconvert=single /mountpoint

 Yeah that's perhaps a better starting point for many regular Joe
 users setting up a multiple device btrfs volume, in particular where
 different sized disks can be anticipated.

   I think we should probably default to single on multi-device
filesystems, not RAID-0, as this kind of problem bites a lot of
people, particularly when trying to drop the second disk in a pair.

   In similar vein, I'd suggest that an automatic downgrade from
RAID-1 to DUP metadata on removing one device from a 2-device array
should also be done, but I suspect there's some good reasons for not
doing that, that I've not thought of. This has also bitten a lot of
people in the past.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- There's more than one way to do it is not a commandment. It ---  
   is a dire warning.


signature.asc
Description: Digital signature


Re: device delete, error removing device

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 01:36:31PM -0600, Chris Murphy wrote:
 On Oct 22, 2012, at 11:18 AM, Hugo Mills h...@carfax.org.uk wrote:
  
It's more like a balance which moves everything that has some (part
  of its) existence on a device. So when you have RAID-0 or RAID-1 data,
  all of the related chunks on other disks get moved too (so in RAID-1,
  it's the mirror chunk as well as the chunk on the removed disk that
  gets rewritten).
 
 Does this mean device delete depends on an ability to make writes
 to the device being removed? I immediately think of SSD failures,
 which seem to fail writing, while still being able to reliably read.
 Would that behavior inhibit the ability to remove the device from
 the volume?

   No, the device being removed isn't modified at all. (Which causes
its own set of weird problemettes, but I think most of those have gone
away).

  [ 2152.257163] btrfs: no missing devices found to remove
  
  So they're missing but not missing?
  
If you run sync, or wait for 30 seconds, you'll find that fi show
  shows the correct information again -- btrfs fi show reads the
  superblocks directly, and if you run it immediately after the dev del,
  they've not been flushed back to disk yet.

 Even after an hour, btrfs fi show says there are missing devices.
 After mkfs.btrfs on that missing device, 'btrfs fi show' no longer
 shows the missing device message.

   Hmm. Someone had this on IRC yesterday. It sounds like something's
not properly destroying the superblock(s) on the removed device.

I think we should probably default to single on multi-device
  filesystems, not RAID-0, as this kind of problem bites a lot of
  people, particularly when trying to drop the second disk in a pair.
 
 I'm not thinking of an obvious advantage raid0 has over single other
 than performance. It seems the more common general purpose use case
 is better served by single, especially the likelihood of volumes
 being grown with arbitrary drive capacities.

   Indeed.

 I found this [1] thread discussing a case where a -d single volume
 is upgraded to the raid0 profile. I'm not finding this to be the
 case when trying it today. mkfs.btrfs on 1 drive, then adding a 2nd
 drive, produces:
 Data: total=8.00MB, used=128.00KB
 System, DUP: total=8.00MB, used=4.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=409.56MB, used=24.00KB
 Metadata: total=8.00MB, used=0.00

 This appears to retain the single profile. This is expected at this
 point? What I find a bit problematic is that metadata is still DUP
 rather than being automatically upgraded to raid1.

   Yes, the automatic single - RAID-0 upgrade was fixed. If you
haven't run a balance on (at least) the metadata after adding the new
device, then you won't get the DUP - RAID-1 upgrade on metadata. (I
can tell you haven't run the balance, because you still have the empty
single metadata chunk).

 What is the likelihood of a mkfs.btrfs 2+ device change in the
 default data profile from raid0 to single?

   Non-zero. I think it mostly just wants someone to write the patch,
and then beat off any resulting bikeshedding. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  


signature.asc
Description: Digital signature


Re: Naming of subvolumes

2012-10-25 Thread Hugo Mills
On Thu, Oct 25, 2012 at 01:30:20PM +0100, Richard Hughes wrote:
 I'm planning to use btrfs subvolume snapshot -r name in the system
 upgrade functionality[1] if the user is using btrfs for their root file
 system. We've got most of the bits in place already for Fedora 18.
 
 One think that confuses me is the convention for the naming of
 snapshots. Is there any conventions or prior art there? Can I add
 metadata to the snapshot so that I don't have encode everything in the
 snapshot name itself?

   How about user xattrs? IIRC, that's the user.* namespace.

   The only convention I'm aware of is Ubuntu's use of an @
substitution, where the subvolume to be mounted at / is called @, and
the subvolume to be mounted at /home becomes @home. Both of those
subvolumes are stored in the (otherwise empty) top-level of the
filesystem, which is not mounted in normal operation.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- This chap Anon is writing some perfectly lovely stuff ---  
 at the moment.  


signature.asc
Description: Digital signature


Re: How does btrfs behave on checksum mismatch?

2012-10-27 Thread Hugo Mills
On Sat, Oct 27, 2012 at 09:56:45PM +, Michael Kjörling wrote:
 I came across the tidbit that ZFS has a contract guarantee that the
 data read back will either be correct (the checksum computed over the
 data read from the disk matches the checksum stored on disk), or you
 get an I/O error. Obviously, this greatly reduces the probability that
 the data is invalid. (Particularly when taken in combination with the
 disk firmware's own ECC and checksumming.)
 
 With the default options, does btrfs make any similar guarantees? If
 not, then are there any options to force it to make such guarantees?

   It does indeed do the same thing: if the checksum doesn't match the
block, then the alternative block is read (if one exists, e.g. RAID-1,
RAID-10). If that does not exist, or also has a checksum failure, then
EIO is returned.

   Hugo.

 I'm interested in this both from a specification and an implementation
 point of view.
 
 The last thing anyone wants is probably undetected bit rot, and with
 today's large drives, even with the quite low bit rot numbers it can
 be a real concern. If even the act of simply successfully reading a
 file guarantees, to the extent of the checksumming algorithm's ability
 to detect changes, that the data read is the same as was once written,
 that would be a major selling point for btrfs for me personally.
 
 The closest I was able to find was that btrfs uses crc32c currently
 for data and metadata checksumming and that this can be turned off if
 so desired (using the nodatasum mount option), but nothing about
 what the file system code does or is supposed to do in the face of a
 checksum mismatch.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- It used to take a lot of talent and a certain type of ---  
upbringing to be perfectly polite and have filthy manners
at the same time. Now all it needs is a computer.


signature.asc
Description: Digital signature


Re: [RFC] New attempt to a better btrfs fi df

2012-10-27 Thread Hugo Mills
On Sun, Oct 28, 2012 at 12:30:44AM +0200, Martin Steigerwald wrote:
 Am Samstag, 27. Oktober 2012 schrieb Michael Kjörling:
  On 27 Oct 2012 18:43 +0200, from mar...@lichtvoll.de (Martin 
 Steigerwald):
   Possibly this could be done tabular as well, like:
   
   vdb   vdc vdd
  
   Data, RAID 0  307,25MB307,25MB307,25MB
   …
   System,RAID1  -   8MB 8MB
   …
   Unused2,23GB  2,69GB  2,24GB
  
   
  
   I like this. But what if the filesystem has 100 disks?
  
  Maybe I'm just not familiar enough with btrfs yet to punch an
  immediate hole in the idea, but how about pivoting that table? Columns
  for data values (data, raid 0, system, raid 1, unused, ...) and
  rows for the underlying devices? Something like this, copying the
  numbers from your example. And I'm using colon here rather than comma,
  because I believe that it better captures the intent.
  
 Data: RAID 0   System: RAID 1   Unused
  /dev/vdb 307.25 MB-2.23 GB
  /dev/vdc 307.25 MB 8 MB2.69 GB
  /dev/vdd 307.25 MB 8 MB2.24 GB
    ==   
  TOTAL921.75 MB16 MB7.16 GB
 
 Hmmm, good idea. I like it this way around.
 
 It would scale better with the number of drives and there is a good way to 
 place the totals.
 
 I wonder about how to possibly include the used part of each tree. With 
 mostly 5 columns it might be doable.

   Note that this could get arbitrarily wide in the presence of the
(planned) per-object replication config. Otherwise, it works. The
width is probably likely to grow more slowly than the length, though,
so this way round is probably the better option. IMO. Eggshell blue is
good enough. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Some days,  it's just not worth gnawing through the straps. ---   


signature.asc
Description: Digital signature


Re: How does btrfs behave on checksum mismatch?

2012-10-28 Thread Hugo Mills
On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote:
 Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson:
  In a raid1 situation, it will also rewrite the effected data, on the
  drive that failed the checksum
 
 Will it do so without an explicit scrub?

   If a failed checksum is detected, yes.

   If there's a bad block, and the FS happens to read the good copy
first, it won't fix it, because it hasn't tried reading the bad copy
yet.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: How does btrfs behave on checksum mismatch?

2012-10-28 Thread Hugo Mills
On Sun, Oct 28, 2012 at 02:36:24PM +0100, Martin Steigerwald wrote:
 Am Sonntag, 28. Oktober 2012 schrieb Hugo Mills:
  On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote:
   Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson:
In a raid1 situation, it will also rewrite the effected data, on
the drive that failed the checksum
   
   Will it do so without an explicit scrub?
  
 If a failed checksum is detected, yes.
  
 If there's a bad block, and the FS happens to read the good copy
  first, it won't fix it, because it hasn't tried reading the bad copy
  yet.
 
 Ah, okay. I think I read some while ago in a case of bad checksum detected 
 it won´t repair automatically. Has this been changed?

   It was changed some time ago -- the kernel release after scrub went
in, IIRC.

 Anyway, a regular scrub still makes sense, as BTRFS only reads files that 
 applications demand and BTRFS may read from a good copy as you pointed 
 out.

   Indeed. I have a cron job in /etc/cron.monthy for my main FS to do
just that.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: Old (almost 2 years) btrfs failed fs. Parent transid failure. Can it be fixed ?

2012-10-29 Thread Hugo Mills
On Mon, Oct 29, 2012 at 01:33:13PM +0100, Tomasz Torcz wrote:
 On Mon, Oct 29, 2012 at 01:22:59PM +0100, Tommy Jonsson wrote:
  Hi, i have an old btrfs file-system that crashed on a power-failure
  for about 2 years ago.
  
  i have clone the
  git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
  (at 2012-10-29) and compiled the tools.
 
   I think you need to get branch dangerdonoteveruse to get real fsck code.

   That's a very outdated piece of advice. That code is now in
mainline btrfs-progs, and has been since March. Tommy has the correct
and up-to-date version of the progs.

  sudo mount -t btrfs /dev/sda /mnt/disk/
 
  Could you try with -o recovery?

   That's worth a try as a first step.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- The last man on Earth sat in a room.  Suddenly, there was a ---   
   knock at the door.


signature.asc
Description: Digital signature


Re: How to find (out if) files sharing content?

2012-10-30 Thread Hugo Mills
On Tue, Oct 30, 2012 at 04:20:05PM +0100, Gábor Nyers wrote:
 Hi,
 
 How could one find out if 2 files share any extents on a btrfs file system?
 
 A more generic variation of the above: How to list files on the same
 file system/subvolume sharing content?

   You have direct (read-only) access to the metadata trees through
the TREE_SEARCH ioctl. It should be possible to walk through the
extents of a given file, and (I think) follow back-refs from the
extent back to the other files that share it.

   There's no simple code to do that right now, though.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- And what rough beast,  its hour come round at last / slouches ---  
 towards Bethlehem,  to be born? 


signature.asc
Description: Digital signature


Re: Why btrfs inline small file by default?

2012-10-30 Thread Hugo Mills
On Wed, Oct 31, 2012 at 05:40:25AM +0800, ching wrote:
 On 10/30/2012 08:17 PM, cwillu wrote:
  If there is a lot of small files, then the size of metadata will be
  undesirable due to deduplication
 
  Yes, that is a fact, but if that really matters depends on the use-case
  (e.g., the small files to large files ratio, ...). But as btrfs is designed
  explicitly as a general purpose file system, you usually want the good
  performance instead of the better disk-usage (especially as disk space 
  isn't
  expensive anymore).
  As I understand it, in basically all cases the total storage used by
  inlining will be _smaller_, as the allocation doesn't need to be
  aligned to the sector size.
 
 
 if i have 10G small files in total, then it will consume 20G by default.

   If those small files are each 128 bytes in size, then you have
approximately 80 million of them, and they'd take up 80 million pages,
or 320 GiB of total disk space.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I always felt that as a C programmer, I --- 
 was becoming typecast.  


signature.asc
Description: Digital signature


Re: Why btrfs inline small file by default?

2012-10-30 Thread Hugo Mills
On Tue, Oct 30, 2012 at 10:14:12PM +, Hugo Mills wrote:
 On Wed, Oct 31, 2012 at 05:40:25AM +0800, ching wrote:
  On 10/30/2012 08:17 PM, cwillu wrote:
   If there is a lot of small files, then the size of metadata will be
   undesirable due to deduplication
  
   Yes, that is a fact, but if that really matters depends on the use-case
   (e.g., the small files to large files ratio, ...). But as btrfs is 
   designed
   explicitly as a general purpose file system, you usually want the good
   performance instead of the better disk-usage (especially as disk space 
   isn't
   expensive anymore).
   As I understand it, in basically all cases the total storage used by
   inlining will be _smaller_, as the allocation doesn't need to be
   aligned to the sector size.
  
  
  if i have 10G small files in total, then it will consume 20G by default.
 
If those small files are each 128 bytes in size, then you have
 approximately 80 million of them, and they'd take up 80 million pages,
 or 320 GiB of total disk space.

   Sorry, to make that clear -- I meant if they were stored in Data.
If they're inlined in metadata, then they'll take approximately 20 GiB
as you claim, which is a lot less than the 320 GiB they'd be if
they're not.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I always felt that as a C programmer, I --- 
 was becoming typecast.  


signature.asc
Description: Digital signature


Re: [Request for review] [RFC] Add label support for snapshots and subvols

2012-11-01 Thread Hugo Mills
On Fri, Nov 02, 2012 at 05:28:01AM +0700, Fajar A. Nugraha wrote:
 On Fri, Nov 2, 2012 at 5:16 AM, cwillu cwi...@cwillu.com wrote:
   btrfs fi label -t /btrfs/snap1-sv1
  Prod-DB-sand-box-testing
 
  Why is this better than:
 
  # btrfs su snap /btrfs/Prod-DB /btrfs/Prod-DB-sand-box-testing
  # mv /btrfs/Prod-DB-sand-box-testing /btrfs/Prod-DB-production-test
  # ls /btrfs/
  Prod-DB  Prod-DB-production-test
 
 
 ... because it would mean possibilty to decouple subvol name from
 whatever-data-you-need (in this case, a label).
 
 My request, though, is to just implement properties, and USER
 properties, like what we have in zfs. This seems to be a cleaner,
 saner approach. For example, this is on Ubutu + zfsonlinux:
 
 # zfs create rpool/u
 # zfs set user:label=Some test filesystem rpool/u
 # zfs get creation,user:label rpool/u
 NAME PROPERTYVALUE  SOURCE
 rpool/u  creationFri Nov  2  5:24 2012  -
 rpool/u  user:label  Some test filesystem   local

   Don't we already have an equivalent to that with user xattrs?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  


signature.asc
Description: Digital signature


Re: no space left on device.

2012-11-02 Thread Hugo Mills
On Fri, Nov 02, 2012 at 10:54:47AM -0500, Kyle Gates wrote:
  So I have ended up in a state where I can't delete files with rm.
 
  the error I get is no space on device. however I'm not even close to empty.
  /dev/sdb1 38G 27G 9.5G 75%
  there is about 800k files/dirs in this filesystem
 
  extra strange is that I can in another directory create and delete files.
 
  So I tried pretty much all I could google my way to but problem
  persisted. So I decided to do a backup and a format. But when the backup
  was done I tried one more time and now it was possible to delete the
  directory and all content?
 
  using the 3.5 kernel in ubuntu 12.10. Is this a known issue ? is it
  fixed in later kernels?
 
  fsck /btrfs scrub and kernel log. nothing indicate any problem of any kind.
 
 
 First let's see the output of:
 btrfs fi df /mountpoint
 
 You're probably way over allocated in metadata so a balance should help:
 btrfs bal start -m /mountpoint
 or omit the -m option to run a full balance.

   Or, better, -musage=5 (or 1), which will do even less work.

... but let's see the btrfs fi df output first. Could you also add the
output of btrfs fi show (no parameters), please?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You're never alone with a rubber duck... --- 


signature.asc
Description: Digital signature


Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df

2012-11-02 Thread Hugo Mills
On Fri, Nov 02, 2012 at 07:05:37PM +, Gabriel wrote:
 On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote:
  On 2012-11-02 12:18, Martin Steigerwald wrote:
  Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB
  in total. I understand the logic behind this, but this could be a bit
  confusing.
  
  But it makes sense: Showing real allocation on device level makes
  sense,
  cause thats what really allocated on disk. Total makes some sense,
  cause thats what is being used from the tree by BTRFS.
  
  Yes, me too. At the first I was confused when you noticed this
  discrepancy. So I have to admit that it is not so obvious to understand.
  However we didn't find any way to make it more clear...
  
  It still looks confusing at first…
  We could use Chunk(s) capacity instead of total/size ? I would like an
  opinion from a english people point of view..
 
 This is easy to fix, here's a mockup:
 
 Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2
/dev/dm-03.50GB

   I've not considered the full semantics of all this yet -- I'll try
to do that tomorrow. However, I note that the ×2 here could become
non-integer with the RAID-5/6 code (which is due Real Soon Now). In
the first RAID-5/6 code drop, it won't even be simple to calculate
where there are different-sized devices in the filesystem. Putting an
exact figure on that number is potentially going to be awkward. I
think we're going to need kernel help for working out what that number
should be, in the general case.

   Again, I'm raising minor points based on future capabilities, but I
feel it's worth considering them at this stage, even if the correct
answer is yes, we'll do this now, and deal with any other problems
later.

   Hugo.

Data   Metadata MetadataSystem System  
Single Single   DUP Single DUP Unallocated

 /dev/dm-16 1.31TB   8.00MB  56.00GB4.00MB  16.00MB   0.00
==  === == === ===
 Total  1.31TB   8.00MB  28.00GB ×2 4.00MB   8.00MB ×20.00
 Used   1.31TB 0.00   5.65GB ×2   0.00 152.00KB ×2
 
 Also, I don't know if you could use libblkid, but it finds more 
 descriptive names than dm-NN (thanks to some smart sorting logic).
 
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- My doctor tells me that I have a malformed public-duty gland, ---  
and a natural deficiency in moral fibre. 


signature.asc
Description: Digital signature


Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df

2012-11-02 Thread Hugo Mills
On Fri, Nov 02, 2012 at 11:23:14PM +, Gabriel wrote:
 On Fri, 02 Nov 2012 22:06:04 +, Hugo Mills wrote:
 
  On Fri, Nov 02, 2012 at 07:05:37PM +, Gabriel wrote:
  On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote:
   On 2012-11-02 12:18, Martin Steigerwald wrote:
   Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB
   in total. I understand the logic behind this, but this could be a bit
   confusing.
   
   But it makes sense: Showing real allocation on device level makes
   sense,
   cause thats what really allocated on disk. Total makes some sense,
   cause thats what is being used from the tree by BTRFS.
   
   Yes, me too. At the first I was confused when you noticed this
   discrepancy. So I have to admit that it is not so obvious to understand.
   However we didn't find any way to make it more clear...
   
   It still looks confusing at first…
   We could use Chunk(s) capacity instead of total/size ? I would like an
   opinion from a english people point of view..
  
  This is easy to fix, here's a mockup:
  
  Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2
 /dev/dm-03.50GB
  
 I've not considered the full semantics of all this yet -- I'll try
  to do that tomorrow. However, I note that the ×2 here could become
  non-integer with the RAID-5/6 code (which is due Real Soon Now). In
  the first RAID-5/6 code drop, it won't even be simple to calculate
  where there are different-sized devices in the filesystem. Putting an
  exact figure on that number is potentially going to be awkward. I
  think we're going to need kernel help for working out what that number
  should be, in the general case.
 
 DUP can be nested below a device because it represents same-device
 redundancy (purpose: survive smudges but not device failure).
 
 On the other hand raid levels should occupy the same space on all
 linked devices (a necessary consequence of the guarantee that RAID5
 can survive the loss of any device and RAID6 any two devices).

   No, the multiplier here is variable. Consider:

1 MiB stored in RAID-5 across 3 devices takes up 1.5 MiB -- multiplier ×1.5
   (1 MiB over 2 devices is 512 KiB, plus an additional 512 KiB for parity)
1 MiB stored in RAID-5 across 6 devices takes up 1.2 MiB -- multipler ×1.2
   (1 MiB over 5 devices is 204.8 KiB, plus an additional 204.8 KiB for parity)

   With the (initial) proposed implementation of RAID-5, the
stripe-width (i.e. the number of devices used for any given chunk
allocation) will be *as many as can be allocated*. Chris confirmed
this today on IRC. So if I have a disk array of 2T, 2T, 2T, 1T, 1T,
1T, then the first 1T of allocation will stripe across 6 devices,
giving me 5 data+1 parity, or a multiplier of ×1.2. As soon as the
smaller devices are full, the stripe width will drop to 3 devices, and
we'll be using 2 data+1 parity allocation, or a multiplier of ×1.5 for
any subsequent chunks. So, as more data over the first 5T is stored,
the multiplier steadily decreases, until we fill the FS, and we get a
multiplier of ×1.35 overall. This gets more complicated if you have
devices of many different sizes. (Imagine 6 disks with sizes 500G, 1T,
1.5T, 2T, 3T, 3T).

   We probably can work out the current RAID overhead and feed it back
sensibly, but it's (a) not constant as the allocation of the chunks
increases, and (b) not trivial to compute.

 The two probably won't need to be represented at the same time
 except during a reshape, because I imagine DUP gets converted to
 RAID (1 or 5) as soon as the second device is added.
 
 A 1→2 reshape would look a bit like this (doing only the data column
 and skipping totals):
 
 InitialDevice
   Reserved   1.21TB
   Used   1.21TB
 RAID1(InitialDevice, SecondDevice)
   Reserved   1.31TB + 100GB
   Used 2× 100GB
 
 RAID5, RAID6: same with fractions, n+1⁄n and n+2⁄n.

   Except that n isn't guaranteed to be constant. That was pretty much
my only point. Don't assume that it will be (or at the very least, be
aware that you are assuming it is, and be prepared for inconsistencies).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Well, sir, the floor is yours.  But remember, the ---
  roof is ours!  


signature.asc
Description: Digital signature


Re: How does btrfs handle sudden shutdowns?

2012-11-06 Thread Hugo Mills
On Tue, Nov 06, 2012 at 12:33:08PM +, Michael Kjörling wrote:
 Can btrfs deal reasonably gracefully with sudden shutdowns? (I'm
 mainly thinking of power outages which lead to logical structure
 damage but not physical media damage.)

   In theory (i.e. by the design of the FS), you should be able to
pull the plug on btrfs at any point, and the FS will always be
consistent.

   This makes some assumptions: That writing a single page to the FS
is atomic. That the hardware reports barriers to the OS reliably. i.e.
if the hardware says it's fully stored data without losing it, then it
actually has.

   There are also some caveats: while the FS should always be
consistent, the latest transaction write may not have been completed,
so you could potentially lose up to 30 seconds of writes to the FS
from immediately before the crash.

   If the FS does corrupt over a power failure, and the hardware can
be demonstrated to be good, then we have a bug that needs to be
tracked down. (There have been a number of these over the development
of the FS so far, but they do get fixed).

 What would be the risk points, file-system-wise?
 
 Can for example a rotating snapshot schedule mitigate some or all
 issues relating to sudden shutdowns, if any? (_For example_, take a
 snapshot every minute, keeping the last five; if the main file system
 fails to mount, then could the most recent usable snapshot be used as
 a fallback, or is it likely to be equally damaged or inconsistent?)

   No, snapshots give you no additional guarantees -- if the FS
corrupts and is unmountable, a snapshot is part of the same FS and
will also be unmountable.

 Obviously a UPS or other form of fallback power is preferable to no
 UPS if power outages are a concern, so as to allow a controlled system
 shutdown (or fail-over to a more long-term backup power supply) in the
 event of a prolonged power outage, but I'm wondering about situations
 where such don't exist or even fail.

   As I said above, the FS structures _should_ be completely reliable
in the face of power loss; that they haven't been in the past is
definitely a bug, and those bugs have been / are being fixed as
they're found. We've had very few transid match failures recently,
which used to be the main failure mode for these bugs. I don't know
whether that's because people aren't reporting them, or because
they're not happening nearly so often these days. I suspect the
latter.

   I guess the question for you is: are you after the _expected_
behaviour of the FS (should always be consistent on good hardware, but
you may lose up to 30 seconds of writes), or are you after mitigation
strategies in the face of FS bugs (keep off-site backups and be
prepared to use them)?

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- emacs:  Eighty Megabytes And Constantly Swapping. ---


signature.asc
Description: Digital signature


<    1   2   3   4   5   6   7   8   9   10   >