Re: snapshot deletion / unmount slowness

2013-03-10 Thread Liu Bo
On Mon, Mar 11, 2013 at 12:11:43PM +0600, Roman Mamedov wrote:
> On Sun, 10 Mar 2013 22:31:08 -0700
> Michael Johnson - MJ  wrote:
> 
> > What I now suspect is going on is that while deleting the snapshots
> > was quick, that probably kicks of a background thread which actually
> > does the heavy lifting.
> 
> Exactly that, the snapshot deletion only "syncs" on unmount, there is no
> other way to ensure it is complete.
> 
> If you have some patience and let it unmount properly and then remount it, you
> may find that you have gained much more free space, due to all the snapshots
> being actually deleted and the space they were occupying freed only just now.

A recent commit(commit fa6ac8765c48a06dfed914e8c8c3a903f9d313a0
Btrfs: fix cleaner thread not working with inode cache option)
may improve the situation.

You may want to try it.

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: snapshot deletion / unmount slowness

2013-03-10 Thread Roman Mamedov
On Sun, 10 Mar 2013 22:31:08 -0700
Michael Johnson - MJ  wrote:

> What I now suspect is going on is that while deleting the snapshots
> was quick, that probably kicks of a background thread which actually
> does the heavy lifting.

Exactly that, the snapshot deletion only "syncs" on unmount, there is no
other way to ensure it is complete.

If you have some patience and let it unmount properly and then remount it, you
may find that you have gained much more free space, due to all the snapshots
being actually deleted and the space they were occupying freed only just now.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


snapshot deletion / unmount slowness

2013-03-10 Thread Michael Johnson - MJ
I currently have a btrfs filesystem that I am unmounting and it has
been has been "unmounting" for the last 20 minutes.

I'm pretty sure I know exactly what is going on and in my current
situation it's not a huge issues, but it would be a problem if this
was a production system and I was trying to do a maintenance.

Here is how I got into this situation:

I am migrating my data from one pair of disks (mirrored with btrfs) to
another pair of disks.  I rsync'd my data from the original btrfs file
system to the other.  When it completed, my new filesystem showed
165GB used. The original show 1.8TB used.  I came to the conclusion
that it must be the daily snapshots I have that were using the
majority of the space and because I was going to destroy the
filesystem, I decided, what the heck, let me destroy the snapshots and
see what it looks like.

To my surprise, removing all the snapshots resulted in the usage
dropping from 1.8TB to 1.7TB.  I re-ran my rsync, it complete without
transferring any new data.  I then did a du -s in the mountpoint for
the original filesystem and is reported back 165GB which agrees with
what rsync and df on the new filesystem reports.

My first thought was that I must have some sort of bizarre corruption
on the original filesystem.  And then I went to unmount it and it
still has not returned.

What I now suspect is going on is that while deleting the snapshots
was quick, that probably kicks of a background thread which actually
does the heavy lifting.  I noticed a btrfs-cleaner process that was in
an io wait state, which I presumed was the process in question.
However, now 40 minutes later, my unmount is still hung and the
btrfs-cleaner process is sleeping, so perhaps I am wrong.

At this point I am going to powercycle my system, but I figured I
would check and see if anyone else knew for certain it this was the
type of behavior one would expect to see when removing large snapshots
and then immediately trying to unmount the filesystem.  If so, it
seems like this is something that would need to change before someone
would want to seriously consider using btrfs w/ snapshots in a
production environment.  I know btrfs is not considered production
ready yet (well, at least not by the developers, regardless of what
Oracle and Suse say).  At the same time, I've not been able to find
any mention of similar problems, so I figured it was worth mentioning.

--
Michael Johnson - MJ
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfstests: 301: sparse copy between different filesystems/mountpoints on btrfs

2013-03-10 Thread Eric Sandeen
On 3/10/13 6:03 PM, Dave Chinner wrote:
> On Sat, Mar 09, 2013 at 06:24:47PM -0600, Eric Sandeen wrote:
>> On 1/18/13 3:48 PM, Koen De Wit wrote:
>>> +}
>>> +
>>> +_scratch_mount
>>> +_create_reflinks_to $TESTDIR2
>>> +_scratch_unmount
>>> +
>>> +mount $TEST_DEV $SCRATCH_MNT
>>> +_create_reflinks_to $TESTDIR3
>>> +umount $SCRATCH_MNT
>>
>> TBH this confuses me, not that it's necessarily wrong (?)
>> You mount TEST_DEV on $SCRATCH_MNT which makes my brain hurt a little.
>> Then _create_reflinks_to $TESTDIR3 and at that point, um, what's going on,
>> what's linking what to where?
> 
> Mounting the TEST_DEV on SCRATCH_MNT is almost always a bad thing to
> do. The test harness expects TEST_DEV to be mounted on TEST_DIR, not
> anywhere else.
> 
> If you need multiple scratch filesystems to test cross-device
> linkage errors, use loopback devices or make use of the btrfs
> scratch device pool...

Actually, looking at it again - does this wind up with TEST_DEV mounted
on both TEST_DIR and SCRATCH_MNT?  Maybe what the test wants is more
mountpoints, not more devices?

-Eric

> Cheers,
> 
> Dave.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/03/13 15:04, Hugo Mills wrote:
> On Sat, Mar 09, 2013 at 09:41:50PM -0800, Roger Binns wrote:
>> The only constraints that matter are surviving N device failures, and
>> data not lost if at least N devices are still present.  Under the
>> hood the best way of meeting those can be heuristically determined,
>> and I'd expect things like overhead to dynamically adjust as storage
>> fills up or empties.
> 
> That's really not going to work happily -- you'd have to run the 
> restriper in the background automatically as the device fills up.

Which is the better approach - the administrator has to sit there
adjusting various parameters after having done some difficult calculations
redoing it as data and devices increase or decrease - or a computer with
billions of bytes of memory and billions of cpu cycles per second just
figures it out based on experience :-)

> Given that this is going to end up rewriting *all* of the data on the 
> FS,

Why does all data have to be rewritten?  Why does every piece of data have
to have exactly the same storage parameters in terms of
non-redundancy/performance/striping options?

I can easily imagine the final implementation being informed by hot data
tracking.  There is absolutely no need for data that is rarely read to be
using the maximum striping/performance/overhead options.

There is no need to rewrite everything anyway - if a filesystem with 1GB
of data is heading towards 2GB of data then only enough readjusts need to
be made to release that additional 1GB of overhead.

I also assume that the probability of all devices being exactly the same
size and exactly the same performance characteristics is going to
decrease.  Many will expect that they can add an SSD to the soup, and over
time add/update devices.  ie the homogenous case that regular RAID
implicitly assumes will become increasingly rare.

> If you want maximum storage (with some given redundancy), regardless of
> performance, then you might as well start with the parity-based levels
> and just leave it at that.

In the short term it would certainly make sense to have an online
calculator or mkfs helper where you specify the device sizes and
redundancy requirements together with how much data you have, and it then
spits out the string of numbers and letters to use for mkfs/balance.

> Thinking about it, specifying a (redundancy, acceptable_wastage) pair
> is fairly pointless in controlling the performance levels,

I don't think there is merit in specifying acceptable message - the answer
is obvious in that any unused space is acceptable for use.  That also
means it changes over time as storage is used/freed.

> There's not much else a heuristic can do, without effectively exposing
> all the config options to the admin, in some obfuscated form.

There is lots heuristics can do.  At the simplest level btrfs can monitor
device performance characteristics and use that as a first pass.  One
database that I use has an interesting approach for queries - rather than
trying to work out the single best perfect execution strategy (eg which
indices in which order) it actually tries them all out concurrently and
picks the quickest.  That is then used for future similar queries with the
performance being monitored.  Once responses times no longer match the
strategy it tries them all again to pick a new winner.

There is no reason btrfs can't try a similar approach.  When presented
with a pile of heterogenous storage with different sizes and performance
characteristics, use all reasonable approaches and monitor resulting
read/write performance.  Then start biasing towards what works best.  Use
hot data tracking to determine which data would most benefit from its
approach being changed to more optimal values.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE9I4UACgkQmOOfHg372QTNZgCeJe7H9FDiwMq1CWWZTWE89/4O
fDsAn1s6/J1am4mxHhOYUnz/3JUZ6VJx
=/XF8
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread sam tygier
On 09/03/13 20:31, Hugo Mills wrote:
>Some time ago, and occasionally since, we've discussed altering the
> "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
> number of copies, m is the number of (data) devices in a stripe per copy,
> and p is the number of parity devices in a stripe.
> 
>The current kernel implementation uses as many devices as it can in the
> striped modes (RAID-0, -10, -5, -6), and in this implementation, that is
> written as "mS" (with a literal "m"). The mS and pP sections are omitted
> if the value is 1S or 0P.
> 
>The magic look-up table for old-style / new-style is:
> 
> single 1C (or omitted, in btrfs fi df output)
> RAID-0 1CmS
> RAID-1 2C
> DUP2CD
> RAID-102CmS
> RAID-5 1CmS1P
> RAID-6 1CmS2P

Are these the only valid options? Are 'sensible' new levels (eg 3C, mirrored to 
3 disk or 1CmS3P, like raid6 with but with 3 parity blocks) allowed? Are any 
arbitrary levels allowed (some other comments in the thread suggest no)? Will 
there be a recommended (or supported) set?



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Hugo Mills
On Sun, Mar 10, 2013 at 11:40:27PM +, sam tygier wrote:
> On 10/03/13 15:43, Goffredo Baroncelli wrote:
> > - DUP   -> dD   (to allow more that 2 copy per
> >  disk)
> > 
> > - RAID1 -> nC or *C 
> > 
> > - RAID0 -> mS or *S
> > 
> > - RAID10-> nCmS or *CmS or nC*s
> > 
> > - RAID with parity  -> mSpP or *SpP or mS*p (it is possible ?)
> > 
> > - single-> 1C or 1D or 1S or "single"
> > 
> > 
> > where d,n,m,p are integers; '*' is the literal '*' and means "how many
> > possible".
> 
> Using an asterisk '*' in something will be used as a command line argument 
> risks having the shell expand it. Sticking to pure alphanumeric names would 
> be better.

   Yeah, David's just pointed this out on IRC. After a bit of fiddling
around with various options, I like using X.

   I'm also going to use lowercase c,s,p, because it seems to be
easier to read with the different-height characters. So we end up
with, e.g.

1c  (single)
2cXs(RAID-10)
1cXs2p  (RAID-6)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- It's against my programming to impersonate a deity! ---   


signature.asc
Description: Digital signature


[PATCH] fs: btrfs: Replaced calls to kmalloc and memcpy with kmemdup

2013-03-10 Thread Alexandru Gheorghiu
Replaced calls to kmalloc followed by memcpy with single call to kmemdup.
This patch was found using coccicheck.

Signed-off-by: Alexandru Gheorghiu 
---
 fs/btrfs/send.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f7a8b86..f1e1e34 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3429,10 +3429,9 @@ static int __find_xattr(int num, struct btrfs_key 
*di_key,
strncmp(name, ctx->name, name_len) == 0) {
ctx->found_idx = num;
ctx->found_data_len = data_len;
-   ctx->found_data = kmalloc(data_len, GFP_NOFS);
+   ctx->found_data = kmemdup(data, data_len, GFP_NOFS);
if (!ctx->found_data)
return -ENOMEM;
-   memcpy(ctx->found_data, data, data_len);
return 1;
}
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread sam tygier
On 10/03/13 15:43, Goffredo Baroncelli wrote:
> - DUP -> dD   (to allow more that 2 copy per
>disk)
> 
> - RAID1   -> nC or *C 
> 
> - RAID0   -> mS or *S
> 
> - RAID10  -> nCmS or *CmS or nC*s
> 
> - RAID with parity-> mSpP or *SpP or mS*p (it is possible ?)
> 
> - single  -> 1C or 1D or 1S or "single"
> 
> 
> where d,n,m,p are integers; '*' is the literal '*' and means "how many
> possible".

Using an asterisk '*' in something will be used as a command line argument 
risks having the shell expand it. Sticking to pure alphanumeric names would be 
better.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Diego Calleja
El Domingo, 10 de marzo de 2013 12:23:56 Martin Steigerwald escribió:
> Any other idea to make it less cryptic?

I would vote for optionally allowing to expand the codes into
something more verbose and self-documented, ie:

1CmS1P <-> 1Copy-manyStripes-1Parity
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfstests: 301: sparse copy between different filesystems/mountpoints on btrfs

2013-03-10 Thread Dave Chinner
On Sat, Mar 09, 2013 at 06:24:47PM -0600, Eric Sandeen wrote:
> On 1/18/13 3:48 PM, Koen De Wit wrote:
> > +}
> > +
> > +_scratch_mount
> > +_create_reflinks_to $TESTDIR2
> > +_scratch_unmount
> > +
> > +mount $TEST_DEV $SCRATCH_MNT
> > +_create_reflinks_to $TESTDIR3
> > +umount $SCRATCH_MNT
> 
> TBH this confuses me, not that it's necessarily wrong (?)
> You mount TEST_DEV on $SCRATCH_MNT which makes my brain hurt a little.
> Then _create_reflinks_to $TESTDIR3 and at that point, um, what's going on,
> what's linking what to where?

Mounting the TEST_DEV on SCRATCH_MNT is almost always a bad thing to
do. The test harness expects TEST_DEV to be mounted on TEST_DIR, not
anywhere else.

If you need multiple scratch filesystems to test cross-device
linkage errors, use loopback devices or make use of the btrfs
scratch device pool...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Harald Glatt
On Sun, Mar 10, 2013 at 11:24 PM, Hugo Mills  wrote:
>> On 03/09/2013 09:31 PM, Hugo Mills wrote:
>> >Some time ago, and occasionally since, we've discussed altering the
>> > "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
>> > number of copies, m is the number of (data) devices in a stripe per copy,
>> > and p is the number of parity devices in a stripe.
>> >
>
>Hugo.
>
> --
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>--- Great oxymorons of the world, no. 6: Mature Student ---

It's important that the userland tools will output the things in both
a 'human readable' format as well as the short forms. I would say when
giving btrfs a parameter, it should only accept the short forms. But
in order to allow users to figure out what they actually mean when
doing something like 'btrfs fi show' it should tell you sometihng by
the lines of  '1 copy, 3 stripes, 1 parity (1C3S1P)' instead of just
the short form.

I would also make the case that leaving out 'defaults' from the output
is bad for the learning curve as well. Even when it's just 1C I
wouldn't remove that from the output, but the input parameter when you
work with it should know what you mean when you leave 1C out, of
course, and not require it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Hugo Mills
On Sun, Mar 10, 2013 at 04:43:33PM +0100, Goffredo Baroncelli wrote:
> Hi Hugo,
> 
> On 03/09/2013 09:31 PM, Hugo Mills wrote:
> >Some time ago, and occasionally since, we've discussed altering the
> > "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
> > number of copies, m is the number of (data) devices in a stripe per copy,
> > and p is the number of parity devices in a stripe.
> > 
> >The current kernel implementation uses as many devices as it can in the
> > striped modes (RAID-0, -10, -5, -6), and in this implementation, that is
> > written as "mS" (with a literal "m"). The mS and pP sections are omitted
> > if the value is 1S or 0P.
> > 
> >The magic look-up table for old-style / new-style is:
> > 
> > single   1C (or omitted, in btrfs fi df output)
> > RAID-0   1CmS
> > RAID-1   2C
> > DUP  2CD
> > RAID-10  2CmS
> > RAID-5   1CmS1P
> > RAID-6   1CmS2P
> 
> 
> Even I found a bit more rational the "nCmSpP" format, I think that this
> is a bit too complex.
> 
> As you told:
> >Chris is definitely planning fixed values for mS (so you can ask
> > for, say, exactly 4 stripes and one parity), and values for nC greater
> > than 2. As far as I know, there aren't any plans for nC > 1 and pP > 0
> > together. I haven't got far enough into the kernel code to work out
> > whether that's simple or not to implement.
> 
> On the basis of that we should handle few cases than the full "nCDmSpP"
> format allow. So I suggest to allow the following "shorts forms":
> 
> - DUP -> dD   (to allow more that 2 copy per
>disk)
> 
> - RAID1   -> nC or *C 
> 
> - RAID0   -> mS or *S

   So we can drop the C clause where it's 1C, in the same way we drop
the S clause when it's 1S, and P when it's 0P. I'm happy with that
much. It makes the parser a little more complicated, but it's no big
problem.

> - RAID10  -> nCmS or *CmS or nC*s
> 
> - RAID with parity-> mSpP or *SpP or mS*p (it is possible ?)
> 
> - single  -> 1C or 1D or 1S or "single"
> 
> 
> where d,n,m,p are integers; '*' is the literal '*' and means "how many
> possible".

   I don't particularly like this as a generic thing. The * is
ambiguous: does it mean "use as many as possible at all times" (like
the current RAID-0 implementation, for example), or does it mean "use
as many as we can right now, and don't reduce the value"?

   The former is the default for stripes on current RAID-0, -10, -5
and -6. However, it would be problematic for copies on RAID-1, and
-10, because you'd have variable redundancy. For specifying copies,
you'd want the second meaning above, so the * actually means subtly
different things depending on where it appears.

> For example if I have 6 disks:
> *C2S  -> means: 3 copies, 2 stripes ( m*n = 3*2 == num disk == 6)
> 2C*S  -> means: 2 copies, 3 stripes ( m*n = 3*2 == num disk == 6)
> 
> *S2P  -> means: 2 parity, 4 stripes ( p+m = 4+2 == num disk == 6)

> We could also have some defaults, like: d=2, n=2, m=*, p=1, so some
> common forms become:
> D -> means DUP (d==2)
> S -> means RAID0 (s=*)
> C -> means RAID1 (n=2)
> CS-> means RAID10 (n=2, m=*)
> SP-> means RAID5 (m=*, p=1)
> SP2   -> means RAID6 (m=*, p=2)

   Too cryptic; special cases -- we'd only be replacing one set of
semantic-free symbols (RAID-18-OMGBBQ) with another set, which is what
I'm trying to get away from with this patch.

   I definitely want to see  clauses only, in a
fixed order and with as little special-casing as possible. We can drop
"unused" clauses, but trying to remove just the numbers for specific
cases, or just to produce new special names and aliases for things
seems counterproductive.

> It would allowable also more complex form like
>   5C3D4S2P
> but I don't think it will be supported ever.

   Probably not. :)

   It's unclear to me what the 5C3D would actually do. (5 copies on
different drives, and three other copies placed somewhere on the same
devices? And as you say, unlikely to be implemented) I think the "D"
concept should stay as a modifier flag, not a separate clause, as
we've got either n copies all on different drives, or n copies without
the guarantee, which is what the D shows.

   (As an aside, when Chris gives us 3C, it'll be interesting to see
what can be done with 3CD on two drives... TRIPE rather than DUP?)

> I have a request: could we consider to swap letters with the numbers ?
> For example 3S2P could become S3P2. For my *subjective* point of view,
> this seems more clear. What do you think ?

   I read it as "three stripes, two parity", so to me it makes more
sense in  order.

> However I am open also to the following forms
> 
> dD -> DUP(d), as backward compatibility DUP means d==2
> nC -> RAID1(n), as backward compatibility RAID1 means n==2
> mS -> RAID0(n), as ba

Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Harald Glatt
I created a btrfs volume on a 4GB drive using the entire drive
(VirtualBox VM). Of this drive btrfs immediately used 400 MB. I then
filled it up with random data, left around 300 MB free and made a
md5sum of said data. Then I umounted the volume and wrote random data
into it the drive with dd at 1GB, 2GB and 3GB offsets. I increased the
amount of data written each time between my tests. Btrfs kept the
entire filesystem in tact (I did have to use btrfs-zero-log though)
and the checksum stayed correct the entire time. I wrote 120 MB of
corrupt data into the drive and that worked, on my next test writing
150 MB of corrupt data resulted in btrfs not being mountable anymore.
150 MB was around 5% of the 3.3 GB which was the size of my test file
how I got to the 5%. The output of btrfs when I couldn't mount anymore
said something about seeing a corruption larger than 145 MB which it
cannot recover from (heavily paraphrased) so I knew that my 150 MB
test was very close to the limit.

On Sun, Mar 10, 2013 at 11:59 PM, Goffredo Baroncelli
 wrote:
> On 03/10/2013 10:45 PM, Harald Glatt wrote:
>
>> I've noticed through my own tests that on a single device I can
>> corrupt around 5% of the data completely before btrfs fails. Up to
>> that point both filesystem as well as data integrity stays at 100%.
>> However the default layout for one disk seems to be having the data
>> once, the metadata DUP and the system DUP too.
>
> How make you the corruption ? Does btrfs return wrong data ? How is
> calculated the 5% ?
>
>
>> Having these 5% isn't
>> mentioned anywhere... Is this a value that could maybe be manipulated
>> and could it be introduced into a naming scheme like this? Also where
>> do the 5% redundancy come from?
>
> On a single device, the metadata are DUPlicated but the data have only 1
> copy.
>
> This means that if you corrupt the 1 copy of the metadata, btrfs
> survives using the other copy. Instead if you corrupt the data btrfs
> return an error.
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Hugo Mills
On Sat, Mar 09, 2013 at 09:41:50PM -0800, Roger Binns wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 09/03/13 17:44, Hugo Mills wrote:
> > You've got at least three independent parameters to the system in order
> > to make that choice, though, and it's a fairly fuzzy decision problem.
> > You've got:
> > 
> > - Device redundancy - Storage overhead - Performance
> 
> Overhead and performance aren't separate goals.  More accurately the goal
> is best performance given the devices available and constrained by redundancy.
> 
> If I have 1GB of unique data and 10GB of underlying space available then
> feel free to make 9 additional copies of each piece of data if that helps
> performance.  As I increase the unique data the overhead available will
> decrease, but I doubt anyone has a goal of micromanaging overhead usage.
> Why can't the filesystem just figure it out and do the best job available
> given minimal constraints?
> 
> > I definitely want to report the results in nCmSpP form, which tells you
> > what it's actually done. The internal implementation, while not 
> > expressing the full gamut of possibilities, maps directly from the 
> > internal configuration to that form, and so it should at least be an 
> > allowable input for configuration (e.g. mkfs.btrfs and the restriper).
> 
> Agreed on that for the micromanagers :-)
> 
> > If you'd like to suggest a usable set of configuration axes [say, 
> > (redundancy, overhead) ], and a set of rules for converting those
> > requirements to the internal representation, then there's no reason we
> > can't add them as well in a later set of patches.
> 
> The only constraints that matter are surviving N device failures, and data
> not lost if at least N devices are still present.  Under the hood the best
> way of meeting those can be heuristically determined, and I'd expect
> things like overhead to dynamically adjust as storage fills up or empties.

   That's really not going to work happily -- you'd have to run the
restriper in the background automatically as the device fills up.
Given that this is going to end up rewriting *all* of the data on the
FS, taking up storage bandwidth as it does it (and taking a hell of a
long time to complete), I think this is a bit of a non-starter. Oh,
and your performance will drop steadily over the months as it cranks
down through the various options.

   If you want maximum storage (with some given redundancy),
regardless of performance, then you might as well start with the
parity-based levels and just leave it at that.

   Thinking about it, specifying a (redundancy, acceptable_wastage)
pair is fairly pointless in controlling the performance levels,
because it doesn't make the space/speed trade-off explicit. In fact,
without actually doing long-term benchmarks on your hardware and
workloads, it's going to be next-to-impossible to give any concrete
figures, other than "X will be faster than Y".

   So, if you (as $sysadmin) actually care about performance, you'll
have to benchmark your own system and test the various options anyway.
If you don't care about performance and want as much storage as
possible, you'll be using mS1P or mS2P (or possibly mS3P in the
future). There's not much else a heuristic can do, without effectively
exposing all the config options to the admin, in some obfuscated form.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no. 6: Mature Student ---   


signature.asc
Description: Digital signature


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Goffredo Baroncelli
On 03/10/2013 10:45 PM, Harald Glatt wrote:

> I've noticed through my own tests that on a single device I can
> corrupt around 5% of the data completely before btrfs fails. Up to
> that point both filesystem as well as data integrity stays at 100%.
> However the default layout for one disk seems to be having the data
> once, the metadata DUP and the system DUP too. 

How make you the corruption ? Does btrfs return wrong data ? How is
calculated the 5% ?


> Having these 5% isn't
> mentioned anywhere... Is this a value that could maybe be manipulated
> and could it be introduced into a naming scheme like this? Also where
> do the 5% redundancy come from?

On a single device, the metadata are DUPlicated but the data have only 1
copy.

This means that if you corrupt the 1 copy of the metadata, btrfs
survives using the other copy. Instead if you corrupt the data btrfs
return an error.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Harald Glatt
On Sun, Mar 10, 2013 at 10:36 PM, Hugo Mills  wrote:
>
>Oh, sorry. It's "reduced redundancy", aka DUP -- i.e. you get that
> number of copies, but not guarantee that the copies all live on
> different devices. I'm not devoted to showing it this way. Other
> suggestions for making this distinction are welcomed. :)
>
>
>Hugo.
>
> --
> === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
>   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
>--- There's many a slip 'twixt wicket-keeper and gully. ---

I've noticed through my own tests that on a single device I can
corrupt around 5% of the data completely before btrfs fails. Up to
that point both filesystem as well as data integrity stays at 100%.
However the default layout for one disk seems to be having the data
once, the metadata DUP and the system DUP too. Having these 5% isn't
mentioned anywhere... Is this a value that could maybe be manipulated
and could it be introduced into a naming scheme like this? Also where
do the 5% redundancy come from?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Hugo Mills
On Sun, Mar 10, 2013 at 12:23:56PM +0100, Martin Steigerwald wrote:
> Hi Hugo,
> 
> Am Samstag, 9. März 2013 schrieb Hugo Mills:
> >Some time ago, and occasionally since, we've discussed altering the
> > "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
> > number of copies, m is the number of (data) devices in a stripe per copy,
> > and p is the number of parity devices in a stripe.
> > 
> >The current kernel implementation uses as many devices as it can in
> > the striped modes (RAID-0, -10, -5, -6), and in this implementation,
> > that is written as "mS" (with a literal "m"). The mS and pP sections are
> > omitted if the value is 1S or 0P.
> > 
> >The magic look-up table for old-style / new-style is:
> > 
> > single   1C (or omitted, in btrfs fi df output)
> > RAID-0   1CmS
> > RAID-1   2C
> > DUP  2CD
> 
> What does the "D" in "2CD" mean? Its not explained above, unless I miss 
> something.

   Oh, sorry. It's "reduced redundancy", aka DUP -- i.e. you get that
number of copies, but not guarantee that the copies all live on
different devices. I'm not devoted to showing it this way. Other
suggestions for making this distinction are welcomed. :)

> > RAID-10  2CmS
> > RAID-5   1CmS1P
> > RAID-6   1CmS2P
> 
> I think its great to clarify the RAID-level terminology, but I find the new 
> notation a bit, hmmm, cryptic.
> 
> Maybe for displaying it would be nice to show a more verbose format like
> 
> 2 copies, many stripes, 1 parity (1CmS1P)
> 
> by default and the abbreviated one in parentheses?

   The only place it gets output right now is btrfs fi df, and
something that verbose would probably get in the way.

> Any other idea to make it less cryptic?

   Not necessarily less cryptic, but using lower-case c/s/p would
probably improve readability: 1c2s1p. Possibly also using * instead of
m for the "s" setting: 2c*s. That will change the heights of the
characters, so you only really need to look at the tall ones.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's many a slip 'twixt wicket-keeper and gully. ---   


signature.asc
Description: Digital signature


[PATCH] btrfs-progs: output raid[56] options in mkfs.btrfs

2013-03-10 Thread Matias Bjørling
This patch adds the raid[56] options to the output of mkfs.btrfs help.
---
 mkfs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mkfs.c b/mkfs.c
index 5ece186..f9f26a5 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -326,7 +326,7 @@ static void print_usage(void)
fprintf(stderr, "options:\n");
fprintf(stderr, "\t -A --alloc-start the offset to start the FS\n");
fprintf(stderr, "\t -b --byte-count total number of bytes in the FS\n");
-   fprintf(stderr, "\t -d --data data profile, raid0, raid1, raid10, dup 
or single\n");
+   fprintf(stderr, "\t -d --data data profile, raid0, raid1, raid5, raid6, 
raid10, dup or single\n");
fprintf(stderr, "\t -l --leafsize size of btree leaves\n");
fprintf(stderr, "\t -L --label set a label\n");
fprintf(stderr, "\t -m --metadata metadata profile, values like data 
profile\n");
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Balance terminating

2013-03-10 Thread David Sterba
On Sat, Mar 09, 2013 at 12:03:07PM +, Swasher wrote:
> After it, I'm start watch for progress via 'balance status', and I can see 
> how 
> progress run. But only for 83% left. 
> 
> I'm wait few hours and try cancel it with 'balance cancel'.
> More than 12 hours had passed, but balance stalled in this state:
> 
> $ btrfs fi balance status /mnt/raid/  
> Balance on '/mnt/raid/' is running, cancel requested
> 386 out of about 2231 chunks balanced (387 considered),  83% left

What's the kernel version?

> Question: Can I'm safetly reboot server? Or should I do something else?

Yes it's safe. Please note that an interrupted balance will continue on
next mount unless 'skip_balance' is specified (and then balance cancel
will clear the state from the fs).

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Add man page description for nCmSpP replication levels

2013-03-10 Thread Goffredo Baroncelli
On 03/10/2013 06:20 PM, Hugo Mills wrote:
> On Sun, Mar 10, 2013 at 03:01:12PM +0100, Goffredo Baroncelli wrote:
>> Hi Hugo,
>>
>> could you please add also to the btrfs man page a section where are
>> described the nCmSpP levels ?
>>
>> Thanks.
>> GB
>>
>>
>>
>> On 03/09/2013 09:31 PM, Hugo Mills wrote:
>>> Signed-off-by: Hugo Mills 
>>> ---
> [snip]
>>> diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
>>> index 41163e0..2e71e65 100644
>>> --- a/man/mkfs.btrfs.8.in
>>> +++ b/man/mkfs.btrfs.8.in
>>> @@ -37,7 +37,29 @@ mkfs.btrfs uses all the available storage for the 
>>> filesystem.
>>>  .TP
>>>  \fB\-d\fR, \fB\-\-data \fItype\fR
>>>  Specify how the data must be spanned across the devices specified. Valid
>>> -values are raid0, raid1, raid10 or single.
> 
>Like here?

Yes, I thought about a section which lists the allowable raid level.

> 
>>> +values are of the form C[D][S[P]], where  is the number of 
>>> copies
>>> +of data,  is the number of stripes per copy, and  is the number of 
>>> parity
>>> +stripes. The  parameter must (currently) be a literal "m", indicating 
>>> that
>>> +as many stripes as possible will be used. The letter D may be added to the
>>> +number of copies, to indicate non-redundant copies (e.g. on the same 
>>> device).
>>> +
>>> +The following deprecated values may also be used:
>>> +.RS 16
>>> +.P
>>> +single 1C
>>> +.P
>>> +raid0  1CmS
>>> +.P
>>> +raid1  2C
>>> +.P
>>> +dup2CD
>>> +.P
>>> +raid10 2CmS
>>> +.P
>>> +raid5  1CmS1P
>>> +.P
>>> +raid6  1CmS2P
>>> +.RS -16
>>>  .TP
>>>  \fB\-f\fR
>>>  Force overwrite when an existing filesystem is detected on the device.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Add man page description for nCmSpP replication levels

2013-03-10 Thread Hugo Mills
On Sun, Mar 10, 2013 at 03:01:12PM +0100, Goffredo Baroncelli wrote:
> Hi Hugo,
> 
> could you please add also to the btrfs man page a section where are
> described the nCmSpP levels ?
> 
> Thanks.
> GB
> 
> 
> 
> On 03/09/2013 09:31 PM, Hugo Mills wrote:
> > Signed-off-by: Hugo Mills 
> > ---
[snip]
> > diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
> > index 41163e0..2e71e65 100644
> > --- a/man/mkfs.btrfs.8.in
> > +++ b/man/mkfs.btrfs.8.in
> > @@ -37,7 +37,29 @@ mkfs.btrfs uses all the available storage for the 
> > filesystem.
> >  .TP
> >  \fB\-d\fR, \fB\-\-data \fItype\fR
> >  Specify how the data must be spanned across the devices specified. Valid
> > -values are raid0, raid1, raid10 or single.

   Like here?

> > +values are of the form C[D][S[P]], where  is the number of 
> > copies
> > +of data,  is the number of stripes per copy, and  is the number of 
> > parity
> > +stripes. The  parameter must (currently) be a literal "m", indicating 
> > that
> > +as many stripes as possible will be used. The letter D may be added to the
> > +number of copies, to indicate non-redundant copies (e.g. on the same 
> > device).
> > +
> > +The following deprecated values may also be used:
> > +.RS 16
> > +.P
> > +single 1C
> > +.P
> > +raid0  1CmS
> > +.P
> > +raid1  2C
> > +.P
> > +dup2CD
> > +.P
> > +raid10 2CmS
> > +.P
> > +raid5  1CmS1P
> > +.P
> > +raid6  1CmS2P
> > +.RS -16
> >  .TP
> >  \fB\-f\fR
> >  Force overwrite when an existing filesystem is detected on the device.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I can resist everything except temptation ---


signature.asc
Description: Digital signature


Re: [PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support

2013-03-10 Thread Goffredo Baroncelli
On 03/10/2013 02:19 PM, Martin Steigerwald wrote:
> Am Sonntag, 10. März 2013 schrieb Martin Steigerwald:
>> Am Sonntag, 10. März 2013 schrieb Goffredo Baroncelli:
>>> Hi all,
>>
>> Hi Goffredo,
>>
>>> This is the third attempt of my patches related to show how the data
>>> are stored in a btrfs filesystem. I rebased all the patches on the
>>> latest mason git. I tried to address the Zach concern abou the using
>>> of the string_list_add() in the df_pretty_sizes(): string_list_add()
>>> is removed from the df_pretty_sizes() and I created the new function
>>> sla_pretty_sizes() which calls df_pretty_sizes() and
>>> string_list_add().
>>
>> Thanks for your new round of patches.
> 
> My MTA returned on my first answer to you:
> 
> : host mta5.am0.yahoodns.net[74.6.136.244] 
> said:
> 554 delivery error: dd This user doesn't have a yahoo.com account
> (goffredo.baronce...@yahoo.com) [-5] - mta1233.mail.sk1.yahoo.com (in 
> reply
> to end of DATA command)
> 
> Replying to list cause obviously I cannot reach your personal mail address 
> right now.

It seems that yahoo doesn't recognise my email
"goffredo.baronce...@yahoo.com"; the strange thing is that I use this
user for the authentication !

> 
> Thanks,

GB


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support

2013-03-10 Thread Goffredo Baroncelli
Hi Martin,

On 03/10/2013 02:16 PM, Martin Steigerwald wrote:
> Am Sonntag, 10. März 2013 schrieb Goffredo Baroncelli:
>> Hi all, 
> 
> Hi Goffredo,
> 
>> This is the third attempt of my patches related to show how the data
>> are stored in a btrfs filesystem. I rebased all the patches on the latest
>> mason git. I tried to address the Zach concern abou the using of
>> the string_list_add() in the df_pretty_sizes(): string_list_add() is
>> removed from the df_pretty_sizes() and I created the new function 
>> sla_pretty_sizes() which calls df_pretty_sizes() and string_list_add().
> 
> Thanks for your new round of patches.
> 
>> Unfortunately I noticed a regression which passed all the reviews until
>> now: the command btrfs fi df previous didn't require the root
>> capability, now with my patches it is required, because I need to know
>> some info about the chunks so I need to use the "BTRFS_IOC_TREE_SEARCH".
>>
>> I think that there are the following possibilities:
>> 1) accept this regresssion
>> 2) remove the command "btrfs fi df" and leave only "btrfs fi disk-usage"
>> and "btrfs dev disk-usage"
>> 3) adding a new ioctl which could be used without root capability. Of
>> course this ioctl would return only a subset of the
>> BTRFS_IOC_TREE_SEARCH info
>>
>> I think that the 3) would be the "long term" solution. I am not happy
>> about the 1), so as "short term solution" I think that we should go with
>> the 2). What do you think ?
> 
> Uhm, but exactly the new btrfs fi df contains a good overview:
> 
>> Below the description of the patches.
>>
>> --
>>
>> These patches update the btrfs fi df command and add two new commands:
>> - btrfs filesystem disk-usage 
>> - btrfs device disk-usage 
>>
>> The command "btrfs filesystem df" now shows only the disk
>> usage/available.
>>
>> $ sudo btrfs filesystem df /mnt/btrfs1/
>> Disk size:   400.00GB
>> Disk allocated:8.04GB
>> Disk unallocated:391.97GB
>> Used: 11.29MB
>> Free (Estimated):250.45GB   (Max: 396.99GB, min: 201.00GB)
>> Data to disk ratio:  63 %
>>
>> The "Free (Estimated)" tries to give an estimation of the free space
>> on the basis of the chunks usage. Max and min are the maximum allowable
>> space (if the next chunk are allocated as SINGLE) or the minimum one (
>> if the next chunks are allocated as DUP/RAID1/RAID10).
> 
> What information fi df can´t display without root permissions? Maybe its
> okay to just omit it for now if being run as user or display a "run as root" 
> hint instead?

I need the root permission to know how many stripes the raid5/6 chunks
are allocated.
For the RAID1/RAID10/RAID0 the computation of the disk usage by the
stripes was easy. I need to multiply the field " total_bytes" of the
struct btrfs_ioctl_space_args by a factor depending by the kind of RAID
(for example for RAID1 this factor is two: the disk space used is two
times the space available).
Instead for RAID5/6 this value depends by the number of disk at "the
time of the chunk creation". If the chunk was created in a RAID5 with 4
disks, the ratio space available/disk space used is 3/4. If I add
another disk and don't perform a balance, for the old chunk the ration
is 3/4 for a new chunk the ratio become 4/5. To got this information I
need to retrieve the chunk info using the BTRFS_IOC_TREE_SEARCH ioctl.

GB




-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Goffredo Baroncelli
Hi Hugo,

On 03/09/2013 09:31 PM, Hugo Mills wrote:
>Some time ago, and occasionally since, we've discussed altering the
> "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
> number of copies, m is the number of (data) devices in a stripe per copy,
> and p is the number of parity devices in a stripe.
> 
>The current kernel implementation uses as many devices as it can in the
> striped modes (RAID-0, -10, -5, -6), and in this implementation, that is
> written as "mS" (with a literal "m"). The mS and pP sections are omitted
> if the value is 1S or 0P.
> 
>The magic look-up table for old-style / new-style is:
> 
> single 1C (or omitted, in btrfs fi df output)
> RAID-0 1CmS
> RAID-1 2C
> DUP2CD
> RAID-102CmS
> RAID-5 1CmS1P
> RAID-6 1CmS2P


Even I found a bit more rational the "nCmSpP" format, I think that this
is a bit too complex.

As you told:
>Chris is definitely planning fixed values for mS (so you can ask
> for, say, exactly 4 stripes and one parity), and values for nC greater
> than 2. As far as I know, there aren't any plans for nC > 1 and pP > 0
> together. I haven't got far enough into the kernel code to work out
> whether that's simple or not to implement.

On the basis of that we should handle few cases than the full "nCDmSpP"
format allow. So I suggest to allow the following "shorts forms":

- DUP   -> dD   (to allow more that 2 copy per
 disk)

- RAID1 -> nC or *C 

- RAID0 -> mS or *S

- RAID10-> nCmS or *CmS or nC*s

- RAID with parity  -> mSpP or *SpP or mS*p (it is possible ?)

- single-> 1C or 1D or 1S or "single"


where d,n,m,p are integers; '*' is the literal '*' and means "how many
possible".

For example if I have 6 disks:
*C2S-> means: 3 copies, 2 stripes ( m*n = 3*2 == num disk == 6)
2C*S-> means: 2 copies, 3 stripes ( m*n = 3*2 == num disk == 6)
*S2P-> means: 2 parity, 4 stripes ( p+m = 4+2 == num disk == 6)

We could also have some defaults, like: d=2, n=2, m=*, p=1, so some
common forms become:
D   -> means DUP (d==2)
S   -> means RAID0 (s=*)
C   -> means RAID1 (n=2)
CS  -> means RAID10 (n=2, m=*)
SP  -> means RAID5 (m=*, p=1)
SP2 -> means RAID6 (m=*, p=2)

It would allowable also more complex form like
5C3D4S2P
but I don't think it will be supported ever.

I have a request: could we consider to swap letters with the numbers ?
For example 3S2P could become S3P2. For my *subjective* point of view,
this seems more clear. What do you think ?

However I am open also to the following forms

dD -> DUP(d), as backward compatibility DUP means d==2
nC -> RAID1(n), as backward compatibility RAID1 means n==2
mS -> RAID0(n), as backward compatibility RAID0 means m==*
nCmS -> RAID10(n,m), as backward compatibility RAID10 means n=2, m=*
mSpP -> RAIDP(n,m), as backward compatibility RAID6 means p=2, m=*
RAID5 means p=1, m=*

more verbose but also more familiar to the administrator.

What do you think ?

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] btrfs-progs: more Coverity cleanups

2013-03-10 Thread David Sterba
On Mon, Mar 04, 2013 at 04:39:50PM -0600, Eric Sandeen wrote:
> This gets the coverity issue count down to 33.  Before Zach started this
> process, we were over 150, IIRC.  So it's almost to the point where the
> scans will be manageable going forward.
> 
> Not a lot of real bugfixes here, but a bit better error handling in
> places.  I sent out 2 dumb patches  (elsewhere) on Friday, though,
> so feel free to take a close & skeptical look at these ;)

Thanks, pulled them in (minus 04 and V2 for 12 and 14).

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] Enhance the command btrfs filesystem df.

2013-03-10 Thread Wang Shilong
Hello,

> From: Goffredo Baroncelli 
> 
> Enhance the command "btrfs filesystem df" to show space usage information
> for a mount point(s). It shows also an estimation of the space available,
> on the basis of the current one used.
> 
> Signed-off-by: Goffredo Baroncelli 
> ---
> Makefile |2 +-
> cmds-fi-disk_usage.c |  530 ++
> cmds-fi-disk_usage.h |   25 +++
> cmds-filesystem.c|  125 +---
> ctree.h  |   17 +-
> utils.c  |   14 ++
> utils.h  |2 +
> 7 files changed, 589 insertions(+), 126 deletions(-)
> create mode 100644 cmds-fi-disk_usage.c
> create mode 100644 cmds-fi-disk_usage.h
> 
> diff --git a/Makefile b/Makefile
> index 0d6c43a..bd792b6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -9,7 +9,7 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
> print-tree.o \
> string_list.o
> cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>  cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
> -cmds-quota.o cmds-qgroup.o cmds-replace.o
> +cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-fi-disk_usage.o
> 
> CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \
>   -Wuninitialized -Wshadow -Wundef
> diff --git a/cmds-fi-disk_usage.c b/cmds-fi-disk_usage.c
> new file mode 100644
> index 000..50b2fae
> --- /dev/null
> +++ b/cmds-fi-disk_usage.c
> @@ -0,0 +1,530 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "utils.h"
> +#include "kerncompat.h"
> +#include "ctree.h"
> +#include "string_list.h"
> +
> +#include "commands.h"
> +
> +#include "version.h"
> +
> +#define DF_HUMAN_UNIT(1<<0)
> +
> +/* 
> + * To store the size information about the chunks:
> + * the chunks info are grouped by the tuple (type, devid, num_stripes),
> + * i.e. if two chunks are of the same type (RAID1, DUP...), are on the
> + * same disk, have the same stripes then their sizes are grouped
> + */
> +struct chunk_info {
> + u64 type;
> + u64 size;
> + u64 devid;
> + u64 num_stripes;
> +};
> +
> +/*
> + * Pretty print the size
> + */
> +static char *df_pretty_sizes(u64 size, int mode)
> +{
> + char *s;
> +
> + if (mode & DF_HUMAN_UNIT) {
> + s = pretty_sizes(size);
> + if (!s)
> + return NULL;
> + } else {
> + s = malloc(21);

  I don't really like '21' here...

> + if (!s)
> + return NULL;
> + sprintf(s, "%llu", size);
> + }
> +
> + return s;
> +}
> +
> +/*
> + * This function is like the one above, only it passes the string buffer
> + * to the string_list_add function to be able to free all the strings 
> togheter
> + * with the string_list_free() function
> + */
> +static char *sla_pretty_sizes(u64 size, int mode)
> +{
> + return string_list_add(df_pretty_sizes(size,mode));
> +}
> +
> +/*
> + * Add the chunk info to the chunk_info list
> + */
> +static int add_info_to_list(struct chunk_info **info_ptr,
> + int *info_count,
> + struct btrfs_chunk *chunk)
> +{
> +
> + u64 type = btrfs_stack_chunk_type(chunk);
> + u64 size = btrfs_stack_chunk_length(chunk);
> + int num_stripes = btrfs_stack_chunk_num_stripes(chunk);

why not  u64 for num_stripes here?

> + int j;
> +
> + for (j = 0 ; j < num_stripes ; j++) {
> + int i;
> + struct chunk_info *p = 0;
> + struct btrfs_stripe *stripe;
> + u64devid;
   
 It is better to move all these declarations to the start
 of the function…
and there are many places..you declare or assign several
values in the one line, i think it is not good coding styles ^_^

> +
> + stripe = btrfs_stripe_nr(chunk, j);
> + devid = btrfs_stack_stripe_devid(stripe);
> +
> + for (i = 0 ; i < *info_count ; i++)
> + if ((*info_ptr)[i].type == type &&
> + (*info_ptr)[i].devid == devid &&
> +  

Re: [PATCH 1/8] Add some helpers to manage the strings allocation/deallocation.

2013-03-10 Thread Goffredo Baroncelli
On 03/10/2013 03:34 PM, Wang Shilong wrote:
> Hello,
[...]
>> +
>> +/* 
>> + * Add a string to the dynamic allocated string list
>> + */
>> +char *string_list_add(char *s)
>> +{
>> +int  size;
>> +
>   
>   I'd prefer to have a check here firstly, like:
>   if (!s)
>   return s;
> 
>   Since this function is called directly without any check about 'char *s'
> in your next patch..
>   
>   Thanks,
>   Wang

Thanks for the review.
However this check is not mandatory. Even if s == null, we store a null
pointer which is meaningless but not dangerous because it is legal to
free() a null pointer.

Form man 3 free:

void free(void *ptr);
[...]
If ptr is NULL, no operation is performed.

In a next submit I will add your suggestions.
> 
>> +size = sizeof(void *) * ++count_string_to_free;
>> +strings_to_free = realloc(strings_to_free, size);
>> +
>> +/* if we don't have enough memory, we have more serius
>> +   problem than that a wrong handling of not enough memory */
>> +if (!strings_to_free) {
>> +fprintf(stderr, "add_string_to_free(): Not enough memory\n");
>> +count_string_to_free = 0;
>> +return NULL;
>> +}
>> +
>> +strings_to_free[count_string_to_free-1] = s;
>> +return s;
>> +}
>> +
>> +/*
>> + * Free the dynamic allocated strings list
>> + */
>> +void string_list_free()
>> +{
>> +int i;
>> +for (i = 0 ; i < count_string_to_free ; i++)
>> +free(strings_to_free[i]);
>> +
>> +free(strings_to_free);
>> +
>> +strings_to_free = 0;
>> +count_string_to_free = 0;
>> +}
>> +
>> +
>> diff --git a/string_list.h b/string_list.h
>> new file mode 100644
>> index 000..fdc027d
>> --- /dev/null
>> +++ b/string_list.h
>> @@ -0,0 +1,23 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License v2 as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public
>> + * License along with this program; if not, write to the
>> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
>> + * Boston, MA 021110-1307, USA.
>> + */
>> +
>> +#ifndef STRING_LIST_H
>> +#define STRING_LIST_H
>> +
>> +char *string_list_add(char *s);
>> +void string_list_free();
>> +
>> +#endif
>> -- 
>> 1.7.10.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] Add some helpers to manage the strings allocation/deallocation.

2013-03-10 Thread Wang Shilong
Hello,

> From: Goffredo Baroncelli 
> 
> This patch adds some helpers to manage the strings allocation and
> deallocation.
> The function string_list_add(char *) adds the passed string to a list;
> the function string_list_free() frees all the strings together.
> 
> Signed-off-by: Goffredo Baroncelli 
> ---
> Makefile  |3 ++-
> string_list.c |   65 +
> string_list.h |   23 
> 3 files changed, 90 insertions(+), 1 deletion(-)
> create mode 100644 string_list.c
> create mode 100644 string_list.h
> 
> diff --git a/Makefile b/Makefile
> index 596bf93..0d6c43a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -5,7 +5,8 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
> print-tree.o \
> root-tree.o dir-item.o file-item.o inode-item.o \
> inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \
> volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \
> -   send-stream.o send-utils.o qgroup.o raid6.o
> +   send-stream.o send-utils.o qgroup.o raid6.o \
> +   string_list.o
> cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
>  cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
>  cmds-quota.o cmds-qgroup.o cmds-replace.o
> diff --git a/string_list.c b/string_list.c
> new file mode 100644
> index 000..f840048
> --- /dev/null
> +++ b/string_list.c
> @@ -0,0 +1,65 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "string_list.h"
> +
> +/* To store the strings */
> +static void **strings_to_free;
> +static int count_string_to_free;
> +
> +/* 
> + * Add a string to the dynamic allocated string list
> + */
> +char *string_list_add(char *s)
> +{
> + int  size;
> +

I'd prefer to have a check here firstly, like:
if (!s)
return s;

Since this function is called directly without any check about 'char *s'
in your next patch..

Thanks,
Wang

> + size = sizeof(void *) * ++count_string_to_free;
> + strings_to_free = realloc(strings_to_free, size);
> +
> + /* if we don't have enough memory, we have more serius
> +problem than that a wrong handling of not enough memory */
> + if (!strings_to_free) {
> + fprintf(stderr, "add_string_to_free(): Not enough memory\n");
> + count_string_to_free = 0;
> + return NULL;
> + }
> +
> + strings_to_free[count_string_to_free-1] = s;
> + return s;
> +}
> +
> +/*
> + * Free the dynamic allocated strings list
> + */
> +void string_list_free()
> +{
> + int i;
> + for (i = 0 ; i < count_string_to_free ; i++)
> + free(strings_to_free[i]);
> +
> + free(strings_to_free);
> +
> + strings_to_free = 0;
> + count_string_to_free = 0;
> +}
> +
> +
> diff --git a/string_list.h b/string_list.h
> new file mode 100644
> index 000..fdc027d
> --- /dev/null
> +++ b/string_list.h
> @@ -0,0 +1,23 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public
> + * License v2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; if not, write to the
> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
> + * Boston, MA 021110-1307, USA.
> + */
> +
> +#ifndef STRING_LIST_H
> +#define STRING_LIST_H
> +
> +char *string_list_add(char *s);
> +void string_list_free();
> +
> +#endif
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Goffredo Baroncelli
Hi Martin,

On 03/10/2013 12:23 PM, Martin Steigerwald wrote:
> Hi Hugo,
> 
> Am Samstag, 9. März 2013 schrieb Hugo Mills:
>>Some time ago, and occasionally since, we've discussed altering the
>> "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
>> number of copies, m is the number of (data) devices in a stripe per copy,
>> and p is the number of parity devices in a stripe.
>>
>>The current kernel implementation uses as many devices as it can in
>> the striped modes (RAID-0, -10, -5, -6), and in this implementation,
>> that is written as "mS" (with a literal "m"). The mS and pP sections are
>> omitted if the value is 1S or 0P.
>>
>>The magic look-up table for old-style / new-style is:
>>
>> single   1C (or omitted, in btrfs fi df output)
>> RAID-0   1CmS
>> RAID-1   2C
>> DUP  2CD
> 
> What does the "D" in "2CD" mean? Its not explained above, unless I miss 
> something.

This means DUP (two copy on the same disk); I understand that only
reading the code.

> 
>
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Add man page description for nCmSpP replication levels

2013-03-10 Thread Goffredo Baroncelli
Hi Hugo,

could you please add also to the btrfs man page a section where are
described the nCmSpP levels ?

Thanks.
GB



On 03/09/2013 09:31 PM, Hugo Mills wrote:
> Signed-off-by: Hugo Mills 
> ---
>  man/btrfs.8.in  |9 +
>  man/mkfs.btrfs.8.in |   24 +++-
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/man/btrfs.8.in b/man/btrfs.8.in
> index 94f4ffe..2799ec7 100644
> --- a/man/btrfs.8.in
> +++ b/man/btrfs.8.in
> @@ -25,6 +25,8 @@ btrfs \- control a btrfs filesystem
>  [-s \fIstart\fR] [-t \fIsize\fR] -[vf] <\fIfile\fR>|<\fIdir\fR> \
>  [<\fIfile\fR>|<\fIdir\fR>...]
>  .PP
> +\fBbtrfs\fP \fBfilesystem df\fP [-r]\fI  \fP
> +.PP
>  \fBbtrfs\fP \fBfilesystem sync\fP\fI  \fP
>  .PP
>  \fBbtrfs\fP \fBfilesystem resize\fP\fI [devid:][+/\-][gkm]|[devid:]max 
> \fP
> @@ -217,6 +219,13 @@ don't use it if you use snapshots, have de-duplicated 
> your data or made
>  copies with \fBcp --reflink\fP.
>  .TP
>  
> +\fBfilesystem df\fR [-r] \fI\fR
> +Show usage information for the filesystem identified by \fI\fR.
> +
> +\fB-r, --raid\fP Use old-style "RAID-n" terminology to show replication types
> +
> +.TP
> +
>  \fBfilesystem sync\fR\fI  \fR
>  Force a sync for the filesystem identified by \fI\fR.
>  .TP
> diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
> index 41163e0..2e71e65 100644
> --- a/man/mkfs.btrfs.8.in
> +++ b/man/mkfs.btrfs.8.in
> @@ -37,7 +37,29 @@ mkfs.btrfs uses all the available storage for the 
> filesystem.
>  .TP
>  \fB\-d\fR, \fB\-\-data \fItype\fR
>  Specify how the data must be spanned across the devices specified. Valid
> -values are raid0, raid1, raid10 or single.
> +values are of the form C[D][S[P]], where  is the number of copies
> +of data,  is the number of stripes per copy, and  is the number of 
> parity
> +stripes. The  parameter must (currently) be a literal "m", indicating that
> +as many stripes as possible will be used. The letter D may be added to the
> +number of copies, to indicate non-redundant copies (e.g. on the same device).
> +
> +The following deprecated values may also be used:
> +.RS 16
> +.P
> +single   1C
> +.P
> +raid01CmS
> +.P
> +raid12C
> +.P
> +dup  2CD
> +.P
> +raid10   2CmS
> +.P
> +raid51CmS1P
> +.P
> +raid61CmS2P
> +.RS -16
>  .TP
>  \fB\-f\fR
>  Force overwrite when an existing filesystem is detected on the device.


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support

2013-03-10 Thread Martin Steigerwald
Am Sonntag, 10. März 2013 schrieb Martin Steigerwald:
> Am Sonntag, 10. März 2013 schrieb Goffredo Baroncelli:
> > Hi all,
> 
> Hi Goffredo,
> 
> > This is the third attempt of my patches related to show how the data
> > are stored in a btrfs filesystem. I rebased all the patches on the
> > latest mason git. I tried to address the Zach concern abou the using
> > of the string_list_add() in the df_pretty_sizes(): string_list_add()
> > is removed from the df_pretty_sizes() and I created the new function
> > sla_pretty_sizes() which calls df_pretty_sizes() and
> > string_list_add().
> 
> Thanks for your new round of patches.

My MTA returned on my first answer to you:

: host mta5.am0.yahoodns.net[74.6.136.244] 
said:
554 delivery error: dd This user doesn't have a yahoo.com account
(goffredo.baronce...@yahoo.com) [-5] - mta1233.mail.sk1.yahoo.com (in 
reply
to end of DATA command)

Replying to list cause obviously I cannot reach your personal mail address 
right now.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support

2013-03-10 Thread Martin Steigerwald
Am Sonntag, 10. März 2013 schrieb Goffredo Baroncelli:
> Hi all, 

Hi Goffredo,

> This is the third attempt of my patches related to show how the data
> are stored in a btrfs filesystem. I rebased all the patches on the latest
> mason git. I tried to address the Zach concern abou the using of
> the string_list_add() in the df_pretty_sizes(): string_list_add() is
> removed from the df_pretty_sizes() and I created the new function 
> sla_pretty_sizes() which calls df_pretty_sizes() and string_list_add().

Thanks for your new round of patches.

> Unfortunately I noticed a regression which passed all the reviews until
> now: the command btrfs fi df previous didn't require the root
> capability, now with my patches it is required, because I need to know
> some info about the chunks so I need to use the "BTRFS_IOC_TREE_SEARCH".
> 
> I think that there are the following possibilities:
> 1) accept this regresssion
> 2) remove the command "btrfs fi df" and leave only "btrfs fi disk-usage"
> and "btrfs dev disk-usage"
> 3) adding a new ioctl which could be used without root capability. Of
> course this ioctl would return only a subset of the
> BTRFS_IOC_TREE_SEARCH info
> 
> I think that the 3) would be the "long term" solution. I am not happy
> about the 1), so as "short term solution" I think that we should go with
> the 2). What do you think ?

Uhm, but exactly the new btrfs fi df contains a good overview:

> Below the description of the patches.
> 
> --
> 
> These patches update the btrfs fi df command and add two new commands:
> - btrfs filesystem disk-usage 
> - btrfs device disk-usage 
> 
> The command "btrfs filesystem df" now shows only the disk
> usage/available.
> 
> $ sudo btrfs filesystem df /mnt/btrfs1/
> Disk size:   400.00GB
> Disk allocated:8.04GB
> Disk unallocated:391.97GB
> Used: 11.29MB
> Free (Estimated):250.45GB   (Max: 396.99GB, min: 201.00GB)
> Data to disk ratio:  63 %
> 
> The "Free (Estimated)" tries to give an estimation of the free space
> on the basis of the chunks usage. Max and min are the maximum allowable
> space (if the next chunk are allocated as SINGLE) or the minimum one (
> if the next chunks are allocated as DUP/RAID1/RAID10).

What information fi df can´t display without root permissions? Maybe its
okay to just omit it for now if being run as user or display a "run as root" 
hint instead?

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] Add command btrfs filesystem disk-usage

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Signed-off-by: Goffredo Baroncelli 
---
 cmds-fi-disk_usage.c |  432 ++
 cmds-fi-disk_usage.h |3 +
 cmds-filesystem.c|2 +
 utils.c  |   58 +++
 utils.h  |3 +
 5 files changed, 498 insertions(+)

diff --git a/cmds-fi-disk_usage.c b/cmds-fi-disk_usage.c
index 50b2fae..cb680e6 100644
--- a/cmds-fi-disk_usage.c
+++ b/cmds-fi-disk_usage.c
@@ -20,11 +20,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "utils.h"
 #include "kerncompat.h"
 #include "ctree.h"
 #include "string_list.h"
+#include "string_table.h"
 
 #include "commands.h"
 
@@ -45,6 +47,13 @@ struct chunk_info {
u64 num_stripes;
 };
 
+/* to store information about the disks */
+struct disk_info {
+   u64 devid;
+   charpath[BTRFS_DEVICE_PATH_NAME_MAX];
+   u64 size;
+};
+
 /*
  * Pretty print the size
  */
@@ -528,3 +537,426 @@ int cmd_filesystem_df(int argc, char **argv)
return 0;
 }
 
+/*
+ *  Helper to sort the disk_info structure
+ */
+static int cmp_disk_info(const void *a, const void *b)
+{
+   return strcmp(((struct disk_info *)a)->path,
+   ((struct disk_info *)b)->path);
+}
+
+/*
+ *  This function load the disk_info structure and put them in an array
+ */
+static int load_disks_info(int fd,
+  struct disk_info **disks_info_ptr,
+  int *disks_info_count)
+{
+
+   int ret, i, ndevs;
+   struct btrfs_ioctl_fs_info_args fi_args;
+   struct btrfs_ioctl_dev_info_args dev_info;
+   struct disk_info *info;
+
+   *disks_info_count = 0;
+   *disks_info_ptr = 0;
+
+   ret = ioctl(fd, BTRFS_IOC_FS_INFO, &fi_args);
+   if (ret < 0) {
+   fprintf(stderr, "ERROR: cannot get filesystem info\n");
+   return -1;
+   }
+
+   info = malloc(sizeof(struct disk_info) * fi_args.num_devices);
+   if (!info) {
+   fprintf(stderr, "ERROR: not enough memory\n");
+   return -1;
+   }
+
+   for (i = 0, ndevs = 0 ; i <= fi_args.max_id ; i++) {
+
+   BUG_ON(ndevs >= fi_args.num_devices);
+   ret = get_device_info(fd, i, &dev_info);
+
+   if (ret == -ENODEV)
+   continue;
+   if (ret) {
+   fprintf(stderr,
+   "ERROR: cannot get info about device devid=%d\n",
+   i);
+   free(info);
+   return -1;
+   }
+
+   info[ndevs].devid = dev_info.devid;
+   strcpy(info[ndevs].path, (char *)dev_info.path);
+   info[ndevs].size = get_partition_size((char *)dev_info.path);
+   ++ndevs;
+   }
+
+   BUG_ON(ndevs != fi_args.num_devices);
+   qsort(info, fi_args.num_devices,
+   sizeof(struct disk_info), cmp_disk_info);
+
+   *disks_info_count = fi_args.num_devices;
+   *disks_info_ptr = info;
+
+   return 0;
+
+}
+
+/*
+ *  This function computes the size of a chunk in a disk
+ */ 
+static u64 calc_chunk_size(struct chunk_info *ci)
+{
+   if (ci->type & BTRFS_BLOCK_GROUP_RAID0)
+   return ci->size / ci->num_stripes;
+   else if (ci->type & BTRFS_BLOCK_GROUP_RAID1)
+   return ci->size ;
+   else if (ci->type & BTRFS_BLOCK_GROUP_DUP)
+   return ci->size ;
+   else if (ci->type & BTRFS_BLOCK_GROUP_RAID5)
+   return ci->size / (ci->num_stripes -1);
+   else if (ci->type & BTRFS_BLOCK_GROUP_RAID6)
+   return ci->size / (ci->num_stripes -2);
+   else if (ci->type & BTRFS_BLOCK_GROUP_RAID10)
+   return ci->size / ci->num_stripes;
+   return ci->size;
+}
+
+/*
+ *  This function print the results of the command btrfs fi disk-usage
+ *  in tabular format
+ */
+static void _cmd_filesystem_disk_usage_tabular(int mode,
+   struct btrfs_ioctl_space_args *sargs,
+   struct chunk_info *chunks_info_ptr,
+   int chunks_info_count,
+   struct disk_info *disks_info_ptr,
+   int disks_info_count)
+{
+   int i;
+   u64 total_unused = 0;
+   struct string_table *matrix = 0;
+   int  ncols, nrows;
+   
+
+   ncols = sargs->total_spaces + 2;
+   nrows = 2 + 1 + disks_info_count + 1 + 2;
+
+   matrix = table_create(ncols, nrows);
+   if (!matrix) {
+   fprintf(stderr, "ERROR: not enough memory\n");
+   return;
+   }
+
+   /* header */
+   for (i = 0; i < sargs->total_spaces; i++) {
+   const char *description;
+
+   u64 flags = sargs->spaces[i].flags;
+   description = btrfs_flags2description(flags);
+
+   table

[PATCH 7/8] Add btrfs device disk-usage command

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Signed-off-by: Goffredo Baroncelli 
---
 cmds-device.c|3 ++
 cmds-fi-disk_usage.c |  141 ++
 cmds-fi-disk_usage.h |3 ++
 3 files changed, 147 insertions(+)

diff --git a/cmds-device.c b/cmds-device.c
index 198ad68..0dbc02c 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -27,6 +27,7 @@
 #include "ctree.h"
 #include "ioctl.h"
 #include "utils.h"
+#include "cmds-fi-disk_usage.h"
 
 #include "commands.h"
 
@@ -403,6 +404,8 @@ const struct cmd_group device_cmd_group = {
{ "scan", cmd_scan_dev, cmd_scan_dev_usage, NULL, 0 },
{ "ready", cmd_ready_dev, cmd_ready_dev_usage, NULL, 0 },
{ "stats", cmd_dev_stats, cmd_dev_stats_usage, NULL, 0 },
+   { "disk-usage", cmd_device_disk_usage,
+   cmd_device_disk_usage_usage, NULL, 0 },
{ 0, 0, 0, 0, 0 }
}
 };
diff --git a/cmds-fi-disk_usage.c b/cmds-fi-disk_usage.c
index cb680e6..f64e483 100644
--- a/cmds-fi-disk_usage.c
+++ b/cmds-fi-disk_usage.c
@@ -959,4 +959,145 @@ int cmd_filesystem_disk_usage(int argc, char **argv)
return 0;
 }
 
+static void print_disk_chunks(int fd,
+   u64 devid,
+   u64 total_size,
+   struct chunk_info *chunks_info_ptr,
+   int chunks_info_count,
+   int mode)
+{
+   int i;
+   u64 allocated = 0;
+   char *s;
+
+   for (i = 0 ; i < chunks_info_count ; i++) {
+   const char *description;
+   const char *r_mode;
+   u64 flags;
+   u64 size;
+
+   if (chunks_info_ptr[i].devid != devid)
+   continue;
+
+   flags = chunks_info_ptr[i].type;
+
+   description = btrfs_flags2description(flags);
+   r_mode = btrfs_flags2profile(flags);
+   size = calc_chunk_size(chunks_info_ptr+i);
+   s = sla_pretty_sizes(size, mode);
+   printf("   %s,%s:%*s%10s\n",
+   description,
+   r_mode,
+   (int)(20 - strlen(description) - strlen(r_mode)), "",
+   s);
+
+   allocated += size;
+
+   }
+   s = sla_pretty_sizes(total_size - allocated, mode);
+   printf("   Unallocated: %*s%10s\n",
+   (int)(20 - strlen("Unallocated")), "",
+   s);
+
+}
+
+static int _cmd_device_disk_usage(int fd, char *path, int mode)
+{
+   int i;
+   int ret = 0;
+   int info_count = 0;
+   struct chunk_info *info_ptr = 0;
+   struct disk_info *disks_info_ptr = 0;
+   int disks_info_count = 0;
+
+   if (load_chunk_info(fd, &info_ptr, &info_count) ||
+   load_disks_info(fd, &disks_info_ptr, &disks_info_count)) {
+   ret = -1;
+   goto exit;
+   }
+
+   for (i = 0 ; i < disks_info_count ; i++) {
+   char *s;
+
+   s = sla_pretty_sizes(disks_info_ptr[i].size, mode);
+   printf("%s\t%10s\n", disks_info_ptr[i].path, s);
+
+   print_disk_chunks(fd, disks_info_ptr[i].devid,
+   disks_info_ptr[i].size,
+   info_ptr, info_count,
+   mode);
+   printf("\n");
+
+   }
+
+
+exit:
+
+   string_list_free();
+   if (disks_info_ptr)
+   free(disks_info_ptr);
+   if (info_ptr)
+   free(info_ptr);
+
+   return ret;
+}
+
+const char * const cmd_device_disk_usage_usage[] = {
+   "btrfs device disk-usage [-b]  [..]",
+   "Show which chunks are in a device.",
+   "",
+   "-b\tSet byte as unit",
+   NULL
+};
+
+int cmd_device_disk_usage(int argc, char **argv)
+{
+
+   int flags = DF_HUMAN_UNIT;
+   int i, more_than_one = 0;
+
+   optind = 1;
+   while (1) {
+   charc = getopt(argc, argv, "b");
+
+   if (c < 0)
+   break;
+
+   switch (c) {
+   case 'b':
+   flags &= ~DF_HUMAN_UNIT;
+   break;
+   default:
+   usage(cmd_device_disk_usage_usage);
+   }
+   }
+
+   if (check_argc_min(argc - optind, 1)) {
+   usage(cmd_device_disk_usage_usage);
+   return 21;
+   }
+
+   for (i = optind; i < argc ; i++) {
+   int r, fd;
+   if (more_than_one)
+   printf("\n");
+
+   fd = open_file_or_dir(argv[i]);
+   if (fd < 0) {
+   fprintf(stderr, "ERROR: can't access to '%s'\n",
+   argv[1]);
+   return 12;
+   }
+   r = _cmd_device_disk_usage(fd, argv[i], flags);

[PATCH 8/8] Create a new entry in btrfs man page for btrfs device disk-usage.

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Signed-off-by: Goffredo Baroncelli 
---
 man/btrfs.8.in |8 
 1 file changed, 8 insertions(+)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 50dc510..e60c81f 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -46,6 +46,8 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBdevice delete\fP\fI  [...]  \fP
 .PP
+\fBbtrfs\fP \fBdevice disk-usage\fP\fI [-b]  [...] \fP
+.PP
 \fBbtrfs\fP \fBreplace start\fP \fI[-Bfr] |  
\fP
 .PP
 \fBbtrfs\fP \fBreplace status\fP \fI[-1] \fP
@@ -360,6 +362,12 @@ Add device(s) to the filesystem identified by \fI\fR.
 Remove device(s) from a filesystem identified by \fI\fR.
 .TP
 
+\fBdevice disk-usage\fR\fI [-b]  [..] \fR
+Show which chunks are in a device.
+
+\fB-b\fP set byte as unit.
+.TP
+
 \fBdevice scan\fR \fI[--all-devices| [...]\fR
 If one or more devices are passed, these are scanned for a btrfs filesystem. 
 If no devices are passed, \fBbtrfs\fR scans all the block devices listed
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] Add helpers functions to handle the printing of data in tabular format.

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

This patch adds some functions to manage the printing of the data in
tabular format.

The function
struct string_table *table_create(int columns, int rows)
creates an (empty) table.

The functions
char *table_printf(struct string_table *tab, int column,
int row, char *fmt, ...)
char *table_vprintf(struct string_table *tab, int column,
int row, char *fmt, va_list ap)
populate the table with text. To align the text to the left, the text
shall be prefixed with '<', otherwise the text shall be prefixed by a
'>'. If the first character is a '=', the the text is replace by a
sequence of '=' to fill the column width.

The function
void table_free(struct string_table *)
frees all the data associated to the table.

The function
void table_dump(struct string_table *tab)
prints the table on stdout.

Signed-off-by: Goffredo Baroncelli 
---
 Makefile   |2 +-
 string_table.c |  157 
 string_table.h |   36 +
 3 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 string_table.c
 create mode 100644 string_table.h

diff --git a/Makefile b/Makefile
index bd792b6..fd1b312 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
print-tree.o \
  inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \
  volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \
  send-stream.o send-utils.o qgroup.o raid6.o \
- string_list.o
+ string_list.o string_table.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
   cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-fi-disk_usage.o
diff --git a/string_table.c b/string_table.c
new file mode 100644
index 000..7efff87
--- /dev/null
+++ b/string_table.c
@@ -0,0 +1,157 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "string_table.h"
+
+/*
+ *  This function create an array of char * which will represent a table
+ */
+struct string_table *table_create(int columns, int rows)
+{
+   struct string_table *p;
+   int size;
+   
+   
+   size = sizeof( struct string_table ) + 
+   rows * columns* sizeof(char *);
+   p = calloc(1, size);
+
+   if (!p) return NULL;
+
+   p->ncols = columns;
+   p->nrows = rows;
+
+   return p;
+}
+
+/*
+ * This function is like a vprintf, but store the results in a cell of
+ * the table.
+ * If fmt  starts with '<', the text is left aligned; if fmt starts with
+ * '>' the text is right aligned. If fmt is equal to '=' the text will
+ * be replaced by a '=' dimensioned on the basis of the column width
+ */
+char *table_vprintf(struct string_table *tab, int column, int row,
+ char *fmt, va_list ap)
+{
+   int idx = tab->ncols*row+column;
+   char *msg = calloc(100, sizeof(char));
+
+   if (!msg)
+   return NULL;
+
+   if (tab->cells[idx])
+   free(tab->cells[idx]);
+   tab->cells[idx] = msg;
+   vsnprintf(msg, 99, fmt, ap);
+
+   return msg;
+}
+
+
+/*
+ * This function is like a printf, but store the results in a cell of
+ * the table.
+ */
+char *table_printf(struct string_table *tab, int column, int row,
+ char *fmt, ...)
+{
+   va_list ap;
+   char *ret;
+
+   va_start(ap, fmt);
+   ret = table_vprintf(tab, column, row, fmt, ap);
+   va_end(ap);
+
+   return ret;
+}
+
+/*
+ * This function dumps the table. Every "=" string will be replaced by
+ * a "===" length as the column
+ */
+void table_dump(struct string_table *tab)
+{
+   int sizes[tab->ncols];
+   int i, j;
+
+   for (i = 0 ; i < tab->ncols ; i++) {
+   sizes[i] = 0;
+   for (j = 0 ; j < tab->nrows ; j++) {
+   int idx = i + j*tab->ncols;
+   int s;
+
+   if (!tab->cells[idx])
+   continue;
+
+   s = strlen(tab->cells[idx]) - 1;
+   if (s < 1 || tab->cells[idx][0] == '=')
+

[PATCH 6/8] Create entry in man page for btrfs filesystem disk-usage

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Signed-off-by: Goffredo Baroncelli 
---
 man/btrfs.8.in |   13 +
 1 file changed, 13 insertions(+)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index e2f86ea..50dc510 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -29,6 +29,9 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBfilesystem resize\fP\fI [devid:][+/\-][gkm]|[devid:]max 
\fP
 .PP
+\fBbtrfs\fP \fBfilesystem filesystem disk-usage [-t][-b]\fP\fI  
+[path..]\fP
+.PP
 \fBbtrfs\fP \fBfilesystem label\fP\fI  [newlabel]\fP
 .PP
 \fBbtrfs\fP \fBfilesystem df\fP\fI [-b] \fIpath [path..]\fR\fP
@@ -251,6 +254,16 @@ it with the new desired size.  When recreating the 
partition make sure to use
 the same starting disk cylinder as before.
 .TP
 
+\fBfilesystem disk-usage\fP [-t][-b] \fIpath [path..]\fR
+
+Show in which disk the chunks are allocated.
+
+\fB-b\fP Set byte as unit
+
+\fB-t\fP Show data in tabular format
+
+.TP
+
 \fBfilesystem label\fP\fI  [newlabel]\fP
 Show or update the label of a filesystem. \fI\fR is used to identify the
 filesystem. 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] Create the man page entry for the command btrfs fi df

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Signed-off-by: Goffredo Baroncelli 
---
 man/btrfs.8.in |   49 +
 1 file changed, 49 insertions(+)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 94f4ffe..e2f86ea 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -31,6 +31,8 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBfilesystem label\fP\fI  [newlabel]\fP
 .PP
+\fBbtrfs\fP \fBfilesystem df\fP\fI [-b] \fIpath [path..]\fR\fP
+.PP
 \fBbtrfs\fP \fBfilesystem balance\fP\fI  \fP
 .PP
 \fBbtrfs\fP \fBdevice scan\fP\fI [--all-devices| [...]]\fP
@@ -266,6 +268,53 @@ NOTE: Currently there are the following limitations:
 - the filesystem should not have more than one device.
 .TP
 
+\fBfilesystem df\fP [-b] \fIpath [path..]\fR
+
+Show space usage information for a mount point.
+
+\fB-b\fP Set byte as unit
+
+The command \fBbtrfs filesystem df\fP is used to query how many space on the 
+disk(s) are used and an estimation of the free
+space of the filesystem.
+The output of the command \fBbtrfs filesystem df\fP shows:
+
+.RS
+.IP \fBDisk\ size\fP
+the total size of the disks which compose the filesystem.
+
+.IP \fBDisk\ allocated\fP
+the size of the area of the disks used by the chunks.
+
+.IP \fBDisk\ unallocated\fP 
+the size of the area of the disks which is free (i.e.
+the differences of the values above).
+
+.IP \fBUsed\fP
+the portion of the logical space used by the file and metadata.
+
+.IP \fBFree\ (estimated)\fP
+the estimated free space available: i.e. how many space can be used
+by the user. The evaluation 
+cannot be rigorous because it depends by the allocation policy (DUP, Single,
+RAID1...) of the metadata and data chunks. If every chunk is stored as
+"Single" the sum of the \fBfree (estimated)\fP space and the \fBused\fP 
+space  is equal to the \fBdisk size\fP.
+Otherwise if all the chunk are mirrored (raid1 or raid10) or duplicated
+the sum of the \fBfree (estimated)\fP space and the \fBused\fP space is
+half of the \fBdisk size\fP. Normally the \fBfree (estimated)\fP is between
+these two limits.
+
+.IP \fBData\ to\ disk\ ratio\fP
+the ratio betwen the \fBlogical size\fP (i.e. the space available by
+the chunks) and the \fBdisk allocated\fP (by the chunks). Normally it is 
+lower than 100% because the metadata is duplicated for security reasons.
+If all the data and metadata are duplicated (or have a profile like RAID1)
+the \fBData\ to\ disk\ ratio\fP could be 50%.
+
+.RE
+.TP
+
 \fBfilesystem show\fR [--all-devices||]\fR
 Show the btrfs filesystem with some additional info. If no \fIUUID\fP or 
 \fIlabel\fP is passed, \fBbtrfs\fR show info of all the btrfs filesystem.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] Enhance the command btrfs filesystem df.

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Enhance the command "btrfs filesystem df" to show space usage information
for a mount point(s). It shows also an estimation of the space available,
on the basis of the current one used.

Signed-off-by: Goffredo Baroncelli 
---
 Makefile |2 +-
 cmds-fi-disk_usage.c |  530 ++
 cmds-fi-disk_usage.h |   25 +++
 cmds-filesystem.c|  125 +---
 ctree.h  |   17 +-
 utils.c  |   14 ++
 utils.h  |2 +
 7 files changed, 589 insertions(+), 126 deletions(-)
 create mode 100644 cmds-fi-disk_usage.c
 create mode 100644 cmds-fi-disk_usage.h

diff --git a/Makefile b/Makefile
index 0d6c43a..bd792b6 100644
--- a/Makefile
+++ b/Makefile
@@ -9,7 +9,7 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
print-tree.o \
  string_list.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
-  cmds-quota.o cmds-qgroup.o cmds-replace.o
+  cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-fi-disk_usage.o
 
 CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \
-Wuninitialized -Wshadow -Wundef
diff --git a/cmds-fi-disk_usage.c b/cmds-fi-disk_usage.c
new file mode 100644
index 000..50b2fae
--- /dev/null
+++ b/cmds-fi-disk_usage.c
@@ -0,0 +1,530 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "kerncompat.h"
+#include "ctree.h"
+#include "string_list.h"
+
+#include "commands.h"
+
+#include "version.h"
+
+#define DF_HUMAN_UNIT  (1<<0)
+
+/* 
+ * To store the size information about the chunks:
+ * the chunks info are grouped by the tuple (type, devid, num_stripes),
+ * i.e. if two chunks are of the same type (RAID1, DUP...), are on the
+ * same disk, have the same stripes then their sizes are grouped
+ */
+struct chunk_info {
+   u64 type;
+   u64 size;
+   u64 devid;
+   u64 num_stripes;
+};
+
+/*
+ * Pretty print the size
+ */
+static char *df_pretty_sizes(u64 size, int mode)
+{
+   char *s;
+
+   if (mode & DF_HUMAN_UNIT) {
+   s = pretty_sizes(size);
+   if (!s)
+   return NULL;
+   } else {
+   s = malloc(21);
+   if (!s)
+   return NULL;
+   sprintf(s, "%llu", size);
+   }
+
+   return s;
+}
+
+/*
+ * This function is like the one above, only it passes the string buffer
+ * to the string_list_add function to be able to free all the strings togheter
+ * with the string_list_free() function
+ */
+static char *sla_pretty_sizes(u64 size, int mode)
+{
+   return string_list_add(df_pretty_sizes(size,mode));
+}
+
+/*
+ * Add the chunk info to the chunk_info list
+ */
+static int add_info_to_list(struct chunk_info **info_ptr,
+   int *info_count,
+   struct btrfs_chunk *chunk)
+{
+
+   u64 type = btrfs_stack_chunk_type(chunk);
+   u64 size = btrfs_stack_chunk_length(chunk);
+   int num_stripes = btrfs_stack_chunk_num_stripes(chunk);
+   int j;
+
+   for (j = 0 ; j < num_stripes ; j++) {
+   int i;
+   struct chunk_info *p = 0;
+   struct btrfs_stripe *stripe;
+   u64devid;
+
+   stripe = btrfs_stripe_nr(chunk, j);
+   devid = btrfs_stack_stripe_devid(stripe);
+
+   for (i = 0 ; i < *info_count ; i++)
+   if ((*info_ptr)[i].type == type &&
+   (*info_ptr)[i].devid == devid &&
+   (*info_ptr)[i].num_stripes == num_stripes ) {
+   p = (*info_ptr) + i;
+   break;
+   }
+
+   if (!p) {
+   int size = sizeof(struct btrfs_chunk) * (*info_count+1);
+   struct chunk_info *res = realloc(*info_ptr, size);
+
+   if (!res) {
+   fprintf(stderr, "ERROR: not enough memory\n");
+   return -1;
+   }
+
+   *info_ptr = res;
+

[PATCH 1/8] Add some helpers to manage the strings allocation/deallocation.

2013-03-10 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

This patch adds some helpers to manage the strings allocation and
deallocation.
The function string_list_add(char *) adds the passed string to a list;
the function string_list_free() frees all the strings together.

Signed-off-by: Goffredo Baroncelli 
---
 Makefile  |3 ++-
 string_list.c |   65 +
 string_list.h |   23 
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 string_list.c
 create mode 100644 string_list.h

diff --git a/Makefile b/Makefile
index 596bf93..0d6c43a 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,8 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o 
print-tree.o \
  root-tree.o dir-item.o file-item.o inode-item.o \
  inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \
  volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \
- send-stream.o send-utils.o qgroup.o raid6.o
+ send-stream.o send-utils.o qgroup.o raid6.o \
+ string_list.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
   cmds-quota.o cmds-qgroup.o cmds-replace.o
diff --git a/string_list.c b/string_list.c
new file mode 100644
index 000..f840048
--- /dev/null
+++ b/string_list.c
@@ -0,0 +1,65 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "string_list.h"
+
+/* To store the strings */
+static void **strings_to_free;
+static int count_string_to_free;
+
+/* 
+ * Add a string to the dynamic allocated string list
+ */
+char *string_list_add(char *s)
+{
+   int  size;
+
+   size = sizeof(void *) * ++count_string_to_free;
+   strings_to_free = realloc(strings_to_free, size);
+
+   /* if we don't have enough memory, we have more serius
+  problem than that a wrong handling of not enough memory */
+   if (!strings_to_free) {
+   fprintf(stderr, "add_string_to_free(): Not enough memory\n");
+   count_string_to_free = 0;
+   return NULL;
+   }
+
+   strings_to_free[count_string_to_free-1] = s;
+   return s;
+}
+
+/*
+ * Free the dynamic allocated strings list
+ */
+void string_list_free()
+{
+   int i;
+   for (i = 0 ; i < count_string_to_free ; i++)
+   free(strings_to_free[i]);
+
+   free(strings_to_free);
+
+   strings_to_free = 0;
+   count_string_to_free = 0;
+}
+
+
diff --git a/string_list.h b/string_list.h
new file mode 100644
index 000..fdc027d
--- /dev/null
+++ b/string_list.h
@@ -0,0 +1,23 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef STRING_LIST_H
+#define STRING_LIST_H
+
+char *string_list_add(char *s);
+void string_list_free();
+
+#endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support

2013-03-10 Thread Goffredo Baroncelli
Hi all, 

This is the third attempt of my patches related to show how the data
are stored in a btrfs filesystem. I rebased all the patches on the latest
mason git. I tried to address the Zach concern abou the using of
the string_list_add() in the df_pretty_sizes(): string_list_add() is
removed from the df_pretty_sizes() and I created the new function 
sla_pretty_sizes() which calls df_pretty_sizes() and string_list_add().

Unfortunately I noticed a regression which passed all the reviews until now:
the command btrfs fi df previous didn't require the root capability,
now with my patches it is required, because I need to know some info
about the chunks so I need to use the "BTRFS_IOC_TREE_SEARCH".

I think that there are the following possibilities:
1) accept this regresssion
2) remove the command "btrfs fi df" and leave only "btrfs fi disk-usage" and
   "btrfs dev disk-usage"
3) adding a new ioctl which could be used without root capability. Of course
   this ioctl would return only a subset of the BTRFS_IOC_TREE_SEARCH info

I think that the 3) would be the "long term" solution. I am not happy about
the 1), so as "short term solution" I think that we should go with the 2).
What do you think ?

Below the description of the patches.

--

These patches update the btrfs fi df command and add two new commands:
- btrfs filesystem disk-usage 
- btrfs device disk-usage 

The command "btrfs filesystem df" now shows only the disk usage/available.

$ sudo btrfs filesystem df /mnt/btrfs1/
Disk size:   400.00GB
Disk allocated:8.04GB
Disk unallocated:391.97GB
Used: 11.29MB
Free (Estimated):250.45GB   (Max: 396.99GB, min: 201.00GB)
Data to disk ratio:  63 %

The "Free (Estimated)" tries to give an estimation of the free space
on the basis of the chunks usage. Max and min are the maximum allowable
space (if the next chunk are allocated as SINGLE) or the minimum one (
if the next chunks are allocated as DUP/RAID1/RAID10).

The other two commands show the chunks in the disks.

$ sudo btrfs filesystem disk-usage /mnt/btrfs1/
Data,Single: Size:8.00MB, Used:0.00
   /dev/vdb 8.00MB

Data,RAID6: Size:2.00GB, Used:11.25MB
   /dev/vdb 1.00GB
   /dev/vdc 1.00GB
   /dev/vdd 1.00GB
   /dev/vde 1.00GB

Metadata,Single: Size:8.00MB, Used:0.00
   /dev/vdb 8.00MB

Metadata,RAID5: Size:3.00GB, Used:36.00KB
   /dev/vdb 1.00GB
   /dev/vdc 1.00GB
   /dev/vdd 1.00GB
   /dev/vde 1.00GB

System,Single: Size:4.00MB, Used:0.00
   /dev/vdb 4.00MB

System,RAID5: Size:12.00MB, Used:4.00KB
   /dev/vdb 4.00MB
   /dev/vdc 4.00MB
   /dev/vdd 4.00MB
   /dev/vde 4.00MB

Unallocated:
   /dev/vdb97.98GB
   /dev/vdc98.00GB
   /dev/vdd98.00GB
   /dev/vde98.00GB

or in tabular format

$ sudo ./btrfs filesystem disk-usage -t /mnt/btrfs1/
 Data   DataMetadata Metadata System System 
 Single RAID6   Single   RAID5Single RAID5   Unallocated

/dev/vdb 8.00MB  1.00GB   8.00MB   1.00GB 4.00MB  4.00MB 97.98GB
/dev/vdc  -  1.00GB-   1.00GB  -  4.00MB 98.00GB
/dev/vdd  -  1.00GB-   1.00GB  -  4.00MB 98.00GB
/dev/vde  -  1.00GB-   1.00GB  -  4.00MB 98.00GB
 == ===   == === ===
Total8.00MB  2.00GB   8.00MB   3.00GB 4.00MB 12.00MB391.97GB
Used   0.00 11.25MB 0.00  36.00KB   0.00  4.00KB

These are the most complete output, where it is possible to know which
disk a chunk uses and the usage of every chunk.

Finally the last command shows which chunks a disk hosts:

$ sudo ./btrfs device disk-usage /mnt/btrfs1/
/dev/vdb  100.00GB
   Data,Single:  8.00MB
   Data,RAID6:   1.00GB
   Metadata,Single:  8.00MB
   Metadata,RAID5:   1.00GB
   System,Single:4.00MB
   System,RAID5: 4.00MB
   Unallocated: 97.98GB

/dev/vdc  100.00GB
   Data,RAID6:   1.00GB
   Metadata,RAID5:   1.00GB
   System,RAID5: 4.00MB
   Unallocated: 98.00GB

/dev/vdd  100.00GB
   Data,RAID6:   1.00GB
   Metadata,RAID5:   1.00GB
   System,RAID5: 4.00MB
   Unallocated: 98.00GB

/dev/vde  100.00GB
   Data,RAID6:   1.00GB
   Metadata,RAID5:   1.00GB
   System,RAID5: 4.00MB
   Unallocated: 98.00GB

More or less are the same information above, only grouped by disk.
Unfortunately I don't have any information about the chunk usage per disk basis.

Comments are welcome.
The code is pullable from
http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git
branch
df-du-raid56

BR
G.Baroncelli

[1] http://permalink

Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/03/13 22:37, Harald Glatt wrote:
> I have to add something to my own message: Even the notion of thinking 
> in 'how many devices do I want to give away for redundancy' is 
> outdated...

Devices are the only real tangible thing that you can actually point at.
They are the final physical unit and what gets added or replaced, and can
be put in another machine for recovery.

> What it really comes down to is how much space am I willing to
> sacrifice so that my reliablilty is increasing.

The answer as a whole is simple - you want all unused space for
reliability and performance.  If you put 5GB of data on 10GB of space then
50% is the value.

> Rather than addressing that at a per-drive level with a futuristic fs
> like btrfs I think setting a percent value of total space would be
> best.

The problem is that drives are what fail, can be added or removed.  A
percent value makes that considerably harder to deal with especially
changes after the initial setup.

It would be nice to be able to indicate that same data/files/directories
are more important than others (eg cache/spool/trash directories are
unimportant, my documents and photos are very important).

> This is pretty much what it would be like in a perfect world, in my
> opinion :)

Conceptually I want to point at the drives and say "do the right thing"
without any further configuration, monitoring or micro-managing.  In the
short/medium term I'd even be happy to run a btrfs cron job that digs
around, rearranges things and gives me a final report with a colour coding
homeland security style.  When it hits yellow, it is time to delete files
or add more storage.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE8cwIACgkQmOOfHg372QSjUACfcDn1lB/wHXZH9E5a5elUlAT3
Y+QAoL9cNtjMpdZtqYH+t6QXTcOYwYBy
=UMKJ
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Martin Steigerwald
Am Sonntag, 10. März 2013 schrieb Harald Glatt:
> > Very good points,
> > 
> > I was also gonna write something by the lines of 'all that matters is
> > achieving the minimum amount of redundancy, as requested by the user,
> > at the maximum possible performance'.
> > 
> > After reading your post now, Roger, I'm much more clear on what I
> > actually wanted to say, which is pretty much the same thing:
> > 
> > In paradise really all I would have to tell btrfs is how many drives
> > I'm willing to give away so that they will be used exclusively for
> > redundancy. Everything else btrfs should figure out by itself. Not
> > just because it's simpler for the user, but also because btrfs
> > actually is in a position to KNOW better.
[…]
> > It sounds quite futuristic to me, but it is definitely something that
> > we have to achieve hopefully rather sooner than later :)
> > 
> > I'm looking forward to it!
> 
> I have to add something to my own message: Even the notion of thinking
> in 'how many devices do I want to give away for redundancy' is
> outdated... What it really comes down to is how much space am I
> willing to sacrifice so that my reliablilty is increasing. Rather than
> addressing that at a per-drive level with a futuristic fs like btrfs I
> think setting a percent value of total space would be best. Here are
> my n hard drives, I want to give up thirty percent of maximum space so
> that basically I can lose that amount of space across any device
> combination and be safe. Do this for me and do it at maximum possible
> performance with the device count, and type, and sizes that I've given
> you. And if I'm not using the filesystem much, if I have tons of space
> free, feel free to build even more redundancy while idle.
> 
> This is pretty much what it would be like in a perfect world, in my
> opinion :) --

Still, the way to a perfect world is done in steps. Thus I agree with Hugo 
to start somewhere and I think for a admin who wants to know and specify 
exactly what is going one the patches Hugo offered for discussions are a 
huge step forward.

I think they are a good base to build upon.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Martin Steigerwald
Hi Hugo,

Am Samstag, 9. März 2013 schrieb Hugo Mills:
>Some time ago, and occasionally since, we've discussed altering the
> "RAID-n" terminology to change it to an "nCmSpP" format, where n is the
> number of copies, m is the number of (data) devices in a stripe per copy,
> and p is the number of parity devices in a stripe.
> 
>The current kernel implementation uses as many devices as it can in
> the striped modes (RAID-0, -10, -5, -6), and in this implementation,
> that is written as "mS" (with a literal "m"). The mS and pP sections are
> omitted if the value is 1S or 0P.
> 
>The magic look-up table for old-style / new-style is:
> 
> single   1C (or omitted, in btrfs fi df output)
> RAID-0   1CmS
> RAID-1   2C
> DUP  2CD

What does the "D" in "2CD" mean? Its not explained above, unless I miss 
something.

> RAID-10  2CmS
> RAID-5   1CmS1P
> RAID-6   1CmS2P

I think its great to clarify the RAID-level terminology, but I find the new 
notation a bit, hmmm, cryptic.

Maybe for displaying it would be nice to show a more verbose format like

2 copies, many stripes, 1 parity (1CmS1P)

by default and the abbreviated one in parentheses?

Any other idea to make it less cryptic?

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html