Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-11-12 Thread Jan Engelhardt

On Friday 2012-09-28 10:58, Hugo Mills wrote:

   Data_to_disk_ratio, maybe?

 Why use underscores instead of spaces?

   So that you can use, say, read in the shell to extract data from
each line. To that end, there should be a space between the value and
the unit throughout.

Eww. Having a special single-line output mode would be much better
for these kinds of integration.

Is it too far fetched to make the info available through sysfs?

 space_used=$(cat /sys/.../space_used)

is so much preferable than an awkful

 space_used=$(btrfs fi df | awk ...)

and hope for that the line is actually in the df output.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-29 Thread Goffredo Baroncelli

On 09/28/2012 10:13 PM, Hugo Mills wrote:

Summary:
  Disk_size: 135.00 GiB
  Disk_allocated: 10.51 GiB
  Disk_unallocated:  124.49 GiB
  Used:2.59 GiB
  Free_(Estimated):   91.93 GiB
  Average_disk_efficiency:  70 %

  Details:
 Chunk-typeMode Disk-allocated Used   Available
 Data  Single4.01GB  2.16GB  1.87GB
 SystemDUP  16.00MB  4.00KB  7.99MB
 SystemSingle4.00MB0.00  4.00MB
 Metadata  DUP   6.00GB429.16MB  2.57GB
 Metadata  Single8.00MB0.00  8.00MB



  Where:
 Disk-allocated -  space used on the disk by the chunk
 Disk-size  -  size of the disk
 Disk-unallocated   -  disk not used in any chunk
 Used   -  space used by the files/metadata

The problem here is that if you're using raw storage, the Used
value in the second stanza grows twice as fast as the user expects.


This is the misunderstanding whom I talked before.

If you give a look at the line Metadata DUP, you can see that the 
disk-allocated are about 6GB, instead if you sum Used and Available you 
got 3GB.


I.e. if you create a 1GB file, Used ever increased of 1GB, and 
Available ever decrease 1GB, whichever you are using DUP or Single or RAID*



I

think this second stanza should at minimum include the cooked values
used in btrfs fi df, because those reflect the user's experience. Then
adding [some of?] the raw values you've got here to help connect the
values to the raw data in the first stanza of output.


The only raw values are the one prefixed with disk. The other ones are 
at the net of the DUP/Single/Raid




As I said above, it's the connection between I wrote a 1GiB file
to my filesystem and why have my numbers increased/decreased by
2GiB(*)/1.2GiB(**)?


I repeat, if the chunk is DUP-ed, if you create 1GB file:
- Disk-allocate increase 2GB (supposing that all the chunks are full)
- Used increase 1GB
- Available decrease 1GB




(*) RAID-1
(**) RAID-5-ish


Ciao
Goffredo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-29 Thread Sébastien Maury

Hi,

First of all, i've to say that i'm not a linux specialist, so that  
means my point of view is balanced between a linux admin and a user.

I may also say stupid things, so pleas excuse me in advance :p

The first difference between the original command and the discussed  
one is on the value for the DUP parts (one has to be multiplied by 2,  
whereas the other is already multiplied by 2).

I think this should be indicated somewhere in order to avoid confusion.
This has been pointed already, but whatever the output is, it is  
essential to know if the value is raw or not, if it has to be  
multiplied or divided.


Also, i do agree with Hugo concerning the output to make it easier to  
parse through scripting.
The units should also be settable in order to have the same units for  
all values.


Basically, this new output is more explicit for me and remove a bit of  
confusion.


Although, the part Average_disk_efficiency seems confusing as i'm  
not sure the term efficiency is correct in that part.
That makes me ask some questions : why this much allocated ? when will  
it allocate more ? how much might be allocated ? ...
So, this percentage doesn't indicate an efficient usage of disk space  
or not ... for me, it indicates that it needed to allocated that  
(depending on the chunk size).
In this example there's indeed 30% of the allocation that is unused,  
but it will be used as data will grow on the disk.
For me it's similar as a LUN created in thick provisioning ... i might  
not need all the space, but i don't want to be stuck if i'll need it.

(dunno if i'm clear on that part)

Am i wrong in saying that Free_(Estimated) is a false value as the  
snapshots size isn't included ?
Let's say i've like 10 GB of snapshots ... then  
Free_(Estimated)=Free_(Estimated)-snaps size ? no ?
Is it possible to include those snaps size somewhere (maybe not to  
include in the summary or details, but to add another section or  
option allowing to have that info) ?


Finally, i do agree about the linearly growth as the best model currently.
For several reasons, some already explained by Hugo, and because as  
far as i understood, there is no single way to know very accurately  
how your disk is used. That said, the point is at least to give the  
most accurate data as possible and to be able to interpret them.
In a production environment, i can't afford to say sorry, the app is  
crashed because my disk is full. So i need a view on what's happening  
on my disk.
Even if it lacks perfect accuracy, i can place thresholds to avoid any  
problem (70% of disk full as a warning for example).


So, i would change some terms i guess indicating more precisely the  
raw data and the already computed ones.
I would also not use the term efficiency as people may wonder at some  
point if they didn't make a mistake using btrfs seeing a % never near  
from 100.

The Data_to_disk_ratio seems preferable for me.

Cordialement,

Sébastien

Goffredo Baroncelli kreij...@gmail.com a écrit :


On 09/28/2012 10:13 PM, Hugo Mills wrote:

Summary:

 Disk_size:  135.00 GiB
 Disk_allocated:  10.51 GiB
 Disk_unallocated:   124.49 GiB
 Used: 2.59 GiB
 Free_(Estimated):91.93 GiB
 Average_disk_efficiency:  70 %

 Details:
Chunk-typeMode Disk-allocated Used   Available
Data  Single4.01GB  2.16GB  1.87GB
SystemDUP  16.00MB  4.00KB  7.99MB
SystemSingle4.00MB0.00  4.00MB
Metadata  DUP   6.00GB429.16MB  2.57GB
Metadata  Single8.00MB0.00  8.00MB



 Where:
Disk-allocated  -  space used on the disk by the chunk
Disk-size   -  size of the disk
Disk-unallocated-  disk not used in any chunk
Used-  space used by the files/metadata

   The problem here is that if you're using raw storage, the Used
value in the second stanza grows twice as fast as the user expects.


This is the misunderstanding whom I talked before.

If you give a look at the line Metadata DUP, you can see that the
disk-allocated are about 6GB, instead if you sum Used and Available you
got 3GB.

I.e. if you create a 1GB file, Used ever increased of 1GB, and
Available ever decrease 1GB, whichever you are using DUP or Single or
RAID*


I

think this second stanza should at minimum include the cooked values
used in btrfs fi df, because those reflect the user's experience. Then
adding [some of?] the raw values you've got here to help connect the
values to the raw data in the first stanza of output.


The only raw values are the one prefixed with disk. The other ones
are at the net of the DUP/Single/Raid



   As I said above, it's the connection between I wrote a 1GiB file
to my filesystem and why have my numbers increased/decreased by
2GiB(*)/1.2GiB(**)?



Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-29 Thread Goffredo Baroncelli

Hi Sébastien,

On 09/29/2012 11:59 AM, Sébastien Maury wrote:

Hi,

First of all, i've to say that i'm not a linux specialist, so that means
my point of view is balanced between a linux admin and a user.
I may also say stupid things, so pleas excuse me in advance :p

The first difference between the original command and the discussed one
is on the value for the DUP parts (one has to be multiplied by 2,
whereas the other is already multiplied by 2).
I think this should be indicated somewhere in order to avoid confusion.
This has been pointed already, but whatever the output is, it is
essential to know if the value is raw or not, if it has to be multiplied
or divided.






Also, i do agree with Hugo concerning the output to make it easier to
parse through scripting.
The units should also be settable in order to have the same units for
all values.


I have added a -k switch, so the output is in KiB unit (I tried bytes 
but so the line will became very long: 164 is about 20 digits in 
decimal form)



Basically, this new output is more explicit for me and remove a bit of
confusion.


Great I reached my 1st goal !



Although, the part Average_disk_efficiency seems confusing as i'm not
sure the term efficiency is correct in that part.
That makes me ask some questions : why this much allocated ? when will
it allocate more ? how much might be allocated ? ...
So, this percentage doesn't indicate an efficient usage of disk space or
not ... for me, it indicates that it needed to allocated that (depending
on the chunk size).
In this example there's indeed 30% of the allocation that is unused, but
it will be used as data will grow on the disk.


The 30% of the disk is/will be used for redundancy purpose. Moreover 
there are the chunk that are pre-allocated area, which could influence 
the free space estimation...



For me it's similar as a LUN created in thick provisioning ... i might
not need all the space, but i don't want to be stuck if i'll need it.
(dunno if i'm clear on that part)

Am i wrong in saying that Free_(Estimated) is a false value as the
snapshots size isn't included ?
Let's say i've like 10 GB of snapshots ... then
Free_(Estimated)=Free_(Estimated)-snaps size ? no ?
Is it possible to include those snaps size somewhere (maybe not to
include in the summary or details, but to add another section or option
allowing to have that info) ?


Free_(Estimated) takes in account also the snapshot. The point is 
another one: the user has to know that updating (i.e. changing part of 
file without increasing its size) a snapshoted file requires space. But 
Used part takes in account all the space used. So Free_(Estimated) it is 
accurate.





Finally, i do agree about the linearly growth as the best model currently.
For several reasons, some already explained by Hugo, and because as far
as i understood, there is no single way to know very accurately how
your disk is used. That said, the point is at least to give the most
accurate data as possible and to be able to interpret them.
In a production environment, i can't afford to say sorry, the app is
crashed because my disk is full. So i need a view on what's happening
on my disk.
Even if it lacks perfect accuracy, i can place thresholds to avoid any
problem (70% of disk full as a warning for example).

So, i would change some terms i guess indicating more precisely the
raw data and the already computed ones.


I would like to uses the Disk prefix. Raw to me creates more 
confusions. However we should highlight that the disk occupation is 
related to the chunks, which means basically a pre-allocation and not 
an using. For example a my filesystem has:


ghigo@venice:~$ btrfs/btrfs-progs/btrfs fi disk /mnt/old-btrfs/
Summary:
  Path: /mnt/old-btrfs/
  Disk_size:232.11GB
  Disk_allocated:   150.29GB
  Disk_unallocated:  81.82GB
  Used:  19.94GB
  Free_(Estimated): 201.16GB
  Average_disk_efficiency:  95 %

Details:
  Chunk-type  ModeDisk-allocatedUsed   Available
  DataSingle136.01GB 18.84GB117.17GB
  System  DUP16.00MB 28.00KB  7.97MB
  System  Single  4.00MB0.00  4.00MB
  MetadataDUP14.25GB  1.10GB  6.03GB
  MetadataSingle  8.00MB0.00  8.00MB


Note that I have 136GB of chunk, but only 18GB are used.

After a btrfs balance start I got a different picture:

Summary:
  Path: /mnt/old-btrfs/
  Disk_size:232.11GB
  Disk_allocated:34.13GB
  Disk_unallocated: 197.98GB
  Used:  19.94GB
  Free_(Estimated): 177.74GB
  Average_disk_efficiency:  85 %

Details:
  Chunk-type  ModeDisk-allocatedUsed   Available
  DataSingle

Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Hugo Mills
On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote:
 On Thu, 27 Sep 2012 23:02:35 +0200
 Goffredo Baroncelli kreij...@libero.it wrote:
 
  Sorry for the space error:
  Below a more correct example
  
  $ btrfs filesystem disk-free /
  Summary:
  Total:  135.00GB
  Allocated:   10.51GB
  Unallocated:124.49GB
  Free_(Estimated)  86.56GB
  Average_disk_efficiency: 62 %
 
 How do you estimate Free here? Sorry I didn't check the source code in git,
 but from the Details below nothing leads me to believe that this FS is
 doomed to only be able to usefully utilize only ~86GB of the partition, and 
 not
 more.
 
 Are you ready to answer the flood of questions from people why their disk is
 only 62% efficient, and how to tune it to 100%? :-)

   Data_to_disk_ratio, maybe?

 Why use underscores instead of spaces?

   So that you can use, say, read in the shell to extract data from
each line. To that end, there should be a space between the value and
the unit throughout.

  Details:
  Chunk-typeMode   AllocatedUsedFree
  --   -   -

   Minor thing: The underlines are largely superfluous. Few basic CL
tools I can think of use them.

  Data  Single4.01GB  2.16GB  1.87GB
  SystemDUP  16.00MB  4.00KB  7.99MB
  SystemSingle4.00MB0.00  4.00MB
  Metadata  DUP   6.00GB429.16MB  2.57GB
  Metadata  Single8.00MB0.00  8.00MB

   I think we need another column here, to indicate how much *actual*
disk space is used by each row, so adding up that column will give you
the Allocated value in the first clause. I think that's probably the
biggest cause of confusion. Raw alloc., maybe, and use the term
raw somewhere in the first clause to hammer the point home.

   My only concern here is that we're a bit too close to the existing
solution (albeit merging the two sets of output), which has proven
itself over time to be somewhat confusing. I think the Alloc_Raw
column is the minimum necessary to link the two in some easily
determinable way. Adding totals to Alloc_Raw, and Used (but not Free
or Alloc) would help, I think. I don't think it's useful to add them
to the Free or Alloc columns, because those figures change as the FS
allocates chunks, and we'll end up with people querying the fact that
the total of Free doesn't add up to any of the figures in the
summary.

   Say, something like this:

Summary_(Raw):
  Total:135.00 GiB
  Allocated: 10.51 GiB
  Unallocated:  124.49 GiB
  Free_(Estimated):  86.56 GiB
  Average_disk_efficiency:  62 %

Details:
  Chunk_type  ModeAlloc_Raw  Alloc  UsedFree
  DataSingle   4.01 GiB   4.01 GiB2.16 GiB  1.87 GiB
  System  DUP 32.00 MiB  16.00 MiB4.00 KiB  7.99 MiB
  System  Single   4.00 MiB   4.00 MiB0.00 B4.00 MiB
  MetadataDUP 12.00 GiB   6.00 GiB  429.16 MiB  2.57 GiB
  MetadataSingle   8.00 MiB   8.00 MiB0.00 B8.00 MiB
  Total   16.04 GiB   2.59 GiB

   The other thing is that there should be a switch (or possibly two)
to give highly machine-readable versions of the output -- no units
(units as bytes by default, with other units settable by a switch),
tab-separated, possibly a different option for each of the above
output clauses.

   Ultimately, I think the bikeshed should be turquoise.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Python is executable pseudocode; perl ---  
is executable line-noise.


signature.asc
Description: Digital signature


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Goffredo Baroncelli

On 09/28/2012 05:17 AM, Roman Mamedov wrote:

On Thu, 27 Sep 2012 23:02:35 +0200
Goffredo Baroncellikreij...@libero.it  wrote:


Sorry for the space error:
Below a more correct example

$ btrfs filesystem disk-free /
Summary:
 Total: 135.00GB
 Allocated:  10.51GB
 Unallocated:   124.49GB
 Free_(Estimated)  86.56GB
 Average_disk_efficiency: 62 %


How do you estimate Free here? Sorry I didn't check the source code in git,
but from the Details below nothing leads me to believe that this FS is
doomed to only be able to usefully utilize only ~86GB of the partition, and not
more.



The estimation is made on the basis of the real allocated space on the 
disk and the available space.


In the example we know that BTRFS allocate:
- 4GB   in Single mode (4GB available, 2.16GB used)
- 16MB  in DUP mode (so  16/2=8MB available, 4kb used)
- 4MB   in Single mode (4MB available)
- 6GB   in DUP mode (6/2=3GB available, 429MB used)
- 8MB   in Single mode (8MB available)


So BTRFS allocated on disk 4GB+16MB+4MB+6GB+8MB = ~10GB, but the space 
availabled (regarding these allocated chunks) is 4GB+8MB+4MB+3GB+8MB = ~7GB.


This means that the ration of space physically allocated on the disk and 
the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB 
are available.


Yes my previous 0.62 was wrong. The real ratio is 0.7.



Are you ready to answer the flood of questions from people why their disk is
only 62% efficient, and how to tune it to 100%? :-)


I don't understand your question: by default BTRFS store all metadata 
DUP-ed, this means that on the disk the space allocated are 2 times the 
space required. Because on BTRFS the metadata are a lot, this means that 
BTRFS is not so efficiency as other file-systems. This is a well know fact.


If you want to use all the space with the maximum efficiency, you could 
format the filesystem with the options -m single.





Why use underscores instead of spaces?


Simplify the parsing in scripts






Details:
 Chunk-typeMode   AllocatedUsedFree
 --   -   -
 Data  Single4.01GB  2.16GB  1.87GB
 SystemDUP  16.00MB  4.00KB  7.99MB
 SystemSingle4.00MB0.00  4.00MB
 Metadata  DUP   6.00GB429.16MB  2.57GB
 Metadata  Single8.00MB0.00  8.00MB




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Goffredo Baroncelli

Hi Hugo,

On 09/28/2012 10:58 AM, Hugo Mills wrote:

On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote:

On Thu, 27 Sep 2012 23:02:35 +0200
Goffredo Baroncellikreij...@libero.it  wrote:


[...]



So that you can use, say, read in the shell to extract data from
each line. To that end, there should be a space between the value and
the unit throughout.


Details:
 Chunk-typeMode   AllocatedUsedFree
 --   -   -


Minor thing: The underlines are largely superfluous. Few basic CL
tools I can think of use them.

Ok




 Data  Single4.01GB  2.16GB  1.87GB
 SystemDUP  16.00MB  4.00KB  7.99MB
 SystemSingle4.00MB0.00  4.00MB
 Metadata  DUP   6.00GB429.16MB  2.57GB
 Metadata  Single8.00MB0.00  8.00MB


I think we need another column here, to indicate how much *actual*
disk space is used by each row, so adding up that column will give you
the Allocated value in the first clause. I think that's probably the
biggest cause of confusion. Raw alloc., maybe, and use the term
raw somewhere in the first clause to hammer the point home.


I think that there is a little misunderstanding. We are saying the same 
thing. Only I call allocated what you call raw alloc




My only concern here is that we're a bit too close to the existing
solution (albeit merging the two sets of output), which has proven
itself over time to be somewhat confusing. I think the Alloc_Raw
column is the minimum necessary to link the two in some easily
determinable way. Adding totals to Alloc_Raw, and Used (but not Free
or Alloc) would help, I think. I don't think it's useful to add them
to the Free or Alloc columns, because those figures change as the FS
allocates chunks, and we'll end up with people querying the fact that
the total of Free doesn't add up to any of the figures in the
summary.

Say, something like this:

Summary_(Raw):
   Total:135.00 GiB
   Allocated:10.51 GiB
   Unallocated: 124.49 GiB
   Free_(Estimated):  86.56 GiB
   Average_disk_efficiency:  62 %

Details:
   Chunk_type  ModeAlloc_Raw  Alloc  UsedFree
   DataSingle   4.01 GiB   4.01 GiB2.16 GiB  1.87 GiB
   System  DUP 32.00 MiB  16.00 MiB4.00 KiB  7.99 MiB
   System  Single   4.00 MiB   4.00 MiB0.00 B4.00 MiB
   MetadataDUP 12.00 GiB   6.00 GiB  429.16 MiB  2.57 GiB
   MetadataSingle   8.00 MiB   8.00 MiB0.00 B8.00 MiB
   Total   16.04 GiB   2.59 GiB

The other thing is that there should be a switch (or possibly two)
to give highly machine-readable versions of the output -- no units
(units as bytes by default, with other units settable by a switch),
tab-separated, possibly a different option for each of the above
output clauses.
I fully Agree. But my first concern was about the wording (if fact even 
though we are saying the same thing you didn't understood me).


Let me propose the following:

Summary:
   Disk_size:135.00 GiB
   Disk_allocated:10.51 GiB
   Disk_unallocated: 124.49 GiB
   Used:   2.59 GiB
   Free_(Estimated):  91.93 GiB
   Average_disk_efficiency:  70 %

Details:
  Chunk-typeMode Disk-allocated Used   Available
  Data  Single4.01GB  2.16GB  1.87GB
  SystemDUP  16.00MB  4.00KB  7.99MB
  SystemSingle4.00MB0.00  4.00MB
  Metadata  DUP   6.00GB429.16MB  2.57GB
  Metadata  Single8.00MB0.00  8.00MB



Where:
  Disk-allocated- space used on the disk by the chunk
  Disk-size - size of the disk
  Disk-unallocated  - disk not used in any chunk
  Used  - space used by the files/metadata
  Available - space available in the *allocated* chunk
  Free_(Estimated)  - Theoretical free space for files (Disk_size
   * Average_disk_efficiency - Used)







Ultimately, I think the bikeshed should be turquoise.

? :-)




Hugo.



Ciao
Goffredo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Roman Mamedov
On Fri, 28 Sep 2012 18:44:07 +0200
Goffredo Baroncelli kreij...@inwind.it wrote:

 This means that the ration of space physically allocated on the disk and 
 the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB 
 are available.

You assume metadata allocation will always grow linearly with data, which is
not true. So in my opinion it is not a good estimate.

  Are you ready to answer the flood of questions from people why their disk is
  only 62% efficient, and how to tune it to 100%? :-)
 
 I don't understand your question

You mentioned that the aim was to make the output more friendly, i.e. to make
it less confusing. But I find this percentage and the way it is labeled likely
to achieve the opposite effect, causing a lot of new questions on what does
this mean (while the percentage reported is likely not even being correct),
how to improve it, etc.

 Because on BTRFS the metadata are a lot

Keep in mind that there is also inlining; so even if the space is allocated
for metadata, it will be used to store small files. So it might be not
completely fair to count the metadata allocated space as unusable space.

  Why use underscores instead of spaces?
 
 Simplify the parsing in scripts

I think it looks awkward and is not warranted since this is a primarily
user-facing utility. Also none of the other similar tools shy from having
spaces anywhere they need to, e.g.

# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
  Creation Time : Wed May 25 00:07:38 2011
 Raid Level : raid5
 Array Size : 3907003136 (3726.01 GiB 4000.77 GB)
  Used Dev Size : 976750784 (931.50 GiB 1000.19 GB)
   Raid Devices : 5
  Total Devices : 5
Persistence : Superblock is persistent

  Intent Bitmap : Internal

Update Time : Fri Sep 28 21:20:51 2012
  State : active 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   Name : avdeb:0  (local to host avdeb)
   UUID : b99961fb:ed1f76c8:ec2dad31:6db45332
 Events : 14254

Number   Major   Minor   RaidDevice State
   7   8   170  active sync   /dev/sdb1
   6   8   331  active sync   /dev/sdc1
   3   8   652  active sync   /dev/sde1
   4   8   493  active sync   /dev/sdd1
   5   8   814  active sync   /dev/sdf1

# lvdisplay 
  --- Logical volume ---
  LV Path/dev/alpha/lv1
  LV Namelv1
  VG Namealpha
  LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2
  LV Write Accessread/write
  LV Creation host, time , 
  LV Status  available
  # open 0
  LV Size3.52 TiB
  Current LE 115431
  Segments   3
  Allocation inherit
  Read ahead sectors auto
  - currently set to 4096
  Block device   252:0

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Goffredo Baroncelli

On 09/28/2012 08:02 PM, Roman Mamedov wrote:

On Fri, 28 Sep 2012 18:44:07 +0200
Goffredo Baroncellikreij...@inwind.it  wrote:


This means that the ration of space physically allocated on the disk and
the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB
are available.


You assume metadata allocation will always grow linearly with data, which is
not true. So in my opinion it is not a good estimate.


I am open to accept suggestion on how improve the algorithm. Today we 
have only ... nothing. If I elaborate the output of btrfs fi show I can 
estimate the best-case (i.e. the data have no further redundancy); my 
algorithm is a bit smarter. However I repeat: please suggest us a better 
algorithm.


Regarding the assumption about the ratio data/metadata is constant, yes 
I assumed that. Why this should change ? Of course could change, but 
which would be a better estimation ?


My algorithm is not perfect, but better than nothing.





Are you ready to answer the flood of questions from people why their disk is
only 62% efficient, and how to tune it to 100%? :-)


I don't understand your question


You mentioned that the aim was to make the output more friendly, i.e. to make
it less confusing. But I find this percentage and the way it is labeled likely
to achieve the opposite effect, causing a lot of new questions on what does
this mean (while the percentage reported is likely not even being correct),
how to improve it, etc.


These questions already are there, because the free space estimation in 
BTRFS is

a) very complex
b) btrfs fi df and btrfs fi show don't help to measure ( nor 
estimate) the space available.





Because on BTRFS the metadata are a lot


Keep in mind that there is also inlining; so even if the space is allocated
for metadata, it will be used to store small files. So it might be not
completely fair to count the metadata allocated space as unusable space.


I never told that the metadata space is unusable space. Is true the 
opposite: I don't differentiate data/metadata/system I only consider 
the RAID/DUP/Single in terms of disk-space/available-space.





Why use underscores instead of spaces?


Simplify the parsing in scripts


I think it looks awkward and is not warranted since this is a primarily
user-facing utility. Also none of the other similar tools shy from having
spaces anywhere they need to, e.g.


We could improve on this side. However these utilities are often used in 
scripts




# mdadm --detail /dev/md0
/dev/md0:
 Version : 1.2
   Creation Time : Wed May 25 00:07:38 2011
  Raid Level : raid5
  Array Size : 3907003136 (3726.01 GiB 4000.77 GB)
   Used Dev Size : 976750784 (931.50 GiB 1000.19 GB)
Raid Devices : 5
   Total Devices : 5
 Persistence : Superblock is persistent

   Intent Bitmap : Internal

 Update Time : Fri Sep 28 21:20:51 2012
   State : active
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0

  Layout : left-symmetric
  Chunk Size : 64K

Name : avdeb:0  (local to host avdeb)
UUID : b99961fb:ed1f76c8:ec2dad31:6db45332
  Events : 14254

 Number   Major   Minor   RaidDevice State
7   8   170  active sync   /dev/sdb1
6   8   331  active sync   /dev/sdc1
3   8   652  active sync   /dev/sde1
4   8   493  active sync   /dev/sdd1
5   8   814  active sync   /dev/sdf1

# lvdisplay
   --- Logical volume ---
   LV Path/dev/alpha/lv1
   LV Namelv1
   VG Namealpha
   LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2
   LV Write Accessread/write
   LV Creation host, time ,
   LV Status  available
   # open 0
   LV Size3.52 TiB
   Current LE 115431
   Segments   3
   Allocation inherit
   Read ahead sectors auto
   - currently set to 4096
   Block device   252:0



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Hugo Mills
   Hi, Goffredo,

On Fri, Sep 28, 2012 at 07:27:16PM +0200, Goffredo Baroncelli wrote:
 On 09/28/2012 10:58 AM, Hugo Mills wrote:
 On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote:
 On Thu, 27 Sep 2012 23:02:35 +0200
 Goffredo Baroncellikreij...@libero.it  wrote:
 
 [...]
[...]
 Details:
  Chunk-typeMode   AllocatedUsedFree
  --   -   -
[...]
  Data  Single4.01GB  2.16GB  1.87GB
  SystemDUP  16.00MB  4.00KB  7.99MB
  SystemSingle4.00MB0.00  4.00MB
  Metadata  DUP   6.00GB429.16MB  2.57GB
  Metadata  Single8.00MB0.00  8.00MB
 
 I think we need another column here, to indicate how much *actual*
 disk space is used by each row, so adding up that column will give you
 the Allocated value in the first clause. I think that's probably the
 biggest cause of confusion. Raw alloc., maybe, and use the term
 raw somewhere in the first clause to hammer the point home.
 
 I think that there is a little misunderstanding. We are saying the
 same thing. Only I call allocated what you call raw alloc

   OK, I think we need both. We need to indicate somewhere (in the
Details section in my version) both the total number of bits of rust
used and the amount of data stored. It's not good to ask the user to
know that they need to multiply/divide by two for certain storage
modes (or even more complicated for RAID-5/6). Somewhere, they will
find that values change twice as fast as they expect (or at half the
speed), and that causes problems. We need to find some way of
connecting the two in a way that makes it reasonably obvious where the
figures come from..

 My only concern here is that we're a bit too close to the existing
 solution (albeit merging the two sets of output), which has proven
 itself over time to be somewhat confusing. I think the Alloc_Raw
 column is the minimum necessary to link the two in some easily
 determinable way. Adding totals to Alloc_Raw, and Used (but not Free
 or Alloc) would help, I think. I don't think it's useful to add them
 to the Free or Alloc columns, because those figures change as the FS
 allocates chunks, and we'll end up with people querying the fact that
 the total of Free doesn't add up to any of the figures in the
 summary.
 
 Say, something like this:
 
 Summary_(Raw):
Total:135.00 GiB
Allocated: 10.51 GiB
Unallocated:  124.49 GiB
Free_(Estimated):  86.56 GiB
Average_disk_efficiency:  62 %
 
 Details:
Chunk_type  ModeAlloc_Raw  Alloc  UsedFree
DataSingle   4.01 GiB   4.01 GiB2.16 GiB  1.87 GiB
System  DUP 32.00 MiB  16.00 MiB4.00 KiB  7.99 MiB
System  Single   4.00 MiB   4.00 MiB0.00 B4.00 MiB
MetadataDUP 12.00 GiB   6.00 GiB  429.16 MiB  2.57 GiB
MetadataSingle   8.00 MiB   8.00 MiB0.00 B8.00 MiB
Total   16.04 GiB   2.59 GiB
 
 The other thing is that there should be a switch (or possibly two)
 to give highly machine-readable versions of the output -- no units
 (units as bytes by default, with other units settable by a switch),
 tab-separated, possibly a different option for each of the above
 output clauses.
 I fully Agree. But my first concern was about the wording (if fact
 even though we are saying the same thing you didn't understood me).
 
 Let me propose the following:
 
 Summary:
Disk_size:  135.00 GiB
Disk_allocated:  10.51 GiB
Disk_unallocated:   124.49 GiB
Used: 2.59 GiB
Free_(Estimated):91.93 GiB
Average_disk_efficiency:  70 %
 
 Details:
   Chunk-typeMode Disk-allocated Used   Available
   Data  Single4.01GB  2.16GB  1.87GB
   SystemDUP  16.00MB  4.00KB  7.99MB
   SystemSingle4.00MB0.00  4.00MB
   Metadata  DUP   6.00GB429.16MB  2.57GB
   Metadata  Single8.00MB0.00  8.00MB
 
 
 
 Where:
   Disk-allocated  - space used on the disk by the chunk
   Disk-size   - size of the disk
   Disk-unallocated- disk not used in any chunk
   Used- space used by the files/metadata

   The problem here is that if you're using raw storage, the Used
value in the second stanza grows twice as fast as the user expects. I
think this second stanza should at minimum include the cooked values
used in btrfs fi df, because those reflect the user's experience. Then
adding [some of?] the raw values you've got here to help connect the
values to the raw data in the first stanza of output.

   As I said above, it's the connection between I wrote a 1GiB file
to my 

Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Hugo Mills
On Sat, Sep 29, 2012 at 12:02:23AM +0600, Roman Mamedov wrote:
 On Fri, 28 Sep 2012 18:44:07 +0200
 Goffredo Baroncelli kreij...@inwind.it wrote:
 
  This means that the ration of space physically allocated on the disk and 
  the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB 
  are available.
 
 You assume metadata allocation will always grow linearly with data, which is
 not true. So in my opinion it is not a good estimate.

   No, but it's the best model we have right now. (And probably about
the best model we will have, without knowledge of the future
intentions of the user). Without inlining file data, the metadata is
dominated by checksums, which is a linear relationship (approx
1000:1). With inlining file data, metadata is probably dominated by
inline data; assuming the ratio of small-to-large files on the FS
remains unchanged in future, a linear relationship also applies. For
general usage, I'm happy to assume that the current ratio of data to
metadata will remain largely unchanged over the lifetime of the FS.

   Why use underscores instead of spaces?
  
  Simplify the parsing in scripts
 
 I think it looks awkward and is not warranted since this is a primarily
 user-facing utility. Also none of the other similar tools shy from having
 spaces anywhere they need to, e.g.
 
 # mdadm --detail /dev/md0
 /dev/md0:
 Version : 1.2
   Creation Time : Wed May 25 00:07:38 2011
  Raid Level : raid5
  Array Size : 3907003136 (3726.01 GiB 4000.77 GB)
   Used Dev Size : 976750784 (931.50 GiB 1000.19 GB)
Raid Devices : 5
   Total Devices : 5
 Persistence : Superblock is persistent
 
   Intent Bitmap : Internal
 
 Update Time : Fri Sep 28 21:20:51 2012
   State : active 
  Active Devices : 5
 Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0
 
  Layout : left-symmetric
  Chunk Size : 64K
 
Name : avdeb:0  (local to host avdeb)
UUID : b99961fb:ed1f76c8:ec2dad31:6db45332
  Events : 14254
 
 Number   Major   Minor   RaidDevice State
7   8   170  active sync   /dev/sdb1
6   8   331  active sync   /dev/sdc1
3   8   652  active sync   /dev/sde1
4   8   493  active sync   /dev/sdd1
5   8   814  active sync   /dev/sdf1
 
 # lvdisplay 
   --- Logical volume ---
   LV Path/dev/alpha/lv1
   LV Namelv1
   VG Namealpha
   LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2
   LV Write Accessread/write
   LV Creation host, time , 
   LV Status  available
   # open 0
   LV Size3.52 TiB
   Current LE 115431
   Segments   3
   Allocation inherit
   Read ahead sectors auto
   - currently set to 4096
   Block device   252:0

   ... and I've always found those hard to deal with in scripts. :)

   (But they do have plumbing options, to use the git terminology,
so I'd be happy with having a parsable output option).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hey, Virtual Memory! Now I can have a *really big* ramdisk! ---   


signature.asc
Description: Digital signature


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Goffredo Baroncelli

On 09/28/2012 10:13 PM, Hugo Mills wrote:

Summary:
  Disk_size: 135.00 GiB
  Disk_allocated: 10.51 GiB
  Disk_unallocated:  124.49 GiB
  Used:2.59 GiB
  Free_(Estimated):   91.93 GiB
  Average_disk_efficiency:  70 %

  Details:
 Chunk-typeMode Disk-allocated Used   Available
 Data  Single4.01GB  2.16GB  1.87GB
 SystemDUP  16.00MB  4.00KB  7.99MB
 SystemSingle4.00MB0.00  4.00MB
 Metadata  DUP   6.00GB429.16MB  2.57GB
 Metadata  Single8.00MB0.00  8.00MB



  Where:
 Disk-allocated -  space used on the disk by the chunk
 Disk-size  -  size of the disk
 Disk-unallocated   -  disk not used in any chunk
 Used   -  space used by the files/metadata

The problem here is that if you're using raw storage, the Used
value in the second stanza grows twice as fast as the user expects.


This is the misunderstanding whom I talked before.

If you give a look at the line Metadata DUP, you can see that the 
disk-allocated are about 6GB, instead if you sum Used and Available you 
got 3GB.


I.e. if you create a 1GB file, Used ever increased of 1GB, and 
Available ever decrease 1GB, whichever you are using DUP or Single or RAID*



I

think this second stanza should at minimum include the cooked values
used in btrfs fi df, because those reflect the user's experience. Then
adding [some of?] the raw values you've got here to help connect the
values to the raw data in the first stanza of output.


The only raw values are the one prefixed with disk. The other ones are 
at the net of the DUP/Single/Raid




As I said above, it's the connection between I wrote a 1GiB file
to my filesystem and why have my numbers increased/decreased by
2GiB(*)/1.2GiB(**)?


I repeat, if the chunk is DUP-ed, if you create 1GB file:
- Disk-allocate increase 2GB (supposing that all the chunks are full)
- Used increase 1GB
- Available decrease 1GB




(*) RAID-1
(**) RAID-5-ish


Ciao
Goffredo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-28 Thread Wade Cline

On 09/28/2012 01:20 PM, Hugo Mills wrote:


On Sat, Sep 29, 2012 at 12:02:23AM +0600, Roman Mamedov wrote:

On Fri, 28 Sep 2012 18:44:07 +0200
Goffredo Baroncellikreij...@inwind.it  wrote:


This means that the ration of space physically allocated on the disk and
the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB
are available.

You assume metadata allocation will always grow linearly with data, which is
not true. So in my opinion it is not a good estimate.

No, but it's the best model we have right now. (And probably about
the best model we will have, without knowledge of the future
intentions of the user). Without inlining file data, the metadata is
dominated by checksums, which is a linear relationship (approx
1000:1). With inlining file data, metadata is probably dominated by
inline data; assuming the ratio of small-to-large files on the FS
remains unchanged in future, a linear relationship also applies. For
general usage, I'm happy to assume that the current ratio of data to
metadata will remain largely unchanged over the lifetime of the FS.

Since there really isn't a simple answer to how much free-space,
why not have the command print an upper and lower estimate and let
the user figure out how to interpret the numbers? This would inform
the user that there is some guesswork inherent in the estimation and
also provide an educated user with more exact numbers. Something
containing information such as:

  Total:135.00 GiB
  Allocated:10.51 GiB
  Unallocated:  124.49 GiB
  Free_Upper_Est:   130.00 GiB
  Free_Lower_Est:   62.45 GiB



The main idea is that an informed user would know that the
upper-estimation would be for only writing, say, new data, while
the lower-estimation would be for writing everything in, say, a
RAID-1 subvolume. An uninformed user would (hopefully) realize
that he needs to read the Wiki's FAQ.


... and I've always found those hard to deal with in scripts. :)

(But they do have plumbing options, to use the git terminology,
so I'd be happy with having a parsable output option).

Hugo.


In 'df'/'du', -h is used for human-readable output while no options
is for easily parsable output.

Basically, I think that the bikeshed should be green. ;)

-Wade

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-27 Thread Goffredo Baroncelli

On 09/27/2012 12:44 PM, Sébastien Maury wrote:

Hi,

I've installed a new server using btrfs for my root partition (/).

It uses snapper for snapshots management and all seems to work pretty fine.

My problem is to be able to know the remaining REAL free space in my
partition.

Using different commands, i have different results, and i don't know how
to interpret them correctly :
poivron:~ # btrfs filesystem df /
Data: total=4.01GB, used=2.16GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=3.00GB, used=429.16MB
Metadata: total=8.00MB, used=0.00


In effect the output of btrfs filesystem df / is not very friendly. 
What about changing the output as below:


$ btrfs filesystem disk-free /
Summary:
  Total:135.00GB
  Allocated: 10.51GB
  Unallocated:  124.49GB
  Free_(Estimated)   86.56GB
  Average_disk_efficiency:  62 %

Details:
  Chunk-type  Mode   AllocatedUsedFree
  --     -   -
  DataSingle4.01GB  2.16GB  1.87GB
  System  DUP  16.00MB  4.00KB  7.99MB
  System  Single4.00MB0.00  4.00MB
  MetadataDUP   6.00GB429.16MB  2.57GB
  MetadataSingle8.00MB0.00  8.00MB



Where the Free_(Estimated) and Average_disk_efficency are computed as:
  Average_disk_efficency = ratio of average disk usage =
(sum(ChunkUsed)+sum(ChunkFree))/sum(ChunkAllocated)

  Estimated_available = Average_disk_efficency *
Unallocated+sum(ChunkFree)

I am open to suggestion about the terms: Used vs Allocated and Free vs 
Available, or a better description of Average disk efficiency



BR
G.Baroncelli

P.S. the source could be find at

http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git

branch
disk_free
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]

2012-09-27 Thread Roman Mamedov
On Thu, 27 Sep 2012 23:02:35 +0200
Goffredo Baroncelli kreij...@libero.it wrote:

 Sorry for the space error:
 Below a more correct example
 
 $ btrfs filesystem disk-free /
 Summary:
 Total:135.00GB
 Allocated: 10.51GB
 Unallocated:  124.49GB
 Free_(Estimated)  86.56GB
 Average_disk_efficiency: 62 %

How do you estimate Free here? Sorry I didn't check the source code in git,
but from the Details below nothing leads me to believe that this FS is
doomed to only be able to usefully utilize only ~86GB of the partition, and not
more.

Are you ready to answer the flood of questions from people why their disk is
only 62% efficient, and how to tune it to 100%? :-)

Why use underscores instead of spaces?


 
 Details:
 Chunk-typeMode   AllocatedUsedFree
 --   -   -
 Data  Single4.01GB  2.16GB  1.87GB
 SystemDUP  16.00MB  4.00KB  7.99MB
 SystemSingle4.00MB0.00  4.00MB
 Metadata  DUP   6.00GB429.16MB  2.57GB
 Metadata  Single8.00MB0.00  8.00MB

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature