Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Friday 2012-09-28 10:58, Hugo Mills wrote: Data_to_disk_ratio, maybe? Why use underscores instead of spaces? So that you can use, say, read in the shell to extract data from each line. To that end, there should be a space between the value and the unit throughout. Eww. Having a special single-line output mode would be much better for these kinds of integration. Is it too far fetched to make the info available through sysfs? space_used=$(cat /sys/.../space_used) is so much preferable than an awkful space_used=$(btrfs fi df | awk ...) and hope for that the line is actually in the df output. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/28/2012 10:13 PM, Hugo Mills wrote: Summary: Disk_size: 135.00 GiB Disk_allocated: 10.51 GiB Disk_unallocated: 124.49 GiB Used:2.59 GiB Free_(Estimated): 91.93 GiB Average_disk_efficiency: 70 % Details: Chunk-typeMode Disk-allocated Used Available Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB Where: Disk-allocated - space used on the disk by the chunk Disk-size - size of the disk Disk-unallocated - disk not used in any chunk Used - space used by the files/metadata The problem here is that if you're using raw storage, the Used value in the second stanza grows twice as fast as the user expects. This is the misunderstanding whom I talked before. If you give a look at the line Metadata DUP, you can see that the disk-allocated are about 6GB, instead if you sum Used and Available you got 3GB. I.e. if you create a 1GB file, Used ever increased of 1GB, and Available ever decrease 1GB, whichever you are using DUP or Single or RAID* I think this second stanza should at minimum include the cooked values used in btrfs fi df, because those reflect the user's experience. Then adding [some of?] the raw values you've got here to help connect the values to the raw data in the first stanza of output. The only raw values are the one prefixed with disk. The other ones are at the net of the DUP/Single/Raid As I said above, it's the connection between I wrote a 1GiB file to my filesystem and why have my numbers increased/decreased by 2GiB(*)/1.2GiB(**)? I repeat, if the chunk is DUP-ed, if you create 1GB file: - Disk-allocate increase 2GB (supposing that all the chunks are full) - Used increase 1GB - Available decrease 1GB (*) RAID-1 (**) RAID-5-ish Ciao Goffredo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
Hi, First of all, i've to say that i'm not a linux specialist, so that means my point of view is balanced between a linux admin and a user. I may also say stupid things, so pleas excuse me in advance :p The first difference between the original command and the discussed one is on the value for the DUP parts (one has to be multiplied by 2, whereas the other is already multiplied by 2). I think this should be indicated somewhere in order to avoid confusion. This has been pointed already, but whatever the output is, it is essential to know if the value is raw or not, if it has to be multiplied or divided. Also, i do agree with Hugo concerning the output to make it easier to parse through scripting. The units should also be settable in order to have the same units for all values. Basically, this new output is more explicit for me and remove a bit of confusion. Although, the part Average_disk_efficiency seems confusing as i'm not sure the term efficiency is correct in that part. That makes me ask some questions : why this much allocated ? when will it allocate more ? how much might be allocated ? ... So, this percentage doesn't indicate an efficient usage of disk space or not ... for me, it indicates that it needed to allocated that (depending on the chunk size). In this example there's indeed 30% of the allocation that is unused, but it will be used as data will grow on the disk. For me it's similar as a LUN created in thick provisioning ... i might not need all the space, but i don't want to be stuck if i'll need it. (dunno if i'm clear on that part) Am i wrong in saying that Free_(Estimated) is a false value as the snapshots size isn't included ? Let's say i've like 10 GB of snapshots ... then Free_(Estimated)=Free_(Estimated)-snaps size ? no ? Is it possible to include those snaps size somewhere (maybe not to include in the summary or details, but to add another section or option allowing to have that info) ? Finally, i do agree about the linearly growth as the best model currently. For several reasons, some already explained by Hugo, and because as far as i understood, there is no single way to know very accurately how your disk is used. That said, the point is at least to give the most accurate data as possible and to be able to interpret them. In a production environment, i can't afford to say sorry, the app is crashed because my disk is full. So i need a view on what's happening on my disk. Even if it lacks perfect accuracy, i can place thresholds to avoid any problem (70% of disk full as a warning for example). So, i would change some terms i guess indicating more precisely the raw data and the already computed ones. I would also not use the term efficiency as people may wonder at some point if they didn't make a mistake using btrfs seeing a % never near from 100. The Data_to_disk_ratio seems preferable for me. Cordialement, Sébastien Goffredo Baroncelli kreij...@gmail.com a écrit : On 09/28/2012 10:13 PM, Hugo Mills wrote: Summary: Disk_size: 135.00 GiB Disk_allocated: 10.51 GiB Disk_unallocated: 124.49 GiB Used: 2.59 GiB Free_(Estimated):91.93 GiB Average_disk_efficiency: 70 % Details: Chunk-typeMode Disk-allocated Used Available Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB Where: Disk-allocated - space used on the disk by the chunk Disk-size - size of the disk Disk-unallocated- disk not used in any chunk Used- space used by the files/metadata The problem here is that if you're using raw storage, the Used value in the second stanza grows twice as fast as the user expects. This is the misunderstanding whom I talked before. If you give a look at the line Metadata DUP, you can see that the disk-allocated are about 6GB, instead if you sum Used and Available you got 3GB. I.e. if you create a 1GB file, Used ever increased of 1GB, and Available ever decrease 1GB, whichever you are using DUP or Single or RAID* I think this second stanza should at minimum include the cooked values used in btrfs fi df, because those reflect the user's experience. Then adding [some of?] the raw values you've got here to help connect the values to the raw data in the first stanza of output. The only raw values are the one prefixed with disk. The other ones are at the net of the DUP/Single/Raid As I said above, it's the connection between I wrote a 1GiB file to my filesystem and why have my numbers increased/decreased by 2GiB(*)/1.2GiB(**)?
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
Hi Sébastien, On 09/29/2012 11:59 AM, Sébastien Maury wrote: Hi, First of all, i've to say that i'm not a linux specialist, so that means my point of view is balanced between a linux admin and a user. I may also say stupid things, so pleas excuse me in advance :p The first difference between the original command and the discussed one is on the value for the DUP parts (one has to be multiplied by 2, whereas the other is already multiplied by 2). I think this should be indicated somewhere in order to avoid confusion. This has been pointed already, but whatever the output is, it is essential to know if the value is raw or not, if it has to be multiplied or divided. Also, i do agree with Hugo concerning the output to make it easier to parse through scripting. The units should also be settable in order to have the same units for all values. I have added a -k switch, so the output is in KiB unit (I tried bytes but so the line will became very long: 164 is about 20 digits in decimal form) Basically, this new output is more explicit for me and remove a bit of confusion. Great I reached my 1st goal ! Although, the part Average_disk_efficiency seems confusing as i'm not sure the term efficiency is correct in that part. That makes me ask some questions : why this much allocated ? when will it allocate more ? how much might be allocated ? ... So, this percentage doesn't indicate an efficient usage of disk space or not ... for me, it indicates that it needed to allocated that (depending on the chunk size). In this example there's indeed 30% of the allocation that is unused, but it will be used as data will grow on the disk. The 30% of the disk is/will be used for redundancy purpose. Moreover there are the chunk that are pre-allocated area, which could influence the free space estimation... For me it's similar as a LUN created in thick provisioning ... i might not need all the space, but i don't want to be stuck if i'll need it. (dunno if i'm clear on that part) Am i wrong in saying that Free_(Estimated) is a false value as the snapshots size isn't included ? Let's say i've like 10 GB of snapshots ... then Free_(Estimated)=Free_(Estimated)-snaps size ? no ? Is it possible to include those snaps size somewhere (maybe not to include in the summary or details, but to add another section or option allowing to have that info) ? Free_(Estimated) takes in account also the snapshot. The point is another one: the user has to know that updating (i.e. changing part of file without increasing its size) a snapshoted file requires space. But Used part takes in account all the space used. So Free_(Estimated) it is accurate. Finally, i do agree about the linearly growth as the best model currently. For several reasons, some already explained by Hugo, and because as far as i understood, there is no single way to know very accurately how your disk is used. That said, the point is at least to give the most accurate data as possible and to be able to interpret them. In a production environment, i can't afford to say sorry, the app is crashed because my disk is full. So i need a view on what's happening on my disk. Even if it lacks perfect accuracy, i can place thresholds to avoid any problem (70% of disk full as a warning for example). So, i would change some terms i guess indicating more precisely the raw data and the already computed ones. I would like to uses the Disk prefix. Raw to me creates more confusions. However we should highlight that the disk occupation is related to the chunks, which means basically a pre-allocation and not an using. For example a my filesystem has: ghigo@venice:~$ btrfs/btrfs-progs/btrfs fi disk /mnt/old-btrfs/ Summary: Path: /mnt/old-btrfs/ Disk_size:232.11GB Disk_allocated: 150.29GB Disk_unallocated: 81.82GB Used: 19.94GB Free_(Estimated): 201.16GB Average_disk_efficiency: 95 % Details: Chunk-type ModeDisk-allocatedUsed Available DataSingle136.01GB 18.84GB117.17GB System DUP16.00MB 28.00KB 7.97MB System Single 4.00MB0.00 4.00MB MetadataDUP14.25GB 1.10GB 6.03GB MetadataSingle 8.00MB0.00 8.00MB Note that I have 136GB of chunk, but only 18GB are used. After a btrfs balance start I got a different picture: Summary: Path: /mnt/old-btrfs/ Disk_size:232.11GB Disk_allocated:34.13GB Disk_unallocated: 197.98GB Used: 19.94GB Free_(Estimated): 177.74GB Average_disk_efficiency: 85 % Details: Chunk-type ModeDisk-allocatedUsed Available DataSingle
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote: On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncelli kreij...@libero.it wrote: Sorry for the space error: Below a more correct example $ btrfs filesystem disk-free / Summary: Total: 135.00GB Allocated: 10.51GB Unallocated:124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % How do you estimate Free here? Sorry I didn't check the source code in git, but from the Details below nothing leads me to believe that this FS is doomed to only be able to usefully utilize only ~86GB of the partition, and not more. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) Data_to_disk_ratio, maybe? Why use underscores instead of spaces? So that you can use, say, read in the shell to extract data from each line. To that end, there should be a space between the value and the unit throughout. Details: Chunk-typeMode AllocatedUsedFree -- - - Minor thing: The underlines are largely superfluous. Few basic CL tools I can think of use them. Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB I think we need another column here, to indicate how much *actual* disk space is used by each row, so adding up that column will give you the Allocated value in the first clause. I think that's probably the biggest cause of confusion. Raw alloc., maybe, and use the term raw somewhere in the first clause to hammer the point home. My only concern here is that we're a bit too close to the existing solution (albeit merging the two sets of output), which has proven itself over time to be somewhat confusing. I think the Alloc_Raw column is the minimum necessary to link the two in some easily determinable way. Adding totals to Alloc_Raw, and Used (but not Free or Alloc) would help, I think. I don't think it's useful to add them to the Free or Alloc columns, because those figures change as the FS allocates chunks, and we'll end up with people querying the fact that the total of Free doesn't add up to any of the figures in the summary. Say, something like this: Summary_(Raw): Total:135.00 GiB Allocated: 10.51 GiB Unallocated: 124.49 GiB Free_(Estimated): 86.56 GiB Average_disk_efficiency: 62 % Details: Chunk_type ModeAlloc_Raw Alloc UsedFree DataSingle 4.01 GiB 4.01 GiB2.16 GiB 1.87 GiB System DUP 32.00 MiB 16.00 MiB4.00 KiB 7.99 MiB System Single 4.00 MiB 4.00 MiB0.00 B4.00 MiB MetadataDUP 12.00 GiB 6.00 GiB 429.16 MiB 2.57 GiB MetadataSingle 8.00 MiB 8.00 MiB0.00 B8.00 MiB Total 16.04 GiB 2.59 GiB The other thing is that there should be a switch (or possibly two) to give highly machine-readable versions of the output -- no units (units as bytes by default, with other units settable by a switch), tab-separated, possibly a different option for each of the above output clauses. Ultimately, I think the bikeshed should be turquoise. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Python is executable pseudocode; perl --- is executable line-noise. signature.asc Description: Digital signature
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/28/2012 05:17 AM, Roman Mamedov wrote: On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncellikreij...@libero.it wrote: Sorry for the space error: Below a more correct example $ btrfs filesystem disk-free / Summary: Total: 135.00GB Allocated: 10.51GB Unallocated: 124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % How do you estimate Free here? Sorry I didn't check the source code in git, but from the Details below nothing leads me to believe that this FS is doomed to only be able to usefully utilize only ~86GB of the partition, and not more. The estimation is made on the basis of the real allocated space on the disk and the available space. In the example we know that BTRFS allocate: - 4GB in Single mode (4GB available, 2.16GB used) - 16MB in DUP mode (so 16/2=8MB available, 4kb used) - 4MB in Single mode (4MB available) - 6GB in DUP mode (6/2=3GB available, 429MB used) - 8MB in Single mode (8MB available) So BTRFS allocated on disk 4GB+16MB+4MB+6GB+8MB = ~10GB, but the space availabled (regarding these allocated chunks) is 4GB+8MB+4MB+3GB+8MB = ~7GB. This means that the ration of space physically allocated on the disk and the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB are available. Yes my previous 0.62 was wrong. The real ratio is 0.7. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) I don't understand your question: by default BTRFS store all metadata DUP-ed, this means that on the disk the space allocated are 2 times the space required. Because on BTRFS the metadata are a lot, this means that BTRFS is not so efficiency as other file-systems. This is a well know fact. If you want to use all the space with the maximum efficiency, you could format the filesystem with the options -m single. Why use underscores instead of spaces? Simplify the parsing in scripts Details: Chunk-typeMode AllocatedUsedFree -- - - Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
Hi Hugo, On 09/28/2012 10:58 AM, Hugo Mills wrote: On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote: On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncellikreij...@libero.it wrote: [...] So that you can use, say, read in the shell to extract data from each line. To that end, there should be a space between the value and the unit throughout. Details: Chunk-typeMode AllocatedUsedFree -- - - Minor thing: The underlines are largely superfluous. Few basic CL tools I can think of use them. Ok Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB I think we need another column here, to indicate how much *actual* disk space is used by each row, so adding up that column will give you the Allocated value in the first clause. I think that's probably the biggest cause of confusion. Raw alloc., maybe, and use the term raw somewhere in the first clause to hammer the point home. I think that there is a little misunderstanding. We are saying the same thing. Only I call allocated what you call raw alloc My only concern here is that we're a bit too close to the existing solution (albeit merging the two sets of output), which has proven itself over time to be somewhat confusing. I think the Alloc_Raw column is the minimum necessary to link the two in some easily determinable way. Adding totals to Alloc_Raw, and Used (but not Free or Alloc) would help, I think. I don't think it's useful to add them to the Free or Alloc columns, because those figures change as the FS allocates chunks, and we'll end up with people querying the fact that the total of Free doesn't add up to any of the figures in the summary. Say, something like this: Summary_(Raw): Total:135.00 GiB Allocated:10.51 GiB Unallocated: 124.49 GiB Free_(Estimated): 86.56 GiB Average_disk_efficiency: 62 % Details: Chunk_type ModeAlloc_Raw Alloc UsedFree DataSingle 4.01 GiB 4.01 GiB2.16 GiB 1.87 GiB System DUP 32.00 MiB 16.00 MiB4.00 KiB 7.99 MiB System Single 4.00 MiB 4.00 MiB0.00 B4.00 MiB MetadataDUP 12.00 GiB 6.00 GiB 429.16 MiB 2.57 GiB MetadataSingle 8.00 MiB 8.00 MiB0.00 B8.00 MiB Total 16.04 GiB 2.59 GiB The other thing is that there should be a switch (or possibly two) to give highly machine-readable versions of the output -- no units (units as bytes by default, with other units settable by a switch), tab-separated, possibly a different option for each of the above output clauses. I fully Agree. But my first concern was about the wording (if fact even though we are saying the same thing you didn't understood me). Let me propose the following: Summary: Disk_size:135.00 GiB Disk_allocated:10.51 GiB Disk_unallocated: 124.49 GiB Used: 2.59 GiB Free_(Estimated): 91.93 GiB Average_disk_efficiency: 70 % Details: Chunk-typeMode Disk-allocated Used Available Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB Where: Disk-allocated- space used on the disk by the chunk Disk-size - size of the disk Disk-unallocated - disk not used in any chunk Used - space used by the files/metadata Available - space available in the *allocated* chunk Free_(Estimated) - Theoretical free space for files (Disk_size * Average_disk_efficiency - Used) Ultimately, I think the bikeshed should be turquoise. ? :-) Hugo. Ciao Goffredo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Fri, 28 Sep 2012 18:44:07 +0200 Goffredo Baroncelli kreij...@inwind.it wrote: This means that the ration of space physically allocated on the disk and the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB are available. You assume metadata allocation will always grow linearly with data, which is not true. So in my opinion it is not a good estimate. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) I don't understand your question You mentioned that the aim was to make the output more friendly, i.e. to make it less confusing. But I find this percentage and the way it is labeled likely to achieve the opposite effect, causing a lot of new questions on what does this mean (while the percentage reported is likely not even being correct), how to improve it, etc. Because on BTRFS the metadata are a lot Keep in mind that there is also inlining; so even if the space is allocated for metadata, it will be used to store small files. So it might be not completely fair to count the metadata allocated space as unusable space. Why use underscores instead of spaces? Simplify the parsing in scripts I think it looks awkward and is not warranted since this is a primarily user-facing utility. Also none of the other similar tools shy from having spaces anywhere they need to, e.g. # mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed May 25 00:07:38 2011 Raid Level : raid5 Array Size : 3907003136 (3726.01 GiB 4000.77 GB) Used Dev Size : 976750784 (931.50 GiB 1000.19 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 28 21:20:51 2012 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : avdeb:0 (local to host avdeb) UUID : b99961fb:ed1f76c8:ec2dad31:6db45332 Events : 14254 Number Major Minor RaidDevice State 7 8 170 active sync /dev/sdb1 6 8 331 active sync /dev/sdc1 3 8 652 active sync /dev/sde1 4 8 493 active sync /dev/sdd1 5 8 814 active sync /dev/sdf1 # lvdisplay --- Logical volume --- LV Path/dev/alpha/lv1 LV Namelv1 VG Namealpha LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2 LV Write Accessread/write LV Creation host, time , LV Status available # open 0 LV Size3.52 TiB Current LE 115431 Segments 3 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 252:0 -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/28/2012 08:02 PM, Roman Mamedov wrote: On Fri, 28 Sep 2012 18:44:07 +0200 Goffredo Baroncellikreij...@inwind.it wrote: This means that the ration of space physically allocated on the disk and the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB are available. You assume metadata allocation will always grow linearly with data, which is not true. So in my opinion it is not a good estimate. I am open to accept suggestion on how improve the algorithm. Today we have only ... nothing. If I elaborate the output of btrfs fi show I can estimate the best-case (i.e. the data have no further redundancy); my algorithm is a bit smarter. However I repeat: please suggest us a better algorithm. Regarding the assumption about the ratio data/metadata is constant, yes I assumed that. Why this should change ? Of course could change, but which would be a better estimation ? My algorithm is not perfect, but better than nothing. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) I don't understand your question You mentioned that the aim was to make the output more friendly, i.e. to make it less confusing. But I find this percentage and the way it is labeled likely to achieve the opposite effect, causing a lot of new questions on what does this mean (while the percentage reported is likely not even being correct), how to improve it, etc. These questions already are there, because the free space estimation in BTRFS is a) very complex b) btrfs fi df and btrfs fi show don't help to measure ( nor estimate) the space available. Because on BTRFS the metadata are a lot Keep in mind that there is also inlining; so even if the space is allocated for metadata, it will be used to store small files. So it might be not completely fair to count the metadata allocated space as unusable space. I never told that the metadata space is unusable space. Is true the opposite: I don't differentiate data/metadata/system I only consider the RAID/DUP/Single in terms of disk-space/available-space. Why use underscores instead of spaces? Simplify the parsing in scripts I think it looks awkward and is not warranted since this is a primarily user-facing utility. Also none of the other similar tools shy from having spaces anywhere they need to, e.g. We could improve on this side. However these utilities are often used in scripts # mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed May 25 00:07:38 2011 Raid Level : raid5 Array Size : 3907003136 (3726.01 GiB 4000.77 GB) Used Dev Size : 976750784 (931.50 GiB 1000.19 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 28 21:20:51 2012 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : avdeb:0 (local to host avdeb) UUID : b99961fb:ed1f76c8:ec2dad31:6db45332 Events : 14254 Number Major Minor RaidDevice State 7 8 170 active sync /dev/sdb1 6 8 331 active sync /dev/sdc1 3 8 652 active sync /dev/sde1 4 8 493 active sync /dev/sdd1 5 8 814 active sync /dev/sdf1 # lvdisplay --- Logical volume --- LV Path/dev/alpha/lv1 LV Namelv1 VG Namealpha LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2 LV Write Accessread/write LV Creation host, time , LV Status available # open 0 LV Size3.52 TiB Current LE 115431 Segments 3 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 252:0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
Hi, Goffredo, On Fri, Sep 28, 2012 at 07:27:16PM +0200, Goffredo Baroncelli wrote: On 09/28/2012 10:58 AM, Hugo Mills wrote: On Fri, Sep 28, 2012 at 09:17:59AM +0600, Roman Mamedov wrote: On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncellikreij...@libero.it wrote: [...] [...] Details: Chunk-typeMode AllocatedUsedFree -- - - [...] Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB I think we need another column here, to indicate how much *actual* disk space is used by each row, so adding up that column will give you the Allocated value in the first clause. I think that's probably the biggest cause of confusion. Raw alloc., maybe, and use the term raw somewhere in the first clause to hammer the point home. I think that there is a little misunderstanding. We are saying the same thing. Only I call allocated what you call raw alloc OK, I think we need both. We need to indicate somewhere (in the Details section in my version) both the total number of bits of rust used and the amount of data stored. It's not good to ask the user to know that they need to multiply/divide by two for certain storage modes (or even more complicated for RAID-5/6). Somewhere, they will find that values change twice as fast as they expect (or at half the speed), and that causes problems. We need to find some way of connecting the two in a way that makes it reasonably obvious where the figures come from.. My only concern here is that we're a bit too close to the existing solution (albeit merging the two sets of output), which has proven itself over time to be somewhat confusing. I think the Alloc_Raw column is the minimum necessary to link the two in some easily determinable way. Adding totals to Alloc_Raw, and Used (but not Free or Alloc) would help, I think. I don't think it's useful to add them to the Free or Alloc columns, because those figures change as the FS allocates chunks, and we'll end up with people querying the fact that the total of Free doesn't add up to any of the figures in the summary. Say, something like this: Summary_(Raw): Total:135.00 GiB Allocated: 10.51 GiB Unallocated: 124.49 GiB Free_(Estimated): 86.56 GiB Average_disk_efficiency: 62 % Details: Chunk_type ModeAlloc_Raw Alloc UsedFree DataSingle 4.01 GiB 4.01 GiB2.16 GiB 1.87 GiB System DUP 32.00 MiB 16.00 MiB4.00 KiB 7.99 MiB System Single 4.00 MiB 4.00 MiB0.00 B4.00 MiB MetadataDUP 12.00 GiB 6.00 GiB 429.16 MiB 2.57 GiB MetadataSingle 8.00 MiB 8.00 MiB0.00 B8.00 MiB Total 16.04 GiB 2.59 GiB The other thing is that there should be a switch (or possibly two) to give highly machine-readable versions of the output -- no units (units as bytes by default, with other units settable by a switch), tab-separated, possibly a different option for each of the above output clauses. I fully Agree. But my first concern was about the wording (if fact even though we are saying the same thing you didn't understood me). Let me propose the following: Summary: Disk_size: 135.00 GiB Disk_allocated: 10.51 GiB Disk_unallocated: 124.49 GiB Used: 2.59 GiB Free_(Estimated):91.93 GiB Average_disk_efficiency: 70 % Details: Chunk-typeMode Disk-allocated Used Available Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB Where: Disk-allocated - space used on the disk by the chunk Disk-size - size of the disk Disk-unallocated- disk not used in any chunk Used- space used by the files/metadata The problem here is that if you're using raw storage, the Used value in the second stanza grows twice as fast as the user expects. I think this second stanza should at minimum include the cooked values used in btrfs fi df, because those reflect the user's experience. Then adding [some of?] the raw values you've got here to help connect the values to the raw data in the first stanza of output. As I said above, it's the connection between I wrote a 1GiB file to my
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Sat, Sep 29, 2012 at 12:02:23AM +0600, Roman Mamedov wrote: On Fri, 28 Sep 2012 18:44:07 +0200 Goffredo Baroncelli kreij...@inwind.it wrote: This means that the ration of space physically allocated on the disk and the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB are available. You assume metadata allocation will always grow linearly with data, which is not true. So in my opinion it is not a good estimate. No, but it's the best model we have right now. (And probably about the best model we will have, without knowledge of the future intentions of the user). Without inlining file data, the metadata is dominated by checksums, which is a linear relationship (approx 1000:1). With inlining file data, metadata is probably dominated by inline data; assuming the ratio of small-to-large files on the FS remains unchanged in future, a linear relationship also applies. For general usage, I'm happy to assume that the current ratio of data to metadata will remain largely unchanged over the lifetime of the FS. Why use underscores instead of spaces? Simplify the parsing in scripts I think it looks awkward and is not warranted since this is a primarily user-facing utility. Also none of the other similar tools shy from having spaces anywhere they need to, e.g. # mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed May 25 00:07:38 2011 Raid Level : raid5 Array Size : 3907003136 (3726.01 GiB 4000.77 GB) Used Dev Size : 976750784 (931.50 GiB 1000.19 GB) Raid Devices : 5 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Sep 28 21:20:51 2012 State : active Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : avdeb:0 (local to host avdeb) UUID : b99961fb:ed1f76c8:ec2dad31:6db45332 Events : 14254 Number Major Minor RaidDevice State 7 8 170 active sync /dev/sdb1 6 8 331 active sync /dev/sdc1 3 8 652 active sync /dev/sde1 4 8 493 active sync /dev/sdd1 5 8 814 active sync /dev/sdf1 # lvdisplay --- Logical volume --- LV Path/dev/alpha/lv1 LV Namelv1 VG Namealpha LV UUIDHP19fU-oMhM-sdqN-yFWa-N3Rs-ktBw-21GSD2 LV Write Accessread/write LV Creation host, time , LV Status available # open 0 LV Size3.52 TiB Current LE 115431 Segments 3 Allocation inherit Read ahead sectors auto - currently set to 4096 Block device 252:0 ... and I've always found those hard to deal with in scripts. :) (But they do have plumbing options, to use the git terminology, so I'd be happy with having a parsable output option). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Hey, Virtual Memory! Now I can have a *really big* ramdisk! --- signature.asc Description: Digital signature
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/28/2012 10:13 PM, Hugo Mills wrote: Summary: Disk_size: 135.00 GiB Disk_allocated: 10.51 GiB Disk_unallocated: 124.49 GiB Used:2.59 GiB Free_(Estimated): 91.93 GiB Average_disk_efficiency: 70 % Details: Chunk-typeMode Disk-allocated Used Available Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB Where: Disk-allocated - space used on the disk by the chunk Disk-size - size of the disk Disk-unallocated - disk not used in any chunk Used - space used by the files/metadata The problem here is that if you're using raw storage, the Used value in the second stanza grows twice as fast as the user expects. This is the misunderstanding whom I talked before. If you give a look at the line Metadata DUP, you can see that the disk-allocated are about 6GB, instead if you sum Used and Available you got 3GB. I.e. if you create a 1GB file, Used ever increased of 1GB, and Available ever decrease 1GB, whichever you are using DUP or Single or RAID* I think this second stanza should at minimum include the cooked values used in btrfs fi df, because those reflect the user's experience. Then adding [some of?] the raw values you've got here to help connect the values to the raw data in the first stanza of output. The only raw values are the one prefixed with disk. The other ones are at the net of the DUP/Single/Raid As I said above, it's the connection between I wrote a 1GiB file to my filesystem and why have my numbers increased/decreased by 2GiB(*)/1.2GiB(**)? I repeat, if the chunk is DUP-ed, if you create 1GB file: - Disk-allocate increase 2GB (supposing that all the chunks are full) - Used increase 1GB - Available decrease 1GB (*) RAID-1 (**) RAID-5-ish Ciao Goffredo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/28/2012 01:20 PM, Hugo Mills wrote: On Sat, Sep 29, 2012 at 12:02:23AM +0600, Roman Mamedov wrote: On Fri, 28 Sep 2012 18:44:07 +0200 Goffredo Baroncellikreij...@inwind.it wrote: This means that the ration of space physically allocated on the disk and the space available is 7GB/10GB = 0.7 . So on 135GB of disk, only 94GB are available. You assume metadata allocation will always grow linearly with data, which is not true. So in my opinion it is not a good estimate. No, but it's the best model we have right now. (And probably about the best model we will have, without knowledge of the future intentions of the user). Without inlining file data, the metadata is dominated by checksums, which is a linear relationship (approx 1000:1). With inlining file data, metadata is probably dominated by inline data; assuming the ratio of small-to-large files on the FS remains unchanged in future, a linear relationship also applies. For general usage, I'm happy to assume that the current ratio of data to metadata will remain largely unchanged over the lifetime of the FS. Since there really isn't a simple answer to how much free-space, why not have the command print an upper and lower estimate and let the user figure out how to interpret the numbers? This would inform the user that there is some guesswork inherent in the estimation and also provide an educated user with more exact numbers. Something containing information such as: Total:135.00 GiB Allocated:10.51 GiB Unallocated: 124.49 GiB Free_Upper_Est: 130.00 GiB Free_Lower_Est: 62.45 GiB The main idea is that an informed user would know that the upper-estimation would be for only writing, say, new data, while the lower-estimation would be for writing everything in, say, a RAID-1 subvolume. An uninformed user would (hopefully) realize that he needs to read the Wiki's FAQ. ... and I've always found those hard to deal with in scripts. :) (But they do have plumbing options, to use the git terminology, so I'd be happy with having a parsable output option). Hugo. In 'df'/'du', -h is used for human-readable output while no options is for easily parsable output. Basically, I think that the bikeshed should be green. ;) -Wade -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/27/2012 12:44 PM, Sébastien Maury wrote: Hi, I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB Metadata: total=8.00MB, used=0.00 In effect the output of btrfs filesystem df / is not very friendly. What about changing the output as below: $ btrfs filesystem disk-free / Summary: Total:135.00GB Allocated: 10.51GB Unallocated: 124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % Details: Chunk-type Mode AllocatedUsedFree -- - - DataSingle4.01GB 2.16GB 1.87GB System DUP 16.00MB 4.00KB 7.99MB System Single4.00MB0.00 4.00MB MetadataDUP 6.00GB429.16MB 2.57GB MetadataSingle8.00MB0.00 8.00MB Where the Free_(Estimated) and Average_disk_efficency are computed as: Average_disk_efficency = ratio of average disk usage = (sum(ChunkUsed)+sum(ChunkFree))/sum(ChunkAllocated) Estimated_available = Average_disk_efficency * Unallocated+sum(ChunkFree) I am open to suggestion about the terms: Used vs Allocated and Free vs Available, or a better description of Average disk efficiency BR G.Baroncelli P.S. the source could be find at http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git branch disk_free -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncelli kreij...@libero.it wrote: Sorry for the space error: Below a more correct example $ btrfs filesystem disk-free / Summary: Total:135.00GB Allocated: 10.51GB Unallocated: 124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % How do you estimate Free here? Sorry I didn't check the source code in git, but from the Details below nothing leads me to believe that this FS is doomed to only be able to usefully utilize only ~86GB of the partition, and not more. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) Why use underscores instead of spaces? Details: Chunk-typeMode AllocatedUsedFree -- - - Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature