Re: compression btrfs

2013-03-26 Thread Josef Bacik
On Mon, Mar 25, 2013 at 10:03:20PM -0600, lonat_fr...@163.com wrote:
 Hi everyone,
 
   I have used btrfs as a work partition with compression=zlib. The 
 compression ratio is not satisfied to me. 
 

So you probably want compress-force=zlib.  With just compress we will bail out
of the compression if the compressed pages are larger than the original size,
which means if you wrote a particular file and then copmressed it with gzip
you'd possibly see different results, but if you do compress-force=zlib then
you'll see behavior more like gzip.

    I tracked my workloads in btrfs. The zlib module (zlib.c) seems work well: 
 write size of each write operation in writepage function can be compressed 
 into about 20%. 
 
   I suspent the workloads may impact the btrfs behavior. My workloads include 
 really a large number of overwrite operations. 
 
    I briefly reviewed the code about the space reclaim in btrfs, and found 
 the btrfs kicks the defrag off when the overwritten range is smaller than 
 16KB, And this is the only method of reclaiming freed extents with 
 compression. Am I right?

It's 64k, and what do you mean reclaiming freed extents?  The freed extents will
be reclaimed once they are completely overwritten.

    
    So my question is if btrfs can successfully reclaim the overwritten space 
 when the cleaner thread can not be started, such as in the case that each 
 overwrite operation is larger than 16KB? 

Not sure what you mean by reclaim.  They won't be defragged if the overwrite is
above 64k, but if any write is less than 64k then it will defrag the whole file.
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: compression btrfs

2013-03-26 Thread Josef Bacik
On Tue, Mar 26, 2013 at 10:27:34AM -0600, yiletian wrote:
 Yes, I use compress-force=zlib for my partition.
 
 Consider this scenario.
 
 We first write a file with size of 256KB. Assume all data is compressed to 
 128KB size,
 btrfs create a extent item in extent-tree to record the 128KB disk range  
 (named E).
 and btrfs also creates a single file extent to records the disk range of E.
 
 Then we overwrite from 16KB to the end of file, with size of 240KB.
 Btrfs will create a new file extent for the overwritten range.
 That is, the file has two file extents: the first one is to record the first 
 16KB and the second one record the remaining 240KB.
 
 Then we are in a dilemma:
 1. the first one only occupies a disk range of 16KB, but entire E is reserved 
 for it. This is because the __btrfs_drop_exte nts function do not decrease 
 the number of back refs of E.
 2. because the overwritten range is large enough, the compress_file_range 
 does not  call btrfs_add_inode_defrag to kick off a defrag for the file 
 automatically.
 
 With this dilemma,  how can btrfs reclaim the 112KB disk range (at least) 
 recorded in E.
 

Oh yeah welcome to btrfs, you must be new here ;).  So yeah this is the way it
works, until we overwrite the entire extent we don't reclaim any of the space.
This includes the prealloc an 8 gig vm image and then random write inside of
it workload, you could end up using up to 16gb in the worst case scenario.  The
thing we could do to fix this would be to instead of splitting the file extents
and then inc'ing the ref of the original extent we instead split the extent ref
as well, so we can reclaim this space.  It's on my list of things to do down the
road, but it keeps getting supplanted by other priorities.  THanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:Re: Re: compression btrfs

2013-03-26 Thread yiletian
I think the biggest problem is how we can reclaim the space when the extent is 
a compressed one.
In this case, we may need to read and decompress data in the extent, and then 
compress the valid range to generate a new extent.
Is this process a performance killer?
At 2013-03-27 02:03:57,Josef Bacik jba...@fusionio.com wrote:
On Tue, Mar 26, 2013 at 10:27:34AM -0600, yiletian wrote:
 Yes, I use compress-force=zlib for my partition.
 
 Consider this scenario.
 
 We first write a file with size of 256KB. Assume all data is compressed to 
 128KB size,
 btrfs create a extent item in extent-tree to record the 128KB disk range  
 (named E).
 and btrfs also creates a single file extent to records the disk range of E.
 
 Then we overwrite from 16KB to the end of file, with size of 240KB.
 Btrfs will create a new file extent for the overwritten range.
 That is, the file has two file extents: the first one is to record the first 
 16KB and the second one record the remaining 240KB.
 
 Then we are in a dilemma:
 1. the first one only occupies a disk range of 16KB, but entire E is 
 reserved for it. This is because the __btrfs_drop_exte nts function do not 
 decrease the number of back refs of E.
 2. because the overwritten range is large enough, the compress_file_range 
 does not  call btrfs_add_inode_defrag to kick off a defrag for the file 
 automatically.
 
 With this dilemma,  how can btrfs reclaim the 112KB disk range (at least) 
 recorded in E.
 

Oh yeah welcome to btrfs, you must be new here ;).  So yeah this is the way it
works, until we overwrite the entire extent we don't reclaim any of the space.
This includes the prealloc an 8 gig vm image and then random write inside of
it workload, you could end up using up to 16gb in the worst case scenario.  
The
thing we could do to fix this would be to instead of splitting the file extents
and then inc'ing the ref of the original extent we instead split the extent ref
as well, so we can reclaim this space.  It's on my list of things to do down 
the
road, but it keeps getting supplanted by other priorities.  THanks,

Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html