Re: [PATCH -next] btrfs: compression: remove set but not used variable 'tree'
On 8.11.18 г. 4:15 ч., YueHaibing wrote: > Fixes gcc '-Wunused-but-set-variable' warning: > > fs/btrfs/compression.c: In function 'end_compressed_bio_write': > fs/btrfs/compression.c:232:25: warning: > variable 'tree' set but not used [-Wunused-but-set-variable] > > It not used any more after > commit 2922040236f9 ("btrfs: Remove extent_io_ops::writepage_end_io_hook") > > Signed-off-by: YueHaibing Reviewed-by: Nikolay Borisov > --- > fs/btrfs/compression.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index bde8d04..088570c 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -229,7 +229,6 @@ static noinline void end_compressed_writeback(struct > inode *inode, > */ > static void end_compressed_bio_write(struct bio *bio) > { > - struct extent_io_tree *tree; > struct compressed_bio *cb = bio->bi_private; > struct inode *inode; > struct page *page; > @@ -248,7 +247,6 @@ static void end_compressed_bio_write(struct bio *bio) >* call back into the FS and do all the end_io operations >*/ > inode = cb->inode; > - tree = _I(inode)->io_tree; > cb->compressed_pages[0]->mapping = cb->inode->i_mapping; > btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0], > cb->start, cb->start + cb->len - 1, NULL, > > > >
[PATCH -next] btrfs: compression: remove set but not used variable 'tree'
Fixes gcc '-Wunused-but-set-variable' warning: fs/btrfs/compression.c: In function 'end_compressed_bio_write': fs/btrfs/compression.c:232:25: warning: variable 'tree' set but not used [-Wunused-but-set-variable] It not used any more after commit 2922040236f9 ("btrfs: Remove extent_io_ops::writepage_end_io_hook") Signed-off-by: YueHaibing --- fs/btrfs/compression.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index bde8d04..088570c 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -229,7 +229,6 @@ static noinline void end_compressed_writeback(struct inode *inode, */ static void end_compressed_bio_write(struct bio *bio) { - struct extent_io_tree *tree; struct compressed_bio *cb = bio->bi_private; struct inode *inode; struct page *page; @@ -248,7 +247,6 @@ static void end_compressed_bio_write(struct bio *bio) * call back into the FS and do all the end_io operations */ inode = cb->inode; - tree = _I(inode)->io_tree; cb->compressed_pages[0]->mapping = cb->inode->i_mapping; btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0], cb->start, cb->start + cb->len - 1, NULL,
Re: Do btrfs compression option changes need to be atomic?
David Sterba wrote: > The ordering of the bits set/cleared can prevent unwanted combinations, > like: when the COMPRESS bit is set, there's no NODATACOW anymore: > > - clear NODATACOW > - set compress_type > - set COMPRESS > > I haven't checked with the sources if this is so. The current code doesn't use memory barriers or atomic ops, so you must assume that those memory operations can be reordered/merged by the compiler or the cpu. > So the changes would be possible in one atomic (bitwise) call? Instead > of: > > clear_bit(opts, NODATACOW) > set_bit(opts, COMPRESS) > > do: > > set_bits_with_mask(opts, COMPRESS, BTRFS_DATA__STORAGE_OPT) After the mount API changes, the value for the variable will be composed in a separate context and then applied with a single assignment after printing any messages about changes in active options. > Side note: up to now we're all used to the NO* naming and semantics of > the feature bits, so it might take some time to do the mental switch. Yeah - I've actually switched them all to positive naming in what I have so far as it simplifies the code and makes it easier to think about (for me at least). David
Re: Do btrfs compression option changes need to be atomic?
On Tue, Aug 21, 2018 at 03:34:06PM +0100, David Howells wrote: > David Sterba wrote: > > > Do you mean the compression type (btrfs_fs_info::compress_type) and the > > related bits in btrfs_fs_info::mount_opt ? There are more compression > > types but not used in the mount context. I assume you're interested > > only in the mount-time settings, otherwise the defrag and per-inode > > compression has higher priority over the global settings. > > Yes. However, it would appear that remount can race with using the fs_info > variables, so the settings can be changing whilst you're going through the > compress_file_range() function. compress_file_range reads the compress_type at the beginning, so remount does not affect that and the function will continue using that for the extent. This is racy in the sense that there's some compression going on while the mount options do not say so. I think this is not a big deal, there's sync_filesystem call at the beginning of remount, so most of the data use the previous settings. The window where new IO starts but remount has not updated the mount options is short, so yes this is racy though I'm not sure if there's a good way to actually measure that. The result is that eg. some extents got compressed but should not have been. The remount would have to flush IO and freeze all new, then change options and thaw, but this starts to sound too heavyweight. > I'm not sure it'll hurt exactly, but you can also find, say, DATACOW being > disabled whilst you're considering compressing. I think the constraint don't need to be strictly atomic, with some intermediate states that are valid but neither the old nor what the new options say. The race window spans a few instructions so as long as the combination is valid or double checked later, I'd consider it ok. The ordering of the bits set/cleared can prevent unwanted combinations, like: when the COMPRESS bit is set, there's no NODATACOW anymore: - clear NODATACOW - set compress_type - set COMPRESS I haven't checked with the sources if this is so. > > > Further to that, how much of an issue is it if the configuration is split > > > out into its own struct that is accessed from struct btrfs_fs_info using > > > RCU? > > > > Depends on how intrusive it's going to be, the mount opions are tested > > at many places. The RCU overhead and "locking" is lightweight enough so > > it should not be a problem in principle, but without seeing the code I > > can't tell. > > Actually, a better way of doing this might be to put a subset of the settings > into a single variable, say: > > struct btrfs_fs_info { > ... > unsigned intdata_storage_opt; > #define BTRFS_DATA_NONE 0x > #define BTRFS_DATA_COW 0x0001 > #define BTRFS_DATA_SUM 0x0002 > #define BTRFS_DATA_COMPRESS 0x0003 > #define BTRFS_DATA_FORCE_COMPRESS 0x0004 > #define BTRFS_DATA__STORAGE_OPT 0x000f > #define BTRFS_DATA_COMPRESS_ZLIB0x > #define BTRFS_DATA_COMPRESS_LZO 0x0010 > #define BTRFS_DATA_COMPRESS_ZSTD0x0020 > #define BTRFS_DATA__COMPRESS_TYPE 0x00f0 > #define BTRFS_DATA__COMPRESS_LEVEL 0x0f00 > ... > }; > > That way it might be possible for the datacow, datasum, compress and > compress-force options to be handled atomically. So the changes would be possible in one atomic (bitwise) call? Instead of: clear_bit(opts, NODATACOW) set_bit(opts, COMPRESS) do: set_bits_with_mask(opts, COMPRESS, BTRFS_DATA__STORAGE_OPT) ? Side note: up to now we're all used to the NO* naming and semantics of the feature bits, so it might take some time to do the mental switch.
Re: Do btrfs compression option changes need to be atomic?
David Sterba wrote: > Do you mean the compression type (btrfs_fs_info::compress_type) and the > related bits in btrfs_fs_info::mount_opt ? There are more compression > types but not used in the mount context. I assume you're interested > only in the mount-time settings, otherwise the defrag and per-inode > compression has higher priority over the global settings. Yes. However, it would appear that remount can race with using the fs_info variables, so the settings can be changing whilst you're going through the compress_file_range() function. I'm not sure it'll hurt exactly, but you can also find, say, DATACOW being disabled whilst you're considering compressing. > > Further to that, how much of an issue is it if the configuration is split > > out into its own struct that is accessed from struct btrfs_fs_info using > > RCU? > > Depends on how intrusive it's going to be, the mount opions are tested > at many places. The RCU overhead and "locking" is lightweight enough so > it should not be a problem in principle, but without seeing the code I > can't tell. Actually, a better way of doing this might be to put a subset of the settings into a single variable, say: struct btrfs_fs_info { ... unsigned intdata_storage_opt; #define BTRFS_DATA_NONE 0x #define BTRFS_DATA_COW 0x0001 #define BTRFS_DATA_SUM 0x0002 #define BTRFS_DATA_COMPRESS 0x0003 #define BTRFS_DATA_FORCE_COMPRESS 0x0004 #define BTRFS_DATA__STORAGE_OPT 0x000f #define BTRFS_DATA_COMPRESS_ZLIB0x #define BTRFS_DATA_COMPRESS_LZO 0x0010 #define BTRFS_DATA_COMPRESS_ZSTD0x0020 #define BTRFS_DATA__COMPRESS_TYPE 0x00f0 #define BTRFS_DATA__COMPRESS_LEVEL 0x0f00 ... }; That way it might be possible for the datacow, datasum, compress and compress-force options to be handled atomically. David
Re: Do btrfs compression option changes need to be atomic?
On Tue, Aug 21, 2018 at 02:46:30PM +0100, David Howells wrote: > Should changes to the compression options on a btrfs mount be atomic, the > problem being that they're split across three variables? Do you mean the compression type (btrfs_fs_info::compress_type) and the related bits in btrfs_fs_info::mount_opt ? There are more compression types but not used in the mount context. I assume you're interested only in the mount-time settings, otherwise the defrag and per-inode compression has higher priority over the global settings. When an extent is going to be compressed, the current value of compression type is read and then passed around. > Further to that, how much of an issue is it if the configuration is split out > into its own struct that is accessed from struct btrfs_fs_info using RCU? Depends on how intrusive it's going to be, the mount opions are tested at many places. The RCU overhead and "locking" is lightweight enough so it should not be a problem in principle, but without seeing the code I can't tell.
Re: Do btrfs compression option changes need to be atomic?
On 21 Aug 2018, at 9:46, David Howells wrote: Should changes to the compression options on a btrfs mount be atomic, the problem being that they're split across three variables? Further to that, how much of an issue is it if the configuration is split out into its own struct that is accessed from struct btrfs_fs_info using RCU? Hi David, They shouldn't need to be atomic, we're making compression decisions on a per-extent basis and sometimes bailing on the compression if it doesn't make things smaller. -chris
Do btrfs compression option changes need to be atomic?
Should changes to the compression options on a btrfs mount be atomic, the problem being that they're split across three variables? Further to that, how much of an issue is it if the configuration is split out into its own struct that is accessed from struct btrfs_fs_info using RCU? David
"error inheriting props for ino": Btrfs "compression" property
Hello, On two machines I have subvolumes where I backup other hosts' root filesystems via rsync. These subvolumes have the +c attribute on them. During the backup, sometimes I get tons of messages like these in dmesg: [Wed Jul 25 20:58:22 2018] BTRFS error (device dm-8): error inheriting props for ino 1213720 (root 1301): -28 [Wed Jul 25 20:58:22 2018] BTRFS error (device dm-8): error inheriting props for ino 1213723 (root 1301): -28 [Wed Jul 25 20:58:22 2018] BTRFS error (device dm-8): error inheriting props for ino 1213724 (root 1301): -28 [Wed Jul 25 20:58:22 2018] BTRFS error (device dm-8): error inheriting props for ino 1213725 (root 1301): -28 # btrfs inspect inode-resolve 1213720 . ./gemini/lib/modules/4.14.58-rm2+/kernel/virt This seems to be related to the "compression" property in Btrfs: # btrfs property get ./gemini/lib/modules/4.14.58-rm2+/ compression=zlib # btrfs property get ./gemini/lib/modules/4.14.58-rm2+/kernel/virt (no output) Why would it fail like that? This does seem harmless, but the messages are annoying and it's puzzling why this happens in the first place. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/4] btrfs: compression: Add linux/sizes.h for compression.h
Since compression.h is using SZ_* macros, and if some user only includes compression.h without linux/sizes.h, it will cause compile error. One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED, it would cause compile error. Fix it by adding linux/sizes.h in compression.h Signed-off-by: Qu WenruoReviewed-by: Nikolay Borisov --- fs/btrfs/compression.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index cc605f7b23fb..317703d9b073 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -6,6 +6,7 @@ #ifndef BTRFS_COMPRESSION_H #define BTRFS_COMPRESSION_H +#include /* * We want to make sure that amount of RAM required to uncompress an extent is * reasonable, so we limit the total size in ram of a compressed extent to -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] btrfs: compression: Add linux/sizes.h for compression.h
Since compression.h is using SZ_* macros, and if some user only includes compression.h without linux/sizes.h, it will cause compile error. One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED, it would cause compile error. Fix it by adding linux/sizes.h in compression.h Signed-off-by: Qu WenruoReviewed-by: Nikolay Borisov --- fs/btrfs/compression.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index cc605f7b23fb..317703d9b073 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -6,6 +6,7 @@ #ifndef BTRFS_COMPRESSION_H #define BTRFS_COMPRESSION_H +#include /* * We want to make sure that amount of RAM required to uncompress an extent is * reasonable, so we limit the total size in ram of a compressed extent to -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] btrfs: compression: Add linux/sizes.h for compression.h
On 17.05.2018 09:27, Qu Wenruo wrote: > Since compression.h is using SZ_* macros, and if some user only includes > compression.h without linux/sizes.h, it will cause compile error. > > One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED, it would cause > compile error. > > Fix it by adding linux/sizes.h in compression.h > > Signed-off-by: Qu WenruoReviewed-by: Nikolay Borisov > --- > fs/btrfs/compression.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h > index cc605f7b23fb..317703d9b073 100644 > --- a/fs/btrfs/compression.h > +++ b/fs/btrfs/compression.h > @@ -6,6 +6,7 @@ > #ifndef BTRFS_COMPRESSION_H > #define BTRFS_COMPRESSION_H > > +#include > /* > * We want to make sure that amount of RAM required to uncompress an extent > is > * reasonable, so we limit the total size in ram of a compressed extent to > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs: compression: Add linux/sizes.h for compression.h
Since compression.h is using SZ_* macros, and if some user only includes compression.h without linux/sizes.h, it will cause compile error. One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED, it would cause compile error. Fix it by adding linux/sizes.h in compression.h Signed-off-by: Qu Wenruo--- fs/btrfs/compression.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index cc605f7b23fb..317703d9b073 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -6,6 +6,7 @@ #ifndef BTRFS_COMPRESSION_H #define BTRFS_COMPRESSION_H +#include /* * We want to make sure that amount of RAM required to uncompress an extent is * reasonable, so we limit the total size in ram of a compressed extent to -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
On 18.04.2018 22:24, Chris Murphy wrote: > On Wed, Apr 18, 2018 at 10:38 AM, Austin S. Hemmelgarn> wrote: > >> For reference, the zstd compression in BTRFS uses level 3 by default (as >> does zlib compression IIRC), though I'm not sure about lzop (I think it >> uses the lowest compression setting). >> > > > The user space tool, zstd, does default to 3, according to its man page. > >-# # compression level [1-19] (default: 3) > > > However, the kernel is claiming it's level 0, which doesn't exist in the > man page. So I have no idea what we're using. This is what I get with mount > option compress=zstd > Currently the kernel-mode zstd compression doesn't really support any levels (compress_level is not set, even if it's passed, and even then zstd_set_level is also unimplemented). So this number doesn't really make any difference. > [4.097858] BTRFS info (device nvme0n1p9): use zstd compression, level 0 > > > > -- > Chris Murphy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
Thank you, all Though the info is useful, there's not a clear consensus on what I should expect. For interest's sake, I'll post benchmarks from the device itself when it arrives. I'm expecting at least that I'll be blown away :) On 04/18/2018 09:23 PM, Chris Murphy wrote: On Wed, Apr 18, 2018 at 10:38 AM, Austin S. Hemmelgarn> wrote: For reference, the zstd compression in BTRFS uses level 3 by default (as does zlib compression IIRC), though I'm not sure about lzop (I think it uses the lowest compression setting). The user space tool, zstd, does default to 3, according to its man page. -# # compression level [1-19] (default: 3) However, the kernel is claiming it's level 0, which doesn't exist in the man page. So I have no idea what we're using. This is what I get with mount option compress=zstd [ 4.097858] BTRFS info (device nvme0n1p9): use zstd compression, level 0 -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
On Wed, Apr 18, 2018 at 10:38 AM, Austin S. Hemmelgarnwrote: > For reference, the zstd compression in BTRFS uses level 3 by default (as > does zlib compression IIRC), though I'm not sure about lzop (I think it > uses the lowest compression setting). > The user space tool, zstd, does default to 3, according to its man page. -# # compression level [1-19] (default: 3) However, the kernel is claiming it's level 0, which doesn't exist in the man page. So I have no idea what we're using. This is what I get with mount option compress=zstd [4.097858] BTRFS info (device nvme0n1p9): use zstd compression, level 0 -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
On 18.04.2018 21:28, David Sterba wrote: > On Wed, Apr 18, 2018 at 06:14:07PM +0300, Nikolay Borisov wrote: >> >> >> On 18.04.2018 18:10, Brendan Hide wrote: >>> Hi, all >>> >>> I'm looking for some advice re compression with NVME. Compression helps >>> performance with a minor CPU hit - but is it still worth it with the far >>> higher throughputs offered by newer PCI and NVME-type SSDs? >>> >>> I've ordered a PCIe-to-M.2 adapter along with a 1TB 960 Evo drive for my >>> home desktop. I previously used compression on an older SATA-based Intel >>> 520 SSD, where compression made sense. >>> >>> However, the wisdom isn't so clear-cut if the SSD is potentially faster >>> than the compression algorithm with my CPU (aging i7 3770). >>> >>> Testing using a copy of the kernel source tarball in tmpfs it seems my >>> system can compress/decompress at about 670MB/s using zstd with 8 >>> threads. lzop isn't that far behind. But I'm not sure if the benchmark >>> I'm running is the same as how btrfs would be using it internally. >>> >>> Given these numbers I'm inclined to believe compression will make things >>> slower - but can't be sure without knowing if I'm testing correctly. >>> >>> What is the best practice with benchmarking and with NVME/PCI storage? >> >> btrfs doesn't support DAX so using it on NVME doesn't make much sense >> performance wise. > > Is'nt NVMe just "the faster SSD"? Not the persistent memory thing. Indeed, brain fart on my part. NVDIMM is the persistent memory thing. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
On Wed, Apr 18, 2018 at 06:14:07PM +0300, Nikolay Borisov wrote: > > > On 18.04.2018 18:10, Brendan Hide wrote: > > Hi, all > > > > I'm looking for some advice re compression with NVME. Compression helps > > performance with a minor CPU hit - but is it still worth it with the far > > higher throughputs offered by newer PCI and NVME-type SSDs? > > > > I've ordered a PCIe-to-M.2 adapter along with a 1TB 960 Evo drive for my > > home desktop. I previously used compression on an older SATA-based Intel > > 520 SSD, where compression made sense. > > > > However, the wisdom isn't so clear-cut if the SSD is potentially faster > > than the compression algorithm with my CPU (aging i7 3770). > > > > Testing using a copy of the kernel source tarball in tmpfs it seems my > > system can compress/decompress at about 670MB/s using zstd with 8 > > threads. lzop isn't that far behind. But I'm not sure if the benchmark > > I'm running is the same as how btrfs would be using it internally. > > > > Given these numbers I'm inclined to believe compression will make things > > slower - but can't be sure without knowing if I'm testing correctly. > > > > What is the best practice with benchmarking and with NVME/PCI storage? > > btrfs doesn't support DAX so using it on NVME doesn't make much sense > performance wise. Is'nt NVMe just "the faster SSD"? Not the persistent memory thing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
On 2018-04-18 11:10, Brendan Hide wrote: Hi, all I'm looking for some advice re compression with NVME. Compression helps performance with a minor CPU hit - but is it still worth it with the far higher throughputs offered by newer PCI and NVME-type SSDs? I've ordered a PCIe-to-M.2 adapter along with a 1TB 960 Evo drive for my home desktop. I previously used compression on an older SATA-based Intel 520 SSD, where compression made sense. However, the wisdom isn't so clear-cut if the SSD is potentially faster than the compression algorithm with my CPU (aging i7 3770). Testing using a copy of the kernel source tarball in tmpfs it seems my system can compress/decompress at about 670MB/s using zstd with 8 threads. lzop isn't that far behind. But I'm not sure if the benchmark I'm running is the same as how btrfs would be using it internally. BTRFS compresses chunks of 128k at a time and compresses each block one at a time (it doesn't do multi-threaded compression). You can simulate this a bit better by splitting the files you're trying to compress into 128k chunks (calling `split -b 131072` on the file will do this quickly and easily), and then passing all those chunks to the compression program _at the same time_ (this eliminates the overhead of re-invoking the compressor for each block), and then running it with one thread. For reference, the zstd compression in BTRFS uses level 3 by default (as does zlib compression IIRC), though I'm not sure about lzop (I think it uses the lowest compression setting). Note that this will still not be entirely accurate (there are significant differences in buffer handling in the in-kernel implementations because of memory management differences). Another option is to see how long it takes to copy the test data into a ZRAM device. This will eliminate the storage overhead, and use the same compression algorithms that BTRFS does (the only big difference is that it compresses by page, so it will use 4k blocks instead of 128k). zRAM currently doesn't support zstd (though patches have been posted), but it by default uses lzo, and it supports deflate as well (which is essentially the same mathematically as the 'zlib' compression method in BTRFS). Given these numbers I'm inclined to believe compression will make things slower - but can't be sure without knowing if I'm testing correctly. On NVMe, yes, it's probably not worth it for speed. It may however help in other ways. Compressed writes are smaller than normal writes. This means that rewriting a file that is compressed by the filesystem will result in fewer rewritten blocks of storage, which can be useful when dealing with flash memory. Less written data also means you leave a bit more free space for the wear-leveling algorithms to work with, which can improve performance on some devices. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nvme+btrfs+compression sensibility and benchmark
On 18.04.2018 18:10, Brendan Hide wrote: > Hi, all > > I'm looking for some advice re compression with NVME. Compression helps > performance with a minor CPU hit - but is it still worth it with the far > higher throughputs offered by newer PCI and NVME-type SSDs? > > I've ordered a PCIe-to-M.2 adapter along with a 1TB 960 Evo drive for my > home desktop. I previously used compression on an older SATA-based Intel > 520 SSD, where compression made sense. > > However, the wisdom isn't so clear-cut if the SSD is potentially faster > than the compression algorithm with my CPU (aging i7 3770). > > Testing using a copy of the kernel source tarball in tmpfs it seems my > system can compress/decompress at about 670MB/s using zstd with 8 > threads. lzop isn't that far behind. But I'm not sure if the benchmark > I'm running is the same as how btrfs would be using it internally. > > Given these numbers I'm inclined to believe compression will make things > slower - but can't be sure without knowing if I'm testing correctly. > > What is the best practice with benchmarking and with NVME/PCI storage? btrfs doesn't support DAX so using it on NVME doesn't make much sense performance wise. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nvme+btrfs+compression sensibility and benchmark
Hi, all I'm looking for some advice re compression with NVME. Compression helps performance with a minor CPU hit - but is it still worth it with the far higher throughputs offered by newer PCI and NVME-type SSDs? I've ordered a PCIe-to-M.2 adapter along with a 1TB 960 Evo drive for my home desktop. I previously used compression on an older SATA-based Intel 520 SSD, where compression made sense. However, the wisdom isn't so clear-cut if the SSD is potentially faster than the compression algorithm with my CPU (aging i7 3770). Testing using a copy of the kernel source tarball in tmpfs it seems my system can compress/decompress at about 670MB/s using zstd with 8 threads. lzop isn't that far behind. But I'm not sure if the benchmark I'm running is the same as how btrfs would be using it internally. Given these numbers I'm inclined to believe compression will make things slower - but can't be sure without knowing if I'm testing correctly. What is the best practice with benchmarking and with NVME/PCI storage? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: compression: replace workspace managment code by mempool API
Mostly cleanup of old code and replace old API with new one. 1. Drop old linked list based approach 2. Replace all ERR_PTR(-ENOMEM) with NULL, as mempool code only understood NULL 3. mempool call alloc methods on create/resize, so for be sure, move nofs_{save,restore} to appropriate places 4. Update btrfs_comp_op to use void *ws, instead of list_head *ws 5. LZO more aggressive use of kmalloc on order 1 alloc, for more aggressive fallback to mempool 6. Refactor alloc functions to check every allocation, because mempool flags are aggressive and can fail more frequently. Signed-off-by: Timofey Titovets--- fs/btrfs/compression.c | 213 + fs/btrfs/compression.h | 12 +-- fs/btrfs/lzo.c | 64 +-- fs/btrfs/zlib.c| 56 +++-- fs/btrfs/zstd.c| 49 +++- 5 files changed, 148 insertions(+), 246 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 02bd60357f04..869df3f5bd1b 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -811,23 +811,12 @@ static void *heuristic_ws_alloc(gfp_t gfp_mask, void *pool_data) return NULL; } -struct workspaces_list { - struct list_head idle_ws; - spinlock_t ws_lock; - /* Number of free workspaces */ - int free_ws; - /* Total number of allocated workspaces */ - atomic_t total_ws; - /* Waiters for a free workspace */ - wait_queue_head_t ws_wait; -}; - struct workspace_stor { mempool_t *pool; }; static struct workspace_stor btrfs_heuristic_ws_stor; -static struct workspaces_list btrfs_comp_ws[BTRFS_COMPRESS_TYPES]; +static struct workspace_stor btrfs_comp_stor[BTRFS_COMPRESS_TYPES]; static const struct btrfs_compress_op * const btrfs_compress_op[] = { _zlib_compress, @@ -837,14 +826,14 @@ static const struct btrfs_compress_op * const btrfs_compress_op[] = { void __init btrfs_init_compress(void) { - struct list_head *workspace; int i; - mempool_t *pool = btrfs_heuristic_ws_stor.pool; + mempool_t *pool; /* * Preallocate one workspace for heuristic so * we can guarantee forward progress in the worst case */ + pool = btrfs_heuristic_ws_stor.pool; pool = mempool_create(1, heuristic_ws_alloc, heuristic_ws_free, NULL); @@ -852,23 +841,17 @@ void __init btrfs_init_compress(void) pr_warn("BTRFS: cannot preallocate heuristic workspace, will try later\n"); for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) { - INIT_LIST_HEAD(_comp_ws[i].idle_ws); - spin_lock_init(_comp_ws[i].ws_lock); - atomic_set(_comp_ws[i].total_ws, 0); - init_waitqueue_head(_comp_ws[i].ws_wait); - + pool = btrfs_comp_stor[i].pool; /* * Preallocate one workspace for each compression type so * we can guarantee forward progress in the worst case */ - workspace = btrfs_compress_op[i]->alloc_workspace(); - if (IS_ERR(workspace)) { + pool = mempool_create(1, btrfs_compress_op[i]->alloc_workspace, + btrfs_compress_op[i]->free_workspace, + NULL); + + if (pool == NULL) pr_warn("BTRFS: cannot preallocate compression workspace, will try later\n"); - } else { - atomic_set(_comp_ws[i].total_ws, 1); - btrfs_comp_ws[i].free_ws = 1; - list_add(workspace, _comp_ws[i].idle_ws); - } } } @@ -881,6 +864,7 @@ static void *mempool_alloc_wrap(struct workspace_stor *stor) int ncpu = num_online_cpus(); while (unlikely(stor->pool == NULL)) { + int i; mempool_t *pool; void *(*ws_alloc)(gfp_t gfp_mask, void *pool_data); void (*ws_free)(void *element, void *pool_data); @@ -888,6 +872,13 @@ static void *mempool_alloc_wrap(struct workspace_stor *stor) if (stor == _heuristic_ws_stor) { ws_alloc = heuristic_ws_alloc; ws_free = heuristic_ws_free; + } else { + for (i = 0; i < BTRFS_COMPRESS_TYPES; i++) { + if (stor == _comp_stor[i]) + break; + } + ws_alloc = btrfs_compress_op[i]->alloc_workspace; + ws_free = btrfs_compress_op[i]->free_workspace; } pool = mempool_create(1, ws_alloc, ws_free, NULL); @@ -915,7 +906,12 @@ static void *mempool_alloc_wrap(struct workspace_stor *stor) if (stor->pool->min_nr != ncpu)
Re: [PATCH 1/4] btrfs: compression: add helper for type to string conversion
On Fri, Dec 01, 2017 at 05:55:56AM +0800, Anand Jain wrote: > > > On 12/01/2017 12:09 AM, David Sterba wrote: > > There are several places opencoding this conversion, add a helper now > > that we have 3 compression algorithms. > > > > Signed-off-by: David Sterba> > --- > > fs/btrfs/compression.c | 15 +++ > > fs/btrfs/compression.h | 2 ++ > > 2 files changed, 17 insertions(+) > > > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > > index 5982c8a71f02..a1a5958ff328 100644 > > --- a/fs/btrfs/compression.c > > +++ b/fs/btrfs/compression.c > > @@ -45,6 +45,21 @@ > > #include "extent_io.h" > > #include "extent_map.h" > > > > +static const char* const btrfs_compress_types[] = { "", "lzo", "zlib", > > "zstd" }; > > This is wrong order. > > compression.h:BTRFS_COMPRESS_NONE = 0, > compression.h:BTRFS_COMPRESS_ZLIB = 1, > compression.h:BTRFS_COMPRESS_LZO = 2, > compression.h:BTRFS_COMPRESS_ZSTD = 3, > compression.h:BTRFS_COMPRESS_TYPES = 3, This was unintentionally left as an exercise for reviewers. Will fix, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] btrfs: compression: add helper for type to string conversion
On 12/01/2017 12:09 AM, David Sterba wrote: There are several places opencoding this conversion, add a helper now that we have 3 compression algorithms. Signed-off-by: David Sterba--- fs/btrfs/compression.c | 15 +++ fs/btrfs/compression.h | 2 ++ 2 files changed, 17 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 5982c8a71f02..a1a5958ff328 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -45,6 +45,21 @@ #include "extent_io.h" #include "extent_map.h" +static const char* const btrfs_compress_types[] = { "", "lzo", "zlib", "zstd" }; This is wrong order. compression.h: BTRFS_COMPRESS_NONE = 0, compression.h: BTRFS_COMPRESS_ZLIB = 1, compression.h: BTRFS_COMPRESS_LZO = 2, compression.h: BTRFS_COMPRESS_ZSTD = 3, compression.h: BTRFS_COMPRESS_TYPES = 3, Thanks, Anand +const char* btrfs_compress_type2str(enum btrfs_compression_type type) +{ + switch (type) { + case BTRFS_COMPRESS_LZO: + case BTRFS_COMPRESS_ZLIB: + case BTRFS_COMPRESS_ZSTD: + case BTRFS_COMPRESS_NONE: + return btrfs_compress_types[type]; + } + + return NULL; +} + static int btrfs_decompress_bio(struct compressed_bio *cb); static inline int compressed_bio_size(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 6b692903a23c..677fa4aa0bd7 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -137,6 +137,8 @@ extern const struct btrfs_compress_op btrfs_zlib_compress; extern const struct btrfs_compress_op btrfs_lzo_compress; extern const struct btrfs_compress_op btrfs_zstd_compress; +const char* btrfs_compress_type2str(enum btrfs_compression_type type); + int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs: compression: add helper for type to string conversion
There are several places opencoding this conversion, add a helper now that we have 3 compression algorithms. Signed-off-by: David Sterba--- fs/btrfs/compression.c | 15 +++ fs/btrfs/compression.h | 2 ++ 2 files changed, 17 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 5982c8a71f02..a1a5958ff328 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -45,6 +45,21 @@ #include "extent_io.h" #include "extent_map.h" +static const char* const btrfs_compress_types[] = { "", "lzo", "zlib", "zstd" }; + +const char* btrfs_compress_type2str(enum btrfs_compression_type type) +{ + switch (type) { + case BTRFS_COMPRESS_LZO: + case BTRFS_COMPRESS_ZLIB: + case BTRFS_COMPRESS_ZSTD: + case BTRFS_COMPRESS_NONE: + return btrfs_compress_types[type]; + } + + return NULL; +} + static int btrfs_decompress_bio(struct compressed_bio *cb); static inline int compressed_bio_size(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 6b692903a23c..677fa4aa0bd7 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -137,6 +137,8 @@ extern const struct btrfs_compress_op btrfs_zlib_compress; extern const struct btrfs_compress_op btrfs_lzo_compress; extern const struct btrfs_compress_op btrfs_zstd_compress; +const char* btrfs_compress_type2str(enum btrfs_compression_type type); + int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); #endif -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: compression: Return correct default zlib compression level
0 means no compression at all for zlib. Z_DEFAULT_COMPRESSION is the correct value, which is -1 and will be converted to 6 at runtime. Signed-off-by: Qu Wenruo--- fs/btrfs/compression.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index b35ce16b3df3..29a41e6c9e28 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" @@ -1528,5 +1529,9 @@ unsigned int btrfs_compress_str2level(const char *str) if (str[4] == ':' && '1' <= str[5] && str[5] <= '9' && str[6] == 0) return str[5] - '0'; - return 0; + /* +* NOTE: Default compression level is not 0!! +* 0 means no compression at all +*/ + return Z_DEFAULT_COMPRESSION; } -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Btrfs: compression heuristic performance
Just info update. I did some more benchmark, optimization and testing. (Write kernel generalized version of radix sort (that can be ported to sort.c)) (Some memory allocating changes & etc) (Also i'm stupid and make a mistake with inversion of percentage numbers) New, more clean numbers (I disable useless printf and etc): Stats mode heap sort (in kernel): - average ... random data - ~3000 MiB/s - Zeroed - ~4400 MiB/s Stats mode radix sort: - average ... random data - ~4900 MiB/s +63% - Zeroed - ~4100 MiB/s -7% # radix have some problems with sorting zeroes :) Non stats mode heap sort: - average ... random data - ~3500 MiB/s - Zeroed - ~11500 MiB/s Non stats mode radix sort: - average ... random data - ~6000 MiB/s +71% - Zeroed - ~11500 MiB/s # as expected no diff, sorting do nothing on that case Another hot (and slow) path in heuristic, is simple ilog2 =\ i'm give up on porting in kernel ilog2 for x86, so i'm not sure that happens in kernel with that, so JFYI: it's have a significant value only with radix, because heap sort are much slower... So numbers: Stats mode simple ilog2: - average ... random data - ~3000 MiB/s - Zeroed - ~4400 MiB/s Stats mode fast ilog2[1]: - average ... random data - ~3500 MiB/s +16% - Zeroed - ~4400 MiB/s # ilog2 just do nothing on zeroes Non stats mode simple ilog2: - average ... random data - ~3500 MiB/s - Zeroed - ~11500 MiB/s Non stats mode fast ilog2: # Because byte_core_set_size avoid from too bad data - average ... random data - ~3600 MiB/s +2% - Zeroed - ~11500 MiB/s # as expected no diff, ilog2 do nothing on that case # with radix sort - average ... random data - ~6200 MiB/s +77% So, as heuristic is try save CPU time from compression, I think that acceptable to sacrifice 1KiB memory for buffered data (general time/memory trade-off) I will write a RFC with radix sort to compression.c in near time. I'm hard to found other kernel places where that can be used for now. IMHO, if someone also will want that, we will always can move that to lib later. Thanks. Notes: 1. Fast cross platform 64 bit ilog2 (found in the web) const int tab64[64] = { 63, 0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61, 51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56, 45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5}; int ilog2_64 (uint64_t value) { value |= value >> 1; value |= value >> 2; value |= value >> 4; value |= value >> 8; value |= value >> 16; value |= value >> 32; return tab64[((uint64_t)((value - (value >> 1))*0x07EDD5E59A4E28C2)) >> 58]; } -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Btrfs: compression heuristic performance
Hi David, (and list folks in CC of course). TL;DR Sorting is a slowest part of heuristic for now (in worst case ~55% of run time). Kernel use heap sort, i.e. heuristic also use that. Radix sort speedup heuristic a lot (~+30-40%). But radix have some tradeoffs (i.e. need more memory, but that can be easy fixed in theory), does that a normal idea to port radix sort for kernel? Long read version below. --- I already mentioned that i reverse port heuristic code from kernel to userspace [1], to make some experiments and profiling easy for me. (So i spend about a week with some experiments and testings.) (I firstly profiling with Valgrind) So i found heuristic bottleneck for now, is a kernel heap sort() (sort uses about 55% of runtime, that look crazy for me) And as sort used in byte_core_set_size(), all not text data, sort will be called. i.e. from my point of view in 90% cases. (Tool have 2 mods in main, stats and non stats %) ) For show the scale of a problem: Stats mode: - For average and worst case data (random) ~2600 MiB/s - Zeroed data - ~3400 MiB/s Non stats (kernel like) - For average and worst case data (random) ~3000 MiB/s - Zeroed data - ~8000 MiB/s I spend several days try to beat that by different way =\ Only way that i found (i.e. that show good profit) Is replace kernel heap sort with radix_sort() [2] =\ With radix sort: Stats mode: - For average and worst case data (random) ~3800 MiB/s ~+31% - Zeroed data - ~3400 MiB/s (not sure why performance are same) Non stats (kernel like) - For average and worst case data (random) ~4800 MiB/s ~+37% - Zeroed data - ~8000 MiB/s (Inline Radix get additional show +200MiB/s) That seems cool, but please, wait a second nothing come free. Radix sort need a 2 buffers, one for counters, and one for output array. So in our case, (i use 4 bit as radix base), 16 * 4 bytes for counters (stored at stack of sort function) + 4*256 bytes for bucket_buffer. (stored at heuristic workspace). So, in theory, that make a heuristic workspace weight more, i.e. +1KiB per CPU. As heuristic have some assumptions, i.e. we really care only about sample size. And sample size in our case are 8KiB max (i.e. 8KiB per CPU). That possible to replace u32 counters in buckets with u16 counters. And get same: 8KiB sample per CPU 1KiB buckets per CPU But main question: does that make a sense? Thanks! Links: 1. https://github.com/Nefelim4ag/companalyze 2. https://github.com/Nefelim4ag/companalyze/commit/1cd371eac5056927e5f567c6bd32e179ba859629 -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
2017-09-22 8:33 GMT+03:00 shally verma: > On Wed, Sep 20, 2017 at 5:26 PM, Timofey Titovets > wrote: >> 2017-09-20 14:10 GMT+03:00 shally verma : >>> Interesting part is I dont see "encoded" under flags. I couldn't >>> understand if flags are retrieved from btrfs file metadata info. As >>> you are running on 4.14 and I am on 4.9 >>> >>> So, am still under doubt, even with dd if files are getting compressed. >>> >>> What is the filesize shown if you run >>> btrfs fi du /mnt/test0.0 .. is it less or actual size? >>> >>> Is there any command that i can run to confirm file has been compressed? >>> >>> So far, I had my prints enabled in kernel/fs/btrfs/compression.c and >>> check in dmesg that code jumped to compress_page() func. >>> >>> Thanks >>> Shally >>> >> >> Okay, lets play different. >> encoded work for last several years for kernel releases, so you must see >> that. >> >> Reproduction script: >> #!/bin/bash -e >> >> FILE_NAME=$RANDOM$RANDOM >> TMP_DIR=$(mktemp -d) >> IMAGE_FILE="$HOME/$FILE_NAME" >> >> truncate -s 4G $IMAGE_FILE >> mkfs.btrfs -m single -L COMPRESS_TEST $IMAGE_FILE >> mount -o compress-force $IMAGE_FILE $TMP_DIR >> dd if=/dev/zero bs=128K count=2 of=$TMP_DIR/zero >> sync >> filefrag -v $TMP_DIR/zero >> umount $TMP_DIR >> rm -v $IMAGE_FILE >> >> Example output: >> ~ sudo ./btrfs_compress_test.sh >> btrfs-progs v4.13 >> See http://btrfs.wiki.kernel.org for more information. >> >> Label: COMPRESS_TEST >> UUID: abfedc39-dd94-4105-87d6-49eedb13467f >> Node size: 16384 >> Sector size:4096 >> Filesystem size:4.00GiB >> Block group profiles: >> Data: single8.00MiB >> Metadata: single8.00MiB >> System: single4.00MiB >> SSD detected: no >> Incompat features: extref, skinny-metadata >> Number of devices: 1 >> Devices: >> IDSIZE PATH >>1 4.00GiB /root/322906281 >> >> 2+0 records in >> 2+0 records out >> 262144 bytes (262 kB, 256 KiB) copied, 0.000197746 s, 1.3 GB/s >> Filesystem type is: 9123683e >> File size of /tmp/tmp.bDyt3EkEG5/zero is 262144 (64 blocks of 4096 bytes) >> ext: logical_offset:physical_offset: length: expected: flags: >> 0:0.. 31: 3072.. 3103: 32: encoded >> 1: 32.. 63: 3073.. 3104: 32: 3104: >> last,encoded,eof >> /tmp/tmp.bDyt3EkEG5/zero: 2 extents found >> removed '/root/322906281' >> >> Good luck. > Here's my output - Everything is same except: > > 1. nodesize and sector size = 64K > 2. extent length = 2 > 3. I don't see "encoded" in filefrag here. > > btrfs-progs v4.13 > See http://btrfs.wiki.kernel.org for more information. > > Label: COMPRESS_TEST > UUID: fad6907e-d4eb-4dbb-9014-3918a822c9ce > Node size: 65536 > Sector size:65536 > Filesystem size:4.00GiB > Block group profiles: > Data: single8.00MiB > Metadata: single8.00MiB > System: single4.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: >IDSIZE PATH > 1 4.00GiB /root/2808626087 > > 2+0 records in > 2+0 records out > 262144 bytes (262 kB) copied, 0.00028777 s, 911 MB/s > Filesystem type is: 9123683e > File size of /tmp/tmp.346ESCdOIi/zero is 262144 (4 blocks of 65536 bytes) > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 1:192.. 193: 2: >1:2.. 3:193.. 194: 2:194: eof > /tmp/tmp.346ESCdOIi/zero: 2 extents found > removed '/root/2808626087' > > And this is my dmesg > > [170127.417119] BTRFS: device label COMPRESS_TEST devid 1 transid 5 /dev/loop0 > [170127.417493] BTRFS info (device loop0): force zlib compression > [170127.417496] BTRFS info (device loop0): disk space caching is enabled > [170127.417499] BTRFS info (device loop0): has skinny extents > [170127.425858] BTRFS info (device loop0): creating UUID tree > > This is fio --version > fio-3.0 > > What do we doubt here? > > Thanks > Shally > >> -- >> Have a nice day, >> Timofey. Sorry, no idea. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Wed, Sep 20, 2017 at 5:26 PM, Timofey Titovetswrote: > 2017-09-20 14:10 GMT+03:00 shally verma : >> Interesting part is I dont see "encoded" under flags. I couldn't >> understand if flags are retrieved from btrfs file metadata info. As >> you are running on 4.14 and I am on 4.9 >> >> So, am still under doubt, even with dd if files are getting compressed. >> >> What is the filesize shown if you run >> btrfs fi du /mnt/test0.0 .. is it less or actual size? >> >> Is there any command that i can run to confirm file has been compressed? >> >> So far, I had my prints enabled in kernel/fs/btrfs/compression.c and >> check in dmesg that code jumped to compress_page() func. >> >> Thanks >> Shally >> > > Okay, lets play different. > encoded work for last several years for kernel releases, so you must see that. > > Reproduction script: > #!/bin/bash -e > > FILE_NAME=$RANDOM$RANDOM > TMP_DIR=$(mktemp -d) > IMAGE_FILE="$HOME/$FILE_NAME" > > truncate -s 4G $IMAGE_FILE > mkfs.btrfs -m single -L COMPRESS_TEST $IMAGE_FILE > mount -o compress-force $IMAGE_FILE $TMP_DIR > dd if=/dev/zero bs=128K count=2 of=$TMP_DIR/zero > sync > filefrag -v $TMP_DIR/zero > umount $TMP_DIR > rm -v $IMAGE_FILE > > Example output: > ~ sudo ./btrfs_compress_test.sh > btrfs-progs v4.13 > See http://btrfs.wiki.kernel.org for more information. > > Label: COMPRESS_TEST > UUID: abfedc39-dd94-4105-87d6-49eedb13467f > Node size: 16384 > Sector size:4096 > Filesystem size:4.00GiB > Block group profiles: > Data: single8.00MiB > Metadata: single8.00MiB > System: single4.00MiB > SSD detected: no > Incompat features: extref, skinny-metadata > Number of devices: 1 > Devices: > IDSIZE PATH >1 4.00GiB /root/322906281 > > 2+0 records in > 2+0 records out > 262144 bytes (262 kB, 256 KiB) copied, 0.000197746 s, 1.3 GB/s > Filesystem type is: 9123683e > File size of /tmp/tmp.bDyt3EkEG5/zero is 262144 (64 blocks of 4096 bytes) > ext: logical_offset:physical_offset: length: expected: flags: > 0:0.. 31: 3072.. 3103: 32: encoded > 1: 32.. 63: 3073.. 3104: 32: 3104: > last,encoded,eof > /tmp/tmp.bDyt3EkEG5/zero: 2 extents found > removed '/root/322906281' > > Good luck. Here's my output - Everything is same except: 1. nodesize and sector size = 64K 2. extent length = 2 3. I don't see "encoded" in filefrag here. btrfs-progs v4.13 See http://btrfs.wiki.kernel.org for more information. Label: COMPRESS_TEST UUID: fad6907e-d4eb-4dbb-9014-3918a822c9ce Node size: 65536 Sector size:65536 Filesystem size:4.00GiB Block group profiles: Data: single8.00MiB Metadata: single8.00MiB System: single4.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: IDSIZE PATH 1 4.00GiB /root/2808626087 2+0 records in 2+0 records out 262144 bytes (262 kB) copied, 0.00028777 s, 911 MB/s Filesystem type is: 9123683e File size of /tmp/tmp.346ESCdOIi/zero is 262144 (4 blocks of 65536 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 1:192.. 193: 2: 1:2.. 3:193.. 194: 2:194: eof /tmp/tmp.346ESCdOIi/zero: 2 extents found removed '/root/2808626087' And this is my dmesg [170127.417119] BTRFS: device label COMPRESS_TEST devid 1 transid 5 /dev/loop0 [170127.417493] BTRFS info (device loop0): force zlib compression [170127.417496] BTRFS info (device loop0): disk space caching is enabled [170127.417499] BTRFS info (device loop0): has skinny extents [170127.425858] BTRFS info (device loop0): creating UUID tree This is fio --version fio-3.0 What do we doubt here? Thanks Shally > -- > Have a nice day, > Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Mon, Sep 18, 2017 at 01:06:45PM +0530, shally verma wrote: > Hi > > I wanted to test btrfs compression using fio command but somehow > during fio writes, I don't see code taking route of compression blocks > where as If I do a copy to btrfs compression enabled mount point then > I can easily see code falling through compression.c. > > Here's how I do my setup > > 1. mkfs.btrfs /dev/sdb1 > 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt > 3. cp /mnt > 4. dmesg shows print staments from compression.c and zlib.c confirming > compression routine was invoked during write > 5. now, copy back from btrfs mount point to home directory also shows > decompress call invokation > > Now, try same with fio commands: > > fio command > > fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 > --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 > --name=test --size=10G --runtime=180 --time_based fio by default uses fallocate (posix_falloc) to pre-allocate space for the later writes, and PREALLOC path overrides compression path. Like others mentioned, after fio and sync, you'll see 'encoded' in filefrag -v your_file. thanks, -liubo > > But it seems to write uncompressed data. > > Any help here? what's missing? > > Thanks > Shally > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
shally verma posted on Wed, 20 Sep 2017 16:40:15 +0530 as excerpted: > Is there any command that i can run to confirm file has been compressed? There is the quite recently posted (and actively updated since then) compsize command. https://github.com/kilobyte/compsize -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Wed, Sep 20, 2017 at 4:00 PM, Timofey Titovetswrote: > 2017-09-20 13:13 GMT+03:00 shally verma : >> On Wed, Sep 20, 2017 at 3:06 PM, Timofey Titovets >> wrote: >>> 2017-09-20 11:59 GMT+03:00 shally verma : On Wed, Sep 20, 2017 at 2:25 PM, Timofey Titovets wrote: > 2017-09-20 11:44 GMT+03:00 shally verma : >> One more catch... I am initiating fio from non-btrfs filesystem i.e. >> pwd is ext4 based fs where as mount point is btrfs. >> Could that make difference? >> >>> Thanks >>> Shally > > Only matter are where you store test file =) > If you store test file on btrfs, pwd does nothing. > Then steps listed in previous mail should work right? Am listing them again here: " > > 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt > > 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k > --rw=write --iodepth=128 --name=test --size=1G > --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer > --ioengine=libaio > > 1GN file written uncompressed. Here no compression invoked (though > compress-force=zlib) > > 3. cp mnt/test ./ --> copy back fio generated test file from btrfs > mount point to local drive > > 4. hex dump test file (all FFs) -- confirmed that data is compressible > no random data. > > 5. cp test mnt/ --> now, copy same test again back to mount point > (reverse of step 3) . Now, here I see during copying compression is > invoked. > > I am using kernel 4.9 and compress-foce is said to be working for > kernel > 2.13 from wiki ... so I wonder what's so special with cp > command which is not happening during fio writes??? "- Thanks Shally > -- > Have a nice day, > Timofey. >>> >>> I did try reproduce your problem on 4.14, i'm hard to explain that >>> happens and why file created by FIO will not use compression. >>> >>> Suggest workaround: >>> rm -v /mnt/test.0.0 >>> dd if=/dev/zero of=/mnt/test.0.0 bs=1M count=1024 >> >> You mean "compression" using "dd" write. >> >>> fio --directory=$HOME/test/ --numjobs=1 --direct=1 --buffered=0 >>> --bs=64k --rw=write --iodepth=128 --name=test --size=1G >>> --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer >>> --ioengine=libaio >> >> You mean "no compression" during fio writes and you see similar behavior as >> I? >> BTW here --direct=0 and --buffered = 1. >> >>> >>> You can check if data compressed by filefrag -v /mnt/test.0.0 (you >>> will see encoded extents) >>> >> if I do filefrag on mnt/test.0.0, then it shows up like this: >> ext: logical_offset:physical_offset: length: expected: flags: >>0:0.. 1:1329922.. 1329923: 2: >>1:2.. 3:1329923.. 1329924: 2:1329924: >>2:4.. 5:1329924.. 1329925: 2:1329925: >> >> An new to this, how does it says extents are encoded here? >> >> Thanks >> Shally >> >>> -- >>> Have a nice day, >>> Timofey. > > Yes, i was reproduce your problem, new file created by fio not compressed. > So i write you about workaround, because i don't want to deep dig in > that problem. > > Example of encoded: > ~ dd if=/dev/zero of=./test.0.0 bs=1M count=1M count=1; sync > ~ filefrag -v test.0.0 > Filesystem type is: 9123683e > File size of test.0.0 is 1048576 (256 blocks of 4096 bytes) > ext: logical_offset:physical_offset: length: expected: flags: > 0:0.. 31: 72779979.. 72780010: 32: encoded > 1: 32.. 63: 72751509.. 72751540: 32: 72780011: encoded > 2: 64.. 95: 72755246.. 72755277: 32: 72751541: encoded > 3: 96.. 127: 72755610.. 72755641: 32: 72755278: encoded > 4: 128.. 159: 72755631.. 72755662: 32: 72755642: encoded > 5: 160.. 191: 72755722.. 72755753: 32: 72755663: encoded > 6: 192.. 223: 72757862.. 72757893: 32: 72755754: encoded > 7: 224.. 255: 72769455.. 72769486: 32: 72757894: > last,encoded,eof > test.0.0: 8 extents found > Interesting part is I dont see "encoded" under flags. I couldn't understand if flags are retrieved from btrfs file metadata info. As you are running on 4.14 and I am on 4.9 So, am still under doubt, even with dd if files are getting compressed. What is the filesize shown if you run btrfs fi du /mnt/test0.0 .. is it less or actual size? Is there any command that i can run to confirm file has been compressed? So far, I had my prints enabled in kernel/fs/btrfs/compression.c and check in dmesg that code jumped to compress_page() func. Thanks Shally > -- > Have a nice day, > Timofey. -- To
Re: using fio to test btrfs compression
2017-09-20 13:13 GMT+03:00 shally verma: > On Wed, Sep 20, 2017 at 3:06 PM, Timofey Titovets > wrote: >> 2017-09-20 11:59 GMT+03:00 shally verma : >>> On Wed, Sep 20, 2017 at 2:25 PM, Timofey Titovets >>> wrote: 2017-09-20 11:44 GMT+03:00 shally verma : > One more catch... I am initiating fio from non-btrfs filesystem i.e. > pwd is ext4 based fs where as mount point is btrfs. > Could that make difference? > >> Thanks >> Shally Only matter are where you store test file =) If you store test file on btrfs, pwd does nothing. >>> >>> Then steps listed in previous mail should work right? Am listing them >>> again here: >>> >>> " 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k --rw=write --iodepth=128 --name=test --size=1G --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer --ioengine=libaio 1GN file written uncompressed. Here no compression invoked (though compress-force=zlib) 3. cp mnt/test ./ --> copy back fio generated test file from btrfs mount point to local drive 4. hex dump test file (all FFs) -- confirmed that data is compressible no random data. 5. cp test mnt/ --> now, copy same test again back to mount point (reverse of step 3) . Now, here I see during copying compression is invoked. I am using kernel 4.9 and compress-foce is said to be working for kernel > 2.13 from wiki ... so I wonder what's so special with cp command which is not happening during fio writes??? >>> >>> "- >>> >>> Thanks >>> Shally >>> >>> -- Have a nice day, Timofey. >> >> I did try reproduce your problem on 4.14, i'm hard to explain that >> happens and why file created by FIO will not use compression. >> >> Suggest workaround: >> rm -v /mnt/test.0.0 >> dd if=/dev/zero of=/mnt/test.0.0 bs=1M count=1024 > > You mean "compression" using "dd" write. > >> fio --directory=$HOME/test/ --numjobs=1 --direct=1 --buffered=0 >> --bs=64k --rw=write --iodepth=128 --name=test --size=1G >> --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer >> --ioengine=libaio > > You mean "no compression" during fio writes and you see similar behavior as I? > BTW here --direct=0 and --buffered = 1. > >> >> You can check if data compressed by filefrag -v /mnt/test.0.0 (you >> will see encoded extents) >> > if I do filefrag on mnt/test.0.0, then it shows up like this: > ext: logical_offset:physical_offset: length: expected: flags: >0:0.. 1:1329922.. 1329923: 2: >1:2.. 3:1329923.. 1329924: 2:1329924: >2:4.. 5:1329924.. 1329925: 2:1329925: > > An new to this, how does it says extents are encoded here? > > Thanks > Shally > >> -- >> Have a nice day, >> Timofey. Yes, i was reproduce your problem, new file created by fio not compressed. So i write you about workaround, because i don't want to deep dig in that problem. Example of encoded: ~ dd if=/dev/zero of=./test.0.0 bs=1M count=1M count=1; sync ~ filefrag -v test.0.0 Filesystem type is: 9123683e File size of test.0.0 is 1048576 (256 blocks of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 31: 72779979.. 72780010: 32: encoded 1: 32.. 63: 72751509.. 72751540: 32: 72780011: encoded 2: 64.. 95: 72755246.. 72755277: 32: 72751541: encoded 3: 96.. 127: 72755610.. 72755641: 32: 72755278: encoded 4: 128.. 159: 72755631.. 72755662: 32: 72755642: encoded 5: 160.. 191: 72755722.. 72755753: 32: 72755663: encoded 6: 192.. 223: 72757862.. 72757893: 32: 72755754: encoded 7: 224.. 255: 72769455.. 72769486: 32: 72757894: last,encoded,eof test.0.0: 8 extents found -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Wed, Sep 20, 2017 at 3:06 PM, Timofey Titovetswrote: > 2017-09-20 11:59 GMT+03:00 shally verma : >> On Wed, Sep 20, 2017 at 2:25 PM, Timofey Titovets >> wrote: >>> 2017-09-20 11:44 GMT+03:00 shally verma : One more catch... I am initiating fio from non-btrfs filesystem i.e. pwd is ext4 based fs where as mount point is btrfs. Could that make difference? > Thanks > Shally >>> >>> Only matter are where you store test file =) >>> If you store test file on btrfs, pwd does nothing. >>> >> >> Then steps listed in previous mail should work right? Am listing them >> again here: >> >> " >>> >>> 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt >>> >>> 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k >>> --rw=write --iodepth=128 --name=test --size=1G >>> --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer >>> --ioengine=libaio >>> >>> 1GN file written uncompressed. Here no compression invoked (though >>> compress-force=zlib) >>> >>> 3. cp mnt/test ./ --> copy back fio generated test file from btrfs >>> mount point to local drive >>> >>> 4. hex dump test file (all FFs) -- confirmed that data is compressible >>> no random data. >>> >>> 5. cp test mnt/ --> now, copy same test again back to mount point >>> (reverse of step 3) . Now, here I see during copying compression is >>> invoked. >>> >>> I am using kernel 4.9 and compress-foce is said to be working for >>> kernel > 2.13 from wiki ... so I wonder what's so special with cp >>> command which is not happening during fio writes??? >> >> "- >> >> Thanks >> Shally >> >> >>> -- >>> Have a nice day, >>> Timofey. > > I did try reproduce your problem on 4.14, i'm hard to explain that > happens and why file created by FIO will not use compression. > > Suggest workaround: > rm -v /mnt/test.0.0 > dd if=/dev/zero of=/mnt/test.0.0 bs=1M count=1024 You mean "compression" using "dd" write. > fio --directory=$HOME/test/ --numjobs=1 --direct=1 --buffered=0 > --bs=64k --rw=write --iodepth=128 --name=test --size=1G > --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer > --ioengine=libaio You mean "no compression" during fio writes and you see similar behavior as I? BTW here --direct=0 and --buffered = 1. > > You can check if data compressed by filefrag -v /mnt/test.0.0 (you > will see encoded extents) > if I do filefrag on mnt/test.0.0, then it shows up like this: ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 1:1329922.. 1329923: 2: 1:2.. 3:1329923.. 1329924: 2:1329924: 2:4.. 5:1329924.. 1329925: 2:1329925: An new to this, how does it says extents are encoded here? Thanks Shally > -- > Have a nice day, > Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
2017-09-20 11:59 GMT+03:00 shally verma: > On Wed, Sep 20, 2017 at 2:25 PM, Timofey Titovets > wrote: >> 2017-09-20 11:44 GMT+03:00 shally verma : >>> One more catch... I am initiating fio from non-btrfs filesystem i.e. >>> pwd is ext4 based fs where as mount point is btrfs. >>> Could that make difference? >>> Thanks Shally >> >> Only matter are where you store test file =) >> If you store test file on btrfs, pwd does nothing. >> > > Then steps listed in previous mail should work right? Am listing them > again here: > > " >> >> 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt >> >> 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k >> --rw=write --iodepth=128 --name=test --size=1G >> --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer >> --ioengine=libaio >> >> 1GN file written uncompressed. Here no compression invoked (though >> compress-force=zlib) >> >> 3. cp mnt/test ./ --> copy back fio generated test file from btrfs >> mount point to local drive >> >> 4. hex dump test file (all FFs) -- confirmed that data is compressible >> no random data. >> >> 5. cp test mnt/ --> now, copy same test again back to mount point >> (reverse of step 3) . Now, here I see during copying compression is >> invoked. >> >> I am using kernel 4.9 and compress-foce is said to be working for >> kernel > 2.13 from wiki ... so I wonder what's so special with cp >> command which is not happening during fio writes??? > > "- > > Thanks > Shally > > >> -- >> Have a nice day, >> Timofey. I did try reproduce your problem on 4.14, i'm hard to explain that happens and why file created by FIO will not use compression. Suggest workaround: rm -v /mnt/test.0.0 dd if=/dev/zero of=/mnt/test.0.0 bs=1M count=1024 fio --directory=$HOME/test/ --numjobs=1 --direct=1 --buffered=0 --bs=64k --rw=write --iodepth=128 --name=test --size=1G --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer --ioengine=libaio You can check if data compressed by filefrag -v /mnt/test.0.0 (you will see encoded extents) -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Wed, Sep 20, 2017 at 2:25 PM, Timofey Titovetswrote: > 2017-09-20 11:44 GMT+03:00 shally verma : >> One more catch... I am initiating fio from non-btrfs filesystem i.e. >> pwd is ext4 based fs where as mount point is btrfs. >> Could that make difference? >> >>> Thanks >>> Shally > > Only matter are where you store test file =) > If you store test file on btrfs, pwd does nothing. > Then steps listed in previous mail should work right? Am listing them again here: " > > 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt > > 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k > --rw=write --iodepth=128 --name=test --size=1G > --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer > --ioengine=libaio > > 1GN file written uncompressed. Here no compression invoked (though > compress-force=zlib) > > 3. cp mnt/test ./ --> copy back fio generated test file from btrfs > mount point to local drive > > 4. hex dump test file (all FFs) -- confirmed that data is compressible > no random data. > > 5. cp test mnt/ --> now, copy same test again back to mount point > (reverse of step 3) . Now, here I see during copying compression is > invoked. > > I am using kernel 4.9 and compress-foce is said to be working for > kernel > 2.13 from wiki ... so I wonder what's so special with cp > command which is not happening during fio writes??? "- Thanks Shally > -- > Have a nice day, > Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
2017-09-20 11:44 GMT+03:00 shally verma: > One more catch... I am initiating fio from non-btrfs filesystem i.e. > pwd is ext4 based fs where as mount point is btrfs. > Could that make difference? > >> Thanks >> Shally Only matter are where you store test file =) If you store test file on btrfs, pwd does nothing. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Wed, Sep 20, 2017 at 2:06 PM, shally verma <shallyvermacav...@gmail.com> wrote: > On Mon, Sep 18, 2017 at 7:11 PM, Timofey Titovets <nefelim...@gmail.com> > wrote: >> 2017-09-18 16:28 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: >>> On Mon, Sep 18, 2017 at 1:56 PM, Timofey Titovets <nefelim...@gmail.com> >>> wrote: >>>> 2017-09-18 10:36 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: >>>>> Hi >>>>> >>>>> I wanted to test btrfs compression using fio command but somehow >>>>> during fio writes, I don't see code taking route of compression blocks >>>>> where as If I do a copy to btrfs compression enabled mount point then >>>>> I can easily see code falling through compression.c. >>>>> >>>>> Here's how I do my setup >>>>> >>>>> 1. mkfs.btrfs /dev/sdb1 >>>>> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt >>>>> 3. cp /mnt >>>>> 4. dmesg shows print staments from compression.c and zlib.c confirming >>>>> compression routine was invoked during write >>>>> 5. now, copy back from btrfs mount point to home directory also shows >>>>> decompress call invokation >>>>> >>>>> Now, try same with fio commands: >>>>> >>>>> fio command >>>>> >>>>> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 >>>>> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 >>>>> --name=test --size=10G --runtime=180 --time_based >>>>> >>>>> But it seems to write uncompressed data. >>>>> >>>>> Any help here? what's missing? >>>>> >>>>> Thanks >>>>> Shally >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majord...@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib >>>> 2. Tune fio to generate compressible data >>>> >>> How do I "tune" fio to generate data. I had assumed once compression >>> is enabled on btrfs any system fwrite call will simply compress data >>> into it .Isn't it so? >>> Can you share fio command that I can test? >>> Thanks >>> Shally >>>> >>>> -- >>>> Have a nice day, >>>> Timofey. >> >> That useless to compress uncompressible data. >> Also, as you enable compress, not compress-force >> So after first uncompressible write btrfs just stop trying compress that >> file. >> >> From man fio: >> buffer_compress_percentage=int >> If this is set, then fio will attempt to provide I/O >> buffer content (on WRITEs) that compresses to the specified level. Fio >> does this by providing a mix of random data and a fixed pattern. The >> fixed pattern is either >> zeros, or the pattern specified by buffer_pattern. If >> the pattern option is used, it might skew the compression ratio >> slightly. Note that this is per block size unit, for file/disk wide >> compression level that matches this >> setting, you'll also want to set refill_buffers. >> >> buffer_compress_chunk=int >> See buffer_compress_percentage. This setting allows fio >> to manage how big the ranges of random data and zeroed data is. >> Without this set, fio will provide buffer_compress_percentage of >> blocksize random data, followed by >> the remaining zeroed. With this set to some chunk size >> smaller than the block size, fio can alternate random and zeroed data >> throughout the I/O buffer. >> >> Good luck :) > > Now. I did following: > > 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt > > 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k > --rw=write --iodepth=128 --name=test --size=1G > --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer > --ioengine=libaio > > 1GN file written uncompressed. Here no compression invoked (though > compress-force=zlib) > > 3. cp mnt/test ./ --> copy back fio generated test file from btrfs > mount point to local drive > > 4. hex dump test file (all FFs) -- confirmed that data is compressible > no random data. > > 5. cp test mnt/ --> now, copy same test again back to mount point > (reverse of step 3) . Now, here I see during copying compression is > invoked. > > I am using kernel 4.9 and compress-foce is said to be working for > kernel > 2.13 from wiki ... so I wonder what's so special with cp > command which is not happening during fio writes??? > One more catch... I am initiating fio from non-btrfs filesystem i.e. pwd is ext4 based fs where as mount point is btrfs. Could that make difference? > Thanks > Shally > > >> -- >> Have a nice day, >> Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Mon, Sep 18, 2017 at 7:11 PM, Timofey Titovets <nefelim...@gmail.com> wrote: > 2017-09-18 16:28 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: >> On Mon, Sep 18, 2017 at 1:56 PM, Timofey Titovets <nefelim...@gmail.com> >> wrote: >>> 2017-09-18 10:36 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: >>>> Hi >>>> >>>> I wanted to test btrfs compression using fio command but somehow >>>> during fio writes, I don't see code taking route of compression blocks >>>> where as If I do a copy to btrfs compression enabled mount point then >>>> I can easily see code falling through compression.c. >>>> >>>> Here's how I do my setup >>>> >>>> 1. mkfs.btrfs /dev/sdb1 >>>> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt >>>> 3. cp /mnt >>>> 4. dmesg shows print staments from compression.c and zlib.c confirming >>>> compression routine was invoked during write >>>> 5. now, copy back from btrfs mount point to home directory also shows >>>> decompress call invokation >>>> >>>> Now, try same with fio commands: >>>> >>>> fio command >>>> >>>> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 >>>> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 >>>> --name=test --size=10G --runtime=180 --time_based >>>> >>>> But it seems to write uncompressed data. >>>> >>>> Any help here? what's missing? >>>> >>>> Thanks >>>> Shally >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majord...@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib >>> 2. Tune fio to generate compressible data >>> >> How do I "tune" fio to generate data. I had assumed once compression >> is enabled on btrfs any system fwrite call will simply compress data >> into it .Isn't it so? >> Can you share fio command that I can test? >> Thanks >> Shally >>> >>> -- >>> Have a nice day, >>> Timofey. > > That useless to compress uncompressible data. > Also, as you enable compress, not compress-force > So after first uncompressible write btrfs just stop trying compress that file. > > From man fio: > buffer_compress_percentage=int > If this is set, then fio will attempt to provide I/O > buffer content (on WRITEs) that compresses to the specified level. Fio > does this by providing a mix of random data and a fixed pattern. The > fixed pattern is either > zeros, or the pattern specified by buffer_pattern. If > the pattern option is used, it might skew the compression ratio > slightly. Note that this is per block size unit, for file/disk wide > compression level that matches this > setting, you'll also want to set refill_buffers. > > buffer_compress_chunk=int > See buffer_compress_percentage. This setting allows fio > to manage how big the ranges of random data and zeroed data is. > Without this set, fio will provide buffer_compress_percentage of > blocksize random data, followed by > the remaining zeroed. With this set to some chunk size > smaller than the block size, fio can alternate random and zeroed data > throughout the I/O buffer. > > Good luck :) Now. I did following: 1. mount -t btrfs -o compress-force=zlib /dev/sdb1 mnt 2. fio --directory=mnt/ --numjobs=1 --direct=0 --buffered=1 --bs=64k --rw=write --iodepth=128 --name=test --size=1G --buffer_compress_percentage=100 --buffer_pattern=0xFF --refill_buffer --ioengine=libaio 1GN file written uncompressed. Here no compression invoked (though compress-force=zlib) 3. cp mnt/test ./ --> copy back fio generated test file from btrfs mount point to local drive 4. hex dump test file (all FFs) -- confirmed that data is compressible no random data. 5. cp test mnt/ --> now, copy same test again back to mount point (reverse of step 3) . Now, here I see during copying compression is invoked. I am using kernel 4.9 and compress-foce is said to be working for kernel > 2.13 from wiki ... so I wonder what's so special with cp command which is not happening during fio writes??? Thanks Shally > -- > Have a nice day, > Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
2017-09-18 16:28 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: > On Mon, Sep 18, 2017 at 1:56 PM, Timofey Titovets <nefelim...@gmail.com> > wrote: >> 2017-09-18 10:36 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: >>> Hi >>> >>> I wanted to test btrfs compression using fio command but somehow >>> during fio writes, I don't see code taking route of compression blocks >>> where as If I do a copy to btrfs compression enabled mount point then >>> I can easily see code falling through compression.c. >>> >>> Here's how I do my setup >>> >>> 1. mkfs.btrfs /dev/sdb1 >>> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt >>> 3. cp /mnt >>> 4. dmesg shows print staments from compression.c and zlib.c confirming >>> compression routine was invoked during write >>> 5. now, copy back from btrfs mount point to home directory also shows >>> decompress call invokation >>> >>> Now, try same with fio commands: >>> >>> fio command >>> >>> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 >>> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 >>> --name=test --size=10G --runtime=180 --time_based >>> >>> But it seems to write uncompressed data. >>> >>> Any help here? what's missing? >>> >>> Thanks >>> Shally >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib >> 2. Tune fio to generate compressible data >> > How do I "tune" fio to generate data. I had assumed once compression > is enabled on btrfs any system fwrite call will simply compress data > into it .Isn't it so? > Can you share fio command that I can test? > Thanks > Shally >> >> -- >> Have a nice day, >> Timofey. That useless to compress uncompressible data. Also, as you enable compress, not compress-force So after first uncompressible write btrfs just stop trying compress that file. >From man fio: buffer_compress_percentage=int If this is set, then fio will attempt to provide I/O buffer content (on WRITEs) that compresses to the specified level. Fio does this by providing a mix of random data and a fixed pattern. The fixed pattern is either zeros, or the pattern specified by buffer_pattern. If the pattern option is used, it might skew the compression ratio slightly. Note that this is per block size unit, for file/disk wide compression level that matches this setting, you'll also want to set refill_buffers. buffer_compress_chunk=int See buffer_compress_percentage. This setting allows fio to manage how big the ranges of random data and zeroed data is. Without this set, fio will provide buffer_compress_percentage of blocksize random data, followed by the remaining zeroed. With this set to some chunk size smaller than the block size, fio can alternate random and zeroed data throughout the I/O buffer. Good luck :) -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
On Mon, Sep 18, 2017 at 1:56 PM, Timofey Titovets <nefelim...@gmail.com> wrote: > 2017-09-18 10:36 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: >> Hi >> >> I wanted to test btrfs compression using fio command but somehow >> during fio writes, I don't see code taking route of compression blocks >> where as If I do a copy to btrfs compression enabled mount point then >> I can easily see code falling through compression.c. >> >> Here's how I do my setup >> >> 1. mkfs.btrfs /dev/sdb1 >> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt >> 3. cp /mnt >> 4. dmesg shows print staments from compression.c and zlib.c confirming >> compression routine was invoked during write >> 5. now, copy back from btrfs mount point to home directory also shows >> decompress call invokation >> >> Now, try same with fio commands: >> >> fio command >> >> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 >> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 >> --name=test --size=10G --runtime=180 --time_based >> >> But it seems to write uncompressed data. >> >> Any help here? what's missing? >> >> Thanks >> Shally >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib > 2. Tune fio to generate compressible data > How do I "tune" fio to generate data. I had assumed once compression is enabled on btrfs any system fwrite call will simply compress data into it .Isn't it so? Can you share fio command that I can test? Thanks Shally > > -- > Have a nice day, > Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using fio to test btrfs compression
2017-09-18 10:36 GMT+03:00 shally verma <shallyvermacav...@gmail.com>: > Hi > > I wanted to test btrfs compression using fio command but somehow > during fio writes, I don't see code taking route of compression blocks > where as If I do a copy to btrfs compression enabled mount point then > I can easily see code falling through compression.c. > > Here's how I do my setup > > 1. mkfs.btrfs /dev/sdb1 > 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt > 3. cp /mnt > 4. dmesg shows print staments from compression.c and zlib.c confirming > compression routine was invoked during write > 5. now, copy back from btrfs mount point to home directory also shows > decompress call invokation > > Now, try same with fio commands: > > fio command > > fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 > --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 > --name=test --size=10G --runtime=180 --time_based > > But it seems to write uncompressed data. > > Any help here? what's missing? > > Thanks > Shally > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib 2. Tune fio to generate compressible data -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
using fio to test btrfs compression
Hi I wanted to test btrfs compression using fio command but somehow during fio writes, I don't see code taking route of compression blocks where as If I do a copy to btrfs compression enabled mount point then I can easily see code falling through compression.c. Here's how I do my setup 1. mkfs.btrfs /dev/sdb1 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt 3. cp /mnt 4. dmesg shows print staments from compression.c and zlib.c confirming compression routine was invoked during write 5. now, copy back from btrfs mount point to home directory also shows decompress call invokation Now, try same with fio commands: fio command fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1 --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128 --name=test --size=10G --runtime=180 --time_based But it seems to write uncompressed data. Any help here? what's missing? Thanks Shally -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
Hello again list. I thought I would clear the things out and describe what is happening with my troubled RAID setup. So having received the help from the list, I've initially run the full defragmentation of all the data and recompressed everything with zlib. That didn't help. Then I run the full rebalance of the data and that didn't help either. So I had to take a disk out of the raid, copy all the data onto it, recreate the RAID drive with 32kb chunk size and 96kb stripe and copied the data back. Then added the disk back and resynced the raid. So currently the RAID device is Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name: RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size: 21.830 TB Sector Size : 512 Is VD emulated : Yes Parity Size : 7.276 TB State : Optimal Strip Size : 32 KB Number Of Drives: 4 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Bad Blocks Exist: No Is VD Cached: No It is about 40% full with compressed data # btrfs fi usage /mnt/arh-backup1/ Overall: Device size: 21.83TiB Device allocated: 8.98TiB Device unallocated: 12.85TiB Device missing: 0.00B Used: 8.98TiB Free (estimated): 12.85TiB (min: 6.43TiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) I've decided to run a set of test, where 5 gb file was created using different blocksizes and different flags. one file with urandom data was generated and another one filled with zeroes. the data was written with compression and without compression, and it seems that without compression it is possible to gain 30-40% speed, while the cpu was running at 50% idle during the highest loads. dd write speeds (mb/s) flags: conv=fsync compress-force=zlib compress-force=none RAND ZERORAND ZERO bs1024k 387 407 584 577 bs512k 389 414 532 547 bs256k 412 409 558 585 bs128k 412 403 572 583 bs64k409 419 563 574 bs32k407 404 569 572 flags: oflag=sync compress-force=zlib compress-force=none RAND ZERORAND ZERO bs1024k 86.1 97.0203 210 bs512k 50.6 64.485.0 170 bs256k 25.0 29.867.6 67.5 bs128k 13.2 16.448.4 49.8 bs64k7.4 8.3 24.5 27.9 bs32k3.8 4.1 14.0 13.7 flags: no flags compress-force=zlib compress-force=none RAND ZERORAND ZERO bs1024k 480 419 681 595 bs512k 422 412 633 585 bs256k 413 384 707 712 bs128k 414 387 695 704 bs64k482 467 622 587 bs32k416 412 610 598 I have also run a test where I filled the array to about 97% capacity and the write speed went down by about 50% compared with the empty RAID. thanks for the help. - Original Message - From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk> To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org> Sent: Tuesday, 1 August, 2017 10:09:03 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage >> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe >> size". [ ... ] several back-to-back 128KiB writes [ ... ] get >> merged by the 3ware firmware only if it has a persistent >> cache, and maybe your 3ware does not have one, > KOS: No I don't have persistent cache. Only the 512 Mb cache > on board of a controller, that is BBU. If it is a persistent cache, that can be battery-backed (as I wrote, but it seems that you don't have too much time to read replies) then the size of the write, 128KiB or not, should not matter much; the write will be reported complete when it hits the persistent cache (whichever technology it used), and then the HA fimware will spill write cached data to the disks using the optimal operation width. Unless the 3ware firmware is really terrible (and depending on model and vintage it can be amazingly terrible) or the battery is no longer recharging and then the host adapter switches to write-through. That you see very different rates between uncompressed and compressed writes, where the main difference is the limitation on the segment size, seems to indicate that compressed writes involve a lot of RMW, that is sub-stripe updates. As I mentioned already, it would be interesting to retry 'dd' with different 'bs' values without compression and with 'sync' (or 'direct' which only makes sense without compression). > If I had additional SSD caching o
Re: Btrfs + compression = slow performance and high cpu usage
[ ... ] > This is the "storage for beginners" version, what happens in > practice however depends a lot on specific workload profile > (typical read/write size and latencies and rates), caching and > queueing algorithms in both Linux and the HA firmware. To add a bit of slightly more advanced discussion, the main reason for larger strips ("chunk size) is to avoid the huge latencies of disk rotation using unsynchronized disk drives, as detailed here: http://www.sabi.co.uk/blog/12-thr.html?120310#120310 That relates weakly to Btrfs. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
>> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe >> size". [ ... ] several back-to-back 128KiB writes [ ... ] get >> merged by the 3ware firmware only if it has a persistent >> cache, and maybe your 3ware does not have one, > KOS: No I don't have persistent cache. Only the 512 Mb cache > on board of a controller, that is BBU. If it is a persistent cache, that can be battery-backed (as I wrote, but it seems that you don't have too much time to read replies) then the size of the write, 128KiB or not, should not matter much; the write will be reported complete when it hits the persistent cache (whichever technology it used), and then the HA fimware will spill write cached data to the disks using the optimal operation width. Unless the 3ware firmware is really terrible (and depending on model and vintage it can be amazingly terrible) or the battery is no longer recharging and then the host adapter switches to write-through. That you see very different rates between uncompressed and compressed writes, where the main difference is the limitation on the segment size, seems to indicate that compressed writes involve a lot of RMW, that is sub-stripe updates. As I mentioned already, it would be interesting to retry 'dd' with different 'bs' values without compression and with 'sync' (or 'direct' which only makes sense without compression). > If I had additional SSD caching on the controller I would have > mentioned it. So far you had not mentioned the presence of BBU cache either, which is equivalent, even if in one of your previous message (which I try to read carefully) there were these lines: Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU So perhaps someone else would have checked long ago the status of the BBU and whether the "No Write Cache if Bad BBU" case has happened. If the BBU is still working and the policy is still "WriteBack" then things are stranger still. > I was also under impression, that in a situation where mostly > extra large files will be stored on the massive, the bigger > strip size would indeed increase the speed, thus I went with > with the 256 Kb strip size. That runs counter to this simple story: suppose a program is doing 64KiB IO: * For *reads*, there are 4 data drives and the strip size is 16KiB: the 64KiB will be read in parallel on 4 drives. If the strip size is 256KiB then the 64KiB will be read sequentially from just one disk, and 4 successive reads will be read sequentially from the same drive. * For *writes* on a parity RAID like RAID5 things are much, much more extreme: the 64KiB will be written with 16KiB strips on a 5-wide RAID5 set in parallel to 5 drives, with 4 stripes being updated with RMW. But with 256KiB strips it will partially update 5 drives, because the stripe is 1024+256KiB, and it needs to do RMW, and four successive 64KiB drives will need to do that too, even if only one drive is updated. Usually for RAID5 there is an optimization that means that only the specific target drive and the parity drives(s) need RMW, but it is still very expensive. This is the "storage for beginners" version, what happens in practice however depends a lot on specific workload profile (typical read/write size and latencies and rates), caching and queueing algorithms in both Linux and the HA firmware. > Would I be correct in assuming that the RAID strip size of 128 > Kb will be a better choice if one plans to use the BTRFS with > compression? That would need to be tested, because of "depends a lot on specific workload profile, caching and queueing algorithms", but my expectation is the the lower the better. Given that you have 4 drives giving a 3+1 RAID set, perhaps a 32KiB or 64KiB strip size, given a data stripe size of 96KiB or 192KiB, would be better. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
- Original Message - From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk> To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org> Sent: Tuesday, 1 August, 2017 3:14:07 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage > Peter, I don't think the filefrag is showing the correct > fragmentation status of the file when the compression is used. As I wrote, "their size is just limited by the compression code" which results in "128KiB writes". On a "fresh empty Btrfs volume" the compressed extents limited to 128KiB also happen to be pretty physically contiguous, but on a more fragmented free space list they can be more scattered. KOS: Ok, thanks for pointing it out. I have compared the filefrag -v on another btrfs that is not fragmented and see the difference with what is happening on the sluggish one. 5824: 186368.. 186399: 2430093383..2430093414: 32: 2430093414: encoded 5825: 186400.. 186431: 2430093384..2430093415: 32: 2430093415: encoded 5826: 186432.. 186463: 2430093385..2430093416: 32: 2430093416: encoded 5827: 186464.. 186495: 2430093386..2430093417: 32: 2430093417: encoded 5828: 186496.. 186527: 2430093387..2430093418: 32: 2430093418: encoded 5829: 186528.. 186559: 2430093388..2430093419: 32: 2430093419: encoded 5830: 186560.. 186591: 2430093389..2430093420: 32: 2430093420: encoded As I already wrote the main issue here seems to be that we are talking about a "RAID5 with 128KiB writes and a 768KiB stripe size". On MD RAID5 the slowdown because of RMW seems only to be around 30-40%, but it looks like that several back-to-back 128KiB writes get merged by the Linux IO subsystem (not sure whether that's thoroughly legal), and perhaps they get merged by the 3ware firmware only if it has a persistent cache, and maybe your 3ware does not have one, but you have kept your counsel as to that. KOS: No I don't have persistent cache. Only the 512 Mb cache on board of a controller, that is BBU. If I had additional SSD caching on the controller I would have mentioned it. I was also under impression, that in a situation where mostly extra large files will be stored on the massive, the bigger strip size would indeed increase the speed, thus I went with with the 256 Kb strip size. Would I be correct in assuming that the RAID strip size of 128 Kb will be a better choice if one plans to use the BTRFS with compression? thanks, kos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
> Peter, I don't think the filefrag is showing the correct > fragmentation status of the file when the compression is used. As reported on a previous message the output of 'filefrag -v' which can be used to see what is going on: filefrag /mnt/sde3/testfile /mnt/sde3/testfile: 49287 extents found Most the latter extents are mercifully rather contiguous, their size is just limited by the compression code, here is an extract from 'filefrag -v' from around the middle: 24757: 1321888.. 1321919: 11339579.. 11339610: 32: 11339594: 24758: 1321920.. 1321951: 11339597.. 11339628: 32: 11339611: 24759: 1321952.. 1321983: 11339615.. 11339646: 32: 11339629: 24760: 1321984.. 1322015: 11339632.. 11339663: 32: 11339647: 24761: 1322016.. 1322047: 11339649.. 11339680: 32: 11339664: 24762: 1322048.. 1322079: 11339667.. 11339698: 32: 11339681: 24763: 1322080.. 1322111: 11339686.. 11339717: 32: 11339699: 24764: 1322112.. 1322143: 11339703.. 11339734: 32: 11339718: 24765: 1322144.. 1322175: 11339720.. 11339751: 32: 11339735: 24766: 1322176.. 1322207: 11339737.. 11339768: 32: 11339752: 24767: 1322208.. 1322239: 11339754.. 11339785: 32: 11339769: 24768: 1322240.. 1322271: 11339771.. 11339802: 32: 11339786: 24769: 1322272.. 1322303: 11339789.. 11339820: 32: 11339803: But again this is on a fresh empty Btrfs volume. As I wrote, "their size is just limited by the compression code" which results in "128KiB writes". On a "fresh empty Btrfs volume" the compressed extents limited to 128KiB also happen to be pretty physically contiguous, but on a more fragmented free space list they can be more scattered. As I already wrote the main issue here seems to be that we are talking about a "RAID5 with 128KiB writes and a 768KiB stripe size". On MD RAID5 the slowdown because of RMW seems only to be around 30-40%, but it looks like that several back-to-back 128KiB writes get merged by the Linux IO subsystem (not sure whether that's thoroughly legal), and perhaps they get merged by the 3ware firmware only if it has a persistent cache, and maybe your 3ware does not have one, but you have kept your counsel as to that. My impression is that you read the Btrfs documentation and my replies with a lot less attention than I write them. Some of the things you have done and said make me think that you did not read https://btrfs.wiki.kernel.org/index.php/Compression and 'man 5 btrfs', for example: "How does compression interact with direct IO or COW? Compression does not work with DIO, does work with COW and does not work for NOCOW files. If a file is opened in DIO mode, it will fall back to buffered IO. Are there speed penalties when doing random access to a compressed file? Yes. The compression processes ranges of a file of maximum size 128 KiB and compresses each 4 KiB (or page-sized) block separately." > I am currently defragmenting that mountpoint, ensuring that > everrything is compressed with zlib. Defragmenting the used space might help find more contiguous allocations. > p.s. any other suggestion that might help with the fragmentation > and data allocation. Should I try and rebalance the data on the > drive? Yes, regularly, as that defragments the unused space. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Btrfs + compression = slow performance and high cpu usage
> -Original Message- > From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- > ow...@vger.kernel.org] On Behalf Of Konstantin V. Gavrilenko > Sent: Tuesday, 1 August 2017 7:58 PM > To: Peter Grandi <p...@btrfs.list.sabi.co.uk> > Cc: Linux fs Btrfs <linux-btrfs@vger.kernel.org> > Subject: Re: Btrfs + compression = slow performance and high cpu usage > > Peter, I don't think the filefrag is showing the correct fragmentation status > of > the file when the compression is used. > At least the one that is installed by default in Ubuntu 16.04 - e2fsprogs | > 1.42.13-1ubuntu1 > > So for example, fragmentation of compressed file is 320 times more then > uncompressed one. > > root@homenas:/mnt/storage/NEW# filefrag test5g-zeroes > test5g-zeroes: 40903 extents found > > root@homenas:/mnt/storage/NEW# filefrag test5g-data > test5g-data: 129 extents found Compressed extents are about 128kb, uncompressed extents are about 128Mb. (can't remember the exact numbers.) I've had trouble with slow filesystems when using compression. The problem seems to go away when removing compression. Paul.
Re: Btrfs + compression = slow performance and high cpu usage
Peter, I don't think the filefrag is showing the correct fragmentation status of the file when the compression is used. At least the one that is installed by default in Ubuntu 16.04 - e2fsprogs | 1.42.13-1ubuntu1 So for example, fragmentation of compressed file is 320 times more then uncompressed one. root@homenas:/mnt/storage/NEW# filefrag test5g-zeroes test5g-zeroes: 40903 extents found root@homenas:/mnt/storage/NEW# filefrag test5g-data test5g-data: 129 extents found I am currently defragmenting that mountpoint, ensuring that everrything is compressed with zlib. # btrfs fi defragment -rv -czlib /mnt/arh-backup my guess is that it will take another 24-36 hours to complete and then I will redo the test to see if that has helped. will keep the list posted. p.s. any other suggestion that might help with the fragmentation and data allocation. Should I try and rebalance the data on the drive? kos - Original Message - From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk> To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org> Sent: Monday, 31 July, 2017 1:41:07 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage [ ... ] > grep 'model name' /proc/cpuinfo | sort -u > model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz Good, contemporary CPU with all accelerations. > The sda device is a hardware RAID5 consisting of 4x8TB drives. [ ... ] > Strip Size : 256 KB So the full RMW data stripe length is 768KiB. > [ ... ] don't see the previously reported behaviour of one of > the kworker consuming 100% of the cputime, but the write speed > difference between the compression ON vs OFF is pretty large. That's weird; of course 'lzo' is a lot cheaper than 'zlib', but in my test the much higher CPU time of the latter was spread across many CPUs, while in your case it wasn't, even if the E5645 has 6 CPUs and can do 12 threads. That seemed to point to some high cost of finding free blocks, that is a very fragmented free list, or something else. > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress oflag=direct > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 26.0685 s, 206 MB/s The results with 'oflag=direct' are not relevant, because Btrfs behaves "differently" with that. > mountflags: > (rw,relatime,compress-force=zlib,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 77.4845 s, 69.3 MB/s > mountflags: > (rw,relatime,compress-force=lzo,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 122.321 s, 43.9 MB/s That's pretty good for a RAID5 with 128KiB writes and a 768KiB stripe size, on a 3ware, and looks like that the hw host adapter does not have a persistent cache (battery backed usually). My guess that watching transfer rates and latencies with 'iostat -dk -zyx 1' did not happen. > mountflags: (rw,relatime,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 10.1033 s, 531 MB/s I had mentioned in my previous reply the output of 'filefrag'. That to me seems relevant here, because of RAID5 RMW and maximum extent size with Brfs compression and strip/stripe size. Perhaps redoing the tests with a 128KiB 'bs' *without* compression would be interesting, perhaps even with 'oflag=sync' instead of 'conv=fsync'. It is hard for me to see a speed issue here with Btrfs: for comparison I have done a simple test with a both a 3+1 MD RAID5 set with a 256KiB chunk size and a single block device on "contemporary" 1T/2TB drives, capable of sequential transfer rates of 150-190MB/s: soft# grep -A2 sdb3 /proc/mdstat md127 : active raid5 sde3[4] sdd3[2] sdc3[1] sdb3[0] 729808128 blocks super 1.0 level 5, 256k chunk, algorithm 2 [4/4] [] with compression: soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 /mnt/test5 soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3 soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 94.3605 s, 111 MB/s 0.01user 12.59system 1:34.36elapsed 13%CPU (0avgtext+0avgdata 2932maxresident)k 13042144inputs+20482144outputs (3major+345minor)pagefaults 0swaps soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 93.5885 s, 112 MB/s 0.03user 12.35syst
Re: Btrfs + compression = slow performance and high cpu usage
> [ ... ] It is hard for me to see a speed issue here with > Btrfs: for comparison I have done a simple test with a both a > 3+1 MD RAID5 set with a 256KiB chunk size and a single block > device on "contemporary" 1T/2TB drives, capable of sequential > transfer rates of 150-190MB/s: [ ... ] The figures after this are a bit on the low side because I realized looking at 'vmstat' that the source block device 'sda6' was being a bottleneck, as the host has only 8GiB instead of the 16GiB I misremembered, and also 'sda' is a relatively slow flash SSD that reads are most at around 220MB/s. So I have redone the simple tests with a transfer size of 3GB, which ensures that all reads are from memory cache: with compression: soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 /mnt/test5 soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3 soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=1M count=3000 conv=fsync 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 15.8869 s, 198 MB/s 0.00user 2.80system 0:15.88elapsed 17%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+6148256outputs (0major+346minor)pagefaults 0swaps soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile bs=1M count=3000 conv=fsync 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 16.9663 s, 185 MB/s 0.00user 2.61system 0:16.96elapsed 15%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+6144672outputs (0major+346minor)pagefaults 0swaps soft# btrfs fi df /mnt/test5/ | grep Data Data, single: total=3.00GiB, used=2.28GiB soft# btrfs fi df /mnt/sdg3 | grep Data Data, single: total=3.00GiB, used=2.28GiB soft# filefrag /mnt/test5/testfile /mnt/sdg3/testfile /mnt/test5/testfile: 8811 extents found /mnt/sdg3/testfile: 8759 extents found Slightly weird that with a 3GB size the number of extents is almost double that for the 10GB, but I guess that depends on speed. Then without compression: soft# mount -t btrfs -o commit=10 /dev/md/test5 /mnt/test5 soft# mount -t btrfs -o commit=10 /dev/sdg3 /mnt/sdg3 soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=1M count=3000 conv=fsync 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 8.06841 s, 390 MB/s 0.00user 3.90system 0:08.80elapsed 44%CPU (0avgtext+0avgdata 2880maxresident)k 0inputs+6153856outputs (0major+345minor)pagefaults 0swaps soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile bs=1M count=3000 conv=fsync 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 30.215 s, 104 MB/s 0.00user 4.82system 0:30.93elapsed 15%CPU (0avgtext+0avgdata 2888maxresident)k 0inputs+6152128outputs (0major+347minor)pagefaults 0swaps soft# filefrag /mnt/test5/testfile /mnt/sdg3/testfile /mnt/test5/testfile: 5 extents found /mnt/sdg3/testfile: 3 extents found Also added: soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=128k count=3000 | dd iflag=fullblock of=/mnt/test5/testfile bs=128k oflag=sync 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 160.315 s, 2.5 MB/s 0.02user 0.46system 2:40.31elapsed 0%CPU (0avgtext+0avgdata 1992maxresident)k 0inputs+0outputs (0major+124minor)pagefaults 0swaps 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 160.365 s, 2.5 MB/s soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=128k count=3000 | dd iflag=fullblock of=/mnt/sdg3/testfile bs=128k oflag=sync 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 113.51 s, 3.5 MB/s 0.02user 0.56system 1:53.51elapsed 0%CPU (0avgtext+0avgdata 2156maxresident)k 0inputs+0outputs (0major+120minor)pagefaults 0swaps 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 113.544 s, 3.5 MB/s soft# filefrag /mnt/test5/testfile /mnt/sdg3/testfile /mnt/test5/testfile: 1 extent found /mnt/sdg3/testfile: 22 extents found soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=1M count=1000 | dd iflag=fullblock of=/mnt/test5/testfile bs=1M oflag=sync
Re: Btrfs + compression = slow performance and high cpu usage
[ ... ] > Also added: Feeling very generous :-) today, adding these too: soft# mkfs.btrfs -mraid10 -draid10 -L test5 /dev/sd{b,c,d,e}3 [ ... ] soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/sdb3 /mnt/test5 soft# rm -f /mnt/test5/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=1M count=3000 conv=fsync 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 14.2166 s, 221 MB/s 0.00user 2.54system 0:14.21elapsed 17%CPU (0avgtext+0avgdata 3056maxresident)k 0inputs+6144768outputs (0major+346minor)pagefaults 0swaps soft# rm -f /mnt/test5/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=128k count=3000 conv=fsync 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 2.05933 s, 191 MB/s 0.00user 0.32system 0:02.06elapsed 15%CPU (0avgtext+0avgdata 1996maxresident)k 0inputs+772512outputs (0major+124minor)pagefaults 0swaps soft# rm -f /mnt/test5/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=1M count=1000 | dd iflag=fullblock of=/mnt/test5/testfile bs=1M oflag=sync 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 60.6019 s, 17.3 MB/s 0.01user 1.04system 1:00.60elapsed 1%CPU (0avgtext+0avgdata 2888maxresident)k 0inputs+0outputs (0major+348minor)pagefaults 0swaps 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 60.4116 s, 17.4 MB/s soft# rm -f /mnt/test5/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 bs=128k count=3000 | dd iflag=fullblock of=/mnt/test5/testfile bs=128k oflag=sync 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 148.04 s, 2.7 MB/s 0.00user 0.62system 2:28.04elapsed 0%CPU (0avgtext+0avgdata 1996maxresident)k 0inputs+0outputs (0major+125minor)pagefaults 0swaps 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 148.083 s, 2.7 MB/s soft# sysctl vm/drop_caches=3 vm.drop_caches = 3 soft# /usr/bin/time dd iflag=fullblock if=/mnt/test5/testfile bs=128k count=3000 of=/dev/zero 3000+0 records in 3000+0 records out 393216000 bytes (393 MB) copied, 1.09729 s, 358 MB/s 0.00user 0.24system 0:01.10elapsed 23%CPU (0avgtext+0avgdata 2164maxresident)k 459768inputs+0outputs (3major+121minor)pagefaults 0swaps -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
[ ... ] > grep 'model name' /proc/cpuinfo | sort -u > model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz Good, contemporary CPU with all accelerations. > The sda device is a hardware RAID5 consisting of 4x8TB drives. [ ... ] > Strip Size : 256 KB So the full RMW data stripe length is 768KiB. > [ ... ] don't see the previously reported behaviour of one of > the kworker consuming 100% of the cputime, but the write speed > difference between the compression ON vs OFF is pretty large. That's weird; of course 'lzo' is a lot cheaper than 'zlib', but in my test the much higher CPU time of the latter was spread across many CPUs, while in your case it wasn't, even if the E5645 has 6 CPUs and can do 12 threads. That seemed to point to some high cost of finding free blocks, that is a very fragmented free list, or something else. > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress oflag=direct > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 26.0685 s, 206 MB/s The results with 'oflag=direct' are not relevant, because Btrfs behaves "differently" with that. > mountflags: > (rw,relatime,compress-force=zlib,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 77.4845 s, 69.3 MB/s > mountflags: > (rw,relatime,compress-force=lzo,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 122.321 s, 43.9 MB/s That's pretty good for a RAID5 with 128KiB writes and a 768KiB stripe size, on a 3ware, and looks like that the hw host adapter does not have a persistent cache (battery backed usually). My guess that watching transfer rates and latencies with 'iostat -dk -zyx 1' did not happen. > mountflags: (rw,relatime,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 10.1033 s, 531 MB/s I had mentioned in my previous reply the output of 'filefrag'. That to me seems relevant here, because of RAID5 RMW and maximum extent size with Brfs compression and strip/stripe size. Perhaps redoing the tests with a 128KiB 'bs' *without* compression would be interesting, perhaps even with 'oflag=sync' instead of 'conv=fsync'. It is hard for me to see a speed issue here with Btrfs: for comparison I have done a simple test with a both a 3+1 MD RAID5 set with a 256KiB chunk size and a single block device on "contemporary" 1T/2TB drives, capable of sequential transfer rates of 150-190MB/s: soft# grep -A2 sdb3 /proc/mdstat md127 : active raid5 sde3[4] sdd3[2] sdc3[1] sdb3[0] 729808128 blocks super 1.0 level 5, 256k chunk, algorithm 2 [4/4] [] with compression: soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 /mnt/test5 soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3 soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 94.3605 s, 111 MB/s 0.01user 12.59system 1:34.36elapsed 13%CPU (0avgtext+0avgdata 2932maxresident)k 13042144inputs+20482144outputs (3major+345minor)pagefaults 0swaps soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 93.5885 s, 112 MB/s 0.03user 12.35system 1:33.59elapsed 13%CPU (0avgtext+0avgdata 2940maxresident)k 13042144inputs+20482400outputs (3major+346minor)pagefaults 0swaps soft# filefrag /mnt/test5/testfile /mnt/sdg3/testfile /mnt/test5/testfile: 48945 extents found /mnt/sdg3/testfile: 49029 extents found soft# btrfs fi df /mnt/test5/ | grep Data Data, single: total=7.00GiB, used=6.55GiB soft# btrfs fi df /mnt/sdg3 | grep Data Data, single: total=7.00GiB, used=6.55GiB soft# sysctl vm/drop_caches=3 vm.drop_caches = 3 soft# /usr/bin/time dd iflag=fullblock if=/mnt/test5/testfile bs=1M count=1 of=/dev/zero 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 23.2975 s, 450 MB/s 0.01user 7.59system 0:23.32elapsed 32%CPU (0avgtext+0avgdata 2932maxresident)k 13759624inputs+0outputs (3major+344minor)pagefaults 0swaps soft# sysctl vm/drop_caches=3 vm.drop_caches = 3 soft# /usr/bin/time dd iflag=fullblock if=/mnt/sdg3/testfile bs=1M count=1 of=/dev/zero 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 35.0032 s, 300 MB/s 0.01user 8.46system 0:35.03elapsed 24%CPU (0avgtext+0avgdata 2924maxresident)k 13750568inputs+0outputs (3major+345minor)pagefaults 0swaps and
Re: Btrfs + compression = slow performance and high cpu usage
all 0.00 0.00 4.84 5.09 0.00 90.08 14:31:45all 0.17 0.00 4.67 4.75 0.00 90.42 14:31:46all 0.00 0.00 4.60 3.76 0.00 91.64 14:31:47all 0.08 0.00 5.07 3.66 0.00 91.18 14:31:48all 0.00 0.00 5.01 3.68 0.00 91.31 14:31:49all 0.00 0.00 4.76 3.68 0.00 91.56 14:31:50all 0.08 0.00 4.59 3.59 0.00 91.73 14:31:51all 0.00 0.00 2.67 1.92 0.00 95.41 - Original Message - From: "Peter Grandi" <p...@btrfs.list.sabi.co.uk> To: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org> Sent: Friday, 28 July, 2017 8:08:47 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage > I am stuck with a problem of btrfs slow performance when using > compression. [ ... ] That to me looks like an issue with speed, not performance, and in particular with PEBCAK issues. As to high CPU usage, when you find a way to do both compression and checksumming without using much CPU time, please send patches urgently :-). In your case the increase in CPU time is bizarre. I have the Ubuntu 4.4 "lts-xenial" kernel and what you report does not happen here (with a few little changes): soft# grep 'model name' /proc/cpuinfo | sort -u model name : AMD FX(tm)-6100 Six-Core Processor soft# cpufreq-info | grep 'current CPU frequency' current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). soft# lsscsi | grep 'sd[ae]' [0:0:0:0]diskATA HFS256G32MNB-220 3L00 /dev/sda [5:0:0:0]diskATA ST2000DM001-1CH1 CC44 /dev/sde soft# mkfs.btrfs -f /dev/sde3 [ ... ] soft# mount -t btrfs -o discard,autodefrag,compress=lzo,compress-force,commit=10 /dev/sde3 /mnt/sde3 soft# df /dev/sda6 /mnt/sde3 Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda6 90048 76046 14003 85% / /dev/sde3 23756819235501 1% /mnt/sde3 The above is useful context information that was "amazingly" omitted from your reported. In dmesg I see (not the "force zlib compression"): [327730.917285] BTRFS info (device sde3): turning on discard [327730.917294] BTRFS info (device sde3): enabling auto defrag [327730.917300] BTRFS info (device sde3): setting 8 feature flag [327730.917304] BTRFS info (device sde3): force zlib compression [327730.917313] BTRFS info (device sde3): disk space caching is enabled [327730.917315] BTRFS: has skinny extents [327730.917317] BTRFS: flagging fs with big metadata feature [327730.920740] BTRFS: creating UUID tree and the result is: soft# pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile bs=1M count=1 oflag=direct 1+0 records in17MB/s] [==>] 11% ETA 0:15:06 1+0 records out 1048576 bytes (10 GB) copied, 112.845 s, 92.9 MB/s 0.05user 9.93system 1:53.20elapsed 8%CPU (0avgtext+0avgdata 3016maxresident)k 120inputs+20496000outputs (1major+346minor)pagefaults 0swaps 9.77GB 0:01:53 [88.3MB/s] [==>] 11% soft# btrfs fi df /mnt/sde3/ Data, single: total=10.01GiB, used=9.77GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=11.66MiB GlobalReserve, single: total=16.00MiB, used=0.00B As it was running system CPU time was under 20% of one CPU: top - 18:57:29 up 3 days, 19:27, 4 users, load average: 5.44, 2.82, 1.45 Tasks: 325 total, 1 running, 324 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 2.3 sy, 0.0 ni, 91.3 id, 6.3 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 1.3 sy, 0.0 ni, 78.5 id, 20.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.3 us, 5.8 sy, 0.0 ni, 81.0 id, 12.5 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu3 : 0.3 us, 3.4 sy, 0.0 ni, 91.9 id, 4.4 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.3 us, 10.6 sy, 0.0 ni, 55.4 id, 33.7 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8120660 total, 5162236 used, 2958424 free, 4440100 buffers KiB Swap:0 total,0 used,0 free. 351848 cached Mem PID PPID USER PR NIVIRTRESDATA %CPU %MEM TIME+ TTY COMMAND 21047 21046 root 20 08872 26161364 12.9 0.0 0:02.31 pts/3dd iflag=fullblo+ 21045 3535 root 20 07928 1948
Re: Btrfs + compression = slow performance and high cpu usage
In addition to my previous "it does not happen here" comment, if someone is reading this thread, there are some other interesting details: > When the compression is turned off, I am able to get the > maximum 500-600 mb/s write speed on this disk (raid array) > with minimal cpu usage. No details on whether it is a parity RAID or not. > btrfs device usage /mnt/arh-backup1/ > /dev/sda, ID: 2 >Device size:21.83TiB >Device slack: 0.00B >Data,single: 9.29TiB >Metadata,single:46.00GiB >System,single: 32.00MiB >Unallocated:12.49TiB That's exactly 24TB of "Device size", of which around 45% are used, and the string "backup" may suggest that the content is backups, which may indicate a very fragmented freespace. Of course compression does not help with that, in my freshly created Btrfs volume I get as expected: soft# umount /mnt/sde3 soft# mount -t btrfs -o commit=10 /dev/sde3 /mnt/sde3 soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sde3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 103.747 s, 101 MB/s 0.00user 11.56system 1:44.86elapsed 11%CPU (0avgtext+0avgdata 3072maxresident)k 20480672inputs+20498272outputs (1major+349minor)pagefaults 0swaps soft# filefrag /mnt/sde3/testfile /mnt/sde3/testfile: 11 extents found versus: soft# umount /mnt/sde3 soft# mount -t btrfs -o commit=10,compress=lzo,compress-force /dev/sde3 /mnt/sde3 soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sde3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 109.051 s, 96.2 MB/s 0.02user 13.03system 1:49.49elapsed 11%CPU (0avgtext+0avgdata 3068maxresident)k 20494784inputs+20492320outputs (1major+347minor)pagefaults 0swaps soft# filefrag /mnt/sde3/testfile /mnt/sde3/testfile: 49287 extents found Most the latter extents are mercifully rather contiguous, their size is just limited by the compression code, here is an extract from 'filefrag -v' from around the middle: 24757: 1321888.. 1321919: 11339579.. 11339610: 32: 11339594: 24758: 1321920.. 1321951: 11339597.. 11339628: 32: 11339611: 24759: 1321952.. 1321983: 11339615.. 11339646: 32: 11339629: 24760: 1321984.. 1322015: 11339632.. 11339663: 32: 11339647: 24761: 1322016.. 1322047: 11339649.. 11339680: 32: 11339664: 24762: 1322048.. 1322079: 11339667.. 11339698: 32: 11339681: 24763: 1322080.. 1322111: 11339686.. 11339717: 32: 11339699: 24764: 1322112.. 1322143: 11339703.. 11339734: 32: 11339718: 24765: 1322144.. 1322175: 11339720.. 11339751: 32: 11339735: 24766: 1322176.. 1322207: 11339737.. 11339768: 32: 11339752: 24767: 1322208.. 1322239: 11339754.. 11339785: 32: 11339769: 24768: 1322240.. 1322271: 11339771.. 11339802: 32: 11339786: 24769: 1322272.. 1322303: 11339789.. 11339820: 32: 11339803: But again this is on a fresh empty Btrfs volume. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
On Fri, Jul 28, 2017 at 06:20:14PM +, William Muriithi wrote: > Hi Roman, > > > autodefrag > > This sure sounded like a good thing to enable? on paper? right?... > > The moment you see anything remotely weird about btrfs, this is the first > thing you have to disable and retest without. Oh wait, the first would be > qgroups, this one is second. > > What's the problem with autodefrag? I am also using it, so you caught my > attention when you implied that it shouldn't be used. According to docs, it > seem like one of the very mature feature of the filesystem. See below for > the doc I am referring to > > https://btrfs.wiki.kernel.org/index.php/Status > > I am using it as I assumed it could prevent the filesystem being too > fragmented long term, but never thought there was price to pay for using it It introduces additional I/O on writes, as it modifies a small area surrounding any write or cluster of writes. I'm not aware of it causing massive slowdowns, in the way the qgroups does in some situations. If your system is already marginal in terms of being able to support the I/O required, then turning on autodefrag will make things worse (but you may be heading for _much_ worse performance in the future as the FS becomes more fragmented -- depending on your write patterns and use case). Hugo. -- Hugo Mills | Great oxymorons of the world, no. 6: hugo@... carfax.org.uk | Mature Student http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
RE: Btrfs + compression = slow performance and high cpu usage
Hi Roman, > autodefrag This sure sounded like a good thing to enable? on paper? right?... The moment you see anything remotely weird about btrfs, this is the first thing you have to disable and retest without. Oh wait, the first would be qgroups, this one is second. What's the problem with autodefrag? I am also using it, so you caught my attention when you implied that it shouldn't be used. According to docs, it seem like one of the very mature feature of the filesystem. See below for the doc I am referring to https://btrfs.wiki.kernel.org/index.php/Status I am using it as I assumed it could prevent the filesystem being too fragmented long term, but never thought there was price to pay for using it Regards, William -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
> I am stuck with a problem of btrfs slow performance when using > compression. [ ... ] That to me looks like an issue with speed, not performance, and in particular with PEBCAK issues. As to high CPU usage, when you find a way to do both compression and checksumming without using much CPU time, please send patches urgently :-). In your case the increase in CPU time is bizarre. I have the Ubuntu 4.4 "lts-xenial" kernel and what you report does not happen here (with a few little changes): soft# grep 'model name' /proc/cpuinfo | sort -u model name : AMD FX(tm)-6100 Six-Core Processor soft# cpufreq-info | grep 'current CPU frequency' current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). soft# lsscsi | grep 'sd[ae]' [0:0:0:0]diskATA HFS256G32MNB-220 3L00 /dev/sda [5:0:0:0]diskATA ST2000DM001-1CH1 CC44 /dev/sde soft# mkfs.btrfs -f /dev/sde3 [ ... ] soft# mount -t btrfs -o discard,autodefrag,compress=lzo,compress-force,commit=10 /dev/sde3 /mnt/sde3 soft# df /dev/sda6 /mnt/sde3 Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda6 90048 76046 14003 85% / /dev/sde3 23756819235501 1% /mnt/sde3 The above is useful context information that was "amazingly" omitted from your reported. In dmesg I see (not the "force zlib compression"): [327730.917285] BTRFS info (device sde3): turning on discard [327730.917294] BTRFS info (device sde3): enabling auto defrag [327730.917300] BTRFS info (device sde3): setting 8 feature flag [327730.917304] BTRFS info (device sde3): force zlib compression [327730.917313] BTRFS info (device sde3): disk space caching is enabled [327730.917315] BTRFS: has skinny extents [327730.917317] BTRFS: flagging fs with big metadata feature [327730.920740] BTRFS: creating UUID tree and the result is: soft# pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile bs=1M count=1 oflag=direct 1+0 records in17MB/s] [==>] 11% ETA 0:15:06 1+0 records out 1048576 bytes (10 GB) copied, 112.845 s, 92.9 MB/s 0.05user 9.93system 1:53.20elapsed 8%CPU (0avgtext+0avgdata 3016maxresident)k 120inputs+20496000outputs (1major+346minor)pagefaults 0swaps 9.77GB 0:01:53 [88.3MB/s] [==>] 11% soft# btrfs fi df /mnt/sde3/ Data, single: total=10.01GiB, used=9.77GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=11.66MiB GlobalReserve, single: total=16.00MiB, used=0.00B As it was running system CPU time was under 20% of one CPU: top - 18:57:29 up 3 days, 19:27, 4 users, load average: 5.44, 2.82, 1.45 Tasks: 325 total, 1 running, 324 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 2.3 sy, 0.0 ni, 91.3 id, 6.3 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 1.3 sy, 0.0 ni, 78.5 id, 20.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.3 us, 5.8 sy, 0.0 ni, 81.0 id, 12.5 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu3 : 0.3 us, 3.4 sy, 0.0 ni, 91.9 id, 4.4 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.3 us, 10.6 sy, 0.0 ni, 55.4 id, 33.7 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8120660 total, 5162236 used, 2958424 free, 4440100 buffers KiB Swap:0 total,0 used,0 free. 351848 cached Mem PID PPID USER PR NIVIRTRESDATA %CPU %MEM TIME+ TTY COMMAND 21047 21046 root 20 08872 26161364 12.9 0.0 0:02.31 pts/3dd iflag=fullblo+ 21045 3535 root 20 07928 1948 460 12.3 0.0 0:00.72 pts/3pv -tpreb /dev/s+ 21019 2 root 20 0 0 0 0 1.3 0.0 0:42.88 ? [kworker/u16:1] Of course "oflag=direct" is a rather "optimistic" option in this context, so I tried again with something more sensible: soft# pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile bs=1M count=1 conv=fsync 1+0 records in.4MB/s] [==>] 11% ETA 0:14:41 1+0 records out 1048576 bytes (10 GB) copied, 110.523 s, 94.9 MB/s 0.03user 8.94system 1:50.71elapsed 8%CPU (0avgtext+0avgdata 3024maxresident)k 136inputs+20499648outputs (1major+348minor)pagefaults 0swaps 9.77GB 0:01:50 [90.3MB/s] [==>] 11% soft# btrfs fi df /mnt/sde3/ Data, single: total=7.01GiB, used=6.35GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=15.81MiB GlobalReserve,
Re: Btrfs + compression = slow performance and high cpu usage
On Fri, 28 Jul 2017 17:40:50 +0100 (BST) "Konstantin V. Gavrilenko"wrote: > Hello list, > > I am stuck with a problem of btrfs slow performance when using compression. > > when the compress-force=lzo mount flag is enabled, the performance drops to > 30-40 mb/s and one of the btrfs processes utilises 100% cpu time. > mount options: btrfs > relatime,discard,autodefrag,compress=lzo,compress-force,space_cache=v2,commit=10 It does not work like that, you need to set compress-force=lzo (and remove compress=). With your setup I believe you currently use compress-force[=zlib](default), overriding compress=lzo, since it's later in the options order. Secondly, > autodefrag This sure sounded like a good thing to enable? on paper? right?... The moment you see anything remotely weird about btrfs, this is the first thing you have to disable and retest without. Oh wait, the first would be qgroups, this one is second. Finally, what is the reasoning behind "commit=10", and did you check with the default value of 30? -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs + compression = slow performance and high cpu usage
Hello list, I am stuck with a problem of btrfs slow performance when using compression. when the compress-force=lzo mount flag is enabled, the performance drops to 30-40 mb/s and one of the btrfs processes utilises 100% cpu time. mount options: btrfs relatime,discard,autodefrag,compress=lzo,compress-force,space_cache=v2,commit=10 The command I am testing the write throughput is # pv -tpreb /dev/sdb | dd of=./testfile bs=1M oflag=direct # top -d 1 top - 15:49:13 up 1:52, 2 users, load average: 5.28, 2.32, 1.39 Tasks: 320 total, 2 running, 318 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 2.0 sy, 0.0 ni, 77.0 id, 21.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 1.0 sy, 0.0 ni, 90.0 id, 9.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.0 us, 1.0 sy, 0.0 ni, 72.0 id, 27.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.0 us,100.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 1.0 sy, 0.0 ni, 57.0 id, 42.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni, 96.0 id, 4.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni, 94.0 id, 6.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 1.0 sy, 0.0 ni, 95.1 id, 3.9 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 1.0 us, 2.0 sy, 0.0 ni, 24.0 id, 73.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 0.0 us, 0.0 sy, 0.0 ni, 81.8 id, 18.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 1.0 us, 0.0 sy, 0.0 ni, 98.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 0.0 us, 2.0 sy, 0.0 ni, 83.3 id, 14.7 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32934136 total, 10137496 free, 602244 used, 22194396 buff/cache KiB Swap:0 total,0 free,0 used. 30525664 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 37017 root 20 0 0 0 0 R 100.0 0.0 0:32.42 kworker/u49:8 36732 root 20 0 0 0 0 D 4.0 0.0 0:02.40 btrfs-transacti 40105 root 20 08388 3040 2000 D 4.0 0.0 0:02.88 dd The keyworker process that causes the high cpu usage is most likely searching for the free space. # echo l > /proc/sysrq-trigger # dmest -T Fri Jul 28 15:57:51 2017] CPU: 1 PID: 36430 Comm: kworker/u49:2 Not tainted 4.10.0-28-generic #32~16.04.2-Ubuntu [Fri Jul 28 15:57:51 2017] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1b 11/16/2012 [Fri Jul 28 15:57:51 2017] Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] [Fri Jul 28 15:57:51 2017] task: 9ddce6206a40 task.stack: aa9121f6c000 [Fri Jul 28 15:57:51 2017] RIP: 0010:rb_next+0x1e/0x40 [Fri Jul 28 15:57:51 2017] RSP: 0018:aa9121f6fb40 EFLAGS: 0282 [Fri Jul 28 15:57:51 2017] RAX: 9dddc34df1b0 RBX: 0001 RCX: 1000 [Fri Jul 28 15:57:51 2017] RDX: 9dddc34df708 RSI: 9ddccaf470a4 RDI: 9dddc34df2d0 [Fri Jul 28 15:57:51 2017] RBP: aa9121f6fb40 R08: 0001 R09: 3000 [Fri Jul 28 15:57:51 2017] R10: R11: 0002 R12: 9ddccaf47080 [Fri Jul 28 15:57:51 2017] R13: 1000 R14: aa9121f6fc50 R15: 9dddc34df2d0 [Fri Jul 28 15:57:51 2017] FS: () GS:9ddcefa4() knlGS: [Fri Jul 28 15:57:51 2017] CS: 0010 DS: ES: CR0: 80050033 [Fri Jul 28 15:57:51 2017] Call Trace:_space_for_alloc+0xde/0x270 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_find_space_for_alloc+0xde/0x270 [btrfs] [Fri Jul 28 15:57:51 2017] find_free_extent.isra.68+0x3c6/0x1040 [btrfs]s] [Fri Jul 28 15:57:51 2017] btrfs_reserve_extent+0xab/0x210 [btrfs]btrfs] [Fri Jul 28 15:57:51 2017] submit_compressed_extents+0x154/0x580 [btrfs]s] [Fri Jul 28 15:57:51 2017] ? submit_compressed_extents+0x580/0x580 [btrfs] [Fri Jul 28 15:57:51 2017] async_cow_submit+0x82/0x90 [btrfs]00 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_scrubparity_helper+0x1fe/0x300 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_delalloc_helper+0xe/0x10 [btrfs] [Fri Jul 28 15:57:51 2017] process_one_work+0x16b/0x4a0a0 [Fri Jul 28 15:57:51 2017] worker_thread+0x4b/0x500+0x60/0x60 [Fri Jul 28 15:57:51 2017] kthread+0x109/0x1400x4a0/0x4a0 When the compression is turned off, I am able to get the maximum 500-600 mb/s write speed on this disk (raid array) with minimal cpu usage. mount options: relatime,discard,autodefrag,space_cache=v2,commit=10 # iostat -m 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.080.007.74 10.770.00 81.40 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda2376.00 0.00
RE: Btrfs Compression
> -Original Message- > From: Austin S. Hemmelgarn [mailto:ahferro...@gmail.com] > Sent: Thursday, 6 July 2017 9:52 PM > To: Paul Jones <p...@pauljones.id.au>; linux-btrfs@vger.kernel.org > Subject: Re: Btrfs Compression > > On 2017-07-05 23:19, Paul Jones wrote: > > While reading the thread about adding zstd compression, it occurred to > > me that there is potentially another thing affecting performance - > > Compressed extent size. (correct my terminology if it's incorrect). I > > have two near identical RAID1 filesystems (used for backups) on near > > identical discs (HGST 3T), one compressed and one not. The filesystems > > have about 40 snapshots and are about 50% full. The uncompressed > > filesystem runs at about 60 MB/s, the compressed filesystem about 5-10 > > MB/s. There is noticeably more "noise" from the compressed filesystem > > from all the head thrashing that happens while rsync is happening. > > > > Which brings me to my point - In terms of performance for compression, > > is there some low hanging fruit in adjusting the extent size to be > > more like uncompressed extents so there is not so much seeking > > happening? With spinning discs with large data sets it seems pointless > > making the numerical calculations faster if the discs can't keep up. > > Obviously this is assuming optimisation for speed over compression > > ratio. > > > > Thoughts?That really depends on too much to be certain. In all > > likelihood, your > CPU or memory are your bottleneck, not your storage devices. The data > itself gets compressed in memory, and then sent to the storage device, it's > not streamed directly there from the compression thread, so if the CPU was > compressing data faster than the storage devices could transfer it, you would > (or at least, should) be seeing better performance on the compressed > filesystem than the uncompressed one (because you transfer less data on > the compressed filesystem), assuming the datasets are functionally identical. > > That in turn brings up a few other questions: > * What are the other hardware components involved (namely, CPU, RAM< > and storage controller)? If you're using some dinky little Atom or > Cortex-A7 CPU (or almost anything else 32-bit running at less than 2GHz > peak), then that's probably your bottleneck. Similarly, if you've got a cheap > storage controller than needs a lot of attention from the CPU, then that's > probably your bottleneck (you can check this by seeing how much processing > power is being used when just writing to the uncompressed array (check > how much processing power rsync uses copying between two tmpfs mounts, > then subtract that from the total for copying the same data to the > uncompressed filesystem)). Hardware is AMD Phenom II X6 1055T with 8GB DDR3 on the compressed filesystem, Intel i7-3770K with 8GB DDR3 on the uncompressed. Slight difference, but both are up to the task. > * Which compression algorithm are you using, lzo or zlib? If the answer is > zlib, then what you're seeing is generally expected behavior except on > systems with reasonably high-end CPU's and fast memory, because zlib is > _slow_. Zlib. > * Are you storing the same data on both arrays? If not, then that > immediately makes the comparison suspect (if one array is storing lots of > small files and the other is mostly storing small numbers of large files, > then I > would expect the one with lots of small files to get worse performance, and > compression on that one will just make things worse). > This is even more important when using rsync, because the size of the files > involved has a pretty big impact on it's hashing performance and even data > transfer rate (lots of small files == more time spent in syscalls other than > read() and write()). The dataset is rsync-ed to the primary backup and then to the secondary backup, so contains the same content. > > Additionally, when you're referring to extent size, I assume you mean the > huge number of 128k extents that the FIEMAP ioctl (and at least older > versions of `filefrag`) shows for compressed files? If that's the case, then > it's > important to understand that that's due to an issue with FIEMAP, it doesn't > understand compressed extents in BTRFS correctly, so it shows one extent > per compressed _block_ instead, even if they are internally an extent in > BTRFS. You can verify the actual number of extents by checking how many > runs of continuous 128k 'extents' there are. It was my understanding that compressed extents are significantly smaller in size than uncompressed ones? (like 64k vs 128M? perhaps I'm thinking of something else.) I couldn't find any info about this, but I remember it being mentioned here before. Either way disk io is maxed out so something is different with compression in a way that spinning rust doesn't seem to like. Paul.
Re: Btrfs Compression
Le 06/07/2017 à 13:51, Austin S. Hemmelgarn a écrit : > > Additionally, when you're referring to extent size, I assume you mean > the huge number of 128k extents that the FIEMAP ioctl (and at least > older versions of `filefrag`) shows for compressed files? If that's > the case, then it's important to understand that that's due to an > issue with FIEMAP, it doesn't understand compressed extents in BTRFS > correctly, so it shows one extent per compressed _block_ instead, even > if they are internally an extent in BTRFS. You can verify the actual > number of extents by checking how many runs of continuous 128k > 'extents' there are. This is in fact the problem : compressed extents are far less likely to be contiguous than uncompressed extents (even compensating for the fiemap limitations). When calling defrag on these files BTRFS is likely to ignore the fragmentation too : when I modeled the cost of reading a file as stored vs the ideal cost if it were in one single block I got this surprise. Uncompressed files can be fully defragmented most of the time, compressed files usually reach a fragmentation cost of approximately 1.5x to 2.5x the ideal case after defragmentation (it seems to depend on how the whole filesystem is used). Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Compression
On 2017-07-05 23:19, Paul Jones wrote: While reading the thread about adding zstd compression, it occurred to me that there is potentially another thing affecting performance - Compressed extent size. (correct my terminology if it's incorrect). I have two near identical RAID1 filesystems (used for backups) on near identical discs (HGST 3T), one compressed and one not. The filesystems have about 40 snapshots and are about 50% full. The uncompressed filesystem runs at about 60 MB/s, the compressed filesystem about 5-10 MB/s. There is noticeably more "noise" from the compressed filesystem from all the head thrashing that happens while rsync is happening. Which brings me to my point - In terms of performance for compression, is there some low hanging fruit in adjusting the extent size to be more like uncompressed extents so there is not so much seeking happening? With spinning discs with large data sets it seems pointless making the numerical calculations faster if the discs can't keep up. Obviously this is assuming optimisation for speed over compression ratio. Thoughts?That really depends on too much to be certain. In all likelihood, your CPU or memory are your bottleneck, not your storage devices. The data itself gets compressed in memory, and then sent to the storage device, it's not streamed directly there from the compression thread, so if the CPU was compressing data faster than the storage devices could transfer it, you would (or at least, should) be seeing better performance on the compressed filesystem than the uncompressed one (because you transfer less data on the compressed filesystem), assuming the datasets are functionally identical. That in turn brings up a few other questions: * What are the other hardware components involved (namely, CPU, RAM< and storage controller)? If you're using some dinky little Atom or Cortex-A7 CPU (or almost anything else 32-bit running at less than 2GHz peak), then that's probably your bottleneck. Similarly, if you've got a cheap storage controller than needs a lot of attention from the CPU, then that's probably your bottleneck (you can check this by seeing how much processing power is being used when just writing to the uncompressed array (check how much processing power rsync uses copying between two tmpfs mounts, then subtract that from the total for copying the same data to the uncompressed filesystem)). * Which compression algorithm are you using, lzo or zlib? If the answer is zlib, then what you're seeing is generally expected behavior except on systems with reasonably high-end CPU's and fast memory, because zlib is _slow_. * Are you storing the same data on both arrays? If not, then that immediately makes the comparison suspect (if one array is storing lots of small files and the other is mostly storing small numbers of large files, then I would expect the one with lots of small files to get worse performance, and compression on that one will just make things worse). This is even more important when using rsync, because the size of the files involved has a pretty big impact on it's hashing performance and even data transfer rate (lots of small files == more time spent in syscalls other than read() and write()). Additionally, when you're referring to extent size, I assume you mean the huge number of 128k extents that the FIEMAP ioctl (and at least older versions of `filefrag`) shows for compressed files? If that's the case, then it's important to understand that that's due to an issue with FIEMAP, it doesn't understand compressed extents in BTRFS correctly, so it shows one extent per compressed _block_ instead, even if they are internally an extent in BTRFS. You can verify the actual number of extents by checking how many runs of continuous 128k 'extents' there are. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs Compression
While reading the thread about adding zstd compression, it occurred to me that there is potentially another thing affecting performance - Compressed extent size. (correct my terminology if it's incorrect). I have two near identical RAID1 filesystems (used for backups) on near identical discs (HGST 3T), one compressed and one not. The filesystems have about 40 snapshots and are about 50% full. The uncompressed filesystem runs at about 60 MB/s, the compressed filesystem about 5-10 MB/s. There is noticeably more "noise" from the compressed filesystem from all the head thrashing that happens while rsync is happening. Which brings me to my point - In terms of performance for compression, is there some low hanging fruit in adjusting the extent size to be more like uncompressed extents so there is not so much seeking happening? With spinning discs with large data sets it seems pointless making the numerical calculations faster if the discs can't keep up. Obviously this is assuming optimisation for speed over compression ratio. Thoughts? Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] Btrfs: compression must free at least one sector size
On Tue, Jun 06, 2017 at 02:41:15PM +0300, Timofey Titovets wrote: > Btrfs already skip store of data where compression didn't > free at least one byte. Let's make logic better and make check > that compression free at least one sector size > because in another case it useless to store this data compressed > > Signed-off-by: Timofey TitovetsReviewed-by: David Sterba I've updated the changelog a bit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6] Btrfs: compression must free at least one sector size
Btrfs already skip store of data where compression didn't free at least one byte. Let's make logic better and make check that compression free at least one sector size because in another case it useless to store this data compressed Signed-off-by: Timofey TitovetsCc: David Sterba --- fs/btrfs/inode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 17cbe930..7646f46b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -609,11 +609,10 @@ static noinline void compress_file_range(struct inode *inode, /* * one last check to make sure the compression is really a * win, compare the page count read with the blocks on disk +* compression must free at least one sector size */ total_in = ALIGN(total_in, PAGE_SIZE); - if (total_compressed >= total_in) { - will_compress = 0; - } else { + if (total_compressed + blocksize <= total_in) { num_bytes = total_in; *num_added += 1; -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size
On Mon, Jun 05, 2017 at 09:56:14PM +0300, Timofey Titovets wrote: > 2017-06-05 19:10 GMT+03:00 David Sterba: > > On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote: > >> Btrfs already skip store of data where compression didn't > >> free at least one byte. Let's make logic better and make check > >> that compression free at least one sector size > >> because in another case it useless to store this data compressed > >> > >> Signed-off-by: Timofey Titovets > >> --- > >> fs/btrfs/inode.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > >> index 17cbe930..2793007b 100644 > >> --- a/fs/btrfs/inode.c > >> +++ b/fs/btrfs/inode.c > >> @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode > >> *inode, > >> /* > >>* one last check to make sure the compression is really a > >>* win, compare the page count read with the blocks on disk > >> + * compression must free at least one sector size > >>*/ > >> total_in = ALIGN(total_in, PAGE_SIZE); > >> - if (total_compressed >= total_in) { > >> + if (total_compressed + blocksize > total_in) { > > > > We're doing aligned calculation here, shouldn't this be >= ? > > > > If total_compressed + blocksize == total_in, we have saved exactly one > > blocksize. > > We discussed that already in: > https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg64350.html > > IIRC: long in short: > - input data size 8192 > - output data size 4096 > > This invertion logic, i.e. check if compression can be skipped, if > comparasion are true -> skip compression. > > In case of above check, > old logic: > total_compressed >= total_in > 4096 >= 8192 -> will_compress=1 > With if blocksize added: > 4096+4096 >= 8192 -> will_compress=0 > So this must be changed to: > 4096+4096 > 8192 -> will_compress=1 > Because compression save one blocksize > > Also will_compress not used after this code, so in theory this code > can be refactored to be more obvious, like that: > total_in = ALIGN(total_in, PAGE_SIZE); > - if (total_compressed + blocksize > total_in) { > - will_compress = 0; > - } else { > +if (total_compressed + blocksize <= total_in) { Thanks. That way it's much more obvious to read, please update the patch in that regard (ie. also removing will_compress). > num_bytes = total_in; >*num_added += 1; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size
2017-06-05 19:10 GMT+03:00 David Sterba: > On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote: >> Btrfs already skip store of data where compression didn't >> free at least one byte. Let's make logic better and make check >> that compression free at least one sector size >> because in another case it useless to store this data compressed >> >> Signed-off-by: Timofey Titovets >> --- >> fs/btrfs/inode.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index 17cbe930..2793007b 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode >> *inode, >> /* >>* one last check to make sure the compression is really a >>* win, compare the page count read with the blocks on disk >> + * compression must free at least one sector size >>*/ >> total_in = ALIGN(total_in, PAGE_SIZE); >> - if (total_compressed >= total_in) { >> + if (total_compressed + blocksize > total_in) { > > We're doing aligned calculation here, shouldn't this be >= ? > > If total_compressed + blocksize == total_in, we have saved exactly one > blocksize. We discussed that already in: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg64350.html IIRC: long in short: - input data size 8192 - output data size 4096 This invertion logic, i.e. check if compression can be skipped, if comparasion are true -> skip compression. In case of above check, old logic: total_compressed >= total_in 4096 >= 8192 -> will_compress=1 With if blocksize added: 4096+4096 >= 8192 -> will_compress=0 So this must be changed to: 4096+4096 > 8192 -> will_compress=1 Because compression save one blocksize Also will_compress not used after this code, so in theory this code can be refactored to be more obvious, like that: total_in = ALIGN(total_in, PAGE_SIZE); - if (total_compressed + blocksize > total_in) { - will_compress = 0; - } else { +if (total_compressed + blocksize <= total_in) { num_bytes = total_in; *num_added += 1; Thanks -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size
On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote: > Btrfs already skip store of data where compression didn't > free at least one byte. Let's make logic better and make check > that compression free at least one sector size > because in another case it useless to store this data compressed > > Signed-off-by: Timofey Titovets> --- > fs/btrfs/inode.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 17cbe930..2793007b 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode > *inode, > /* >* one last check to make sure the compression is really a >* win, compare the page count read with the blocks on disk > + * compression must free at least one sector size >*/ > total_in = ALIGN(total_in, PAGE_SIZE); > - if (total_compressed >= total_in) { > + if (total_compressed + blocksize > total_in) { We're doing aligned calculation here, shouldn't this be >= ? If total_compressed + blocksize == total_in, we have saved exactly one blocksize. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/2] Btrfs: compression must free at least one sector size
Btrfs already skip store of data where compression didn't free at least one byte. Let's make logic better and make check that compression free at least one sector size because in another case it useless to store this data compressed Signed-off-by: Timofey Titovets--- fs/btrfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 17cbe930..2793007b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode *inode, /* * one last check to make sure the compression is really a * win, compare the page count read with the blocks on disk +* compression must free at least one sector size */ total_in = ALIGN(total_in, PAGE_SIZE); - if (total_compressed >= total_in) { + if (total_compressed + blocksize > total_in) { will_compress = 0; } else { num_bytes = total_in; -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/2] Btrfs: compression fixes
First patch: Sync comparison logic of compressed/uncompressed sizes from zlib to lzo Second patch: Force btrfs to not store data as compressed, if compression will not free at least one sector size, because it's useless in term of saving storage space and reading data from disk, as a result productivity suffers. Changes since v1: - Merge patches for zlib and lzo in one - Sync check logic for zlib and lzo - Check profit after all data are compressed (not while compressing) Changes since v2: - Fix comparassion logic, it's enough if: compressed size + PAGE_SIZE not bigger then input data size Changes since v3: - Use btrfs sector size directly instead of assume that PAGE_SIZE == sectorsize Thanks to Chandan Changes since v4: - Old first patch - merged (fix typo) - New first patch - synced some logic (see patch) from zlib to lzo - Check compressed/uncompressed size in right place compress_file_range() Thanks to David Timofey Titovets (2): Btrfs: lzo.c compressed data size must be less then input size Btrfs: compression must free at least one sector size fs/btrfs/inode.c | 3 ++- fs/btrfs/lzo.c | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] Btrfs: compression must free at least one sector size
2017-05-29 17:23 GMT+03:00 David Sterba: > On Thu, May 25, 2017 at 09:12:20PM +0300, Timofey Titovets wrote: >> Btrfs already skip store of data where compression didn't >> free at least one byte. Let's make logic better and make check >> that compression free at least one sector size >> because in another case it useless to store this data compressed > > Yeah, there's a room for improvement. > > Saving at least one sectorsize sounds ok to me. I'm not sure if this > should be implemented inside the compressors (lzo, zlib). There'res the > quick shortcut (the check "if (tot_in > 8192 && tot_in < tot_out)"), but > otherwise the overall decision whether to use the compressed data is > done in compress_file_range. > > 601 if (will_compress) { > 602 /* > 603 * we aren't doing an inline extent round the > compressed size > 604 * up to a block size boundary so the allocator does > sane > 605 * things > 606 */ > 607 total_compressed = ALIGN(total_compressed, blocksize); > 608 > 609 /* > 610 * one last check to make sure the compression is > really a > 611 * win, compare the page count read with the blocks on > disk > 612 */ > 613 total_in = ALIGN(total_in, PAGE_SIZE); > 614 if (total_compressed >= total_in) { > 615 will_compress = 0; > 616 } else { > ... > > so the check would go to line 614. > > There's one case that your patch misses and it's the compressed inline extent. > As we'd never submit more than one sectorsize of data to compression, the > savings would be bigger than one page and thus we'd skip the compression. > > This could be fixed easily though, but I'd like to use it as an example why > the > decision should be moved upwards in the callchain (ie. to > compress_file_range). Thanks for advice Devid, i will update the patch. Also, as i move the check logic to new place, i want send another patch what will fix the difference in behaviour of check logic in lzo/zlib, i.e.: lzo.c: 232if (tot_out > tot_in) 233goto out; zlib.c: 194if (workspace->strm.total_out >= workspace->strm.total_in) { 195ret = -E2BIG; 196goto out; 197} I think that the zlib logic more smart, because if compressed size == uncompressed it's also useless -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/2] Btrfs: compression must free at least one sector size
On Thu, May 25, 2017 at 09:12:20PM +0300, Timofey Titovets wrote: > Btrfs already skip store of data where compression didn't > free at least one byte. Let's make logic better and make check > that compression free at least one sector size > because in another case it useless to store this data compressed Yeah, there's a room for improvement. Saving at least one sectorsize sounds ok to me. I'm not sure if this should be implemented inside the compressors (lzo, zlib). There'res the quick shortcut (the check "if (tot_in > 8192 && tot_in < tot_out)"), but otherwise the overall decision whether to use the compressed data is done in compress_file_range. 601 if (will_compress) { 602 /* 603 * we aren't doing an inline extent round the compressed size 604 * up to a block size boundary so the allocator does sane 605 * things 606 */ 607 total_compressed = ALIGN(total_compressed, blocksize); 608 609 /* 610 * one last check to make sure the compression is really a 611 * win, compare the page count read with the blocks on disk 612 */ 613 total_in = ALIGN(total_in, PAGE_SIZE); 614 if (total_compressed >= total_in) { 615 will_compress = 0; 616 } else { ... so the check would go to line 614. There's one case that your patch misses and it's the compressed inline extent. As we'd never submit more than one sectorsize of data to compression, the savings would be bigger than one page and thus we'd skip the compression. This could be fixed easily though, but I'd like to use it as an example why the decision should be moved upwards in the callchain (ie. to compress_file_range). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/2] Btrfs: compression must free at least one sector size
Btrfs already skip store of data where compression didn't free at least one byte. Let's make logic better and make check that compression free at least one sector size because in another case it useless to store this data compressed Signed-off-by: Timofey Titovets--- fs/btrfs/lzo.c | 9 - fs/btrfs/zlib.c | 7 ++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c index bd0b0938..4aafae6f 100644 --- a/fs/btrfs/lzo.c +++ b/fs/btrfs/lzo.c @@ -26,6 +26,7 @@ #include #include #include "compression.h" +#include "ctree.h" #define LZO_LEN4 @@ -99,6 +100,7 @@ static int lzo_compress_pages(struct list_head *ws, int nr_pages = 0; struct page *in_page = NULL; struct page *out_page = NULL; + u32 sectorsize; unsigned long bytes_left; unsigned long len = *total_out; unsigned long nr_dest_pages = *out_pages; @@ -229,8 +231,13 @@ static int lzo_compress_pages(struct list_head *ws, in_len = min(bytes_left, PAGE_SIZE); } - if (tot_out > tot_in) + /* Compression must save at least one sectorsize */ + sectorsize = btrfs_inode_sectorsize(mapping->host); + + if (tot_out + sectorsize > tot_in) { + ret = -E2BIG; goto out; + } /* store the size of all chunks of compressed data */ cpage_out = kmap(pages[0]); diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c index 135b1082..f9957248 100644 --- a/fs/btrfs/zlib.c +++ b/fs/btrfs/zlib.c @@ -31,6 +31,7 @@ #include #include #include "compression.h" +#include "ctree.h" struct workspace { z_stream strm; @@ -86,6 +87,7 @@ static int zlib_compress_pages(struct list_head *ws, int nr_pages = 0; struct page *in_page = NULL; struct page *out_page = NULL; + u32 sectorsize; unsigned long bytes_left; unsigned long len = *total_out; unsigned long nr_dest_pages = *out_pages; @@ -191,7 +193,10 @@ static int zlib_compress_pages(struct list_head *ws, goto out; } - if (workspace->strm.total_out >= workspace->strm.total_in) { + /* Compression must save at least one sectorsize */ + sectorsize = btrfs_inode_sectorsize(mapping->host); + + if (workspace->strm.total_out + sectorsize > workspace->strm.total_in) { ret = -E2BIG; goto out; } -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/2] Btrfs: compression fixes
First patch: Fix copy paste typo in debug message for lzo.c, lzo is not deflate Second patch: Force btrfs to not store data as compressed, if compression will not free at least one sector size, because it's useless in term of saving storage space and reading data from disk, as a result productivity suffers. Changes since v1: - Merge patches for zlib and lzo in one - Sync check logic for zlib and lzo - Check profit after all data are compressed (not while compressing) Changes since v2: - Fix comparassion logic, it's enough if: compressed size + PAGE_SIZE not bigger then input data size Changes since v3: - Use btrfs sector size directly instead of assume that PAGE_SIZE == sectorsize Timofey Titovets (2): Btrfs: lzo.c pr_debug() deflate->lzo Btrfs: compression must free at least one sector size fs/btrfs/lzo.c | 11 +-- fs/btrfs/zlib.c | 7 ++- 2 files changed, 15 insertions(+), 3 deletions(-) -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] Btrfs: compression must free at least PAGE_SIZE
2017-05-25 15:51 GMT+03:00 Chandan Rajendra: ... > Apologies for the delayed response. > > I am not really sure if compression code must save atleast one sectorsize > worth of space. But if other developers agree to it, then the above > 'if' condition can be replaced with, > > u32 sectorsize = btrfs_inode_sectorsize(mapping->host); > ... > ... > > if (tot_out + sectorsize > tot_in) { > -- > chandan > Thanks a lot! This approach much simplier then i imagined, i will update patch set and resend. Thank you! -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] Btrfs: compression must free at least PAGE_SIZE
On Sunday, May 21, 2017 12:10:39 AM IST Timofey Titovets wrote: > Btrfs already skip store of data where compression didn't free at least one > byte. > So make logic better and make check that compression free at least one > PAGE_SIZE, > because in another case it useless to store this data compressed > > Signed-off-by: Timofey Titovets> --- > fs/btrfs/lzo.c | 5 - > fs/btrfs/zlib.c | 3 ++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c > index bd0b0938..39678499 100644 > --- a/fs/btrfs/lzo.c > +++ b/fs/btrfs/lzo.c > @@ -229,8 +229,11 @@ static int lzo_compress_pages(struct list_head *ws, > in_len = min(bytes_left, PAGE_SIZE); > } > > - if (tot_out > tot_in) > + /* Compression must save at least one PAGE_SIZE */ > + if (tot_out + PAGE_SIZE > tot_in) { > + ret = -E2BIG; > goto out; > + } Apologies for the delayed response. I am not really sure if compression code must save atleast one sectorsize worth of space. But if other developers agree to it, then the above 'if' condition can be replaced with, u32 sectorsize = btrfs_inode_sectorsize(mapping->host); ... ... if (tot_out + sectorsize > tot_in) { > > /* store the size of all chunks of compressed data */ > cpage_out = kmap(pages[0]); > diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c > index 135b1082..11e117b5 100644 > --- a/fs/btrfs/zlib.c > +++ b/fs/btrfs/zlib.c > @@ -191,7 +191,8 @@ static int zlib_compress_pages(struct list_head *ws, > goto out; > } > > - if (workspace->strm.total_out >= workspace->strm.total_in) { > + /* Compression must save at least one PAGE_SIZE */ > + if (workspace->strm.total_out + PAGE_SIZE > workspace->strm.total_in) { > ret = -E2BIG; > goto out; > } > -- > 2.13.0 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
Hi Timofey, [auto build test ERROR on v4.9-rc8] [also build test ERROR on next-20170522] [cannot apply to btrfs/next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-lzo-c-pr_debug-deflate-lzo/20170523-110651 config: x86_64-kexec (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): fs/btrfs/lzo.c: In function 'lzo_compress_pages': >> fs/btrfs/lzo.c:233:27: error: expected expression before '>' token if (tot_out + PAGE_SIZE => tot_in) { ^ vim +233 fs/btrfs/lzo.c 227 in_page = find_get_page(mapping, start >> PAGE_SHIFT); 228 data_in = kmap(in_page); 229 in_len = min(bytes_left, PAGE_SIZE); 230 } 231 232 /* Compression must save at least one PAGE_SIZE */ > 233 if (tot_out + PAGE_SIZE => tot_in) { 234 ret = -E2BIG; 235 goto out; 236 } --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
Timofey Titovets posted on Mon, 22 May 2017 01:32:21 +0300 as excerpted: > 2017-05-21 20:30 GMT+03:00 Roman Mamedov: >> On Sun, 21 May 2017 19:54:05 +0300 Timofey Titovets >> wrote: >> >>> Sorry, but i know about subpagesize-blocksize patch set, but i don't >>> understand where you see conflict? >>> >>> Can you explain what you mean? >>> >>> By PAGE_SIZE i mean fs cluster size in my patch set. >> >> This appears to be exactly the conflict. Subpagesize blocksize patchset >> would make it possible to use e.g. Btrfs with 4K block (cluster) size >> on a MIPS machine with 64K-sized pages. Would your checking for >> PAGE_SIZE still be correct then? > > Nope >> I guess Duncan's question was why not compare against block size from >> the get go, rather than create more places for Chandan to scour through >> to eliminate all "blocksize = pagesize" assumptions... > - If I try to export sector size to compression code, [...] > it's convert small patch in a big patch series, and you know whats > happens with big patch series... Good point... > - AFAIK in V21 subpage patch set compression on machines with 64KiB > doesn't work as expected [1]. > So, subpage patch series for compression code should reworked, > doesn't matter will my patches merged or not. I guess I'd just like to have Chandan's input here too. It sounds like it shouldn't be too much more to deal with given the size and complexity of that set already, but just to know that he's aware of the it and that it's coordinated there so as not to just be making that journey unnecessarily harder than it is already. [Not that I as a non-dev can really say anything, but it can't /hurt/ to already have his ack, when this gets reviewed by those whose decision /does/ count.] -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
2017-05-21 20:30 GMT+03:00 Roman Mamedov: > On Sun, 21 May 2017 19:54:05 +0300 > Timofey Titovets wrote: > >> Sorry, but i know about subpagesize-blocksize patch set, but i don't >> understand where you see conflict? >> >> Can you explain what you mean? >> >> By PAGE_SIZE i mean fs cluster size in my patch set. > > This appears to be exactly the conflict. Subpagesize blocksize patchset would > make it possible to use e.g. Btrfs with 4K block (cluster) size on a MIPS > machine with 64K-sized pages. Would your checking for PAGE_SIZE still be > correct then? Nope, logic will be incorrect, logic does not allow compress blocks, even if this will lead to profit, but btrfs at now and from 2009 can't be mounted with different cluster size. IMHO, so as i work with code from latest stable this not a big issue (see below). >> So if and when subpage patch set would merged, PAGE_SIZE should be >> replaced with sector size, and all continue work correctly. > > I guess Duncan's question was why not compare against block size from the get > go, rather than create more places for Chandan to scour through to eliminate > all "blocksize = pagesize" assumptions... > -- > With respect, > Roman I did not want to hurt anyone, but: - I do it like this because: it's easy and safe for all other code and btrfs stability (Also i doesn't have enough Kernel/C expirience to make complex rework) - If I try to export sector size to compression code, i should rework many code around, not just fix 2 lines of code, it's complex, it's unsafe, it's will break Chandan patches (because subpage block size do same things), it's convert small patch in a big patch series, and you know whats happens with big patch series... - I want make compression better now for all, not after several years, if possible. (subpage patches didn't finished and merged from 2012, i.e. around five years) - It's should not make a problem for merge Chandan patch set, IMHO, because this not touch how btrfs work with disk and memory. - AFAIK in V21 subpage patch set compression on machines with 64KiB doesn't work as expected [1]. So, subpage patch series for compression code should reworked, doesn't matter will my patches merged or not. 1. https://www.spinics.net/lists/linux-btrfs/msg59393.html Thanks. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
On Sun, 21 May 2017 19:54:05 +0300 Timofey Titovetswrote: > Sorry, but i know about subpagesize-blocksize patch set, but i don't > understand where you see conflict? > > Can you explain what you mean? > > By PAGE_SIZE i mean fs cluster size in my patch set. This appears to be exactly the conflict. Subpagesize blocksize patchset would make it possible to use e.g. Btrfs with 4K block (cluster) size on a MIPS machine with 64K-sized pages. Would your checking for PAGE_SIZE still be correct then? > So if and when subpage patch set would merged, PAGE_SIZE should be > replaced with sector size, and all continue work correctly. I guess Duncan's question was why not compare against block size from the get go, rather than create more places for Chandan to scour through to eliminate all "blocksize = pagesize" assumptions... -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
2017-05-21 4:38 GMT+03:00 Duncan <1i5t5.dun...@cox.net>: > Timofey Titovets posted on Sat, 20 May 2017 21:30:47 +0300 as excerpted: > >> 2017-05-20 20:14 GMT+03:00 Kai Krakow: >> >>> BTW: What's the smallest block size that btrfs stores? Is it always >>> PAGE_SIZE? I'm not familiar with btrfs internals... > > Thanks for asking the question. =:^) I hadn't made the below conflicting > patch sets association until I saw it. > >> https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs > >> AFAIK btrfs works with storage and account data by PAGE_SIZEd block, >> so it's must be usefull to check if compressed size will give as at >> least one PAGE_SIZE space. > > [Not a dev, just a btrfs list regular and btrfs user. If the devs say > different...] > > While AFAIK, btrfs now saves data in page-size blocks (tho note that if > the entire file is under a block it may be inlined into metadata > depending on various limits, at least one of which is configurable)... > > There is a sub-page-size patch set that has already been thru several > revision cycles and AFAIK, remains on the roadmap for eventual merge. > > You'd need to check the list or ask the devs about current status, but > it's quite possible the current code checking for at least one byte of > compression savings, instead of the at least PAGE_SIZE bytes of savings > that at present would make more sense, is in anticipation of that patch > set being merged. > > If the sub-page-size block patch-set is no longer anticipated to be > merged in the relatively near future, it may be worth changing the > compression profit checks to PAGE_SIZE minimum, as this patch set does, > but the two patch sets clearly do conflict in concept, so merging this > one is unlikely to be wise if the other one is still on track for merging. Sorry, but i know about subpagesize-blocksize patch set, but i don't understand where you see conflict? Can you explain what you mean? By PAGE_SIZE i mean fs cluster size in my patch set. In the teory it's also possible to use magic constant like 4096, but then this change will be useless for PowerPC64. Also (AFAIK) most storage engines use 4KiB+ sector size So (IMHO) it's bad practice use block size smaller then 4KiB and try save less 4KiB). As this is only decision logic change, this can't and will not break anything in subpage patch set, and can only lead to false positives, when good compressed data will not saved as compressed. So if and when subpage patch set would merged, PAGE_SIZE should be replaced with sector size, and all continue work correctly. Thanks. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
Timofey Titovets posted on Sat, 20 May 2017 21:30:47 +0300 as excerpted: > 2017-05-20 20:14 GMT+03:00 Kai Krakow: > >> BTW: What's the smallest block size that btrfs stores? Is it always >> PAGE_SIZE? I'm not familiar with btrfs internals... Thanks for asking the question. =:^) I hadn't made the below conflicting patch sets association until I saw it. > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs > AFAIK btrfs works with storage and account data by PAGE_SIZEd block, > so it's must be usefull to check if compressed size will give as at > least one PAGE_SIZE space. [Not a dev, just a btrfs list regular and btrfs user. If the devs say different...] While AFAIK, btrfs now saves data in page-size blocks (tho note that if the entire file is under a block it may be inlined into metadata depending on various limits, at least one of which is configurable)... There is a sub-page-size patch set that has already been thru several revision cycles and AFAIK, remains on the roadmap for eventual merge. You'd need to check the list or ask the devs about current status, but it's quite possible the current code checking for at least one byte of compression savings, instead of the at least PAGE_SIZE bytes of savings that at present would make more sense, is in anticipation of that patch set being merged. If the sub-page-size block patch-set is no longer anticipated to be merged in the relatively near future, it may be worth changing the compression profit checks to PAGE_SIZE minimum, as this patch set does, but the two patch sets clearly do conflict in concept, so merging this one is unlikely to be wise if the other one is still on track for merging. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] Btrfs: compression must free at least PAGE_SIZE
Btrfs already skip store of data where compression didn't free at least one byte. So make logic better and make check that compression free at least one PAGE_SIZE, because in another case it useless to store this data compressed Signed-off-by: Timofey Titovets--- fs/btrfs/lzo.c | 5 - fs/btrfs/zlib.c | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c index bd0b0938..39678499 100644 --- a/fs/btrfs/lzo.c +++ b/fs/btrfs/lzo.c @@ -229,8 +229,11 @@ static int lzo_compress_pages(struct list_head *ws, in_len = min(bytes_left, PAGE_SIZE); } - if (tot_out > tot_in) + /* Compression must save at least one PAGE_SIZE */ + if (tot_out + PAGE_SIZE > tot_in) { + ret = -E2BIG; goto out; + } /* store the size of all chunks of compressed data */ cpage_out = kmap(pages[0]); diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c index 135b1082..11e117b5 100644 --- a/fs/btrfs/zlib.c +++ b/fs/btrfs/zlib.c @@ -191,7 +191,8 @@ static int zlib_compress_pages(struct list_head *ws, goto out; } - if (workspace->strm.total_out >= workspace->strm.total_in) { + /* Compression must save at least one PAGE_SIZE */ + if (workspace->strm.total_out + PAGE_SIZE > workspace->strm.total_in) { ret = -E2BIG; goto out; } -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] Btrfs: compression fixes
First patch: fix copy paste typo in debug message in lzo.c, lzo is not deflate Second patch: force btrfs to not store data as compressed, if compression will not free at least one PAGE_SIZE, because it's useless in term of storage and reading data from disk, as a result productivity suffers Changes since v1: - Merge patches for zlib and lzo in one - Sync check logic for zlib and lzo - Check profit after all data are compressed (not while compressing) Changes since v2: - Fix comparassion logic, it's enough if: compressed size + PAGE_SIZE not bigger then input data size Timofey Titovets (2): Btrfs: lzo.c pr_debug() deflate->lzo Btrfs: compression must free at least PAGE_SIZE fs/btrfs/lzo.c | 7 +-- fs/btrfs/zlib.c | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
2017-05-20 20:14 GMT+03:00 Kai Krakow: > Am Sat, 20 May 2017 19:49:53 +0300 > schrieb Timofey Titovets : > >> Btrfs already skip store of data where compression didn't free at >> least one byte. So make logic better and make check that compression >> free at least one PAGE_SIZE, because in another case it useless to >> store this data compressed >> >> Signed-off-by: Timofey Titovets >> --- >> fs/btrfs/lzo.c | 5 - >> fs/btrfs/zlib.c | 3 ++- >> 2 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c >> index bd0b0938..7f38bc3c 100644 >> --- a/fs/btrfs/lzo.c >> +++ b/fs/btrfs/lzo.c >> @@ -229,8 +229,11 @@ static int lzo_compress_pages(struct list_head >> *ws, in_len = min(bytes_left, PAGE_SIZE); >> } >> >> - if (tot_out > tot_in) >> + /* Compression must save at least one PAGE_SIZE */ >> + if (tot_out + PAGE_SIZE => tot_in) { > > Shouldn't this be ">" instead of ">=" (btw, I don't think => works)... (D'oh, typo, thanks) > Given the case that tot_in is 8192, and tot_out is 4096, we saved a > complete page but 4096+4096 would still be equal to 8192. > > The former logic only pretended that there is no point in compression > if we saved 0 bytes. Before my changes, lzo: if compressed data use same or less space in compare to not compressed -> save compressed version zlib: if profit of compression will not save at least one byte -> not compress that data zlib logic is right, so i just did copy-paste that, but after my changes this case where compressed size == input size doesn't valid (i.e. it's ok then tot_out + PAGE_SIZE == tot_in), so you right ">=" -> ">" i will fix that and resend patch set, thanks. > BTW: What's the smallest block size that btrfs stores? Is it always > PAGE_SIZE? I'm not familiar with btrfs internals... https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs AFAIK btrfs works with storage and account data by PAGE_SIZEd block, so it's must be usefull to check if compressed size will give as at least one PAGE_SIZE space. > >> + ret = -E2BIG; >> goto out; >> + } >> >> /* store the size of all chunks of compressed data */ >> cpage_out = kmap(pages[0]); >> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c >> index 135b1082..2b04259b 100644 >> --- a/fs/btrfs/zlib.c >> +++ b/fs/btrfs/zlib.c >> @@ -191,7 +191,8 @@ static int zlib_compress_pages(struct list_head >> *ws, goto out; >> } >> >> - if (workspace->strm.total_out >= workspace->strm.total_in) { >> + /* Compression must save at least one PAGE_SIZE */ >> + if (workspace->strm.total_out + PAGE_SIZE >= >> workspace->strm.total_in) { ret = -E2BIG; > > Same as above... > >> goto out; >> } >> -- >> 2.13.0 > > > -- > Regards, > Kai -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
Am Sat, 20 May 2017 19:49:53 +0300 schrieb Timofey Titovets: > Btrfs already skip store of data where compression didn't free at > least one byte. So make logic better and make check that compression > free at least one PAGE_SIZE, because in another case it useless to > store this data compressed > > Signed-off-by: Timofey Titovets > --- > fs/btrfs/lzo.c | 5 - > fs/btrfs/zlib.c | 3 ++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c > index bd0b0938..7f38bc3c 100644 > --- a/fs/btrfs/lzo.c > +++ b/fs/btrfs/lzo.c > @@ -229,8 +229,11 @@ static int lzo_compress_pages(struct list_head > *ws, in_len = min(bytes_left, PAGE_SIZE); > } > > - if (tot_out > tot_in) > + /* Compression must save at least one PAGE_SIZE */ > + if (tot_out + PAGE_SIZE => tot_in) { Shouldn't this be ">" instead of ">=" (btw, I don't think => works)... Given the case that tot_in is 8192, and tot_out is 4096, we saved a complete page but 4096+4096 would still be equal to 8192. The former logic only pretended that there is no point in compression if we saved 0 bytes. BTW: What's the smallest block size that btrfs stores? Is it always PAGE_SIZE? I'm not familiar with btrfs internals... > + ret = -E2BIG; > goto out; > + } > > /* store the size of all chunks of compressed data */ > cpage_out = kmap(pages[0]); > diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c > index 135b1082..2b04259b 100644 > --- a/fs/btrfs/zlib.c > +++ b/fs/btrfs/zlib.c > @@ -191,7 +191,8 @@ static int zlib_compress_pages(struct list_head > *ws, goto out; > } > > - if (workspace->strm.total_out >= workspace->strm.total_in) { > + /* Compression must save at least one PAGE_SIZE */ > + if (workspace->strm.total_out + PAGE_SIZE >= > workspace->strm.total_in) { ret = -E2BIG; Same as above... > goto out; > } > -- > 2.13.0 -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] Btrfs: compression fixes
First patch: fix copy paste typo in debug message in lzo.c, lzo is not deflate Second patch: force btrfs to not store data as compressed, if compression will not free at least one PAGE_SIZE, because it's useless in term of storage and reading data from disk, as a result productivity suffers Changes since v1: - Merge patches for zlib and lzo in one - Sync check logic for zlib and lzo - Check profit after all data are compressed (not while compressing) Timofey Titovets (2): Btrfs: lzo.c pr_debug() deflate->lzo Btrfs: compression must free at least PAGE_SIZE fs/btrfs/lzo.c | 7 +-- fs/btrfs/zlib.c | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] Btrfs: compression must free at least PAGE_SIZE
Btrfs already skip store of data where compression didn't free at least one byte. So make logic better and make check that compression free at least one PAGE_SIZE, because in another case it useless to store this data compressed Signed-off-by: Timofey Titovets--- fs/btrfs/lzo.c | 5 - fs/btrfs/zlib.c | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c index bd0b0938..7f38bc3c 100644 --- a/fs/btrfs/lzo.c +++ b/fs/btrfs/lzo.c @@ -229,8 +229,11 @@ static int lzo_compress_pages(struct list_head *ws, in_len = min(bytes_left, PAGE_SIZE); } - if (tot_out > tot_in) + /* Compression must save at least one PAGE_SIZE */ + if (tot_out + PAGE_SIZE => tot_in) { + ret = -E2BIG; goto out; + } /* store the size of all chunks of compressed data */ cpage_out = kmap(pages[0]); diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c index 135b1082..2b04259b 100644 --- a/fs/btrfs/zlib.c +++ b/fs/btrfs/zlib.c @@ -191,7 +191,8 @@ static int zlib_compress_pages(struct list_head *ws, goto out; } - if (workspace->strm.total_out >= workspace->strm.total_in) { + /* Compression must save at least one PAGE_SIZE */ + if (workspace->strm.total_out + PAGE_SIZE >= workspace->strm.total_in) { ret = -E2BIG; goto out; } -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Btrfs: compression fixes
First patch fix copy paste typo in debug message in lzo.c, lzo is not deflate Second and third patches force btrfs not compress data if compression will not free at least one PAGE_SIZE Because it's useless in term of storage and read data from disk, as a result productivity suffers Timofey Titovets (3): Btrfs: lzo.c pr_debug() deflate->lzo Btrfs: lzo compression must free at least PAGE_SIZE Btrfs: zlib compression must free at least PAGE_SIZE fs/btrfs/lzo.c | 4 ++-- fs/btrfs/zlib.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/2] generic: make 17[1-4] work well when btrfs compression is enabled
When enabling btrfs compression, original codes can not fill fs correctly, here we introduce _fill_fs() in common/rc, which'll keep creating and writing files until enospc error occurs. Note _fill_fs is copied from tests/generic/256, but with some minor modifications. Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> --- V2: In common/, I did't find an existing function suitable for these 4 test cases to fill fs, so I still use _pwrite_byte() with a big enough file length fo fill fs. Note, for btrfs, metadata space still is not full, only data space is full, but it's OK for these 4 test cases. All these 4 cases pass in xfs and btrfs(without compression), if btrfs has compression enabled, these 4 cases will fail for false enospc error, I have sent kernel patches to fix this bug. V3: Introduce _fill_fs in common/rc to fill fs. V4: Fix some issues suggested by Eryu and Darrick. V5: Put _fill_fs() in common/populate. --- common/populate | 67 +++ tests/generic/171 | 4 ++-- tests/generic/172 | 5 +++-- tests/generic/173 | 4 ++-- tests/generic/174 | 4 ++-- tests/generic/256 | 66 +- 6 files changed, 82 insertions(+), 68 deletions(-) diff --git a/common/populate b/common/populate index 3b9b531..78e9809 100644 --- a/common/populate +++ b/common/populate @@ -539,3 +539,70 @@ _scratch_fuzz_test() { (find "${SCRATCH_MNT}/test.1/" -type f -size -1048576k -print0 | xargs -0 cat) >/dev/null 2>&1 } +# Fill a file system by repeatedly creating files in the given folder +# starting with the given file size. Files are reduced in size when +# they can no longer fit until no more files can be created. +_fill_fs() +{ + local file_size=$1 + local dir=$2 + local block_size=$3 + local switch_user=$4 + local file_count=1 + local bytes_written=0 + local use_falloc=1; + + if [ $# -ne 4 ]; then + echo "Usage: _fill_fs filesize dir blocksize switch_user" + exit 1 + fi + + if [ $switch_user -eq 0 ]; then + mkdir -p $dir + else + _user_do "mkdir -p $dir" + fi + if [ ! -d $dir ]; then + return 0; + fi + + testio=`$XFS_IO_PROG -F -fc "falloc 0 $block_size" $dir/$$.xfs_io 2>&1` + echo $testio | grep -q "not found" && use_falloc=0 + echo $testio | grep -q "Operation not supported" && use_falloc=0 + + if [ $file_size -lt $block_size ]; then + $file_size = $block_size + fi + + while [ $file_size -ge $block_size ]; do + bytes_written=0 + if [ $switch_user -eq 0 ]; then + if [ $use_falloc -eq 0 ]; then + $XFS_IO_PROG -fc "pwrite -b 8388608 0 $file_size" \ + $dir/$file_count + else + $XFS_IO_PROG -fc "falloc 0 $file_size" \ + $dir/$file_count + fi + else + if [ $use_falloc -eq 0 ]; then + _user_do "$XFS_IO_PROG -f -c \"pwrite -b 8388608 0 \ + $file_size\" $dir/$file_count" + else + _user_do "$XFS_IO_PROG -f -c \"falloc 0 \ + $file_size\" $dir/$file_count" + fi + fi + + if [ -f $dir/$file_count ]; then + bytes_written=$(stat -c '%s' $dir/$file_count) + fi + + # If there was no room to make the file, then divide it in + # half, and keep going + if [ $bytes_written -lt $file_size ]; then + file_size=$((file_size / 2)) + fi + file_count=$((file_count + 1)) + done +} diff --git a/tests/generic/171 b/tests/generic/171 index a69f798..d0cd192 100755 --- a/tests/generic/171 +++ b/tests/generic/171 @@ -38,6 +38,7 @@ _cleanup() # get standard environment, filters and checks . ./common/rc +. ./common/populate . ./common/filter . ./common/attr . ./common/reflink @@ -76,8 +77,7 @@ sync echo "Allocate the rest of the space" nr_free=$(stat -f -c '%f' $testdir) -touch $testdir/file0 $testdir/file1 -_pwrite_byte 0x61 0 $((blksz * nr_free)) $testdir/eat_my_space >> $seqres.full 2>&1 +_fill_fs $((nr_free * blksz)) $testdir/space $blksz 0 >> $seqres.full 2>&1 sync echo "CoW the big file" diff --git a/tests/generic/172 b/tests/generic/172 index 8192290..d943e64 100755 --- a/tests/g
Re: [PATCH v4] generic: make 17[1-4] work well when btrfs compression is enabled
On Tue, Nov 01, 2016 at 04:49:34PM +0800, Wang Xiaoguang wrote: > hi Darrick, > > Common/populate needs xfs_io supports falloc and fpunch, > so I didn't put _fill_fs() in common/populate. Tests will include common/rc first, and so pick up the functionality _fill_fs requires before it's included from common/populate. So there is no reason to put it in common/rc. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] generic: make 17[1-4] work well when btrfs compression is enabled
hi Darrick, Common/populate needs xfs_io supports falloc and fpunch, so I didn't put _fill_fs() in common/populate. Regards, Xiaoguang Wang On 11/01/2016 04:45 PM, Wang Xiaoguang wrote: When enabling btrfs compression, original codes can not fill fs correctly, here we introduce _fill_fs() in common/rc, which'll keep creating and writing files until enospc error occurs. Note _fill_fs is copied from tests/generic/256, but with some minor modifications. Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> --- V2: In common/, I did't find an existing function suitable for these 4 test cases to fill fs, so I still use _pwrite_byte() with a big enough file length fo fill fs. Note, for btrfs, metadata space still is not full, only data space is full, but it's OK for these 4 test cases. All these 4 cases pass in xfs and btrfs(without compression), if btrfs has compression enabled, these 4 cases will fail for false enospc error, I have sent kernel patches to fix this bug. V3: Introduce _fill_fs in common/rc to fill fs. V4: Fix some issues suggested by Eryu and Darrick. --- common/rc | 68 +++ tests/generic/171 | 3 +-- tests/generic/172 | 4 ++-- tests/generic/173 | 3 +-- tests/generic/174 | 3 +-- tests/generic/256 | 65 6 files changed, 78 insertions(+), 68 deletions(-) diff --git a/common/rc b/common/rc index 7a9fc90..7628a0e 100644 --- a/common/rc +++ b/common/rc @@ -4003,6 +4003,74 @@ _require_xfs_mkfs_without_validation() fi } +# Fill a file system by repeatedly creating files in the given folder +# starting with the given file size. Files are reduced in size when +# they can no longer fit until no more files can be created. +_fill_fs() +{ + local file_size=$1 + local dir=$2 + local block_size=$3 + local switch_user=$4 + local file_count=1 + local bytes_written=0 + local use_falloc=1; + + if [ $# -ne 4 ]; then + echo "Usage: _fill_fs filesize dir blocksize switch_user" + exit 1 + fi + + if [ $switch_user -eq 0 ]; then + mkdir -p $dir + else + _user_do "mkdir -p $dir" + fi + if [ ! -d $dir ]; then + return 0; + fi + + testio=`$XFS_IO_PROG -F -fc "falloc 0 $block_size" $dir/$$.xfs_io 2>&1` + echo $testio | grep -q "not found" && use_falloc=0 + echo $testio | grep -q "Operation not supported" && use_falloc=0 + + if [ $file_size -lt $block_size ]; then + $file_size = $block_size + fi + + while [ $file_size -ge $block_size ]; do + bytes_written=0 + if [ $switch_user -eq 0 ]; then + if [ $use_falloc -eq 0 ]; then + $XFS_IO_PROG -fc "pwrite -b 8388608 0 $file_size" \ + $dir/$file_count + else + $XFS_IO_PROG -fc "falloc 0 $file_size" \ + $dir/$file_count + fi + else + if [ $use_falloc -eq 0 ]; then + _user_do "$XFS_IO_PROG -f -c \"pwrite -b 8388608 0 \ + $file_size\" $dir/$file_count" + else + _user_do "$XFS_IO_PROG -f -c \"falloc 0 \ + $file_size\" $dir/$file_count" + fi + fi + + if [ -f $dir/$file_count ]; then + bytes_written=$(stat -c '%s' $dir/$file_count) + fi + + # If there was no room to make the file, then divide it in + # half, and keep going + if [ $bytes_written -lt $file_size ]; then + file_size=$((file_size / 2)) + fi + file_count=$((file_count + 1)) + done +} + init_rc diff --git a/tests/generic/171 b/tests/generic/171 index a69f798..906e4db 100755 --- a/tests/generic/171 +++ b/tests/generic/171 @@ -76,8 +76,7 @@ sync echo "Allocate the rest of the space" nr_free=$(stat -f -c '%f' $testdir) -touch $testdir/file0 $testdir/file1 -_pwrite_byte 0x61 0 $((blksz * nr_free)) $testdir/eat_my_space >> $seqres.full 2>&1 +_fill_fs $((nr_free * blksz)) $testdir/space $blksz 0 >> $seqres.full 2>&1 sync echo "CoW the big file" diff --git a/tests/generic/172 b/tests/generic/172 index 8192290..f6fcdc9 100755 --- a/tests/generic/172 +++ b/
[PATCH v4] generic: make 17[1-4] work well when btrfs compression is enabled
When enabling btrfs compression, original codes can not fill fs correctly, here we introduce _fill_fs() in common/rc, which'll keep creating and writing files until enospc error occurs. Note _fill_fs is copied from tests/generic/256, but with some minor modifications. Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> --- V2: In common/, I did't find an existing function suitable for these 4 test cases to fill fs, so I still use _pwrite_byte() with a big enough file length fo fill fs. Note, for btrfs, metadata space still is not full, only data space is full, but it's OK for these 4 test cases. All these 4 cases pass in xfs and btrfs(without compression), if btrfs has compression enabled, these 4 cases will fail for false enospc error, I have sent kernel patches to fix this bug. V3: Introduce _fill_fs in common/rc to fill fs. V4: Fix some issues suggested by Eryu and Darrick. --- common/rc | 68 +++ tests/generic/171 | 3 +-- tests/generic/172 | 4 ++-- tests/generic/173 | 3 +-- tests/generic/174 | 3 +-- tests/generic/256 | 65 6 files changed, 78 insertions(+), 68 deletions(-) diff --git a/common/rc b/common/rc index 7a9fc90..7628a0e 100644 --- a/common/rc +++ b/common/rc @@ -4003,6 +4003,74 @@ _require_xfs_mkfs_without_validation() fi } +# Fill a file system by repeatedly creating files in the given folder +# starting with the given file size. Files are reduced in size when +# they can no longer fit until no more files can be created. +_fill_fs() +{ + local file_size=$1 + local dir=$2 + local block_size=$3 + local switch_user=$4 + local file_count=1 + local bytes_written=0 + local use_falloc=1; + + if [ $# -ne 4 ]; then + echo "Usage: _fill_fs filesize dir blocksize switch_user" + exit 1 + fi + + if [ $switch_user -eq 0 ]; then + mkdir -p $dir + else + _user_do "mkdir -p $dir" + fi + if [ ! -d $dir ]; then + return 0; + fi + + testio=`$XFS_IO_PROG -F -fc "falloc 0 $block_size" $dir/$$.xfs_io 2>&1` + echo $testio | grep -q "not found" && use_falloc=0 + echo $testio | grep -q "Operation not supported" && use_falloc=0 + + if [ $file_size -lt $block_size ]; then + $file_size = $block_size + fi + + while [ $file_size -ge $block_size ]; do + bytes_written=0 + if [ $switch_user -eq 0 ]; then + if [ $use_falloc -eq 0 ]; then + $XFS_IO_PROG -fc "pwrite -b 8388608 0 $file_size" \ + $dir/$file_count + else + $XFS_IO_PROG -fc "falloc 0 $file_size" \ + $dir/$file_count + fi + else + if [ $use_falloc -eq 0 ]; then + _user_do "$XFS_IO_PROG -f -c \"pwrite -b 8388608 0 \ + $file_size\" $dir/$file_count" + else + _user_do "$XFS_IO_PROG -f -c \"falloc 0 \ + $file_size\" $dir/$file_count" + fi + fi + + if [ -f $dir/$file_count ]; then + bytes_written=$(stat -c '%s' $dir/$file_count) + fi + + # If there was no room to make the file, then divide it in + # half, and keep going + if [ $bytes_written -lt $file_size ]; then + file_size=$((file_size / 2)) + fi + file_count=$((file_count + 1)) + done +} + init_rc diff --git a/tests/generic/171 b/tests/generic/171 index a69f798..906e4db 100755 --- a/tests/generic/171 +++ b/tests/generic/171 @@ -76,8 +76,7 @@ sync echo "Allocate the rest of the space" nr_free=$(stat -f -c '%f' $testdir) -touch $testdir/file0 $testdir/file1 -_pwrite_byte 0x61 0 $((blksz * nr_free)) $testdir/eat_my_space >> $seqres.full 2>&1 +_fill_fs $((nr_free * blksz)) $testdir/space $blksz 0 >> $seqres.full 2>&1 sync echo "CoW the big file" diff --git a/tests/generic/172 b/tests/generic/172 index 8192290..f6fcdc9 100755 --- a/tests/generic/172 +++ b/tests/generic/172 @@ -57,6 +57,7 @@ testdir=$SCRATCH_MNT/test-$seq mkdir $testdir echo "Reformat with appropriate size" +blksz="$(get_block_size $testdir)" umount $SCRATCH_MNT file_size=$((168
Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled
On Fri, Oct 28, 2016 at 03:05:55PM +0800, Wang Xiaoguang wrote: > hi, > > On 10/27/2016 07:25 PM, Eryu Guan wrote: > > On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote: > > > When enabling btrfs compression, original codes can not fill fs > > > correctly, here we introduce _fill_fs() in common/rc, which'll keep > > > creating and writing files until enospc error occurs. Note _fill_fs > > > is copied from tests/generic/256, but with some minor modifications. > > > > > > Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> > > Looks fine to me overall, generic/17[1-4] and generic/256 passed on xfs, > > btrfs and btrfs with compress. But I'd like Darrick to review it as well :) > Could you please give me your btrfs's kernel version? > When enabling btrfs compression run these 4 test cases, I often got > enospc error, it seems that you didn't run into enospc errors. I'm using 4.9-rc1 kernel, with both test_dev and scratch_dev 15G in size, btrfs-progs v4.6 Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled
On Fri, Oct 28, 2016 at 03:00:29PM +0800, Wang Xiaoguang wrote: > hi, > > On 10/28/2016 01:13 AM, Darrick J. Wong wrote: > > On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote: > > > When enabling btrfs compression, original codes can not fill fs > > > correctly, here we introduce _fill_fs() in common/rc, which'll keep > > > creating and writing files until enospc error occurs. Note _fill_fs > > > is copied from tests/generic/256, but with some minor modifications. > > > > > > Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> > > > --- > > > V2: In common/, I did't find an existing function suitable for > > > these 4 test cases to fill fs, so I still use _pwrite_byte() with > > > a big enough file length fo fill fs. Note, for btrfs, metadata space > > > still is not full, only data space is full, but it's OK for these > > > 4 test cases. > > > > > > All these 4 cases pass in xfs and btrfs(without compression), if > > > btrfs has compression enabled, these 4 cases will fail for false > > > enospc error, I have sent kernel patches to fix this bug. > > > > > > V3: Introduce _fill_fs in common/rc to fill fs. > > > --- > > > common/rc | 50 ++ > > > tests/generic/171 | 4 +--- > > > tests/generic/172 | 4 ++-- > > > tests/generic/173 | 4 +--- > > > tests/generic/174 | 4 +--- > > > tests/generic/256 | 65 > > > +-- > > > 6 files changed, 60 insertions(+), 71 deletions(-) > > > > > > diff --git a/common/rc b/common/rc > > > index 7a9fc90..0e1ac5d 100644 > > > --- a/common/rc > > > +++ b/common/rc > > Would you mind moving this to common/populate, where I've been (slowly) > > collecting all the "create stuff inside the filesystem" routines? > OK, sure. > > > > (It's part of a slow-moving effort to declutter common/rc.) > > > > > @@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation() > > > fi > > > } > > > +# Fill a file system by repeatedly creating files in the given folder > > > +# starting with the given file size. Files are reduced in size when > > > +# they can no longer fit until no more files can be created. > > > +_fill_fs() > > > +{ > > > + local file_size=$1 > > > + local dir=$2 > > > + local block_size=$3 > > > + local switch_user=$4 > > > + local file_count=1 > > > + local bytes_written=0 > > > + > > > + if [ $# -ne 4 ]; then > > > + echo "Usage: _fill_fs filesize dir blocksize" > > > + exit 1 > > > + fi > > > + > > > + if [ $switch_user -eq 0 ]; then > > > + mkdir -p $dir > > > + else > > > + _user_do "mkdir -p $dir" > > > + fi > > > + > > > + if [ $file_size -lt $block_size ]; then > > > + $file_size = $block_size > > > + fi > > > + > > > + while [ $file_size -ge $block_size ]; do > > > + bytes_written=0 > > > + if [ $switch_user -eq 0 ]; then > > > + $XFS_IO_PROG -f -c "pwrite 0 $file_size" \ > > > + $dir/$file_count > /dev/null > > If you want to speed this up some more you could pass "-b 8388608" in > > the pwrite string so that xfs_io will allocate an 8MB memory buffer > > internally when writing out data. > > > > $XFS_IO_PROG -f -c "pwrite -b 8388608 0 $file_size" $dir/$file_count > Thanks for this info, I'll use it. > > > > > + else > > > + _user_do "$XFS_IO_PROG -f -c \"pwrite 0 $file_size\" \ > > > + $dir/$file_count > /dev/null" > > Also, why redirect to /dev/null? I think helper functions generally let > > the individual test decide where the stdout & stderr output ultimately > > end up. Most tests seem to dump to $seqres.full or /dev/null, but we > > should let the test decide where potentially useful diagnostic > > information ends up. > Agree :) > > > /me also wonders if falloc would be a faster way to consume disk blocks > > than pwrite, but ... > Oh, it is. Falloc is not affected by compression. One problem with falloc is that not all filesystems support it, e.g. ext3 and ext2. So you have to fall back to pwrite if falloc is not supported. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled
hi, On 10/27/2016 07:25 PM, Eryu Guan wrote: On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote: When enabling btrfs compression, original codes can not fill fs correctly, here we introduce _fill_fs() in common/rc, which'll keep creating and writing files until enospc error occurs. Note _fill_fs is copied from tests/generic/256, but with some minor modifications. Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> Looks fine to me overall, generic/17[1-4] and generic/256 passed on xfs, btrfs and btrfs with compress. But I'd like Darrick to review it as well :) Could you please give me your btrfs's kernel version? When enabling btrfs compression run these 4 test cases, I often got enospc error, it seems that you didn't run into enospc errors. --- V2: In common/, I did't find an existing function suitable for these 4 test cases to fill fs, so I still use _pwrite_byte() with a big enough file length fo fill fs. Note, for btrfs, metadata space still is not full, only data space is full, but it's OK for these 4 test cases. All these 4 cases pass in xfs and btrfs(without compression), if btrfs has compression enabled, these 4 cases will fail for false enospc error, I have sent kernel patches to fix this bug. V3: Introduce _fill_fs in common/rc to fill fs. --- common/rc | 50 ++ tests/generic/171 | 4 +--- tests/generic/172 | 4 ++-- tests/generic/173 | 4 +--- tests/generic/174 | 4 +--- tests/generic/256 | 65 +-- 6 files changed, 60 insertions(+), 71 deletions(-) diff --git a/common/rc b/common/rc index 7a9fc90..0e1ac5d 100644 --- a/common/rc +++ b/common/rc @@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation() fi } +# Fill a file system by repeatedly creating files in the given folder +# starting with the given file size. Files are reduced in size when +# they can no longer fit until no more files can be created. +_fill_fs() +{ + local file_size=$1 + local dir=$2 + local block_size=$3 + local switch_user=$4 + local file_count=1 + local bytes_written=0 + + if [ $# -ne 4 ]; then + echo "Usage: _fill_fs filesize dir blocksize" The usage info here is wrong, missing the "switch user" argument. Thanks for review, I'll fix it in v4. Regards, Xiaoguang Wang Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled
hi, On 10/28/2016 01:13 AM, Darrick J. Wong wrote: On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote: When enabling btrfs compression, original codes can not fill fs correctly, here we introduce _fill_fs() in common/rc, which'll keep creating and writing files until enospc error occurs. Note _fill_fs is copied from tests/generic/256, but with some minor modifications. Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> --- V2: In common/, I did't find an existing function suitable for these 4 test cases to fill fs, so I still use _pwrite_byte() with a big enough file length fo fill fs. Note, for btrfs, metadata space still is not full, only data space is full, but it's OK for these 4 test cases. All these 4 cases pass in xfs and btrfs(without compression), if btrfs has compression enabled, these 4 cases will fail for false enospc error, I have sent kernel patches to fix this bug. V3: Introduce _fill_fs in common/rc to fill fs. --- common/rc | 50 ++ tests/generic/171 | 4 +--- tests/generic/172 | 4 ++-- tests/generic/173 | 4 +--- tests/generic/174 | 4 +--- tests/generic/256 | 65 +-- 6 files changed, 60 insertions(+), 71 deletions(-) diff --git a/common/rc b/common/rc index 7a9fc90..0e1ac5d 100644 --- a/common/rc +++ b/common/rc Would you mind moving this to common/populate, where I've been (slowly) collecting all the "create stuff inside the filesystem" routines? OK, sure. (It's part of a slow-moving effort to declutter common/rc.) @@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation() fi } +# Fill a file system by repeatedly creating files in the given folder +# starting with the given file size. Files are reduced in size when +# they can no longer fit until no more files can be created. +_fill_fs() +{ + local file_size=$1 + local dir=$2 + local block_size=$3 + local switch_user=$4 + local file_count=1 + local bytes_written=0 + + if [ $# -ne 4 ]; then + echo "Usage: _fill_fs filesize dir blocksize" + exit 1 + fi + + if [ $switch_user -eq 0 ]; then + mkdir -p $dir + else + _user_do "mkdir -p $dir" + fi + + if [ $file_size -lt $block_size ]; then + $file_size = $block_size + fi + + while [ $file_size -ge $block_size ]; do + bytes_written=0 + if [ $switch_user -eq 0 ]; then + $XFS_IO_PROG -f -c "pwrite 0 $file_size" \ + $dir/$file_count > /dev/null If you want to speed this up some more you could pass "-b 8388608" in the pwrite string so that xfs_io will allocate an 8MB memory buffer internally when writing out data. $XFS_IO_PROG -f -c "pwrite -b 8388608 0 $file_size" $dir/$file_count Thanks for this info, I'll use it. + else + _user_do "$XFS_IO_PROG -f -c \"pwrite 0 $file_size\" \ + $dir/$file_count > /dev/null" Also, why redirect to /dev/null? I think helper functions generally let the individual test decide where the stdout & stderr output ultimately end up. Most tests seem to dump to $seqres.full or /dev/null, but we should let the test decide where potentially useful diagnostic information ends up. Agree :) /me also wonders if falloc would be a faster way to consume disk blocks than pwrite, but ... Oh, it is. Falloc is not affected by compression. + fi + + if [ -f $dir/$file_count ]; then + bytes_written=$(stat -c '%s' $dir/$file_count) + fi + + # If there was no room to make the file, then divide it in + # half, and keep going + if [ $bytes_written -lt $file_size ]; then + file_size=$((file_size / 2)) + fi + file_count=$((file_count + 1)) + done +} + init_rc diff --git a/tests/generic/171 b/tests/generic/171 index a69f798..fde8ef2 100755 --- a/tests/generic/171 +++ b/tests/generic/171 @@ -75,9 +75,7 @@ _cp_reflink $testdir/bigfile $testdir/clonefile sync echo "Allocate the rest of the space" -nr_free=$(stat -f -c '%f' $testdir) -touch $testdir/file0 $testdir/file1 -_pwrite_byte 0x61 0 $((blksz * nr_free)) $testdir/eat_my_space >> $seqres.full 2>&1 +_fill_fs $((32 * 1024 * 1024)) $testdir/space $blksz 0 >> $seqres.full 2>&1 Given that _fill_fs has exponential backoff to try to really fill the fs, why not call it with $((blksz * nr_free)) instead of a fixed 32MB? OK, I'll send v4 soon. Thanks for all
Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled
On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote: > When enabling btrfs compression, original codes can not fill fs > correctly, here we introduce _fill_fs() in common/rc, which'll keep > creating and writing files until enospc error occurs. Note _fill_fs > is copied from tests/generic/256, but with some minor modifications. > > Signed-off-by: Wang Xiaoguang <wangxg.f...@cn.fujitsu.com> > --- > V2: In common/, I did't find an existing function suitable for > these 4 test cases to fill fs, so I still use _pwrite_byte() with > a big enough file length fo fill fs. Note, for btrfs, metadata space > still is not full, only data space is full, but it's OK for these > 4 test cases. > > All these 4 cases pass in xfs and btrfs(without compression), if > btrfs has compression enabled, these 4 cases will fail for false > enospc error, I have sent kernel patches to fix this bug. > > V3: Introduce _fill_fs in common/rc to fill fs. > --- > common/rc | 50 ++ > tests/generic/171 | 4 +--- > tests/generic/172 | 4 ++-- > tests/generic/173 | 4 +--- > tests/generic/174 | 4 +--- > tests/generic/256 | 65 > +-- > 6 files changed, 60 insertions(+), 71 deletions(-) > > diff --git a/common/rc b/common/rc > index 7a9fc90..0e1ac5d 100644 > --- a/common/rc > +++ b/common/rc Would you mind moving this to common/populate, where I've been (slowly) collecting all the "create stuff inside the filesystem" routines? (It's part of a slow-moving effort to declutter common/rc.) > @@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation() > fi > } > > +# Fill a file system by repeatedly creating files in the given folder > +# starting with the given file size. Files are reduced in size when > +# they can no longer fit until no more files can be created. > +_fill_fs() > +{ > + local file_size=$1 > + local dir=$2 > + local block_size=$3 > + local switch_user=$4 > + local file_count=1 > + local bytes_written=0 > + > + if [ $# -ne 4 ]; then > + echo "Usage: _fill_fs filesize dir blocksize" > + exit 1 > + fi > + > + if [ $switch_user -eq 0 ]; then > + mkdir -p $dir > + else > + _user_do "mkdir -p $dir" > + fi > + > + if [ $file_size -lt $block_size ]; then > + $file_size = $block_size > + fi > + > + while [ $file_size -ge $block_size ]; do > + bytes_written=0 > + if [ $switch_user -eq 0 ]; then > + $XFS_IO_PROG -f -c "pwrite 0 $file_size" \ > + $dir/$file_count > /dev/null If you want to speed this up some more you could pass "-b 8388608" in the pwrite string so that xfs_io will allocate an 8MB memory buffer internally when writing out data. $XFS_IO_PROG -f -c "pwrite -b 8388608 0 $file_size" $dir/$file_count > + else > + _user_do "$XFS_IO_PROG -f -c \"pwrite 0 $file_size\" \ > + $dir/$file_count > /dev/null" Also, why redirect to /dev/null? I think helper functions generally let the individual test decide where the stdout & stderr output ultimately end up. Most tests seem to dump to $seqres.full or /dev/null, but we should let the test decide where potentially useful diagnostic information ends up. /me also wonders if falloc would be a faster way to consume disk blocks than pwrite, but ... > + fi > + > + if [ -f $dir/$file_count ]; then > + bytes_written=$(stat -c '%s' $dir/$file_count) > + fi > + > + # If there was no room to make the file, then divide it in > + # half, and keep going > + if [ $bytes_written -lt $file_size ]; then > + file_size=$((file_size / 2)) > + fi > + file_count=$((file_count + 1)) > + done > +} > + > init_rc > > > > diff --git a/tests/generic/171 b/tests/generic/171 > index a69f798..fde8ef2 100755 > --- a/tests/generic/171 > +++ b/tests/generic/171 > @@ -75,9 +75,7 @@ _cp_reflink $testdir/bigfile $testdir/clonefile > sync > > echo "Allocate the rest of the space" > -nr_free=$(stat -f -c '%f' $testdir) > -touch $testdir/file0 $testdir/file1 > -_pwrite_byte 0x61 0 $((blksz * nr_free)) $testdir/eat_my_space >> > $seqres.full 2>&1 > +_fill_fs $((32 * 1024