Re: [PATCH 2/2] zram: use crypto API to compress the page
On Thu, Aug 13, 2015 at 12:51:13PM +0900, Sergey Senozhatsky wrote: On (08/13/15 11:24), Joonsoo Kim wrote: Until now, zram uses compression algorithm through direct call to core algorithm function, but, it has drawback that we need to add compression algorithm manually to zram if needed. Without this work, we cannot utilize various compression algorithms in the system. Moreover, to add new compression algorithm, we need to know how to use it and this is somewhat time-consuming. I don't like this change, sorry. Here is why. One of the main reasons why I haven't implemented it using crypto API (and it was an option and I thought about it) was and still is the fact that crypto API requires tfm for both compress and decompress operations. And this is a show stopper. No matter how you implement it, it is slower than what we have by definition. I don't think it is show stopper. Current implementation that uses zstrm only for compression seems over-optimized thing to me. Some of other compression algorithms need scratch-buffer to decompress the contents. As you know, zlib is one of them. I tested some of other decompression algorithm such as quicklz, wkdm and they also need scratch-buffer. And, we cannot support crypto algorithm *module* in current implementation. If that optimization is really needed for the case that doesn't need tfm except fetching function pointer, we can implement sharable tfm in crypto subsystem. Now zram_bvec_read() operations depends on: a) other read operations because read() path now use limited in size idle stream list b) write operations because write() path uses same idle streams list Literally, you change zram_bvec_read() from a fast zram_decompress_page(zram, uncmem, index); to a slow zstrm = zcomp_strm_find(zram-comp); zram_decompress_page(zram, zstrm, uncmem, index); zcomp_strm_release(zram-comp, zstrm); you either slow down both write() and read() if you use a single idle stream list for both read and write, or double the amount of memory used by idle streams if you decide to use separate read and write idle stream lists. I see no real reason to do either of those. I like that the existing read() is as fast as probably possible. Yes, there can be some performace regression. But, by allocating some more streams, we can easily eliminate regression. Memory overhead would be really small. I think that gaining flexibility and reducing management overhead can compensate this small memory overhead. Thanks. When I tested new algorithms such as zlib, these problems make me hard to test them. To prevent these problem in the future, I think that using crypto API to compress the page is better approach and this patch implements it. The reason we need to support vairous compression algorithms is that zram's performance is highly depend on workload and compression algorithm and architecture. Every compression algorithm has it's own strong point. For example, zlib is the best for compression ratio, but, it's (de)compression speed is rather slow. Against my expectation, in my kernel build test with zram swap, in low-memory condition on x86, zlib shows best performance than others. In this case, I guess that compression ratio is the most important factor. Unlike this situation, on ARM, maybe fast (de)compression speed is the most important because it's computation speed is slower than x86. We can't expect what algorithm is the best fit for one's system, because it needs too complex calculation. We need to test it in case by case and easy to use new compression algorithm by this patch will help that situation. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com --- drivers/block/zram/Kconfig | 13 +-- drivers/block/zram/Makefile| 4 +- drivers/block/zram/zcomp.c | 87 ++ drivers/block/zram/zcomp.h | 34 + drivers/block/zram/zcomp_lz4.c | 47 --- drivers/block/zram/zcomp_lz4.h | 17 - drivers/block/zram/zcomp_lzo.c | 47 --- drivers/block/zram/zcomp_lzo.h | 17 - drivers/block/zram/zram_drv.c | 28 +- 9 files changed, 66 insertions(+), 228 deletions(-) delete mode 100644 drivers/block/zram/zcomp_lz4.c delete mode 100644 drivers/block/zram/zcomp_lz4.h delete mode 100644 drivers/block/zram/zcomp_lzo.c delete mode 100644 drivers/block/zram/zcomp_lzo.h diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 386ba3d..36ec96f 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -1,8 +1,7 @@ config ZRAM tristate Compressed RAM block device support depends on BLOCK SYSFS ZSMALLOC - select LZO_COMPRESS - select LZO_DECOMPRESS + select CRYPTO_LZO default n help Creates virtual block devices
Re: [PATCH 2/2] zram: use crypto API to compress the page
On Thu, Aug 13, 2015 at 04:19:41PM +0900, Joonsoo Kim wrote: If that optimization is really needed for the case that doesn't need tfm except fetching function pointer, we can implement sharable tfm in crypto subsystem. I'm happy to consider changes to the crypto compression interface as long as it continues to support the existing algorithms and users. Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] zram: use crypto API to compress the page
On Thu, Aug 13, 2015 at 03:32:33PM +0800, Herbert Xu wrote: On Thu, Aug 13, 2015 at 04:19:41PM +0900, Joonsoo Kim wrote: If that optimization is really needed for the case that doesn't need tfm except fetching function pointer, we can implement sharable tfm in crypto subsystem. I'm happy to consider changes to the crypto compression interface as long as it continues to support the existing algorithms and users. Happy to hear that. I will think how it can be improved. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] zram: use crypto API to compress the page
Removed Herbert, David, Stephan not spam their mail boxes. I'd rather investigate something like this. WARNING!!! NOT TESTED AT ALL! like for real... only compile tested. RFC define ZLIB backend the same way as we already do. and introduce compression frontend -flags field. backend that requires zstream for read op will set it to ZCOMP_NEED_READ_STRM. zram checks this flag in read() path and does slow zcomp_strm_find()/zcomp_strm_release() if backend requires it; otherwise we have our fast just decompress it path. this is very ugly and quick. just as an idea at the moment. I will return to this a bit later. not-yet-signed-off-by: Sergey Senozhatsky sergey.senozhatsky.w...@gmail.com --- drivers/block/zram/Kconfig | 12 - drivers/block/zram/Makefile | 1 + drivers/block/zram/zcomp.c | 21 +++- drivers/block/zram/zcomp.h | 9 +++- drivers/block/zram/zcomp_lz4.c | 2 +- drivers/block/zram/zcomp_lzo.c | 2 +- drivers/block/zram/zcomp_zlib.c | 115 drivers/block/zram/zcomp_zlib.h | 17 ++ drivers/block/zram/zram_drv.c | 15 -- 9 files changed, 184 insertions(+), 10 deletions(-) create mode 100644 drivers/block/zram/zcomp_zlib.c create mode 100644 drivers/block/zram/zcomp_zlib.h diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 386ba3d..7afd2ac 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -23,4 +23,14 @@ config ZRAM_LZ4_COMPRESS default n help This option enables LZ4 compression algorithm support. Compression - algorithm can be changed using `comp_algorithm' device attribute. \ No newline at end of file + algorithm can be changed using `comp_algorithm' device attribute. + +config ZRAM_ZLIB_COMPRESS + bool Enable ZLIB algorithm support + depends on ZRAM + select ZLIB_INFLATE + select ZLIB_DEFLATE + default n + help + This option enables ZLIB compression algorithm support. Compression + algorithm can be changed using `comp_algorithm' device attribute. diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile index be0763f..0922f54 100644 --- a/drivers/block/zram/Makefile +++ b/drivers/block/zram/Makefile @@ -1,5 +1,6 @@ zram-y := zcomp_lzo.o zcomp.o zram_drv.o zram-$(CONFIG_ZRAM_LZ4_COMPRESS) += zcomp_lz4.o +zram-$(CONFIG_ZRAM_ZLIB_COMPRESS) += zcomp_zlib.o obj-$(CONFIG_ZRAM) += zram.o diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index 965d1af..4f04186 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -19,6 +19,9 @@ #ifdef CONFIG_ZRAM_LZ4_COMPRESS #include zcomp_lz4.h #endif +#ifdef CONFIG_ZRAM_ZLIB_COMPRESS +#include zcomp_zlib.h +#endif /* * single zcomp_strm backend @@ -48,6 +51,9 @@ static struct zcomp_backend *backends[] = { #ifdef CONFIG_ZRAM_LZ4_COMPRESS zcomp_lz4, #endif +#ifdef CONFIG_ZRAM_ZLIB_COMPRESS + zcomp_zlib, +#endif NULL }; @@ -313,10 +319,16 @@ int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm, zstrm-private); } -int zcomp_decompress(struct zcomp *comp, const unsigned char *src, +int zcomp_decompress(struct zcomp *comp, struct zcomp_strm *zstrm, + const unsigned char *src, size_t src_len, unsigned char *dst) { - return comp-backend-decompress(src, src_len, dst); + void *private = NULL; + + if (unlikely(zstrm)) + private = zstrm-private; + + return comp-backend-decompress(src, src_len, dst, private); } void zcomp_destroy(struct zcomp *comp) @@ -354,5 +366,10 @@ struct zcomp *zcomp_create(const char *compress, int max_strm) kfree(comp); return ERR_PTR(-ENOMEM); } + + /* FIXME quick dirty and ugly. ONLY for testing purposes */ + if (sysfs_streq(compress, zlib)) + comp-flags |= ZCOMP_NEED_READ_STRM; + return comp; } diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h index 46e2b9f..e3ac8d3 100644 --- a/drivers/block/zram/zcomp.h +++ b/drivers/block/zram/zcomp.h @@ -12,6 +12,8 @@ #include linux/mutex.h +#define ZCOMP_NEED_READ_STRM (1 0) + struct zcomp_strm { /* compression/decompression buffer */ void *buffer; @@ -31,7 +33,7 @@ struct zcomp_backend { size_t *dst_len, void *private); int (*decompress)(const unsigned char *src, size_t src_len, - unsigned char *dst); + unsigned char *dst, void *private); void *(*create)(void); void (*destroy)(void *private); @@ -44,6 +46,8 @@ struct zcomp { void *stream; struct zcomp_backend *backend; + int flags; + struct zcomp_strm *(*strm_find)(struct zcomp *comp); void (*strm_release)(struct zcomp *comp, struct zcomp_strm *zstrm);
[PATCH 2/2] zram: use crypto API to compress the page
Until now, zram uses compression algorithm through direct call to core algorithm function, but, it has drawback that we need to add compression algorithm manually to zram if needed. Without this work, we cannot utilize various compression algorithms in the system. Moreover, to add new compression algorithm, we need to know how to use it and this is somewhat time-consuming. When I tested new algorithms such as zlib, these problems make me hard to test them. To prevent these problem in the future, I think that using crypto API to compress the page is better approach and this patch implements it. The reason we need to support vairous compression algorithms is that zram's performance is highly depend on workload and compression algorithm and architecture. Every compression algorithm has it's own strong point. For example, zlib is the best for compression ratio, but, it's (de)compression speed is rather slow. Against my expectation, in my kernel build test with zram swap, in low-memory condition on x86, zlib shows best performance than others. In this case, I guess that compression ratio is the most important factor. Unlike this situation, on ARM, maybe fast (de)compression speed is the most important because it's computation speed is slower than x86. We can't expect what algorithm is the best fit for one's system, because it needs too complex calculation. We need to test it in case by case and easy to use new compression algorithm by this patch will help that situation. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com --- drivers/block/zram/Kconfig | 13 +-- drivers/block/zram/Makefile| 4 +- drivers/block/zram/zcomp.c | 87 ++ drivers/block/zram/zcomp.h | 34 + drivers/block/zram/zcomp_lz4.c | 47 --- drivers/block/zram/zcomp_lz4.h | 17 - drivers/block/zram/zcomp_lzo.c | 47 --- drivers/block/zram/zcomp_lzo.h | 17 - drivers/block/zram/zram_drv.c | 28 +- 9 files changed, 66 insertions(+), 228 deletions(-) delete mode 100644 drivers/block/zram/zcomp_lz4.c delete mode 100644 drivers/block/zram/zcomp_lz4.h delete mode 100644 drivers/block/zram/zcomp_lzo.c delete mode 100644 drivers/block/zram/zcomp_lzo.h diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 386ba3d..36ec96f 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -1,8 +1,7 @@ config ZRAM tristate Compressed RAM block device support depends on BLOCK SYSFS ZSMALLOC - select LZO_COMPRESS - select LZO_DECOMPRESS + select CRYPTO_LZO default n help Creates virtual block devices called /dev/zramX (X = 0, 1, ...). @@ -14,13 +13,3 @@ config ZRAM disks and maybe many more. See zram.txt for more information. - -config ZRAM_LZ4_COMPRESS - bool Enable LZ4 algorithm support - depends on ZRAM - select LZ4_COMPRESS - select LZ4_DECOMPRESS - default n - help - This option enables LZ4 compression algorithm support. Compression - algorithm can be changed using `comp_algorithm' device attribute. \ No newline at end of file diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile index be0763f..9e2b79e 100644 --- a/drivers/block/zram/Makefile +++ b/drivers/block/zram/Makefile @@ -1,5 +1,3 @@ -zram-y := zcomp_lzo.o zcomp.o zram_drv.o - -zram-$(CONFIG_ZRAM_LZ4_COMPRESS) += zcomp_lz4.o +zram-y := zcomp.o zram_drv.o obj-$(CONFIG_ZRAM) += zram.o diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index 965d1af..3dc9bfa 100644 --- a/drivers/block/zram/zcomp.c +++ b/drivers/block/zram/zcomp.c @@ -13,12 +13,9 @@ #include linux/slab.h #include linux/wait.h #include linux/sched.h +#include linux/crypto.h #include zcomp.h -#include zcomp_lzo.h -#ifdef CONFIG_ZRAM_LZ4_COMPRESS -#include zcomp_lz4.h -#endif /* * single zcomp_strm backend @@ -43,36 +40,17 @@ struct zcomp_strm_multi { wait_queue_head_t strm_wait; }; -static struct zcomp_backend *backends[] = { - zcomp_lzo, -#ifdef CONFIG_ZRAM_LZ4_COMPRESS - zcomp_lz4, -#endif - NULL -}; - -static struct zcomp_backend *find_backend(const char *compress) -{ - int i = 0; - while (backends[i]) { - if (sysfs_streq(compress, backends[i]-name)) - break; - i++; - } - return backends[i]; -} - static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm) { - if (zstrm-private) - comp-backend-destroy(zstrm-private); + if (zstrm-tfm) + crypto_free_comp(zstrm-tfm); free_pages((unsigned long)zstrm-buffer, 1); kfree(zstrm); } /* - * allocate new zcomp_strm structure with -private initialized by - * backend, return NULL on error + * allocate new zcomp_strm structure with
Re: [PATCH 2/2] zram: use crypto API to compress the page
On (08/13/15 11:24), Joonsoo Kim wrote: Until now, zram uses compression algorithm through direct call to core algorithm function, but, it has drawback that we need to add compression algorithm manually to zram if needed. Without this work, we cannot utilize various compression algorithms in the system. Moreover, to add new compression algorithm, we need to know how to use it and this is somewhat time-consuming. I don't like this change, sorry. Here is why. One of the main reasons why I haven't implemented it using crypto API (and it was an option and I thought about it) was and still is the fact that crypto API requires tfm for both compress and decompress operations. And this is a show stopper. No matter how you implement it, it is slower than what we have by definition. Now zram_bvec_read() operations depends on: a) other read operations because read() path now use limited in size idle stream list b) write operations because write() path uses same idle streams list Literally, you change zram_bvec_read() from a fast zram_decompress_page(zram, uncmem, index); to a slow zstrm = zcomp_strm_find(zram-comp); zram_decompress_page(zram, zstrm, uncmem, index); zcomp_strm_release(zram-comp, zstrm); you either slow down both write() and read() if you use a single idle stream list for both read and write, or double the amount of memory used by idle streams if you decide to use separate read and write idle stream lists. I see no real reason to do either of those. I like that the existing read() is as fast as probably possible. -ss When I tested new algorithms such as zlib, these problems make me hard to test them. To prevent these problem in the future, I think that using crypto API to compress the page is better approach and this patch implements it. The reason we need to support vairous compression algorithms is that zram's performance is highly depend on workload and compression algorithm and architecture. Every compression algorithm has it's own strong point. For example, zlib is the best for compression ratio, but, it's (de)compression speed is rather slow. Against my expectation, in my kernel build test with zram swap, in low-memory condition on x86, zlib shows best performance than others. In this case, I guess that compression ratio is the most important factor. Unlike this situation, on ARM, maybe fast (de)compression speed is the most important because it's computation speed is slower than x86. We can't expect what algorithm is the best fit for one's system, because it needs too complex calculation. We need to test it in case by case and easy to use new compression algorithm by this patch will help that situation. Signed-off-by: Joonsoo Kim iamjoonsoo@lge.com --- drivers/block/zram/Kconfig | 13 +-- drivers/block/zram/Makefile| 4 +- drivers/block/zram/zcomp.c | 87 ++ drivers/block/zram/zcomp.h | 34 + drivers/block/zram/zcomp_lz4.c | 47 --- drivers/block/zram/zcomp_lz4.h | 17 - drivers/block/zram/zcomp_lzo.c | 47 --- drivers/block/zram/zcomp_lzo.h | 17 - drivers/block/zram/zram_drv.c | 28 +- 9 files changed, 66 insertions(+), 228 deletions(-) delete mode 100644 drivers/block/zram/zcomp_lz4.c delete mode 100644 drivers/block/zram/zcomp_lz4.h delete mode 100644 drivers/block/zram/zcomp_lzo.c delete mode 100644 drivers/block/zram/zcomp_lzo.h diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 386ba3d..36ec96f 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -1,8 +1,7 @@ config ZRAM tristate Compressed RAM block device support depends on BLOCK SYSFS ZSMALLOC - select LZO_COMPRESS - select LZO_DECOMPRESS + select CRYPTO_LZO default n help Creates virtual block devices called /dev/zramX (X = 0, 1, ...). @@ -14,13 +13,3 @@ config ZRAM disks and maybe many more. See zram.txt for more information. - -config ZRAM_LZ4_COMPRESS - bool Enable LZ4 algorithm support - depends on ZRAM - select LZ4_COMPRESS - select LZ4_DECOMPRESS - default n - help - This option enables LZ4 compression algorithm support. Compression - algorithm can be changed using `comp_algorithm' device attribute. \ No newline at end of file diff --git a/drivers/block/zram/Makefile b/drivers/block/zram/Makefile index be0763f..9e2b79e 100644 --- a/drivers/block/zram/Makefile +++ b/drivers/block/zram/Makefile @@ -1,5 +1,3 @@ -zram-y := zcomp_lzo.o zcomp.o zram_drv.o - -zram-$(CONFIG_ZRAM_LZ4_COMPRESS) += zcomp_lz4.o +zram-y := zcomp.o zram_drv.o obj-$(CONFIG_ZRAM) += zram.o diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c index 965d1af..3dc9bfa 100644 ---