Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
On Mon, Apr 16, 2018 at 07:34:29PM +, Yann Collet wrote: > Hi Singh > > I don't have any strong opinion on this topic. > > You made your case clear: > your variant trades a little bit of speed for a little bit more compression > ratio. > In the context of zram, it makes sense, and I would expect it to work, as > advertised in your benchmark results. > (disclaimer: I haven't reproduced these results, just, they look reasonable > to me, I have no reason to doubt them). > > So, the issue is less about performance, than about code complexity. > > As mentioned, this is an incompatible variant. > So, it requires its own entry point, and preferably its own code path > (even if it's heavily duplicated, > mixing it with regular lz4 source code, as proposed in the patch, will be bad > for maintenance, > and can negatively impact regular lz4 usage, outside of zram). > > So that's basically the "cost" of adding this option. > > Is it worth it? > Well, this is completely outside of my responsibility area, so I really can't > tell. > You'll have to convince people in charge that the gains are worth their > complexity, > since _they_ will inherit the duty to keep the system working through its > future evolutions. > At a minimum, you are targeting maintainers of zram and the crypto interface. > For this topic, they are the right people to talk to. > > > On 4/16/18, 04:09, "Maninder Singh"wrote: > > > Hello Nick/ Yann, > > Any inputs regarding LZ4 dyn results & lz4 dyn approach. > > >Hello Nick/Sergey, > > > >Any suggestion or comments, so that we can change code and resend the > patch? > > > >> Hi Nick / Sergey, > >> > >> > >> We have compared LZ4 Dyn with Original LZ4 using some samples of > realtime application data(4Kb) > >> compressed/decompressed by ZRAM. For comparison we have used lzbench > (https://github.com/inikep/lzbench) > >> we have implemented dedicated LZ4 Dyn API & kept last literal length > as 6 to avoid overhead > >> of checks. It seems in average case there is a saving of 3~4% in > compression ratio with almost same compression > >> speed and minor loss in decompression speed (~50MB/s) when compared > with LZ4. > >> > >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default > compressor of ZRAM. > >> Unfortunately the track record of maintaining compression code in the Linux kernel is not great. zlib for example was forked from v1.2.3, which was released in 2005, and hasn't been updated since besides some random drive-by patches which have made it diverge even further from upstream. There have even been bugs assigned CVE numbers in upstream zlib, and I don't think anyone has looked at whether the Linux kernel version has those bugs or not. The story with LZ4 is a bit better as someone updated it to v1.7.3 last year. But, it took a lot of rounds of review in which I had to point out some subtle regressions like the hash table size being accidentally changed, and things not being inlined that should be. And of course now that version is outdated already. We also have LZO and Zstandard in the kernel to maintain too, as well as XZ decompression. And very problematically, *none* of these compression algorithms have a maintainer listed in the MAINTAINERS file. So in my opinion, as a prerequisite for this change, someone would need to volunteer to actually maintain LZ4 in the kernel. Thanks, Eric
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
Hi Singh I don't have any strong opinion on this topic. You made your case clear: your variant trades a little bit of speed for a little bit more compression ratio. In the context of zram, it makes sense, and I would expect it to work, as advertised in your benchmark results. (disclaimer: I haven't reproduced these results, just, they look reasonable to me, I have no reason to doubt them). So, the issue is less about performance, than about code complexity. As mentioned, this is an incompatible variant. So, it requires its own entry point, and preferably its own code path (even if it's heavily duplicated, mixing it with regular lz4 source code, as proposed in the patch, will be bad for maintenance, and can negatively impact regular lz4 usage, outside of zram). So that's basically the "cost" of adding this option. Is it worth it? Well, this is completely outside of my responsibility area, so I really can't tell. You'll have to convince people in charge that the gains are worth their complexity, since _they_ will inherit the duty to keep the system working through its future evolutions. At a minimum, you are targeting maintainers of zram and the crypto interface. For this topic, they are the right people to talk to. On 4/16/18, 04:09, "Maninder Singh"wrote: Hello Nick/ Yann, Any inputs regarding LZ4 dyn results & lz4 dyn approach. >Hello Nick/Sergey, > >Any suggestion or comments, so that we can change code and resend the patch? > >> Hi Nick / Sergey, >> >> >> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb) >> compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench) >> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead >> of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression >> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4. >> >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM. >> >> Original LZ4: >> sh-3.2# ./lzbench -r -elz4 data/ >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 2205 MB/s 2217 MB/s4096 100.00 data//data_1 >> lz4 1.8.0 216 MB/s 761 MB/s2433 59.40 data//data_1 >> lz4 1.8.0 269 MB/s 877 MB/s1873 45.73 data//data_2 >> lz4 1.8.0 238 MB/s 575 MB/s2060 50.29 data//data_3 >> lz4 1.8.0 321 MB/s 1015 MB/s1464 35.74 data//data_4 >> lz4 1.8.0 464 MB/s 1090 MB/s 713 17.41 data//data_5 >> lz4 1.8.0 296 MB/s 956 MB/s1597 38.99 data//data_6 >> lz4 1.8.0 338 MB/s 994 MB/s2238 54.64 data//data_7 >> lz4 1.8.0 705 MB/s 1172 MB/s 193 4.71 data//data_8 >> lz4 1.8.0 404 MB/s 1150 MB/s1097 26.78 data//data_9 >> lz4 1.8.0 216 MB/s 921 MB/s3183 77.71 data//data_10 >> lz4 1.8.0 456 MB/s 1101 MB/s1011 24.68 data//data_11 >> lz4 1.8.0 867 MB/s 1202 MB/s 37 0.90 data//data_12 >> >> >> LZ4 Dynamic Offet: >> sh-3.2# ./lzbench -r -elz4_dyn data/ >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 2203 MB/s 2218 MB/s4096 100.00 data//data_1 >> lz4 1.8.0 218 MB/s 693 MB/s2228 54.39 data//data_1 >> lz4 1.8.0 273 MB/s 851 MB/s1739 42.46 data//data_2 >> lz4 1.8.0 230 MB/s 526 MB/s1800 43.95 data//data_3 >> lz4 1.8.0 321 MB/s 952 MB/s1357 33.13 data//data_4 >> lz4 1.8.0 470 MB/s 1075 MB/s 664 16.21 data//data_5 >> lz4 1.8.0 303 MB/s 964 MB/s1455 35.52 data//data_6 >> lz4 1.8.0 345 MB/s 951 MB/s2126 51.90 data//data_7 >> lz4 1.8.0 744 MB/s 1163 MB/s 177 4.32 data//data_8 >> lz4 1.8.0 409 MB/s 1257 MB/s1033 25.22 data//data_9 >> lz4 1.8.0 220 MB/s 857 MB/s3049 74.44 data//data_10 >> lz4 1.8.0 464 MB/s 1105 MB/s 934 22.80 data//data_11 >> lz4 1.8.0 874 MB/s 1194 MB/s 36 0.88 data//data_12 >> >> >> LZ4 Dynamic Offset with 32K data: >> sh-3.2# ./lzbench -elz4_dyn
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
Hello Nick/ Yann, Any inputs regarding LZ4 dyn results & lz4 dyn approach. >Hello Nick/Sergey, > >Any suggestion or comments, so that we can change code and resend the patch? > >> Hi Nick / Sergey, >> >> >> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime >> application data(4Kb) >> compressed/decompressed by ZRAM. For comparison we have used lzbench >> (https://github.com/inikep/lzbench) >> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to >> avoid overhead >> of checks. It seems in average case there is a saving of 3~4% in compression >> ratio with almost same compression >> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4. >> >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor >> of ZRAM. >> >> Original LZ4: >> sh-3.2# ./lzbench -r -elz4 data/ >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 2205 MB/s 2217 MB/s4096 100.00 data//data_1 >> lz4 1.8.0 216 MB/s 761 MB/s2433 59.40 data//data_1 >> lz4 1.8.0 269 MB/s 877 MB/s1873 45.73 data//data_2 >> lz4 1.8.0 238 MB/s 575 MB/s2060 50.29 data//data_3 >> lz4 1.8.0 321 MB/s 1015 MB/s1464 35.74 data//data_4 >> lz4 1.8.0 464 MB/s 1090 MB/s 713 17.41 data//data_5 >> lz4 1.8.0 296 MB/s 956 MB/s1597 38.99 data//data_6 >> lz4 1.8.0 338 MB/s 994 MB/s2238 54.64 data//data_7 >> lz4 1.8.0 705 MB/s 1172 MB/s 193 4.71 data//data_8 >> lz4 1.8.0 404 MB/s 1150 MB/s1097 26.78 data//data_9 >> lz4 1.8.0 216 MB/s 921 MB/s3183 77.71 >> data//data_10 >> lz4 1.8.0 456 MB/s 1101 MB/s1011 24.68 >> data//data_11 >> lz4 1.8.0 867 MB/s 1202 MB/s 37 0.90 >> data//data_12 >> >> >> LZ4 Dynamic Offet: >> sh-3.2# ./lzbench -r -elz4_dyn data/ >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 2203 MB/s 2218 MB/s4096 100.00 data//data_1 >> lz4 1.8.0 218 MB/s 693 MB/s2228 54.39 data//data_1 >> lz4 1.8.0 273 MB/s 851 MB/s1739 42.46 data//data_2 >> lz4 1.8.0 230 MB/s 526 MB/s1800 43.95 data//data_3 >> lz4 1.8.0 321 MB/s 952 MB/s1357 33.13 data//data_4 >> lz4 1.8.0 470 MB/s 1075 MB/s 664 16.21 data//data_5 >> lz4 1.8.0 303 MB/s 964 MB/s1455 35.52 data//data_6 >> lz4 1.8.0 345 MB/s 951 MB/s2126 51.90 data//data_7 >> lz4 1.8.0 744 MB/s 1163 MB/s 177 4.32 data//data_8 >> lz4 1.8.0 409 MB/s 1257 MB/s1033 25.22 data//data_9 >> lz4 1.8.0 220 MB/s 857 MB/s3049 74.44 >> data//data_10 >> lz4 1.8.0 464 MB/s 1105 MB/s 934 22.80 >> data//data_11 >> lz4 1.8.0 874 MB/s 1194 MB/s 36 0.88 >> data//data_12 >> >> >> LZ4 Dynamic Offset with 32K data: >> sh-3.2# ./lzbench -elz4_dyn data/data32k >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 5285 MB/s 5283 MB/s 32768 100.00 data/data32k >> lz4 1.8.0 274 MB/s 995 MB/s 13435 41.00 data/data32k >> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB) >> >> Original LZ4 with 32K data: >> sh-3.2# ./lzbench_orig -elz4 data/data32k >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 4918 MB/s 5108 MB/s 32768 100.00 data/data32k >> lz4 1.8.0 276 MB/s 1045 MB/s 14492 44.23 data/data32k >> >> LZO1x with 32K data (Default Compressor for ZRAM): >> sh-3.2# ./lzbench -elzo1x,1 data/data32k >> lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski >> Compressor name Compress. Decompress. Compr. size Ratio Filename >> memcpy 5273 MB/s 5320 MB/s 32768 100.00 data/data32k >> lzo1x 2.09 -1 283 MB/s 465 MB/s 14292 43.62 data/data32k Thanks, Maninder Singh
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
Hello, >> (Added cover letter to avoid much text in patch description) >> >> LZ4 specification defines 2 byte offset length for 64 KB data. >> But in case of ZRAM we compress data per page and in most of >> architecture PAGE_SIZE is 4KB. So we can decide offset length based >> on actual offset value. For this we can reserve 1 bit to decide offset >> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than >> 127, >> else 1 byte is enough. > >So what happens if I compress the data on a system with no dyn >offset and then send it over the network to a machine which has >dyn offset? Or, say, I have a USB stick with a compression enabled >FS, store files on a dyn offset enabled PC and then mount that USB >stick on a machine with no dyn offset support. And vice versa. lz4_dyn is not an extension of LZ4 so there is no backward compatibility. Consider this as a different algorithm adapted from LZ4 for better compression ratio. Thanks Maninder Singh
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
On (03/29/18 15:56), Maninder Singh wrote: > Hello Nick/Sergey, > > Any suggestion or comments, so that we can change code and resend the patch? Well... there were no replies to https://marc.info/?l=linux-kernel=152161450026771=2 and https://marc.info/?l=linux-kernel=152161860627974=2 -ss
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
Hello Nick/Sergey, Any suggestion or comments, so that we can change code and resend the patch? > Hi Nick / Sergey, > > > We have compared LZ4 Dyn with Original LZ4 using some samples of realtime >application data(4Kb) > compressed/decompressed by ZRAM. For comparison we have used lzbench >(https://github.com/inikep/lzbench) > we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to >avoid overhead > of checks. It seems in average case there is a saving of 3~4% in compression >ratio with almost same compression > speed and minor loss in decompression speed (~50MB/s) when compared with LZ4. > > Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor >of ZRAM. > > Original LZ4: > sh-3.2# ./lzbench -r -elz4 data/ > lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski > Compressor name Compress. Decompress. Compr. size Ratio Filename > memcpy 2205 MB/s 2217 MB/s4096 100.00 data//data_1 > lz4 1.8.0 216 MB/s 761 MB/s2433 59.40 data//data_1 > lz4 1.8.0 269 MB/s 877 MB/s1873 45.73 data//data_2 > lz4 1.8.0 238 MB/s 575 MB/s2060 50.29 data//data_3 > lz4 1.8.0 321 MB/s 1015 MB/s1464 35.74 data//data_4 > lz4 1.8.0 464 MB/s 1090 MB/s 713 17.41 data//data_5 > lz4 1.8.0 296 MB/s 956 MB/s1597 38.99 data//data_6 > lz4 1.8.0 338 MB/s 994 MB/s2238 54.64 data//data_7 > lz4 1.8.0 705 MB/s 1172 MB/s 193 4.71 data//data_8 > lz4 1.8.0 404 MB/s 1150 MB/s1097 26.78 data//data_9 > lz4 1.8.0 216 MB/s 921 MB/s3183 77.71 data//data_10 > lz4 1.8.0 456 MB/s 1101 MB/s1011 24.68 data//data_11 > lz4 1.8.0 867 MB/s 1202 MB/s 37 0.90 data//data_12 > > > LZ4 Dynamic Offet: > sh-3.2# ./lzbench -r -elz4_dyn data/ > lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski > Compressor name Compress. Decompress. Compr. size Ratio Filename > memcpy 2203 MB/s 2218 MB/s4096 100.00 data//data_1 > lz4 1.8.0 218 MB/s 693 MB/s2228 54.39 data//data_1 > lz4 1.8.0 273 MB/s 851 MB/s1739 42.46 data//data_2 > lz4 1.8.0 230 MB/s 526 MB/s1800 43.95 data//data_3 > lz4 1.8.0 321 MB/s 952 MB/s1357 33.13 data//data_4 > lz4 1.8.0 470 MB/s 1075 MB/s 664 16.21 data//data_5 > lz4 1.8.0 303 MB/s 964 MB/s1455 35.52 data//data_6 > lz4 1.8.0 345 MB/s 951 MB/s2126 51.90 data//data_7 > lz4 1.8.0 744 MB/s 1163 MB/s 177 4.32 data//data_8 > lz4 1.8.0 409 MB/s 1257 MB/s1033 25.22 data//data_9 > lz4 1.8.0 220 MB/s 857 MB/s3049 74.44 data//data_10 > lz4 1.8.0 464 MB/s 1105 MB/s 934 22.80 data//data_11 > lz4 1.8.0 874 MB/s 1194 MB/s 36 0.88 data//data_12 > > > LZ4 Dynamic Offset with 32K data: > sh-3.2# ./lzbench -elz4_dyn data/data32k > lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski > Compressor name Compress. Decompress. Compr. size Ratio Filename > memcpy 5285 MB/s 5283 MB/s 32768 100.00 data/data32k > lz4 1.8.0 274 MB/s 995 MB/s 13435 41.00 data/data32k > done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB) > > Original LZ4 with 32K data: > sh-3.2# ./lzbench_orig -elz4 data/data32k > lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski > Compressor name Compress. Decompress. Compr. size Ratio Filename > memcpy 4918 MB/s 5108 MB/s 32768 100.00 data/data32k > lz4 1.8.0 276 MB/s 1045 MB/s 14492 44.23 data/data32k > > LZO1x with 32K data (Default Compressor for ZRAM): > sh-3.2# ./lzbench -elzo1x,1 data/data32k > lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski > Compressor name Compress. Decompress. Compr. size Ratio Filename > memcpy 5273 MB/s 5320 MB/s 32768 100.00 data/data32k > lzo1x 2.09 -1 283 MB/s 465 MB/s 14292 43.62 data/data32k > > Regards, > Vaneet Narang Thanks.
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
Hi Nick / Sergey, We have compared LZ4 Dyn with Original LZ4 using some samples of realtime application data(4Kb) compressed/decompressed by ZRAM. For comparison we have used lzbench (https://github.com/inikep/lzbench) we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to avoid overhead of checks. It seems in average case there is a saving of 3~4% in compression ratio with almost same compression speed and minor loss in decompression speed (~50MB/s) when compared with LZ4. Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of ZRAM. Original LZ4: sh-3.2# ./lzbench -r -elz4 data/ lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 2205 MB/s 2217 MB/s4096 100.00 data//data_1 lz4 1.8.0 216 MB/s 761 MB/s2433 59.40 data//data_1 lz4 1.8.0 269 MB/s 877 MB/s1873 45.73 data//data_2 lz4 1.8.0 238 MB/s 575 MB/s2060 50.29 data//data_3 lz4 1.8.0 321 MB/s 1015 MB/s1464 35.74 data//data_4 lz4 1.8.0 464 MB/s 1090 MB/s 713 17.41 data//data_5 lz4 1.8.0 296 MB/s 956 MB/s1597 38.99 data//data_6 lz4 1.8.0 338 MB/s 994 MB/s2238 54.64 data//data_7 lz4 1.8.0 705 MB/s 1172 MB/s 193 4.71 data//data_8 lz4 1.8.0 404 MB/s 1150 MB/s1097 26.78 data//data_9 lz4 1.8.0 216 MB/s 921 MB/s3183 77.71 data//data_10 lz4 1.8.0 456 MB/s 1101 MB/s1011 24.68 data//data_11 lz4 1.8.0 867 MB/s 1202 MB/s 37 0.90 data//data_12 LZ4 Dynamic Offet: sh-3.2# ./lzbench -r -elz4_dyn data/ lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 2203 MB/s 2218 MB/s4096 100.00 data//data_1 lz4 1.8.0 218 MB/s 693 MB/s2228 54.39 data//data_1 lz4 1.8.0 273 MB/s 851 MB/s1739 42.46 data//data_2 lz4 1.8.0 230 MB/s 526 MB/s1800 43.95 data//data_3 lz4 1.8.0 321 MB/s 952 MB/s1357 33.13 data//data_4 lz4 1.8.0 470 MB/s 1075 MB/s 664 16.21 data//data_5 lz4 1.8.0 303 MB/s 964 MB/s1455 35.52 data//data_6 lz4 1.8.0 345 MB/s 951 MB/s2126 51.90 data//data_7 lz4 1.8.0 744 MB/s 1163 MB/s 177 4.32 data//data_8 lz4 1.8.0 409 MB/s 1257 MB/s1033 25.22 data//data_9 lz4 1.8.0 220 MB/s 857 MB/s3049 74.44 data//data_10 lz4 1.8.0 464 MB/s 1105 MB/s 934 22.80 data//data_11 lz4 1.8.0 874 MB/s 1194 MB/s 36 0.88 data//data_12 LZ4 Dynamic Offset with 32K data: sh-3.2# ./lzbench -elz4_dyn data/data32k lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 5285 MB/s 5283 MB/s 32768 100.00 data/data32k lz4 1.8.0 274 MB/s 995 MB/s 13435 41.00 data/data32k done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB) Original LZ4 with 32K data: sh-3.2# ./lzbench_orig -elz4 data/data32k lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 4918 MB/s 5108 MB/s 32768 100.00 data/data32k lz4 1.8.0 276 MB/s 1045 MB/s 14492 44.23 data/data32k LZO1x with 32K data (Default Compressor for ZRAM): sh-3.2# ./lzbench -elzo1x,1 data/data32k lzbench 1.7.3 (32-bit Linux) Assembled by P.Skibinski Compressor name Compress. Decompress. Compr. size Ratio Filename memcpy 5273 MB/s 5320 MB/s 32768 100.00 data/data32k lzo1x 2.09 -1 283 MB/s 465 MB/s 14292 43.62 data/data32k Regards, Vaneet Narang
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
On (03/21/18 19:56), Nick Terrell wrote: [..] > This seems like a reasonable extension to the algorithm, and it looks like > LZ4_DYN is about a 5% improvement to compression ratio on your benchmark. > The biggest question I have is if it is worthwhile to maintain a separate > incompatible variant of LZ4 in the kernel without any upstream for a 5% > gain? If we do want to go forward with this, we should perform more > benchmarks. > > I commented in the patch, but because the `dynOffset` variable isn't a > compile time static in LZ4_decompress_generic(), I suspect that the patch > causes a regression in decompression speed for both LZ4 and LZ4_DYN. You'll > need to re-run the benchmarks to first show that LZ4 before the patch > performs the same as LZ4 after the patch. Then re-run the LZ4 vs LZ4_DYN > benchmarks. > > I would also like to see a benchmark in user-space (with the code), so we > can see the performance of LZ4 before and after the patch, as well as LZ4 > vs LZ4_DYN without anything else going on. I expect the extra branches in > the decoding loop to have an impact on speed, and I would like to see how > big the impact is without noise. Yes, I've been thinking about this. There are more branches now ("to dyn or not to dyn") in compression/decompression hot path but we see less instructions and branches in perf output at the end. And my guess is that we have a lot of noise from zram and zsmalloc. The data is XXX bytes shorter with dyn enabled, so we use YYY less moves and ZZZ less branches while we copy the data to zsmalloc and from zsmalloc, and I may be that's the root cause of "performance gain" that we see in zram-fio tests. So may be we need to run benchmarks against lz4, not zram+lz4. > CC-ing Yann Collet, the author of LZ4 Great, thanks. -ss
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
On (03/21/18 10:10), Maninder Singh wrote: > LZ4 specification defines 2 byte offset length for 64 KB data. > But in case of ZRAM we compress data per page and in most of > architecture PAGE_SIZE is 4KB. So we can decide offset length based > on actual offset value. For this we can reserve 1 bit to decide offset > length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127, > else 1 byte is enough. > > With this new implementation new offset value can be at MAX 32 KB. > > Thus we can save more memory for compressed data. > > results checked with new implementation:- > > comression size for same input source > (LZ4_DYN < LZO < LZ4) > > LZO > === > orig_data_size: 78917632 > compr_data_size: 15894668 > mem_used_total: 17117184 > > LZ4 > > orig_data_size: 78917632 > compr_data_size: 16310717 > mem_used_total: 17592320 > > LZ4_DYN > === > orig_data_size: 78917632 > compr_data_size: 15520506 > mem_used_total: 16748544 This seems like a reasonable extension to the algorithm, and it looks like LZ4_DYN is about a 5% improvement to compression ratio on your benchmark. The biggest question I have is if it is worthwhile to maintain a separate incompatible variant of LZ4 in the kernel without any upstream for a 5% gain? If we do want to go forward with this, we should perform more benchmarks. I commented in the patch, but because the `dynOffset` variable isn't a compile time static in LZ4_decompress_generic(), I suspect that the patch causes a regression in decompression speed for both LZ4 and LZ4_DYN. You'll need to re-run the benchmarks to first show that LZ4 before the patch performs the same as LZ4 after the patch. Then re-run the LZ4 vs LZ4_DYN benchmarks. I would also like to see a benchmark in user-space (with the code), so we can see the performance of LZ4 before and after the patch, as well as LZ4 vs LZ4_DYN without anything else going on. I expect the extra branches in the decoding loop to have an impact on speed, and I would like to see how big the impact is without noise. CC-ing Yann Collet, the author of LZ4
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
CC-ing Nick Nick, can you take a look? Message-IDs: lkml.kernel.org/r/1521607242-3968-1-git-send-email-maninder...@samsung.com lkml.kernel.org/r/1521607242-3968-2-git-send-email-maninder...@samsung.com -ss
Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
On (03/21/18 10:10), Maninder Singh wrote: > (Added cover letter to avoid much text in patch description) > > LZ4 specification defines 2 byte offset length for 64 KB data. > But in case of ZRAM we compress data per page and in most of > architecture PAGE_SIZE is 4KB. So we can decide offset length based > on actual offset value. For this we can reserve 1 bit to decide offset > length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127, > else 1 byte is enough. So what happens if I compress the data on a system with no dyn offset and then send it over the network to a machine which has dyn offset? Or, say, I have a USB stick with a compression enabled FS, store files on a dyn offset enabled PC and then mount that USB stick on a machine with no dyn offset support. And vice versa. -ss