Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-04-16 Thread Eric Biggers
On Mon, Apr 16, 2018 at 07:34:29PM +, Yann Collet wrote:
> Hi Singh
> 
> I don't have any strong opinion on this topic.
> 
> You made your case clear: 
> your variant trades a little bit of speed for a little bit more compression 
> ratio.
> In the context of zram, it makes sense, and I would expect it to work, as 
> advertised in your benchmark results.
> (disclaimer: I haven't reproduced these results, just, they look reasonable 
> to me, I have no reason to doubt them).
> 
> So, the issue is less about performance, than about code complexity.
> 
> As mentioned, this is an incompatible variant.
> So, it requires its own entry point, and preferably its own code path
> (even if it's heavily duplicated,
> mixing it with regular lz4 source code, as proposed in the patch, will be bad 
> for maintenance, 
> and can negatively impact regular lz4 usage, outside of zram).
> 
> So that's basically the "cost" of adding this option.
> 
> Is it worth it?
> Well, this is completely outside of my responsibility area, so I really can't 
> tell.
> You'll have to convince people in charge that the gains are worth their 
> complexity,
> since _they_ will inherit the duty to keep the system working through its 
> future evolutions.
> At a minimum, you are targeting maintainers of zram and the crypto interface.
> For this topic, they are the right people to talk to.
> 
> 
> On 4/16/18, 04:09, "Maninder Singh"  wrote:
> 
> 
>  Hello Nick/ Yann,
> 
> Any inputs regarding LZ4 dyn results & lz4 dyn approach.
> 
> >Hello Nick/Sergey,
> > 
> >Any suggestion or comments, so that we can change code and resend the 
> patch?
> > 
> >> Hi Nick / Sergey,
> >> 
> >> 
> >> We have compared LZ4 Dyn with Original LZ4 using some samples of 
> realtime application data(4Kb)
> >> compressed/decompressed by ZRAM. For comparison we have used lzbench 
> (https://github.com/inikep/lzbench)
> >> we have implemented dedicated LZ4 Dyn API & kept last literal length 
> as 6 to avoid overhead 
> >> of checks. It seems in average case there is a saving of 3~4% in 
> compression ratio with almost same compression
> >> speed and minor loss in decompression speed (~50MB/s) when compared 
> with LZ4.
> >> 
> >> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default 
> compressor of ZRAM.
> >> 

Unfortunately the track record of maintaining compression code in the Linux
kernel is not great.  zlib for example was forked from v1.2.3, which was
released in 2005, and hasn't been updated since besides some random drive-by
patches which have made it diverge even further from upstream.  There have even
been bugs assigned CVE numbers in upstream zlib, and I don't think anyone has
looked at whether the Linux kernel version has those bugs or not.

The story with LZ4 is a bit better as someone updated it to v1.7.3 last year.
But, it took a lot of rounds of review in which I had to point out some subtle
regressions like the hash table size being accidentally changed, and things not
being inlined that should be.  And of course now that version is outdated
already.

We also have LZO and Zstandard in the kernel to maintain too, as well as XZ
decompression.

And very problematically, *none* of these compression algorithms have a
maintainer listed in the MAINTAINERS file.

So in my opinion, as a prerequisite for this change, someone would need to
volunteer to actually maintain LZ4 in the kernel.

Thanks,

Eric


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-04-16 Thread Yann Collet
Hi Singh

I don't have any strong opinion on this topic.

You made your case clear: 
your variant trades a little bit of speed for a little bit more compression 
ratio.
In the context of zram, it makes sense, and I would expect it to work, as 
advertised in your benchmark results.
(disclaimer: I haven't reproduced these results, just, they look reasonable to 
me, I have no reason to doubt them).

So, the issue is less about performance, than about code complexity.

As mentioned, this is an incompatible variant.
So, it requires its own entry point, and preferably its own code path
(even if it's heavily duplicated,
mixing it with regular lz4 source code, as proposed in the patch, will be bad 
for maintenance, 
and can negatively impact regular lz4 usage, outside of zram).

So that's basically the "cost" of adding this option.

Is it worth it?
Well, this is completely outside of my responsibility area, so I really can't 
tell.
You'll have to convince people in charge that the gains are worth their 
complexity,
since _they_ will inherit the duty to keep the system working through its 
future evolutions.
At a minimum, you are targeting maintainers of zram and the crypto interface.
For this topic, they are the right people to talk to.


On 4/16/18, 04:09, "Maninder Singh"  wrote:


 Hello Nick/ Yann,

Any inputs regarding LZ4 dyn results & lz4 dyn approach.

>Hello Nick/Sergey,
> 
>Any suggestion or comments, so that we can change code and resend the 
patch?
> 
>> Hi Nick / Sergey,
>> 
>> 
>> We have compared LZ4 Dyn with Original LZ4 using some samples of 
realtime application data(4Kb)
>> compressed/decompressed by ZRAM. For comparison we have used lzbench 
(https://github.com/inikep/lzbench)
>> we have implemented dedicated LZ4 Dyn API & kept last literal length as 
6 to avoid overhead 
>> of checks. It seems in average case there is a saving of 3~4% in 
compression ratio with almost same compression
>> speed and minor loss in decompression speed (~50MB/s) when compared with 
LZ4.
>> 
>> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default 
compressor of ZRAM.
>> 
>> Original LZ4:
>> sh-3.2# ./lzbench  -r  -elz4  data/
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   2205 MB/s  2217 MB/s4096 100.00 
data//data_1
>> lz4 1.8.0 216 MB/s   761 MB/s2433  59.40 
data//data_1
>> lz4 1.8.0 269 MB/s   877 MB/s1873  45.73 
data//data_2
>> lz4 1.8.0 238 MB/s   575 MB/s2060  50.29 
data//data_3
>> lz4 1.8.0 321 MB/s  1015 MB/s1464  35.74 
data//data_4
>> lz4 1.8.0 464 MB/s  1090 MB/s 713  17.41 
data//data_5
>> lz4 1.8.0 296 MB/s   956 MB/s1597  38.99 
data//data_6
>> lz4 1.8.0 338 MB/s   994 MB/s2238  54.64 
data//data_7
>> lz4 1.8.0 705 MB/s  1172 MB/s 193   4.71 
data//data_8
>> lz4 1.8.0 404 MB/s  1150 MB/s1097  26.78 
data//data_9
>> lz4 1.8.0 216 MB/s   921 MB/s3183  77.71 
data//data_10
>> lz4 1.8.0 456 MB/s  1101 MB/s1011  24.68 
data//data_11
>> lz4 1.8.0 867 MB/s  1202 MB/s  37   0.90 
data//data_12
>> 
>> 
>> LZ4 Dynamic Offet:  
>> sh-3.2# ./lzbench  -r  -elz4_dyn  data/
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   2203 MB/s  2218 MB/s4096 100.00 
data//data_1
>> lz4 1.8.0 218 MB/s   693 MB/s2228  54.39 
data//data_1
>> lz4 1.8.0 273 MB/s   851 MB/s1739  42.46 
data//data_2
>> lz4 1.8.0 230 MB/s   526 MB/s1800  43.95 
data//data_3
>> lz4 1.8.0 321 MB/s   952 MB/s1357  33.13 
data//data_4
>> lz4 1.8.0 470 MB/s  1075 MB/s 664  16.21 
data//data_5
>> lz4 1.8.0 303 MB/s   964 MB/s1455  35.52 
data//data_6
>> lz4 1.8.0 345 MB/s   951 MB/s2126  51.90 
data//data_7
>> lz4 1.8.0 744 MB/s  1163 MB/s 177   4.32 
data//data_8
>> lz4 1.8.0 409 MB/s  1257 MB/s1033  25.22 
data//data_9
>> lz4 1.8.0 220 MB/s   857 MB/s3049  74.44 
data//data_10
>> lz4 1.8.0 464 MB/s  1105 MB/s 934  22.80 
data//data_11
>> lz4 1.8.0 874 MB/s  1194 MB/s  36   0.88 
data//data_12
>> 
>> 
>> LZ4 Dynamic Offset with 32K data:
>> sh-3.2# ./lzbench -elz4_dyn 

Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-04-16 Thread Maninder Singh

 Hello Nick/ Yann,

Any inputs regarding LZ4 dyn results & lz4 dyn approach.

>Hello Nick/Sergey,
> 
>Any suggestion or comments, so that we can change code and resend the patch?
> 
>> Hi Nick / Sergey,
>> 
>> 
>> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime 
>> application data(4Kb)
>> compressed/decompressed by ZRAM. For comparison we have used lzbench 
>> (https://github.com/inikep/lzbench)
>> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to 
>> avoid overhead 
>> of checks. It seems in average case there is a saving of 3~4% in compression 
>> ratio with almost same compression
>> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
>> 
>> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor 
>> of ZRAM.
>> 
>> Original LZ4:
>> sh-3.2# ./lzbench  -r  -elz4  data/
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   2205 MB/s  2217 MB/s4096 100.00 data//data_1
>> lz4 1.8.0 216 MB/s   761 MB/s2433  59.40 data//data_1
>> lz4 1.8.0 269 MB/s   877 MB/s1873  45.73 data//data_2
>> lz4 1.8.0 238 MB/s   575 MB/s2060  50.29 data//data_3
>> lz4 1.8.0 321 MB/s  1015 MB/s1464  35.74 data//data_4
>> lz4 1.8.0 464 MB/s  1090 MB/s 713  17.41 data//data_5
>> lz4 1.8.0 296 MB/s   956 MB/s1597  38.99 data//data_6
>> lz4 1.8.0 338 MB/s   994 MB/s2238  54.64 data//data_7
>> lz4 1.8.0 705 MB/s  1172 MB/s 193   4.71 data//data_8
>> lz4 1.8.0 404 MB/s  1150 MB/s1097  26.78 data//data_9
>> lz4 1.8.0 216 MB/s   921 MB/s3183  77.71 
>> data//data_10
>> lz4 1.8.0 456 MB/s  1101 MB/s1011  24.68 
>> data//data_11
>> lz4 1.8.0 867 MB/s  1202 MB/s  37   0.90 
>> data//data_12
>> 
>> 
>> LZ4 Dynamic Offet:  
>> sh-3.2# ./lzbench  -r  -elz4_dyn  data/
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   2203 MB/s  2218 MB/s4096 100.00 data//data_1
>> lz4 1.8.0 218 MB/s   693 MB/s2228  54.39 data//data_1
>> lz4 1.8.0 273 MB/s   851 MB/s1739  42.46 data//data_2
>> lz4 1.8.0 230 MB/s   526 MB/s1800  43.95 data//data_3
>> lz4 1.8.0 321 MB/s   952 MB/s1357  33.13 data//data_4
>> lz4 1.8.0 470 MB/s  1075 MB/s 664  16.21 data//data_5
>> lz4 1.8.0 303 MB/s   964 MB/s1455  35.52 data//data_6
>> lz4 1.8.0 345 MB/s   951 MB/s2126  51.90 data//data_7
>> lz4 1.8.0 744 MB/s  1163 MB/s 177   4.32 data//data_8
>> lz4 1.8.0 409 MB/s  1257 MB/s1033  25.22 data//data_9
>> lz4 1.8.0 220 MB/s   857 MB/s3049  74.44 
>> data//data_10
>> lz4 1.8.0 464 MB/s  1105 MB/s 934  22.80 
>> data//data_11
>> lz4 1.8.0 874 MB/s  1194 MB/s  36   0.88 
>> data//data_12
>> 
>> 
>> LZ4 Dynamic Offset with 32K data:
>> sh-3.2# ./lzbench -elz4_dyn data/data32k
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   5285 MB/s  5283 MB/s   32768 100.00 data/data32k
>> lz4 1.8.0 274 MB/s   995 MB/s   13435  41.00 data/data32k
>> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)
>> 
>> Original LZ4 with 32K data:
>> sh-3.2# ./lzbench_orig -elz4 data/data32k
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   4918 MB/s  5108 MB/s   32768 100.00 data/data32k
>> lz4 1.8.0 276 MB/s  1045 MB/s   14492  44.23 data/data32k
>> 
>> LZO1x with 32K data (Default Compressor for ZRAM): 
>> sh-3.2# ./lzbench -elzo1x,1 data/data32k
>> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
>> Compressor name Compress. Decompress. Compr. size  Ratio Filename
>> memcpy   5273 MB/s  5320 MB/s   32768 100.00 data/data32k
>> lzo1x 2.09 -1 283 MB/s   465 MB/s   14292  43.62 data/data32k


Thanks,
Maninder Singh
 


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-04-02 Thread Maninder Singh

 Hello,

>> (Added cover letter to avoid much text in patch description)
>> 
>> LZ4 specification defines 2 byte offset length for 64 KB data.
>> But in case of ZRAM we compress data per page and in most of
>> architecture PAGE_SIZE is 4KB. So we can decide offset length based
>> on actual offset value. For this we can reserve 1 bit to decide offset
>> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 
>> 127,
>> else 1 byte is enough.
> 
>So what happens if I compress the data on a system with no dyn
>offset and then send it over the network to a machine which has
>dyn offset? Or, say, I have a USB stick with a compression enabled
>FS, store files on a dyn offset enabled PC and then mount that USB
>stick on a machine with no dyn offset support. And vice versa.


lz4_dyn is not an extension of LZ4 so there is no backward compatibility.
Consider this as a different algorithm adapted from LZ4 for better compression 
ratio.

Thanks
Maninder Singh


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-29 Thread Sergey Senozhatsky
On (03/29/18 15:56), Maninder Singh wrote:
> Hello Nick/Sergey,
> 
> Any suggestion or comments, so that we can change code and resend the patch?

Well... there were no replies to
 https://marc.info/?l=linux-kernel=152161450026771=2
and
 https://marc.info/?l=linux-kernel=152161860627974=2

-ss


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-29 Thread Maninder Singh

Hello Nick/Sergey,
 
Any suggestion or comments, so that we can change code and resend the patch?
 
> Hi Nick / Sergey,
> 
> 
> We have compared LZ4 Dyn with Original LZ4 using some samples of realtime 
>application data(4Kb)
> compressed/decompressed by ZRAM. For comparison we have used lzbench 
>(https://github.com/inikep/lzbench)
> we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to 
>avoid overhead 
> of checks. It seems in average case there is a saving of 3~4% in compression 
>ratio with almost same compression
> speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.
> 
> Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor 
>of ZRAM.
> 
> Original LZ4:
> sh-3.2# ./lzbench  -r  -elz4  data/
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name Compress. Decompress. Compr. size  Ratio Filename
> memcpy   2205 MB/s  2217 MB/s4096 100.00 data//data_1
> lz4 1.8.0 216 MB/s   761 MB/s2433  59.40 data//data_1
> lz4 1.8.0 269 MB/s   877 MB/s1873  45.73 data//data_2
> lz4 1.8.0 238 MB/s   575 MB/s2060  50.29 data//data_3
> lz4 1.8.0 321 MB/s  1015 MB/s1464  35.74 data//data_4
> lz4 1.8.0 464 MB/s  1090 MB/s 713  17.41 data//data_5
> lz4 1.8.0 296 MB/s   956 MB/s1597  38.99 data//data_6
> lz4 1.8.0 338 MB/s   994 MB/s2238  54.64 data//data_7
> lz4 1.8.0 705 MB/s  1172 MB/s 193   4.71 data//data_8
> lz4 1.8.0 404 MB/s  1150 MB/s1097  26.78 data//data_9
> lz4 1.8.0 216 MB/s   921 MB/s3183  77.71 data//data_10
> lz4 1.8.0 456 MB/s  1101 MB/s1011  24.68 data//data_11
> lz4 1.8.0 867 MB/s  1202 MB/s  37   0.90 data//data_12
> 
> 
> LZ4 Dynamic Offet:  
> sh-3.2# ./lzbench  -r  -elz4_dyn  data/
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name Compress. Decompress. Compr. size  Ratio Filename
> memcpy   2203 MB/s  2218 MB/s4096 100.00 data//data_1
> lz4 1.8.0 218 MB/s   693 MB/s2228  54.39 data//data_1
> lz4 1.8.0 273 MB/s   851 MB/s1739  42.46 data//data_2
> lz4 1.8.0 230 MB/s   526 MB/s1800  43.95 data//data_3
> lz4 1.8.0 321 MB/s   952 MB/s1357  33.13 data//data_4
> lz4 1.8.0 470 MB/s  1075 MB/s 664  16.21 data//data_5
> lz4 1.8.0 303 MB/s   964 MB/s1455  35.52 data//data_6
> lz4 1.8.0 345 MB/s   951 MB/s2126  51.90 data//data_7
> lz4 1.8.0 744 MB/s  1163 MB/s 177   4.32 data//data_8
> lz4 1.8.0 409 MB/s  1257 MB/s1033  25.22 data//data_9
> lz4 1.8.0 220 MB/s   857 MB/s3049  74.44 data//data_10
> lz4 1.8.0 464 MB/s  1105 MB/s 934  22.80 data//data_11
> lz4 1.8.0 874 MB/s  1194 MB/s  36   0.88 data//data_12
> 
> 
> LZ4 Dynamic Offset with 32K data:
> sh-3.2# ./lzbench -elz4_dyn data/data32k
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name Compress. Decompress. Compr. size  Ratio Filename
> memcpy   5285 MB/s  5283 MB/s   32768 100.00 data/data32k
> lz4 1.8.0 274 MB/s   995 MB/s   13435  41.00 data/data32k
> done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)
> 
> Original LZ4 with 32K data:
> sh-3.2# ./lzbench_orig -elz4 data/data32k
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name Compress. Decompress. Compr. size  Ratio Filename
> memcpy   4918 MB/s  5108 MB/s   32768 100.00 data/data32k
> lz4 1.8.0 276 MB/s  1045 MB/s   14492  44.23 data/data32k
> 
> LZO1x with 32K data (Default Compressor for ZRAM): 
> sh-3.2# ./lzbench -elzo1x,1 data/data32k
> lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
> Compressor name Compress. Decompress. Compr. size  Ratio Filename
> memcpy   5273 MB/s  5320 MB/s   32768 100.00 data/data32k
> lzo1x 2.09 -1 283 MB/s   465 MB/s   14292  43.62 data/data32k
> 
> Regards,
> Vaneet Narang
 
Thanks.
 


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-23 Thread Vaneet Narang
Hi Nick / Sergey,


We have compared LZ4 Dyn with Original LZ4 using some samples of realtime 
application data(4Kb)
compressed/decompressed by ZRAM. For comparison we have used lzbench 
(https://github.com/inikep/lzbench)
we have implemented dedicated LZ4 Dyn API & kept last literal length as 6 to 
avoid overhead 
of checks. It seems in average case there is a saving of 3~4% in compression 
ratio with almost same compression
speed and minor loss in decompression speed (~50MB/s) when compared with LZ4.

Comparison of Lz4 Dyn with LZO1x is also done as LZO1x is default compressor of 
ZRAM.

Original LZ4:
sh-3.2# ./lzbench  -r  -elz4  data/
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name Compress. Decompress. Compr. size  Ratio Filename
memcpy   2205 MB/s  2217 MB/s4096 100.00 data//data_1
lz4 1.8.0 216 MB/s   761 MB/s2433  59.40 data//data_1
lz4 1.8.0 269 MB/s   877 MB/s1873  45.73 data//data_2
lz4 1.8.0 238 MB/s   575 MB/s2060  50.29 data//data_3
lz4 1.8.0 321 MB/s  1015 MB/s1464  35.74 data//data_4
lz4 1.8.0 464 MB/s  1090 MB/s 713  17.41 data//data_5
lz4 1.8.0 296 MB/s   956 MB/s1597  38.99 data//data_6
lz4 1.8.0 338 MB/s   994 MB/s2238  54.64 data//data_7
lz4 1.8.0 705 MB/s  1172 MB/s 193   4.71 data//data_8
lz4 1.8.0 404 MB/s  1150 MB/s1097  26.78 data//data_9
lz4 1.8.0 216 MB/s   921 MB/s3183  77.71 data//data_10
lz4 1.8.0 456 MB/s  1101 MB/s1011  24.68 data//data_11
lz4 1.8.0 867 MB/s  1202 MB/s  37   0.90 data//data_12


LZ4 Dynamic Offet:  
sh-3.2# ./lzbench  -r  -elz4_dyn  data/
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name Compress. Decompress. Compr. size  Ratio Filename
memcpy   2203 MB/s  2218 MB/s4096 100.00 data//data_1
lz4 1.8.0 218 MB/s   693 MB/s2228  54.39 data//data_1
lz4 1.8.0 273 MB/s   851 MB/s1739  42.46 data//data_2
lz4 1.8.0 230 MB/s   526 MB/s1800  43.95 data//data_3
lz4 1.8.0 321 MB/s   952 MB/s1357  33.13 data//data_4
lz4 1.8.0 470 MB/s  1075 MB/s 664  16.21 data//data_5
lz4 1.8.0 303 MB/s   964 MB/s1455  35.52 data//data_6
lz4 1.8.0 345 MB/s   951 MB/s2126  51.90 data//data_7
lz4 1.8.0 744 MB/s  1163 MB/s 177   4.32 data//data_8
lz4 1.8.0 409 MB/s  1257 MB/s1033  25.22 data//data_9
lz4 1.8.0 220 MB/s   857 MB/s3049  74.44 data//data_10
lz4 1.8.0 464 MB/s  1105 MB/s 934  22.80 data//data_11
lz4 1.8.0 874 MB/s  1194 MB/s  36   0.88 data//data_12


LZ4 Dynamic Offset with 32K data:
sh-3.2# ./lzbench -elz4_dyn data/data32k
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name Compress. Decompress. Compr. size  Ratio Filename
memcpy   5285 MB/s  5283 MB/s   32768 100.00 data/data32k
lz4 1.8.0 274 MB/s   995 MB/s   13435  41.00 data/data32k
done... (cIters=1 dIters=1 cTime=1.0 dTime=2.0 chunkSize=1706MB cSpeed=0MB)

Original LZ4 with 32K data:
sh-3.2# ./lzbench_orig -elz4 data/data32k
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name Compress. Decompress. Compr. size  Ratio Filename
memcpy   4918 MB/s  5108 MB/s   32768 100.00 data/data32k
lz4 1.8.0 276 MB/s  1045 MB/s   14492  44.23 data/data32k

LZO1x with 32K data (Default Compressor for ZRAM): 
sh-3.2# ./lzbench -elzo1x,1 data/data32k
lzbench 1.7.3 (32-bit Linux)   Assembled by P.Skibinski
Compressor name Compress. Decompress. Compr. size  Ratio Filename
memcpy   5273 MB/s  5320 MB/s   32768 100.00 data/data32k
lzo1x 2.09 -1 283 MB/s   465 MB/s   14292  43.62 data/data32k

Regards,
Vaneet Narang


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky
On (03/21/18 19:56), Nick Terrell wrote:
[..]
> This seems like a reasonable extension to the algorithm, and it looks like
> LZ4_DYN is about a 5% improvement to compression ratio on your benchmark.
> The biggest question I have is if it is worthwhile to maintain a separate
> incompatible variant of LZ4 in the kernel without any upstream for a 5%
> gain? If we do want to go forward with this, we should perform more
> benchmarks.
> 
> I commented in the patch, but because the `dynOffset` variable isn't a
> compile time static in LZ4_decompress_generic(), I suspect that the patch
> causes a regression in decompression speed for both LZ4 and LZ4_DYN. You'll
> need to re-run the benchmarks to first show that LZ4 before the patch
> performs the same as LZ4 after the patch. Then re-run the LZ4 vs LZ4_DYN
> benchmarks.
> 
> I would also like to see a benchmark in user-space (with the code), so we
> can see the performance of LZ4 before and after the patch, as well as LZ4
> vs LZ4_DYN without anything else going on. I expect the extra branches in
> the decoding loop to have an impact on speed, and I would like to see how
> big the impact is without noise. 

Yes, I've been thinking about this. There are more branches now
("to dyn or not to dyn") in compression/decompression hot path but
we see less instructions and branches in perf output at the end.
And my guess is that we have a lot of noise from zram and zsmalloc.
The data is XXX bytes shorter with dyn enabled, so we use YYY less
moves and ZZZ less branches while we copy the data to zsmalloc and
from zsmalloc, and I may be that's the root cause of "performance
gain" that we see in zram-fio tests. So may be we need to run
benchmarks against lz4, not zram+lz4.

> CC-ing Yann Collet, the author of LZ4

Great, thanks.

-ss


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Nick Terrell
On (03/21/18 10:10), Maninder Singh wrote:
> LZ4 specification defines 2 byte offset length for 64 KB data.
> But in case of ZRAM we compress data per page and in most of
> architecture PAGE_SIZE is 4KB. So we can decide offset length based
> on actual offset value. For this we can reserve 1 bit to decide offset
> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
> else 1 byte is enough.
> 
> With this new implementation new offset value can be at MAX 32 KB.
> 
> Thus we can save more memory for compressed data.
> 
> results checked with new implementation:-
> 
> comression size for same input source
> (LZ4_DYN < LZO < LZ4)
> 
> LZO
> ===
> orig_data_size: 78917632
> compr_data_size: 15894668
> mem_used_total: 17117184
> 
> LZ4
> 
> orig_data_size: 78917632
> compr_data_size: 16310717
> mem_used_total: 17592320
> 
> LZ4_DYN
> ===
> orig_data_size: 78917632
> compr_data_size: 15520506
> mem_used_total: 16748544

This seems like a reasonable extension to the algorithm, and it looks like
LZ4_DYN is about a 5% improvement to compression ratio on your benchmark.
The biggest question I have is if it is worthwhile to maintain a separate
incompatible variant of LZ4 in the kernel without any upstream for a 5%
gain? If we do want to go forward with this, we should perform more
benchmarks.

I commented in the patch, but because the `dynOffset` variable isn't a
compile time static in LZ4_decompress_generic(), I suspect that the patch
causes a regression in decompression speed for both LZ4 and LZ4_DYN. You'll
need to re-run the benchmarks to first show that LZ4 before the patch
performs the same as LZ4 after the patch. Then re-run the LZ4 vs LZ4_DYN
benchmarks.

I would also like to see a benchmark in user-space (with the code), so we
can see the performance of LZ4 before and after the patch, as well as LZ4
vs LZ4_DYN without anything else going on. I expect the extra branches in
the decoding loop to have an impact on speed, and I would like to see how
big the impact is without noise. 

CC-ing Yann Collet, the author of LZ4



Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky
CC-ing Nick

Nick, can you take a look?

Message-IDs:
lkml.kernel.org/r/1521607242-3968-1-git-send-email-maninder...@samsung.com
lkml.kernel.org/r/1521607242-3968-2-git-send-email-maninder...@samsung.com

-ss


Re: [PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky
On (03/21/18 10:10), Maninder Singh wrote:
> (Added cover letter to avoid much text in patch description)
> 
> LZ4 specification defines 2 byte offset length for 64 KB data.
> But in case of ZRAM we compress data per page and in most of
> architecture PAGE_SIZE is 4KB. So we can decide offset length based
> on actual offset value. For this we can reserve 1 bit to decide offset
> length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
> else 1 byte is enough.

So what happens if I compress the data on a system with no dyn
offset and then send it over the network to a machine which has
dyn offset? Or, say, I have a USB stick with a compression enabled
FS, store files on a dyn offset enabled PC and then mount that USB
stick on a machine with no dyn offset support. And vice versa.

-ss