Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-03 Thread Sergey Senozhatsky
On (04/03/18 19:13), Vaneet Narang wrote:
> Hi Sergey,
> 
> >You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
> 2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
> 16 bits of offset. So only 15 Bits can be used to store offset value.

Yes, you are right. My bad, was thinking about something else.

> >'page should be less than 32KB', which I'm sure will be confusing. 
> lz4_dyn will work on bigger data length(> 32k) but in that case compression
> ratio may not be better than LZ4. This is same as LZ4 compressing data more 
> than 64K (16Bits). LZ4 can't store offset more than 64K similarly
> LZ4 dyn can't store offset more than 32K.

Then drop that `if PAGE_SIZE' thing. I'd rather do that stuff internally
in lz4... if it needed at all.

> >And you
> >rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
> >`#if (PAGE_SIZE < (32 * KB))'. 
> They don't need to add this code

Then drop it.

> >Apart from that, lz4_dyn supports only data
> >in up to page_size chunks. Suppose my system has page_size of less than 32K,
> >so I legitimately can enable lz4_dyn, but suppose that I will use it
> >somewhere where I don't work with page_size-d chunks. Will I able to just
> >do tfm->compress(src, sz) on random buffers? The whole thing looks to be
> >quite fragile.
> No thats not true, lz4_dyn can work for random buffers and it need not be 
> of page size chunks. There is no difference in Lz4 and Lz4 dyn working. 

You are right.

-ss


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-03 Thread Sergey Senozhatsky
On (04/03/18 19:13), Vaneet Narang wrote:
> Hi Sergey,
> 
> >You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
> 2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
> 16 bits of offset. So only 15 Bits can be used to store offset value.

Yes, you are right. My bad, was thinking about something else.

> >'page should be less than 32KB', which I'm sure will be confusing. 
> lz4_dyn will work on bigger data length(> 32k) but in that case compression
> ratio may not be better than LZ4. This is same as LZ4 compressing data more 
> than 64K (16Bits). LZ4 can't store offset more than 64K similarly
> LZ4 dyn can't store offset more than 32K.

Then drop that `if PAGE_SIZE' thing. I'd rather do that stuff internally
in lz4... if it needed at all.

> >And you
> >rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
> >`#if (PAGE_SIZE < (32 * KB))'. 
> They don't need to add this code

Then drop it.

> >Apart from that, lz4_dyn supports only data
> >in up to page_size chunks. Suppose my system has page_size of less than 32K,
> >so I legitimately can enable lz4_dyn, but suppose that I will use it
> >somewhere where I don't work with page_size-d chunks. Will I able to just
> >do tfm->compress(src, sz) on random buffers? The whole thing looks to be
> >quite fragile.
> No thats not true, lz4_dyn can work for random buffers and it need not be 
> of page size chunks. There is no difference in Lz4 and Lz4 dyn working. 

You are right.

-ss


RE: Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-03 Thread Vaneet Narang
Hi Sergey,

>You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
16 bits of offset. So only 15 Bits can be used to store offset value.

>'page should be less than 32KB', which I'm sure will be confusing. 
lz4_dyn will work on bigger data length(> 32k) but in that case compression
ratio may not be better than LZ4. This is same as LZ4 compressing data more 
than 64K (16Bits). LZ4 can't store offset more than 64K similarly
LZ4 dyn can't store offset more than 32K.

There is a handling in LZ4 code for this and similar handling added for LZ4 Dyn.

Handling in LZ4 Dyn: max_distance is 32K for lz4_dyn and will be 64K for LZ4
int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

>And you
>rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
>`#if (PAGE_SIZE < (32 * KB))'. 
They don't need to add this code, they just need to choose right compression 
algorithm 
that fits their requirement. If source length is less than 32K then lz4_dyn 
would give better compression ratio then LZ4. 

Considering ZRAM as a user for LZ4 dyn, we have added this check for PAGE_SIZE 
which
is source length. This code adds lz4 dyn to preferred list of compression 
algorithm 
when PAGE size is less than 32K. 

>Apart from that, lz4_dyn supports only data
>in up to page_size chunks. Suppose my system has page_size of less than 32K,
>so I legitimately can enable lz4_dyn, but suppose that I will use it
>somewhere where I don't work with page_size-d chunks. Will I able to just
>do tfm->compress(src, sz) on random buffers? The whole thing looks to be
>quite fragile.
No thats not true, lz4_dyn can work for random buffers and it need not be 
of page size chunks. There is no difference in Lz4 and Lz4 dyn working. 

Only difference is LZ4 dyn doesn't use fixed offset size, this concept already 
getting 
used in LZO which uses dynamic size of Metadata based on Match Length and Match 
offset.
It uses different markers for this which defines length of meta data.

lzodefs.h:

#define M1_MAX_OFFSET   0x0400
#define M2_MAX_OFFSET   0x0800
#define M3_MAX_OFFSET   0x4000
#define M4_MAX_OFFSET   0xbfff

#define M1_MIN_LEN  2
#define M1_MAX_LEN  2
#define M2_MIN_LEN  3
#define M2_MAX_LEN  8
#define M3_MIN_LEN  3
#define M3_MAX_LEN  33
#define M4_MIN_LEN  3
#define M4_MAX_LEN  9

#define M1_MARKER   0
#define M2_MARKER   64
#define M3_MARKER   32
#define M4_MARKER   16

Similarly for LZ4 Dyn, we have used 1 bit as a marker to determine offset 
length.

Thanks & Regards,
Vaneet Narang

rcptInfo.txt
Description: Binary data


RE: Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-03 Thread Vaneet Narang
Hi Sergey,

>You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
2 Byte offset is not shrinked to 1 byte, Its only 1 bit is reserved out of
16 bits of offset. So only 15 Bits can be used to store offset value.

>'page should be less than 32KB', which I'm sure will be confusing. 
lz4_dyn will work on bigger data length(> 32k) but in that case compression
ratio may not be better than LZ4. This is same as LZ4 compressing data more 
than 64K (16Bits). LZ4 can't store offset more than 64K similarly
LZ4 dyn can't store offset more than 32K.

There is a handling in LZ4 code for this and similar handling added for LZ4 Dyn.

Handling in LZ4 Dyn: max_distance is 32K for lz4_dyn and will be 64K for LZ4
int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

>And you
>rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
>`#if (PAGE_SIZE < (32 * KB))'. 
They don't need to add this code, they just need to choose right compression 
algorithm 
that fits their requirement. If source length is less than 32K then lz4_dyn 
would give better compression ratio then LZ4. 

Considering ZRAM as a user for LZ4 dyn, we have added this check for PAGE_SIZE 
which
is source length. This code adds lz4 dyn to preferred list of compression 
algorithm 
when PAGE size is less than 32K. 

>Apart from that, lz4_dyn supports only data
>in up to page_size chunks. Suppose my system has page_size of less than 32K,
>so I legitimately can enable lz4_dyn, but suppose that I will use it
>somewhere where I don't work with page_size-d chunks. Will I able to just
>do tfm->compress(src, sz) on random buffers? The whole thing looks to be
>quite fragile.
No thats not true, lz4_dyn can work for random buffers and it need not be 
of page size chunks. There is no difference in Lz4 and Lz4 dyn working. 

Only difference is LZ4 dyn doesn't use fixed offset size, this concept already 
getting 
used in LZO which uses dynamic size of Metadata based on Match Length and Match 
offset.
It uses different markers for this which defines length of meta data.

lzodefs.h:

#define M1_MAX_OFFSET   0x0400
#define M2_MAX_OFFSET   0x0800
#define M3_MAX_OFFSET   0x4000
#define M4_MAX_OFFSET   0xbfff

#define M1_MIN_LEN  2
#define M1_MAX_LEN  2
#define M2_MIN_LEN  3
#define M2_MAX_LEN  8
#define M3_MIN_LEN  3
#define M3_MAX_LEN  33
#define M4_MIN_LEN  3
#define M4_MAX_LEN  9

#define M1_MARKER   0
#define M2_MARKER   64
#define M3_MARKER   32
#define M4_MARKER   16

Similarly for LZ4 Dyn, we have used 1 bit as a marker to determine offset 
length.

Thanks & Regards,
Vaneet Narang

rcptInfo.txt
Description: Binary data


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-03 Thread Sergey Senozhatsky
On (04/02/18 11:21), Maninder Singh wrote:
[..]
> >>  static const char * const backends[] = {
> >>  "lzo",
> >>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
> >>  "lz4",
> >> +#if (PAGE_SIZE < (32 * KB))
> >> +"lz4_dyn",
> >> +#endif
> > 
> >This is not the list of supported algorithms. It's the list of
> >recommended algorithms. You can configure zram to use any of
> >available and known to Crypto API algorithms. Including lz4_dyn
> >on PAGE_SIZE > 32K systems.
> > 
> Yes, we want to integrate new compression(lz4_dyn) for ZRAM 
> only if PAGE_SIZE is less than 32KB to get maximum benefit. 
> so we added lz4_dyn to available list of ZRAM compression alhorithms.

Which is not what I was talking about.

You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
'page should be less than 32KB', which I'm sure will be confusing. And you
rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
`#if (PAGE_SIZE < (32 * KB))'. Apart from that, lz4_dyn supports only data
in up to page_size chunks. Suppose my system has page_size of less than 32K,
so I legitimately can enable lz4_dyn, but suppose that I will use it
somewhere where I don't work with page_size-d chunks. Will I able to just
do tfm->compress(src, sz) on random buffers? The whole thing looks to be
quite fragile.

-ss


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-03 Thread Sergey Senozhatsky
On (04/02/18 11:21), Maninder Singh wrote:
[..]
> >>  static const char * const backends[] = {
> >>  "lzo",
> >>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
> >>  "lz4",
> >> +#if (PAGE_SIZE < (32 * KB))
> >> +"lz4_dyn",
> >> +#endif
> > 
> >This is not the list of supported algorithms. It's the list of
> >recommended algorithms. You can configure zram to use any of
> >available and known to Crypto API algorithms. Including lz4_dyn
> >on PAGE_SIZE > 32K systems.
> > 
> Yes, we want to integrate new compression(lz4_dyn) for ZRAM 
> only if PAGE_SIZE is less than 32KB to get maximum benefit. 
> so we added lz4_dyn to available list of ZRAM compression alhorithms.

Which is not what I was talking about.

You shrink a 2 bytes offset down to a 1 byte offset, thus you enforce that
'page should be less than 32KB', which I'm sure will be confusing. And you
rely on lz4_dyn users to do the right thing - namely, to use that 'nice'
`#if (PAGE_SIZE < (32 * KB))'. Apart from that, lz4_dyn supports only data
in up to page_size chunks. Suppose my system has page_size of less than 32K,
so I legitimately can enable lz4_dyn, but suppose that I will use it
somewhere where I don't work with page_size-d chunks. Will I able to just
do tfm->compress(src, sz) on random buffers? The whole thing looks to be
quite fragile.

-ss


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-02 Thread Maninder Singh
Hi,

>> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
>> index 4ed0a78..5bc5aab 100644
>> --- a/drivers/block/zram/zcomp.c
>> +++ b/drivers/block/zram/zcomp.c
>> @@ -17,11 +17,15 @@
>>  #include 
>>  
>>  #include "zcomp.h"
>> +#define KB(1 << 10)
>>  
>>  static const char * const backends[] = {
>>  "lzo",
>>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>>  "lz4",
>> +#if (PAGE_SIZE < (32 * KB))
>> +"lz4_dyn",
>> +#endif
> 
>This is not the list of supported algorithms. It's the list of
>recommended algorithms. You can configure zram to use any of
>available and known to Crypto API algorithms. Including lz4_dyn
>on PAGE_SIZE > 32K systems.
> 
>-ss
 
Yes, we want to integrate new compression(lz4_dyn) for ZRAM 
only if PAGE_SIZE is less than 32KB to get maximum benefit. 
so we added lz4_dyn to available list of ZRAM compression alhorithms.

Thanks,
Manider Singh


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-04-02 Thread Maninder Singh
Hi,

>> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
>> index 4ed0a78..5bc5aab 100644
>> --- a/drivers/block/zram/zcomp.c
>> +++ b/drivers/block/zram/zcomp.c
>> @@ -17,11 +17,15 @@
>>  #include 
>>  
>>  #include "zcomp.h"
>> +#define KB(1 << 10)
>>  
>>  static const char * const backends[] = {
>>  "lzo",
>>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>>  "lz4",
>> +#if (PAGE_SIZE < (32 * KB))
>> +"lz4_dyn",
>> +#endif
> 
>This is not the list of supported algorithms. It's the list of
>recommended algorithms. You can configure zram to use any of
>available and known to Crypto API algorithms. Including lz4_dyn
>on PAGE_SIZE > 32K systems.
> 
>-ss
 
Yes, we want to integrate new compression(lz4_dyn) for ZRAM 
only if PAGE_SIZE is less than 32KB to get maximum benefit. 
so we added lz4_dyn to available list of ZRAM compression alhorithms.

Thanks,
Manider Singh


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-23 Thread Vaneet Narang

Hi Nick,

Thanks for your comments, Please check my reply to few of your comments.
I will be sharing benchmarking figures separately. 

> 
>> +if (curr_offset > 127) {
>> +curr_offset = (curr_offset << 1) | DYN_BIT;
>> +LZ4_writeLE16(op, (U16)curr_offset);
>> +op += 2;
>> +} else {
>> +curr_offset = curr_offset << 1;
>> +*op = (BYTE)curr_offset;
>> +op++;
>> +}
> 
>The standard way to do variable sized integers is to use the high-bit as
>the control bit, not the low-bit. Do you have benchmarks to show that using
>the low-bit is faster than using the high-bit? If so, lets comment in the
>code, if not lets use the high-bit.
> 
We are not sure about performance difference of using low or high bit but Since 
as per 
LZ4 specification, offset needs to be stored in little endian format so keeping 
Low bit 
as control bit makes it easier to retreive offset when offset is spread across 
two bytes.

offset = LZ4_readLE16(ip);
if (offset & 0x1) {
offset = offset >> 1; // Just one Right shift
ip += 2;
} else {
offset = (offset & 0xff) >> 1; // Only Two Operation for one byte offset
ip += 1;
}

So not sure if keeping High bit will make much difference as we will be needing 
same or more operations to get offset in case of High bit.

>>  /* copy literals */
>>  cpy = op + length;
>>  if (((endOnInput) && ((cpy > (partialDecoding ? oexit : 
>> oend - MFLIMIT))
>> -|| (ip + length > iend - (2 + 1 + LASTLITERALS
>> -|| ((!endOnInput) && (cpy > oend - 
>> WILDCOPYLENGTH))) {
>> +|| (ip + length > iend - (2 + LASTLITERALS
>> +|| ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 
>> 1))) {
>>  if (partialDecoding) {
>>  if (cpy > oend) {
>>  /*
>> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
>>  break;
>>  }
>>
>> -LZ4_wildCopy(op, ip, cpy);
>> +if (dynOffset && length < 4)
>> +LZ4_copy4(op, ip);
>> +else
>> +LZ4_wildCopy(op, ip, cpy);
>> +
> 
>The LZ4 format enforces that the last 5 bytes are literals so that
>LZ4_wildCopy() can be used here. I suspect that having this extra branch
>here for `dynOffset` mode hurts decompression speed.
> 
This check is made to handle one byte read overflow when decompressing last 
frame. wildCopy does 8 byte copy blindly.

Issue Case:
0 length literal followed by 1 byte offset and then it ends with one byte token 
and 5 Byte Last Literals.

With this combination only 7 bytes (1 Byte Offset + 1 Byte Token + 5 Byte 
Literal) remains at the end of 
input buffer so reading 8 bytes from input buffer results in 1 byte overflow.

Since 1 byte offset is not there in original LZ4, so this issue is not 
possible. To avoid overhead of this check, 
we planned to use 6 Byte as Last literal Length.

-#define LASTLITERALS5
+#define LASTLITERALS6

>>
>>  int LZ4_decompress_safe(const char *source, char *dest,
>> -int compressedSize, int maxDecompressedSize)
>> +int compressedSize, int maxDecompressedSize, bool dynOffset)
>>  {
>>  return LZ4_decompress_generic(source, dest, compressedSize,
>>  maxDecompressedSize, endOnInputSize, full, 0,
>> -noDict, (BYTE *)dest, NULL, 0);
>> +noDict, (BYTE *)dest, NULL, 0, dynOffset);
> 
>You'll need to use the same trick that LZ4_compress_fast() uses, by hard
>coding `dynOffset`. We want the compiler to generate too version of
>LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`.
>That way the tight loop won't the branches that check `dynOffset`.
> 
>if (dynOffset)
>return LZ4_decompress_generic(..., true);
>else
>return LZ4_decompress_generic(..., false);
> 
>Without this trick, I expect that this patch causes a regression to both
>LZ4 and LZ4_DYN decompression speed.

Since there is no backward compatibility of our solution with original LZ4 so we
are bit confused if we should make it as separate API to avoid overhead of 
dynOffset
checks every where in the code and also to avoid changing prototype of exported 
functions 
LZ4_decompress/LZ4_compress OR we should keep these checks to avoid redundant 
code. 
Kindly suggest

Thanks & Regards,
Vaneet Narang


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-23 Thread Vaneet Narang

Hi Nick,

Thanks for your comments, Please check my reply to few of your comments.
I will be sharing benchmarking figures separately. 

> 
>> +if (curr_offset > 127) {
>> +curr_offset = (curr_offset << 1) | DYN_BIT;
>> +LZ4_writeLE16(op, (U16)curr_offset);
>> +op += 2;
>> +} else {
>> +curr_offset = curr_offset << 1;
>> +*op = (BYTE)curr_offset;
>> +op++;
>> +}
> 
>The standard way to do variable sized integers is to use the high-bit as
>the control bit, not the low-bit. Do you have benchmarks to show that using
>the low-bit is faster than using the high-bit? If so, lets comment in the
>code, if not lets use the high-bit.
> 
We are not sure about performance difference of using low or high bit but Since 
as per 
LZ4 specification, offset needs to be stored in little endian format so keeping 
Low bit 
as control bit makes it easier to retreive offset when offset is spread across 
two bytes.

offset = LZ4_readLE16(ip);
if (offset & 0x1) {
offset = offset >> 1; // Just one Right shift
ip += 2;
} else {
offset = (offset & 0xff) >> 1; // Only Two Operation for one byte offset
ip += 1;
}

So not sure if keeping High bit will make much difference as we will be needing 
same or more operations to get offset in case of High bit.

>>  /* copy literals */
>>  cpy = op + length;
>>  if (((endOnInput) && ((cpy > (partialDecoding ? oexit : 
>> oend - MFLIMIT))
>> -|| (ip + length > iend - (2 + 1 + LASTLITERALS
>> -|| ((!endOnInput) && (cpy > oend - 
>> WILDCOPYLENGTH))) {
>> +|| (ip + length > iend - (2 + LASTLITERALS
>> +|| ((!endOnInput) && (cpy > oend - WILDCOPYLENGTH - 
>> 1))) {
>>  if (partialDecoding) {
>>  if (cpy > oend) {
>>  /*
>> @@ -188,13 +190,31 @@ static FORCE_INLINE int LZ4_decompress_generic(
>>  break;
>>  }
>>
>> -LZ4_wildCopy(op, ip, cpy);
>> +if (dynOffset && length < 4)
>> +LZ4_copy4(op, ip);
>> +else
>> +LZ4_wildCopy(op, ip, cpy);
>> +
> 
>The LZ4 format enforces that the last 5 bytes are literals so that
>LZ4_wildCopy() can be used here. I suspect that having this extra branch
>here for `dynOffset` mode hurts decompression speed.
> 
This check is made to handle one byte read overflow when decompressing last 
frame. wildCopy does 8 byte copy blindly.

Issue Case:
0 length literal followed by 1 byte offset and then it ends with one byte token 
and 5 Byte Last Literals.

With this combination only 7 bytes (1 Byte Offset + 1 Byte Token + 5 Byte 
Literal) remains at the end of 
input buffer so reading 8 bytes from input buffer results in 1 byte overflow.

Since 1 byte offset is not there in original LZ4, so this issue is not 
possible. To avoid overhead of this check, 
we planned to use 6 Byte as Last literal Length.

-#define LASTLITERALS5
+#define LASTLITERALS6

>>
>>  int LZ4_decompress_safe(const char *source, char *dest,
>> -int compressedSize, int maxDecompressedSize)
>> +int compressedSize, int maxDecompressedSize, bool dynOffset)
>>  {
>>  return LZ4_decompress_generic(source, dest, compressedSize,
>>  maxDecompressedSize, endOnInputSize, full, 0,
>> -noDict, (BYTE *)dest, NULL, 0);
>> +noDict, (BYTE *)dest, NULL, 0, dynOffset);
> 
>You'll need to use the same trick that LZ4_compress_fast() uses, by hard
>coding `dynOffset`. We want the compiler to generate too version of
>LZ4_decompress_generic(), one with `dynOffset` and one with `!dynOffset`.
>That way the tight loop won't the branches that check `dynOffset`.
> 
>if (dynOffset)
>return LZ4_decompress_generic(..., true);
>else
>return LZ4_decompress_generic(..., false);
> 
>Without this trick, I expect that this patch causes a regression to both
>LZ4 and LZ4_DYN decompression speed.

Since there is no backward compatibility of our solution with original LZ4 so we
are bit confused if we should make it as separate API to avoid overhead of 
dynOffset
checks every where in the code and also to avoid changing prototype of exported 
functions 
LZ4_decompress/LZ4_compress OR we should keep these checks to avoid redundant 
code. 
Kindly suggest

Thanks & Regards,
Vaneet Narang


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-22 Thread kbuild test robot
Hi Maninder,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6]
[cannot apply to next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Maninder-Singh/cover-letter-lz4-Implement-lz4-with-dynamic-offset-length/20180323-064137
config: i386-randconfig-s1-03221113 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/squashfs/lz4_wrapper.c: In function 'lz4_uncompress':
>> fs/squashfs/lz4_wrapper.c:110:8: error: too few arguments to function 
>> 'LZ4_decompress_safe'
 res = LZ4_decompress_safe(stream->input, stream->output,
   ^~~
   In file included from fs/squashfs/lz4_wrapper.c:13:0:
   include/linux/lz4.h:301:5: note: declared here
int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
^~~

vim +/LZ4_decompress_safe +110 fs/squashfs/lz4_wrapper.c

9c06a46f Phillip Lougher2014-11-27   91  
9c06a46f Phillip Lougher2014-11-27   92  
9c06a46f Phillip Lougher2014-11-27   93  static int lz4_uncompress(struct 
squashfs_sb_info *msblk, void *strm,
9c06a46f Phillip Lougher2014-11-27   94 struct buffer_head **bh, int b, 
int offset, int length,
9c06a46f Phillip Lougher2014-11-27   95 struct squashfs_page_actor 
*output)
9c06a46f Phillip Lougher2014-11-27   96  {
9c06a46f Phillip Lougher2014-11-27   97 struct squashfs_lz4 *stream = 
strm;
9c06a46f Phillip Lougher2014-11-27   98 void *buff = stream->input, 
*data;
9c06a46f Phillip Lougher2014-11-27   99 int avail, i, bytes = length, 
res;
9c06a46f Phillip Lougher2014-11-27  100  
9c06a46f Phillip Lougher2014-11-27  101 for (i = 0; i < b; i++) {
9c06a46f Phillip Lougher2014-11-27  102 avail = min(bytes, 
msblk->devblksize - offset);
9c06a46f Phillip Lougher2014-11-27  103 memcpy(buff, 
bh[i]->b_data + offset, avail);
9c06a46f Phillip Lougher2014-11-27  104 buff += avail;
9c06a46f Phillip Lougher2014-11-27  105 bytes -= avail;
9c06a46f Phillip Lougher2014-11-27  106 offset = 0;
9c06a46f Phillip Lougher2014-11-27  107 put_bh(bh[i]);
9c06a46f Phillip Lougher2014-11-27  108 }
9c06a46f Phillip Lougher2014-11-27  109  
d21b5ff1 Sven Schmidt   2017-02-24 @110 res = 
LZ4_decompress_safe(stream->input, stream->output,
d21b5ff1 Sven Schmidt   2017-02-24  111 length, output->length);
d21b5ff1 Sven Schmidt   2017-02-24  112  
d21b5ff1 Sven Schmidt   2017-02-24  113 if (res < 0)
9c06a46f Phillip Lougher2014-11-27  114 return -EIO;
9c06a46f Phillip Lougher2014-11-27  115  
d21b5ff1 Sven Schmidt   2017-02-24  116 bytes = res;
9c06a46f Phillip Lougher2014-11-27  117 data = 
squashfs_first_page(output);
9c06a46f Phillip Lougher2014-11-27  118 buff = stream->output;
9c06a46f Phillip Lougher2014-11-27  119 while (data) {
09cbfeaf Kirill A. Shutemov 2016-04-01  120 if (bytes <= PAGE_SIZE) 
{
9c06a46f Phillip Lougher2014-11-27  121 memcpy(data, 
buff, bytes);
9c06a46f Phillip Lougher2014-11-27  122 break;
9c06a46f Phillip Lougher2014-11-27  123 }
09cbfeaf Kirill A. Shutemov 2016-04-01  124 memcpy(data, buff, 
PAGE_SIZE);
09cbfeaf Kirill A. Shutemov 2016-04-01  125 buff += PAGE_SIZE;
09cbfeaf Kirill A. Shutemov 2016-04-01  126 bytes -= PAGE_SIZE;
9c06a46f Phillip Lougher2014-11-27  127 data = 
squashfs_next_page(output);
9c06a46f Phillip Lougher2014-11-27  128 }
9c06a46f Phillip Lougher2014-11-27  129 squashfs_finish_page(output);
9c06a46f Phillip Lougher2014-11-27  130  
d21b5ff1 Sven Schmidt   2017-02-24  131 return res;
9c06a46f Phillip Lougher2014-11-27  132  }
9c06a46f Phillip Lougher2014-11-27  133  

:: The code at line 110 was first introduced by commit
:: d21b5ff12df45a65bb220c7e8103a5f0f5609377 fs/pstore: fs/squashfs: change 
usage of LZ4 to work with new LZ4 version

:: TO: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-22 Thread kbuild test robot
Hi Maninder,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6]
[cannot apply to next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Maninder-Singh/cover-letter-lz4-Implement-lz4-with-dynamic-offset-length/20180323-064137
config: i386-randconfig-s1-03221113 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/squashfs/lz4_wrapper.c: In function 'lz4_uncompress':
>> fs/squashfs/lz4_wrapper.c:110:8: error: too few arguments to function 
>> 'LZ4_decompress_safe'
 res = LZ4_decompress_safe(stream->input, stream->output,
   ^~~
   In file included from fs/squashfs/lz4_wrapper.c:13:0:
   include/linux/lz4.h:301:5: note: declared here
int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
^~~

vim +/LZ4_decompress_safe +110 fs/squashfs/lz4_wrapper.c

9c06a46f Phillip Lougher2014-11-27   91  
9c06a46f Phillip Lougher2014-11-27   92  
9c06a46f Phillip Lougher2014-11-27   93  static int lz4_uncompress(struct 
squashfs_sb_info *msblk, void *strm,
9c06a46f Phillip Lougher2014-11-27   94 struct buffer_head **bh, int b, 
int offset, int length,
9c06a46f Phillip Lougher2014-11-27   95 struct squashfs_page_actor 
*output)
9c06a46f Phillip Lougher2014-11-27   96  {
9c06a46f Phillip Lougher2014-11-27   97 struct squashfs_lz4 *stream = 
strm;
9c06a46f Phillip Lougher2014-11-27   98 void *buff = stream->input, 
*data;
9c06a46f Phillip Lougher2014-11-27   99 int avail, i, bytes = length, 
res;
9c06a46f Phillip Lougher2014-11-27  100  
9c06a46f Phillip Lougher2014-11-27  101 for (i = 0; i < b; i++) {
9c06a46f Phillip Lougher2014-11-27  102 avail = min(bytes, 
msblk->devblksize - offset);
9c06a46f Phillip Lougher2014-11-27  103 memcpy(buff, 
bh[i]->b_data + offset, avail);
9c06a46f Phillip Lougher2014-11-27  104 buff += avail;
9c06a46f Phillip Lougher2014-11-27  105 bytes -= avail;
9c06a46f Phillip Lougher2014-11-27  106 offset = 0;
9c06a46f Phillip Lougher2014-11-27  107 put_bh(bh[i]);
9c06a46f Phillip Lougher2014-11-27  108 }
9c06a46f Phillip Lougher2014-11-27  109  
d21b5ff1 Sven Schmidt   2017-02-24 @110 res = 
LZ4_decompress_safe(stream->input, stream->output,
d21b5ff1 Sven Schmidt   2017-02-24  111 length, output->length);
d21b5ff1 Sven Schmidt   2017-02-24  112  
d21b5ff1 Sven Schmidt   2017-02-24  113 if (res < 0)
9c06a46f Phillip Lougher2014-11-27  114 return -EIO;
9c06a46f Phillip Lougher2014-11-27  115  
d21b5ff1 Sven Schmidt   2017-02-24  116 bytes = res;
9c06a46f Phillip Lougher2014-11-27  117 data = 
squashfs_first_page(output);
9c06a46f Phillip Lougher2014-11-27  118 buff = stream->output;
9c06a46f Phillip Lougher2014-11-27  119 while (data) {
09cbfeaf Kirill A. Shutemov 2016-04-01  120 if (bytes <= PAGE_SIZE) 
{
9c06a46f Phillip Lougher2014-11-27  121 memcpy(data, 
buff, bytes);
9c06a46f Phillip Lougher2014-11-27  122 break;
9c06a46f Phillip Lougher2014-11-27  123 }
09cbfeaf Kirill A. Shutemov 2016-04-01  124 memcpy(data, buff, 
PAGE_SIZE);
09cbfeaf Kirill A. Shutemov 2016-04-01  125 buff += PAGE_SIZE;
09cbfeaf Kirill A. Shutemov 2016-04-01  126 bytes -= PAGE_SIZE;
9c06a46f Phillip Lougher2014-11-27  127 data = 
squashfs_next_page(output);
9c06a46f Phillip Lougher2014-11-27  128 }
9c06a46f Phillip Lougher2014-11-27  129 squashfs_finish_page(output);
9c06a46f Phillip Lougher2014-11-27  130  
d21b5ff1 Sven Schmidt   2017-02-24  131 return res;
9c06a46f Phillip Lougher2014-11-27  132  }
9c06a46f Phillip Lougher2014-11-27  133  

:: The code at line 110 was first introduced by commit
:: d21b5ff12df45a65bb220c7e8103a5f0f5609377 fs/pstore: fs/squashfs: change 
usage of LZ4 to work with new LZ4 version

:: TO: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-22 Thread kbuild test robot
Hi Maninder,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6]
[cannot apply to next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Maninder-Singh/cover-letter-lz4-Implement-lz4-with-dynamic-offset-length/20180323-064137
config: i386-randconfig-x073-201811 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/pstore/platform.c: In function 'decompress_lz4':
>> fs/pstore/platform.c:357:8: error: too few arguments to function 
>> 'LZ4_decompress_safe'
 ret = LZ4_decompress_safe(in, out, inlen, outlen);
   ^~~
   In file included from fs/pstore/platform.c:38:0:
   include/linux/lz4.h:301:5: note: declared here
int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
^~~

vim +/LZ4_decompress_safe +357 fs/pstore/platform.c

8cfc8ddc Geliang Tang 2016-02-18  352  
8cfc8ddc Geliang Tang 2016-02-18  353  static int decompress_lz4(void *in, void 
*out, size_t inlen, size_t outlen)
8cfc8ddc Geliang Tang 2016-02-18  354  {
8cfc8ddc Geliang Tang 2016-02-18  355   int ret;
8cfc8ddc Geliang Tang 2016-02-18  356  
d21b5ff1 Sven Schmidt 2017-02-24 @357   ret = LZ4_decompress_safe(in, out, 
inlen, outlen);
d21b5ff1 Sven Schmidt 2017-02-24  358   if (ret < 0) {
d21b5ff1 Sven Schmidt 2017-02-24  359   /*
d21b5ff1 Sven Schmidt 2017-02-24  360* LZ4_decompress_safe will 
return an error code
d21b5ff1 Sven Schmidt 2017-02-24  361* (< 0) if decompression failed
d21b5ff1 Sven Schmidt 2017-02-24  362*/
d21b5ff1 Sven Schmidt 2017-02-24  363   pr_err("LZ4_decompress_safe 
error, ret = %d!\n", ret);
8cfc8ddc Geliang Tang 2016-02-18  364   return -EIO;
8cfc8ddc Geliang Tang 2016-02-18  365   }
8cfc8ddc Geliang Tang 2016-02-18  366  
d21b5ff1 Sven Schmidt 2017-02-24  367   return ret;
8cfc8ddc Geliang Tang 2016-02-18  368  }
8cfc8ddc Geliang Tang 2016-02-18  369  

:: The code at line 357 was first introduced by commit
:: d21b5ff12df45a65bb220c7e8103a5f0f5609377 fs/pstore: fs/squashfs: change 
usage of LZ4 to work with new LZ4 version

:: TO: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-22 Thread kbuild test robot
Hi Maninder,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc6]
[cannot apply to next-20180322]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Maninder-Singh/cover-letter-lz4-Implement-lz4-with-dynamic-offset-length/20180323-064137
config: i386-randconfig-x073-201811 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   fs/pstore/platform.c: In function 'decompress_lz4':
>> fs/pstore/platform.c:357:8: error: too few arguments to function 
>> 'LZ4_decompress_safe'
 ret = LZ4_decompress_safe(in, out, inlen, outlen);
   ^~~
   In file included from fs/pstore/platform.c:38:0:
   include/linux/lz4.h:301:5: note: declared here
int LZ4_decompress_safe(const char *source, char *dest, int compressedSize,
^~~

vim +/LZ4_decompress_safe +357 fs/pstore/platform.c

8cfc8ddc Geliang Tang 2016-02-18  352  
8cfc8ddc Geliang Tang 2016-02-18  353  static int decompress_lz4(void *in, void 
*out, size_t inlen, size_t outlen)
8cfc8ddc Geliang Tang 2016-02-18  354  {
8cfc8ddc Geliang Tang 2016-02-18  355   int ret;
8cfc8ddc Geliang Tang 2016-02-18  356  
d21b5ff1 Sven Schmidt 2017-02-24 @357   ret = LZ4_decompress_safe(in, out, 
inlen, outlen);
d21b5ff1 Sven Schmidt 2017-02-24  358   if (ret < 0) {
d21b5ff1 Sven Schmidt 2017-02-24  359   /*
d21b5ff1 Sven Schmidt 2017-02-24  360* LZ4_decompress_safe will 
return an error code
d21b5ff1 Sven Schmidt 2017-02-24  361* (< 0) if decompression failed
d21b5ff1 Sven Schmidt 2017-02-24  362*/
d21b5ff1 Sven Schmidt 2017-02-24  363   pr_err("LZ4_decompress_safe 
error, ret = %d!\n", ret);
8cfc8ddc Geliang Tang 2016-02-18  364   return -EIO;
8cfc8ddc Geliang Tang 2016-02-18  365   }
8cfc8ddc Geliang Tang 2016-02-18  366  
d21b5ff1 Sven Schmidt 2017-02-24  367   return ret;
8cfc8ddc Geliang Tang 2016-02-18  368  }
8cfc8ddc Geliang Tang 2016-02-18  369  

:: The code at line 357 was first introduced by commit
:: d21b5ff12df45a65bb220c7e8103a5f0f5609377 fs/pstore: fs/squashfs: change 
usage of LZ4 to work with new LZ4 version

:: TO: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Maninder Singh
CC: Vaneet Narang.

 
On (03/21/18 10:10), Maninder Singh wrote:
> diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
> index cc7b6d4..185c358 100644
> --- a/lib/lz4/lz4_compress.c
> +++ b/lib/lz4/lz4_compress.c
> @@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
>  const tableType_t tableType,
>  const dict_directive dict,
>  const dictIssue_directive dictIssue,
> -const U32 acceleration)
> +const U32 acceleration,
> +const Dynamic_Offset dynOffset)
>  {
>  const BYTE *ip = (const BYTE *) source;
>  const BYTE *base;
> @@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  BYTE *op = (BYTE *) dest;
>  BYTE * const olimit = op + maxOutputSize;
> +int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;
 
Lets mark this variable `const`. If the compiler doesn't realize that this
variable and `dynOffset` are compile time constants, I expect the speed to
be impacted.
 
>
>  U32 forwardH;
>  size_t refDelta = 0;
> @@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>  for ( ; ; ) {
>  const BYTE *match;
>  BYTE *token;
> +int curr_offset;
>
>  /* Find a match */
>  {
> @@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>  : 0)
>  || ((tableType == byU16)
>  ? 0
> -: (match + MAX_DISTANCE < ip))
> +: (match + max_distance < ip))
>  || (LZ4_read32(match + refDelta)
>  != LZ4_read32(ip)));
>  }
> @@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  _next_match:
>  /* Encode Offset */
> -LZ4_writeLE16(op, (U16)(ip - match));
> -op += 2;
> +if (dynOffset) {
> +curr_offset = (U16)(ip - match);
> +
> +/*
> + * If Ofsset is greater than 127, we need 2 bytes
> + * to store it. Otherwise 1 byte is enough.
> + */
> +if (curr_offset > 127) {
> +curr_offset = (curr_offset << 1) | DYN_BIT;
> +LZ4_writeLE16(op, (U16)curr_offset);
> +op += 2;
> +} else {
> +curr_offset = curr_offset << 1;
> +*op = (BYTE)curr_offset;
> +op++;
> +}
 
The standard way to do variable sized integers is to use the high-bit as
the control bit, not the low-bit. Do you have benchmarks to show that using
the low-bit is faster than using the high-bit? If so, lets comment in the
code, if not lets use the high-bit.
 
> +} else {
> +LZ4_writeLE16(op, (U16)(ip - match));
> +op += 2;
> +}
>
>  /* Encode MatchLength */
>  {
> @@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize, 0,
>  noLimit, byU16, noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  else
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize, 0,
>  noLimit, tableType, noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  } else {
>  if (inputSize < LZ4_64Klimit)
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize,
>  maxOutputSize, limitedOutput, byU16, noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  else
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize,
>  maxOutputSize, limitedOutput, tableType, 
>noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  }
>  }
>
> +static int LZ4_compress_fast_extState_dynamic(
> +void *state,
> +const char *source,
> +char *dest,
> 

Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Maninder Singh
CC: Vaneet Narang.

 
On (03/21/18 10:10), Maninder Singh wrote:
> diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
> index cc7b6d4..185c358 100644
> --- a/lib/lz4/lz4_compress.c
> +++ b/lib/lz4/lz4_compress.c
> @@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
>  const tableType_t tableType,
>  const dict_directive dict,
>  const dictIssue_directive dictIssue,
> -const U32 acceleration)
> +const U32 acceleration,
> +const Dynamic_Offset dynOffset)
>  {
>  const BYTE *ip = (const BYTE *) source;
>  const BYTE *base;
> @@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  BYTE *op = (BYTE *) dest;
>  BYTE * const olimit = op + maxOutputSize;
> +int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;
 
Lets mark this variable `const`. If the compiler doesn't realize that this
variable and `dynOffset` are compile time constants, I expect the speed to
be impacted.
 
>
>  U32 forwardH;
>  size_t refDelta = 0;
> @@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>  for ( ; ; ) {
>  const BYTE *match;
>  BYTE *token;
> +int curr_offset;
>
>  /* Find a match */
>  {
> @@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>  : 0)
>  || ((tableType == byU16)
>  ? 0
> -: (match + MAX_DISTANCE < ip))
> +: (match + max_distance < ip))
>  || (LZ4_read32(match + refDelta)
>  != LZ4_read32(ip)));
>  }
> @@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  _next_match:
>  /* Encode Offset */
> -LZ4_writeLE16(op, (U16)(ip - match));
> -op += 2;
> +if (dynOffset) {
> +curr_offset = (U16)(ip - match);
> +
> +/*
> + * If Ofsset is greater than 127, we need 2 bytes
> + * to store it. Otherwise 1 byte is enough.
> + */
> +if (curr_offset > 127) {
> +curr_offset = (curr_offset << 1) | DYN_BIT;
> +LZ4_writeLE16(op, (U16)curr_offset);
> +op += 2;
> +} else {
> +curr_offset = curr_offset << 1;
> +*op = (BYTE)curr_offset;
> +op++;
> +}
 
The standard way to do variable sized integers is to use the high-bit as
the control bit, not the low-bit. Do you have benchmarks to show that using
the low-bit is faster than using the high-bit? If so, lets comment in the
code, if not lets use the high-bit.
 
> +} else {
> +LZ4_writeLE16(op, (U16)(ip - match));
> +op += 2;
> +}
>
>  /* Encode MatchLength */
>  {
> @@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize, 0,
>  noLimit, byU16, noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  else
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize, 0,
>  noLimit, tableType, noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  } else {
>  if (inputSize < LZ4_64Klimit)
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize,
>  maxOutputSize, limitedOutput, byU16, noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  else
>  return LZ4_compress_generic(ctx, source,
>  dest, inputSize,
>  maxOutputSize, limitedOutput, tableType, 
>noDict,
> -noDictIssue, acceleration);
> +noDictIssue, acceleration, NoDynOffset);
>  }
>  }
>
> +static int LZ4_compress_fast_extState_dynamic(
> +void *state,
> +const char *source,
> +char *dest,
> 

Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Nick Terrell
On (03/21/18 10:10), Maninder Singh wrote:
> diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
> index cc7b6d4..185c358 100644
> --- a/lib/lz4/lz4_compress.c
> +++ b/lib/lz4/lz4_compress.c
> @@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
>   const tableType_t tableType,
>   const dict_directive dict,
>   const dictIssue_directive dictIssue,
> - const U32 acceleration)
> + const U32 acceleration,
> + const Dynamic_Offset dynOffset)
>  {
>   const BYTE *ip = (const BYTE *) source;
>   const BYTE *base;
> @@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>   BYTE *op = (BYTE *) dest;
>   BYTE * const olimit = op + maxOutputSize;
> + int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

Lets mark this variable `const`. If the compiler doesn't realize that this
variable and `dynOffset` are compile time constants, I expect the speed to
be impacted.

>
>   U32 forwardH;
>   size_t refDelta = 0;
> @@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>   for ( ; ; ) {
>   const BYTE *match;
>   BYTE *token;
> + int curr_offset;
>
>   /* Find a match */
>   {
> @@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>   : 0)
>   || ((tableType == byU16)
>   ? 0
> - : (match + MAX_DISTANCE < ip))
> + : (match + max_distance < ip))
>   || (LZ4_read32(match + refDelta)
>   != LZ4_read32(ip)));
>   }
> @@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  _next_match:
>   /* Encode Offset */
> - LZ4_writeLE16(op, (U16)(ip - match));
> - op += 2;
> + if (dynOffset) {
> + curr_offset = (U16)(ip - match);
> +
> + /*
> +  * If Ofsset is greater than 127, we need 2 bytes
> +  * to store it. Otherwise 1 byte is enough.
> +  */
> + if (curr_offset > 127) {
> + curr_offset = (curr_offset << 1) | DYN_BIT;
> + LZ4_writeLE16(op, (U16)curr_offset);
> + op += 2;
> + } else {
> + curr_offset = curr_offset << 1;
> + *op = (BYTE)curr_offset;
> + op++;
> + }

The standard way to do variable sized integers is to use the high-bit as
the control bit, not the low-bit. Do you have benchmarks to show that using
the low-bit is faster than using the high-bit? If so, lets comment in the
code, if not lets use the high-bit.

> + } else {
> + LZ4_writeLE16(op, (U16)(ip - match));
> + op += 2;
> + }
>
>   /* Encode MatchLength */
>   {
> @@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize, 0,
>   noLimit, byU16, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   else
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize, 0,
>   noLimit, tableType, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   } else {
>   if (inputSize < LZ4_64Klimit)
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize,
>   maxOutputSize, limitedOutput, byU16, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   else
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize,
>   maxOutputSize, limitedOutput, tableType, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   }
>  }
>
> +static int LZ4_compress_fast_extState_dynamic(
> + void *state,
> + const char *source,
> + char *dest,
> + int inputSize,
> + int maxOutputSize,
> + int acceleration)
> +{
> + LZ4_stream_t_internal *ctx = &((LZ4_stream_t 
> *)state)->internal_donotuse;
> +
> + LZ4_resetStream((LZ4_stream_t *)state);
> +
> + if (acceleration < 1)
> +

Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Nick Terrell
On (03/21/18 10:10), Maninder Singh wrote:
> diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
> index cc7b6d4..185c358 100644
> --- a/lib/lz4/lz4_compress.c
> +++ b/lib/lz4/lz4_compress.c
> @@ -183,7 +183,8 @@ static FORCE_INLINE int LZ4_compress_generic(
>   const tableType_t tableType,
>   const dict_directive dict,
>   const dictIssue_directive dictIssue,
> - const U32 acceleration)
> + const U32 acceleration,
> + const Dynamic_Offset dynOffset)
>  {
>   const BYTE *ip = (const BYTE *) source;
>   const BYTE *base;
> @@ -199,6 +200,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>   BYTE *op = (BYTE *) dest;
>   BYTE * const olimit = op + maxOutputSize;
> + int max_distance = dynOffset ? MAX_DISTANCE_DYN : MAX_DISTANCE;

Lets mark this variable `const`. If the compiler doesn't realize that this
variable and `dynOffset` are compile time constants, I expect the speed to
be impacted.

>
>   U32 forwardH;
>   size_t refDelta = 0;
> @@ -245,6 +247,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>   for ( ; ; ) {
>   const BYTE *match;
>   BYTE *token;
> + int curr_offset;
>
>   /* Find a match */
>   {
> @@ -285,7 +288,7 @@ static FORCE_INLINE int LZ4_compress_generic(
>   : 0)
>   || ((tableType == byU16)
>   ? 0
> - : (match + MAX_DISTANCE < ip))
> + : (match + max_distance < ip))
>   || (LZ4_read32(match + refDelta)
>   != LZ4_read32(ip)));
>   }
> @@ -328,8 +331,26 @@ static FORCE_INLINE int LZ4_compress_generic(
>
>  _next_match:
>   /* Encode Offset */
> - LZ4_writeLE16(op, (U16)(ip - match));
> - op += 2;
> + if (dynOffset) {
> + curr_offset = (U16)(ip - match);
> +
> + /*
> +  * If Ofsset is greater than 127, we need 2 bytes
> +  * to store it. Otherwise 1 byte is enough.
> +  */
> + if (curr_offset > 127) {
> + curr_offset = (curr_offset << 1) | DYN_BIT;
> + LZ4_writeLE16(op, (U16)curr_offset);
> + op += 2;
> + } else {
> + curr_offset = curr_offset << 1;
> + *op = (BYTE)curr_offset;
> + op++;
> + }

The standard way to do variable sized integers is to use the high-bit as
the control bit, not the low-bit. Do you have benchmarks to show that using
the low-bit is faster than using the high-bit? If so, lets comment in the
code, if not lets use the high-bit.

> + } else {
> + LZ4_writeLE16(op, (U16)(ip - match));
> + op += 2;
> + }
>
>   /* Encode MatchLength */
>   {
> @@ -480,39 +501,70 @@ static int LZ4_compress_fast_extState(
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize, 0,
>   noLimit, byU16, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   else
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize, 0,
>   noLimit, tableType, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   } else {
>   if (inputSize < LZ4_64Klimit)
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize,
>   maxOutputSize, limitedOutput, byU16, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   else
>   return LZ4_compress_generic(ctx, source,
>   dest, inputSize,
>   maxOutputSize, limitedOutput, tableType, noDict,
> - noDictIssue, acceleration);
> + noDictIssue, acceleration, NoDynOffset);
>   }
>  }
>
> +static int LZ4_compress_fast_extState_dynamic(
> + void *state,
> + const char *source,
> + char *dest,
> + int inputSize,
> + int maxOutputSize,
> + int acceleration)
> +{
> + LZ4_stream_t_internal *ctx = &((LZ4_stream_t 
> *)state)->internal_donotuse;
> +
> + LZ4_resetStream((LZ4_stream_t *)state);
> +
> + if (acceleration < 1)
> +

Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky
On (03/21/18 10:10), Maninder Singh wrote:
[..]
> +static struct crypto_alg alg_lz4_dyn = {
> + .cra_name   = "lz4_dyn",
> + .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
> + .cra_ctxsize= sizeof(struct lz4_ctx),
> + .cra_module = THIS_MODULE,
> + .cra_list   = LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
> + .cra_init   = lz4_init,
> + .cra_exit   = lz4_exit,
> + .cra_u  = { .compress = {
> + .coa_compress   = lz4_compress_crypto_dynamic,
> + .coa_decompress = lz4_decompress_crypto_dynamic } }
> +};

[..]

> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 4ed0a78..5bc5aab 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -17,11 +17,15 @@
>  #include 
>  
>  #include "zcomp.h"
> +#define KB   (1 << 10)
>  
>  static const char * const backends[] = {
>   "lzo",
>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>   "lz4",
> +#if (PAGE_SIZE < (32 * KB))
> + "lz4_dyn",
> +#endif

This is not the list of supported algorithms. It's the list of
recommended algorithms. You can configure zram to use any of
available and known to Crypto API algorithms. Including lz4_dyn
on PAGE_SIZE > 32K systems.

-ss


Re: [PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-21 Thread Sergey Senozhatsky
On (03/21/18 10:10), Maninder Singh wrote:
[..]
> +static struct crypto_alg alg_lz4_dyn = {
> + .cra_name   = "lz4_dyn",
> + .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
> + .cra_ctxsize= sizeof(struct lz4_ctx),
> + .cra_module = THIS_MODULE,
> + .cra_list   = LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
> + .cra_init   = lz4_init,
> + .cra_exit   = lz4_exit,
> + .cra_u  = { .compress = {
> + .coa_compress   = lz4_compress_crypto_dynamic,
> + .coa_decompress = lz4_decompress_crypto_dynamic } }
> +};

[..]

> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index 4ed0a78..5bc5aab 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -17,11 +17,15 @@
>  #include 
>  
>  #include "zcomp.h"
> +#define KB   (1 << 10)
>  
>  static const char * const backends[] = {
>   "lzo",
>  #if IS_ENABLED(CONFIG_CRYPTO_LZ4)
>   "lz4",
> +#if (PAGE_SIZE < (32 * KB))
> + "lz4_dyn",
> +#endif

This is not the list of supported algorithms. It's the list of
recommended algorithms. You can configure zram to use any of
available and known to Crypto API algorithms. Including lz4_dyn
on PAGE_SIZE > 32K systems.

-ss


[PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-20 Thread Maninder Singh
LZ4 specification defines 2 byte offset length for 64 KB data.
But in case of ZRAM we compress data per page and in most of
architecture PAGE_SIZE is 4KB. So we can decide offset length based
on actual offset value. For this we can reserve 1 bit to decide offset
length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
else 1 byte is enough.

With this new implementation new offset value can be at MAX 32 KB.

Thus we can save more memory for compressed data.

results checked with new implementation:-
LZO
===
orig_data_size: 78917632
compr_data_size: 15894668
mem_used_total: 17117184

LZ4

orig_data_size: 78917632
compr_data_size: 16310717
mem_used_total: 17592320

LZ4_DYN
===
orig_data_size: 78917632
compr_data_size: 15520506
mem_used_total: 16748544

Signed-off-by: Maninder Singh 
Signed-off-by: Vaneet Narang 
---
 crypto/lz4.c   |   64 -
 drivers/block/zram/zcomp.c |4 ++
 fs/pstore/platform.c   |2 +-
 include/linux/lz4.h|   15 ++--
 lib/decompress_unlz4.c |2 +-
 lib/lz4/lz4_compress.c |   84 +++
 lib/lz4/lz4_decompress.c   |   56 -
 lib/lz4/lz4defs.h  |   11 ++
 8 files changed, 197 insertions(+), 41 deletions(-)

diff --git a/crypto/lz4.c b/crypto/lz4.c
index 2ce2660..f1a8a20 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -67,7 +67,20 @@ static int __lz4_compress_crypto(const u8 *src, unsigned int 
slen,
 u8 *dst, unsigned int *dlen, void *ctx)
 {
int out_len = LZ4_compress_default(src, dst,
-   slen, *dlen, ctx);
+   slen, *dlen, ctx, false);
+
+   if (!out_len)
+   return -EINVAL;
+
+   *dlen = out_len;
+   return 0;
+}
+
+static int __lz4_compress_crypto_dynamic(const u8 *src, unsigned int slen,
+u8 *dst, unsigned int *dlen, void *ctx)
+{
+   int out_len = LZ4_compress_default(src, dst,
+   slen, *dlen, ctx, true);
 
if (!out_len)
return -EINVAL;
@@ -91,10 +104,30 @@ static int lz4_compress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
 }
 
+static int lz4_compress_crypto_dynamic(struct crypto_tfm *tfm, const u8 *src,
+  unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+   struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   return __lz4_compress_crypto_dynamic(src, slen, dst, dlen, 
ctx->lz4_comp_mem);
+}
+
 static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
-   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen);
+   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen, false);
+
+   if (out_len < 0)
+   return -EINVAL;
+
+   *dlen = out_len;
+   return 0;
+}
+
+static int __lz4_decompress_crypto_dynamic(const u8 *src, unsigned int slen,
+  u8 *dst, unsigned int *dlen, void *ctx)
+{
+   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen, true);
 
if (out_len < 0)
return -EINVAL;
@@ -117,6 +150,13 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
 }
 
+static int lz4_decompress_crypto_dynamic(struct crypto_tfm *tfm, const u8 *src,
+unsigned int slen, u8 *dst,
+unsigned int *dlen)
+{
+   return __lz4_decompress_crypto_dynamic(src, slen, dst, dlen, NULL);
+}
+
 static struct crypto_alg alg_lz4 = {
.cra_name   = "lz4",
.cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
@@ -130,6 +170,19 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
.coa_decompress = lz4_decompress_crypto } }
 };
 
+static struct crypto_alg alg_lz4_dyn = {
+   .cra_name   = "lz4_dyn",
+   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
+   .cra_ctxsize= sizeof(struct lz4_ctx),
+   .cra_module = THIS_MODULE,
+   .cra_list   = LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
+   .cra_init   = lz4_init,
+   .cra_exit   = lz4_exit,
+   .cra_u  = { .compress = {
+   .coa_compress   = lz4_compress_crypto_dynamic,
+   .coa_decompress = lz4_decompress_crypto_dynamic } }
+};
+
 static struct scomp_alg scomp = {
.alloc_ctx  = lz4_alloc_ctx,
.free_ctx   = lz4_free_ctx,
@@ -150,9 +203,16 @@ static int __init lz4_mod_init(void)
if (ret)
return ret;
 
+   ret = crypto_register_alg(_lz4_dyn);
+   if (ret) {

[PATCH 1/1] lz4: Implement lz4 with dynamic offset length.

2018-03-20 Thread Maninder Singh
LZ4 specification defines 2 byte offset length for 64 KB data.
But in case of ZRAM we compress data per page and in most of
architecture PAGE_SIZE is 4KB. So we can decide offset length based
on actual offset value. For this we can reserve 1 bit to decide offset
length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
else 1 byte is enough.

With this new implementation new offset value can be at MAX 32 KB.

Thus we can save more memory for compressed data.

results checked with new implementation:-
LZO
===
orig_data_size: 78917632
compr_data_size: 15894668
mem_used_total: 17117184

LZ4

orig_data_size: 78917632
compr_data_size: 16310717
mem_used_total: 17592320

LZ4_DYN
===
orig_data_size: 78917632
compr_data_size: 15520506
mem_used_total: 16748544

Signed-off-by: Maninder Singh 
Signed-off-by: Vaneet Narang 
---
 crypto/lz4.c   |   64 -
 drivers/block/zram/zcomp.c |4 ++
 fs/pstore/platform.c   |2 +-
 include/linux/lz4.h|   15 ++--
 lib/decompress_unlz4.c |2 +-
 lib/lz4/lz4_compress.c |   84 +++
 lib/lz4/lz4_decompress.c   |   56 -
 lib/lz4/lz4defs.h  |   11 ++
 8 files changed, 197 insertions(+), 41 deletions(-)

diff --git a/crypto/lz4.c b/crypto/lz4.c
index 2ce2660..f1a8a20 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -67,7 +67,20 @@ static int __lz4_compress_crypto(const u8 *src, unsigned int 
slen,
 u8 *dst, unsigned int *dlen, void *ctx)
 {
int out_len = LZ4_compress_default(src, dst,
-   slen, *dlen, ctx);
+   slen, *dlen, ctx, false);
+
+   if (!out_len)
+   return -EINVAL;
+
+   *dlen = out_len;
+   return 0;
+}
+
+static int __lz4_compress_crypto_dynamic(const u8 *src, unsigned int slen,
+u8 *dst, unsigned int *dlen, void *ctx)
+{
+   int out_len = LZ4_compress_default(src, dst,
+   slen, *dlen, ctx, true);
 
if (!out_len)
return -EINVAL;
@@ -91,10 +104,30 @@ static int lz4_compress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
 }
 
+static int lz4_compress_crypto_dynamic(struct crypto_tfm *tfm, const u8 *src,
+  unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+   struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   return __lz4_compress_crypto_dynamic(src, slen, dst, dlen, 
ctx->lz4_comp_mem);
+}
+
 static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
-   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen);
+   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen, false);
+
+   if (out_len < 0)
+   return -EINVAL;
+
+   *dlen = out_len;
+   return 0;
+}
+
+static int __lz4_decompress_crypto_dynamic(const u8 *src, unsigned int slen,
+  u8 *dst, unsigned int *dlen, void *ctx)
+{
+   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen, true);
 
if (out_len < 0)
return -EINVAL;
@@ -117,6 +150,13 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
 }
 
+static int lz4_decompress_crypto_dynamic(struct crypto_tfm *tfm, const u8 *src,
+unsigned int slen, u8 *dst,
+unsigned int *dlen)
+{
+   return __lz4_decompress_crypto_dynamic(src, slen, dst, dlen, NULL);
+}
+
 static struct crypto_alg alg_lz4 = {
.cra_name   = "lz4",
.cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
@@ -130,6 +170,19 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
.coa_decompress = lz4_decompress_crypto } }
 };
 
+static struct crypto_alg alg_lz4_dyn = {
+   .cra_name   = "lz4_dyn",
+   .cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
+   .cra_ctxsize= sizeof(struct lz4_ctx),
+   .cra_module = THIS_MODULE,
+   .cra_list   = LIST_HEAD_INIT(alg_lz4_dyn.cra_list),
+   .cra_init   = lz4_init,
+   .cra_exit   = lz4_exit,
+   .cra_u  = { .compress = {
+   .coa_compress   = lz4_compress_crypto_dynamic,
+   .coa_decompress = lz4_decompress_crypto_dynamic } }
+};
+
 static struct scomp_alg scomp = {
.alloc_ctx  = lz4_alloc_ctx,
.free_ctx   = lz4_free_ctx,
@@ -150,9 +203,16 @@ static int __init lz4_mod_init(void)
if (ret)
return ret;
 
+   ret = crypto_register_alg(_lz4_dyn);
+   if (ret) {
+   crypto_unregister_alg(_lz4);
+