Re: [RFC] LZO1X de/compression support

2007-05-23 Thread Jeremy Fitzhardinge
Bernd Petrovitsch wrote:
> The "register" keyword is and was always from start *at most* a hint to
> the C compiler to use a register for that variable (similar to "inline"
> BTW).
> So every C compiler is allowed to simply ignore the "register" for any
> reason - be it "not implemented" or "the compiler knows better".
> Trivial reason: Think of a function with 100 register variables.
>   

The only useful semantic for "register" these days is that its illegal
to take the address of one.  So it might be useful if you want to make
sure you have no aliases of a particular variable.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Bernd Petrovitsch
On Tue, 2007-05-22 at 14:38 +0530, Nitin Gupta wrote:
[...]
> On 5/18/07, Andrey Panin <[EMAIL PROTECTED]> wrote:
> > On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> > > + register const unsigned char *ip;
> >
> > register keyword is meaningless for today's compiler.
> 
> But can we assume that gcc is being used? What if we use compiler for

Yes.
If another compiler wants to compile the kernel, it must have
implemented various widely used gcc extensions.

> which it does matter (can't give example for this...)?

The "register" keyword is and was always from start *at most* a hint to
the C compiler to use a register for that variable (similar to "inline"
BTW).
So every C compiler is allowed to simply ignore the "register" for any
reason - be it "not implemented" or "the compiler knows better".
Trivial reason: Think of a function with 100 register variables.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Jan Engelhardt

On May 22 2007 14:38, Nitin Gupta wrote:
> On 5/18/07, Andrey Panin <[EMAIL PROTECTED]> wrote:
>> On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
>> > + register const unsigned char *ip;
>> 
>> register keyword is meaningless for today's compiler.
>
> But can we assume that gcc is being used? What if we use compiler for
> which it does matter (can't give example for this...)?

How dumb must a compiler be? Although I cannot give an example either,
the register keyword MAY 'clog up' a register, making it harder for
cc to optimally use the whole register set. Imagine we had even less
regs than the x86(_32) already does.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Jan Engelhardt

On May 18 2007 15:28, Nitin Gupta wrote:
> +/* lzo1x.h -- public interface of the LZO1X compression algorithm
> +
> +   This file is part of the LZO real-time data compression library.
> +
> +   Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer
> +   All Rights Reserved.

Oh holy, this is almost as bad as a 4-clause BSD-style license.
(Could the author just write "1996-2005" instead, like most
GNU tools do?)


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Pekka Enberg

Nitin Gupta wrote:

Ok. I will make them inline functions now.


Btw, the only reason why any sane compiler would not inline
them is because it has reason to believe the end-result will
be more efficient. AFAIK you only need to use
__always_inline where it matters for correctness (like, for
example, in mm/slab.c) with GCC 4.0 and up.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Nitin Gupta

Hi,

On 5/18/07, Richard Purdie <[EMAIL PROTECTED]> wrote:

> This patch, as of yet, only gives 'non-safe' version of decompressor.
> The 'safe' version  will be included soon.

How are you planning to add that back?



Please see newer patch posted.


The LZO author had some concerns about this code. The major issue he
highlighted was that it was 64-bit unsafe. Have you addressed that
problem?

I found certain parts which were 64-bit unsafe - corrected them. Now,
I couldn't see any more of such instances and posted as RFC :)


 Has it been tested on 64bit?

No. I am still looking for some 64-bit machine to test on (also some
Big-endian machine).



I'm worried that in converting this code the way you have, you've
possibly introduced potential security holes. You've removed all bounds
checking and are going to have to add that back to create the "safe"
version of the decompression function. Until I mentioned it, you seemed
unaware of the potential problem and the comments above suggest you
don't understand this code as fully as I'd like with regard to
overflows.


I just did the 'logical' porting work. I don't understand the
algorithm itself since I could not find any document that describes
the same.



The version I submitted has at least been subject to userspace scrutiny
over a period of time and is basically unchanged with regard to
security. It is much uglier though.

Richard


Yes. But it will be even better if we can verify/correct this
cleaned-up version - shouldn't be that hard for just ~500 LOC :-)


Thanks for your comments.

- Nitin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Nitin Gupta

On 5/22/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

Nitin Gupta wrote:
> What if compiler decides not to actully inline them? In that case
> there will be significant perf. hit.

Then you use __always_inline.

Pekka



Ok. I will make them inline functions now.

Thanks,
Nitin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Nitin Gupta

Hi,

On 5/18/07, Andrey Panin <[EMAIL PROTECTED]> wrote:

On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> + register const unsigned char *ip;

register keyword is meaningless for today's compiler.


But can we assume that gcc is being used? What if we use compiler for
which it does matter (can't give example for this...)?



> + unsigned char *op;
> + const unsigned char * const in_end = in + in_len;
> + const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
> + const unsigned char *ii;
> + const unsigned char ** const dict = (const unsigned char **)workmem;
> +
> + op = out;

Why not write this as `unsigned char *op = out;` ?



Done.


Empty if() body ?


No empty if() now.


> + m_off -= 0x4000;

What's this magic number 0x4000 repeated everywhere ? M3_MAX_OFFSET perhaps ?


I don't know :)


> +#if defined(__LITTLE_ENDIAN)
> + m_pos = op - 1;
> + m_pos -= (*(const unsigned short *)ip) >> 2;
> +#else
> + m_pos = op - 1;
> + m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> +#endif

IMHO you could write it this way:

m_pos = op - 1 - (le16_to_cpu(*(const u16 *)ip) >> 2);



Shouldn't this be cpu_to_le16() ?



> +#if defined(__LITTLE_ENDIAN)
> + m_pos -= (*(const unsigned short *)ip) >> 2;
> +#else
> + m_pos -= (ip[0] >> 2) + (ip[1] << 6);
> +#endif

m_pos -= le16_to_cpu(*(const u16 *)ip) >> 2;


cpu_to_le16() ?


---
Can you please snip-out irrelevant code parts. It is real pain for
eyes to sneek out your comments otherwise :)


Thanks for your comments.


Cheers,
Nitin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Pekka Enberg

Nitin Gupta wrote:

What if compiler decides not to actully inline them? In that case
there will be significant perf. hit.


Then you use __always_inline.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-22 Thread Nitin Gupta

Hi,

On 5/18/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

On 5/18/07, Nitin Gupta <[EMAIL PROTECTED]> wrote:
> +#define DX2(p,s1,s2) \
> +   (size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0])
> +#define DX3(p,s1,s2,s3) ((DX2((p)+1,s2,s3) << (s1)) ^ (p)[0])
> +#define DMUL(a,b)  ((size_t) ((a) * (b)))
> +#define DMS(v,s)   ((size_t) (((v) & (D_MASK >> (s))) << (s)))
> +#define DM(v)  DMS(v,0)
> +
> +#define D_BITS 14
> +#define DINDEX1(d,p)   d = DM(DMUL(0x21,DX3(p,5,5,6)) >> 5)
> +#define DINDEX2(d,p)   d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f)
> +#define DENTRY(p,in)   (p)

Please make these static inline functions.



What if compiler decides not to actully inline them? In that case
there will be significant perf. hit.


> +#define PTR(a) ((unsigned long) (a))
> +#define PTR_LT(a,b)(PTR(a) < PTR(b))
> +#define PTR_GE(a,b)(PTR(a) >= PTR(b))
> +#define PTR_DIFF(a,b)  (PTR(a) - PTR(b))
> +#define pd(a,b)((size_t) ((a)-(b)))

[snip]

> +#define COPY4(dst,src) *(uint32_t *)(dst) = *(uint32_t *)(src)

Please drop these.



Done. Please see newer version posted.

Thanks for comments.

Cheers,
Nitin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-20 Thread Tomasz Chmielewski

Bill Rugolsky Jr. wrote:


I'm certainly missing something but what are the advantages of this
code (over current gzip etc.), and what will be using it?


Richard's patchset added it to the crypto library and wired it into
the JFFS2 file system.  We recently started using LZO in a userland UDP
proxy to do stateless per-packet payload compression over a WAN link.
With ~1000 octet packets, our particular data stream sees 60% compression
with zlib, and 50% compression with (mini-)LZO, but LZO runs at ~5.6x
the speed of zlib.  IIRC, that translates into > 700Mbps on the input
side on a 2GHZ Opteron, without any further tuning.

Once LZO is in the kernel, I'd like to see it wired into IPComp.
Unfortunately, last I checked only the "deflate" algorithm had an
assigned compression parameter index (CPI), so one will have to use a
private index until an official one is assigned.


I also though of using LZO compression for some of the diskless nodes
which use iSCSI over 100 Mbit or slower.


Certainly, a fast de/compression algorithm in the kernel could bring
some new, innovative uses:

- there are talks about compressed filesystems (jffs2, reiser4, LogFS) -
why no one thought about a compressed tmpfs (should be way easier than a
compressed on-disk filesystem, as we don't have to care about data
recovery in event of a failure)?

- using compression for networking (like Bill mentioned)

- compressed caching

- compressed suspend-to-disk images (should suspend/restore faster this way)


--
Tomasz Chmielewski
http://wpkg.org





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-19 Thread Richard Purdie
On Sat, 2007-05-19 at 14:55 -0400, Bill Rugolsky Jr. wrote:
> On Fri, May 18, 2007 at 11:14:57PM +0200, Krzysztof Halasa wrote:
> > I'm certainly missing something but what are the advantages of this
> > code (over current gzip etc.), and what will be using it?
> 
> Richard's patchset added it to the crypto library and wired it into
> the JFFS2 file system. 

Basically, LZO has much faster decompression speeds. For jffs2, there
was a 40% filesystem read speed improvement with LZO compared to zlib
and that resulted in a device booting 10% faster. The drawback was a
slight drop in file compression (say 5% although it varied a lot
depending on the test data). For lots of uses, that is an acceptable
compromise.

It appears resier4 is using LZO too.

> We recently started using LZO in a userland UDP
> proxy to do stateless per-packet payload compression over a WAN link.
> With ~1000 octet packets, our particular data stream sees 60% compression
> with zlib, and 50% compression with (mini-)LZO, but LZO runs at ~5.6x
> the speed of zlib.  IIRC, that translates into > 700Mbps on the input
> side on a 2GHZ Opteron, without any further tuning.

Those figures sound comparable with my experiences...

Richard


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-19 Thread Bill Rugolsky Jr.
On Fri, May 18, 2007 at 11:14:57PM +0200, Krzysztof Halasa wrote:
> I'm certainly missing something but what are the advantages of this
> code (over current gzip etc.), and what will be using it?

Richard's patchset added it to the crypto library and wired it into
the JFFS2 file system.  We recently started using LZO in a userland UDP
proxy to do stateless per-packet payload compression over a WAN link.
With ~1000 octet packets, our particular data stream sees 60% compression
with zlib, and 50% compression with (mini-)LZO, but LZO runs at ~5.6x
the speed of zlib.  IIRC, that translates into > 700Mbps on the input
side on a 2GHZ Opteron, without any further tuning.

Once LZO is in the kernel, I'd like to see it wired into IPComp.
Unfortunately, last I checked only the "deflate" algorithm had an
assigned compression parameter index (CPI), so one will have to use a
private index until an official one is assigned.

Regards,

Bill Rugolsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-19 Thread devzero
>I'm certainly missing something but what are the advantages of this
>code (over current gzip etc.), and what will be using it?

lzo compresses/decompresses much faster and using less cpu

this is how it compares:

bzip2:  best compression, but damn slow performance
gzip: good compression with good performance
lzo:  not that good compression but stunning performance 

reiser4 and compressed caching is alrady using lzo compression, but they bring 
their own implementation - there are other projects which could make use of it 
- so it`s better to share the code by making it an integral part of the kernel.

roland

> Facts for LZO (at least for original code. Should hold true for this
> port also - hence the RFC!):
> - The compressor can never overrun buffer.
> - The "non-safe" version of decompressor can never overrun buffer if
> compressed data is unmodified. I am not sure about this if compressed
> data is malicious (to be confirmed from the author).
> - The "safe" version can never crash (buffer overrun etc.) - confirmed
> from the author.

I'm certainly missing something but what are the advantages of this
code (over current gzip etc.), and what will be using it?
-- 
Krzysztof Halasa
___
SMS schreiben mit WEB.DE FreeMail - einfach, schnell und
kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Krzysztof Halasa
"Nitin Gupta" <[EMAIL PROTECTED]> writes:

> Facts for LZO (at least for original code. Should hold true for this
> port also - hence the RFC!):
> - The compressor can never overrun buffer.
> - The "non-safe" version of decompressor can never overrun buffer if
> compressed data is unmodified. I am not sure about this if compressed
> data is malicious (to be confirmed from the author).
> - The "safe" version can never crash (buffer overrun etc.) - confirmed
> from the author.

I'm certainly missing something but what are the advantages of this
code (over current gzip etc.), and what will be using it?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Matt Mackall
On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> +/* lzo1x.h -- public interface of the LZO1X compression algorithm
> +
> +   This file is part of the LZO real-time data compression library.
> +
> +   Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer

This could stand some compression.

> +   All Rights Reserved.

All rights are not reserved, otherwise it wouldn't be GPL.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Andrey Panin
On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> Hi,
> 
> This is kernel port of LZO1X de/compression algo stripped down to just ~500 
> LOC!
> It is derived from original LZO 2.02 code found at:
> http://www.oberhumer.com/opensource/lzo/download/
> The code has also been reformatted to match general kernel style.
> 
> Facts for LZO (at least for original code. Should hold true for this
> port also - hence the RFC!):
> - The compressor can never overrun buffer.
> - The "non-safe" version of decompressor can never overrun buffer if
> compressed data is unmodified. I am not sure about this if compressed
> data is malicious (to be confirmed from the author).
> - The "safe" version can never crash (buffer overrun etc.) - confirmed
> from the author.
> 
> This patch, as of yet, only gives 'non-safe' version of decompressor.
> The 'safe' version  will be included soon.
> 
> Since 'non-safe' version has no problems if compressed data is
> unmodified, it is useful in cases we have such guarantees on
> compressed data and hence don't want additional overhead of 'safe'
> version. For e.g. Compressed Caching project
> (http://linuxcompressed.sourceforge.net) has been using the 'non-safe'
> version of LZO1X since long time without any problems w.r.t.
> de/compression itself.
> 
> For now, I have tested this on x86 only.
> 
> Signed-off-by: Nitin Gupta <[EMAIL PROTECTED]>
> ---
> 
> diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h
> new file mode 100755
> index 000..aae547e
> --- /dev/null
> +++ b/include/linux/lzo1x.h
> @@ -0,0 +1,61 @@
> +/* lzo1x.h -- public interface of the LZO1X compression algorithm
> +
> +   This file is part of the LZO real-time data compression library.
> +
> +   Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer
> +   All Rights Reserved.
> +
> +   The LZO library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU General Public License,
> +   version 2, as published by the Free Software Foundation.
> +
> +   The LZO library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with the LZO library; see the file COPYING.
> +   If not, write to the Free Software Foundation, Inc.,
> +   51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> +
> +   Markus F.X.J. Oberhumer
> +   <[EMAIL PROTECTED]>
> +   http://www.oberhumer.com/opensource/lzo/
> +
> +
> +   This file is modified version of lzo1x.h found in original LZO 2.02
> +   code. Some additional changes have also been made to make it work
> +   in kernel space.
> +
> +   Nitin Gupta
> +   <[EMAIL PROTECTED]>
> + */
> +
> +#ifndef __LZO1X_H
> +#define __LZO1X_H
> +
> +/* Size of temp buffer (workmem) required by lzo1x_compress */
> +#define LZO1X_WORKMEM_SIZE   ((size_t) (16384L * sizeof(unsigned char *)))
> +
> +/* LZO1X_1 compression */
> +int
> +lzo1x_compress(const unsigned char *src, size_t src_len,
> + unsigned char *dst, size_t *dst_len,
> + void *workmem);
> +
> +/* LZO1X_1 decompression */
> +int
> +lzo1x_decompress(const unsigned char *src, size_t src_len,
> + unsigned char *dst, size_t *dst_len);
> +
> +#endif
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 2e7ae6b..9d30b1f 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -64,6 +64,12 @@ config ZLIB_INFLATE
>  config ZLIB_DEFLATE
>   tristate
> 
> +config LZO1X
> + tristate "LZO1X Compression/Decompression"
> + help
> +   Compression: LZO1X-1
> +   Decompression: LZO1X
> +
> #
> # Generic allocator support is selected if needed
> #
> diff --git a/lib/Makefile b/lib/Makefile
> index c8c8e20..4dad99d 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -49,6 +49,7 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
> obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
> obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
> obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
> +obj-$(CONFIG_LZO1X)+= lzo1x/
> 
>  obj-$(CONFIG_TEXTSEARCH) += textsearch.o
> obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
> diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
> new file mode 100644
> index 000..322683e
> --- /dev/null
> +++ b/lib/lzo1x/Makefil

Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Richard Purdie
On Fri, 2007-05-18 at 16:57 +0530, Nitin Gupta wrote:
> On 5/18/07, Heikki Orsila <[EMAIL PROTECTED]> wrote:
> > Good work..
> >
> > On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> > > Facts for LZO (at least for original code. Should hold true for this
> > > port also - hence the RFC!):
> > > - The compressor can never overrun buffer.
> > > - The "non-safe" version of decompressor can never overrun buffer if
> > > compressed data is unmodified. I am not sure about this if compressed
> > > data is malicious (to be confirmed from the author).
> > > - The "safe" version can never crash (buffer overrun etc.) - confirmed
> > > from the author.
> >
> > What's the proof?
> 
> I confirmned these from the author - I just ported this code. I think
> he can answer you better - CC'ed him  :-)

The thing is this is more your code now, not his and you should be able
to answer this question...

Regards,

Richard

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Richard Purdie
Hi,

On Fri, 2007-05-18 at 15:28 +0530, Nitin Gupta wrote:
> This is kernel port of LZO1X de/compression algo stripped down to just ~500 
> LOC!
> It is derived from original LZO 2.02 code found at:
> http://www.oberhumer.com/opensource/lzo/download/
> The code has also been reformatted to match general kernel style.
> 
> Facts for LZO (at least for original code. Should hold true for this
> port also - hence the RFC!):
> - The compressor can never overrun buffer.
> - The "non-safe" version of decompressor can never overrun buffer if
> compressed data is unmodified. I am not sure about this if compressed
> data is malicious (to be confirmed from the author).

If the data is malicious, it *can* overrun the buffer.

> - The "safe" version can never crash (buffer overrun etc.) - confirmed
> from the author.
> This patch, as of yet, only gives 'non-safe' version of decompressor.
> The 'safe' version  will be included soon.

How are you planning to add that back?

> Since 'non-safe' version has no problems if compressed data is
> unmodified, it is useful in cases we have such guarantees on
> compressed data and hence don't want additional overhead of 'safe'
> version. For e.g. Compressed Caching project
> (http://linuxcompressed.sourceforge.net) has been using the 'non-safe'
> version of LZO1X since long time without any problems w.r.t.
> de/compression itself.
> 
> For now, I have tested this on x86 only.

The LZO author had some concerns about this code. The major issue he
highlighted was that it was 64-bit unsafe. Have you addressed that
problem? Has it been tested on 64bit?

I'm worried that in converting this code the way you have, you've
possibly introduced potential security holes. You've removed all bounds
checking and are going to have to add that back to create the "safe"
version of the decompression function. Until I mentioned it, you seemed
unaware of the potential problem and the comments above suggest you
don't understand this code as fully as I'd like with regard to
overflows.

The version I submitted has at least been subject to userspace scrutiny
over a period of time and is basically unchanged with regard to
security. It is much uglier though.

Richard


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Nitin Gupta

Hi,

Thanks for review. My comments inline.

On 5/18/07, Heikki Orsila <[EMAIL PROTECTED]> wrote:

Good work..

On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> Facts for LZO (at least for original code. Should hold true for this
> port also - hence the RFC!):
> - The compressor can never overrun buffer.
> - The "non-safe" version of decompressor can never overrun buffer if
> compressed data is unmodified. I am not sure about this if compressed
> data is malicious (to be confirmed from the author).
> - The "safe" version can never crash (buffer overrun etc.) - confirmed
> from the author.

What's the proof?


I confirmned these from the author - I just ported this code. I think
he can answer you better - CC'ed him  :-)



> +/* LZO1X_1 compression */
> +int
> +lzo1x_compress(const unsigned char *src, size_t src_len,
> + unsigned char *dst, size_t *dst_len,
> + void *workmem);

int lzo1x_compress(const unsigned char *src, size_t src_len,
   unsigned char *dst, size_t *dst_len,
   void *workmem);

is the preferred style.



OK. Changed.



> + register const unsigned char *ip;

Is the register directive really useful? Or any subsequent usage of that
directive?


The author must be having some performance gain with this directive.
Though I didn't test performance changes with/without this directive.


> + DINDEX1(dindex,ip);

Put a space after the delimiter: DINDEX1(dindex, ip); This happens in
many places in the source, fix them all.


OK.


Useless brackets: (unsigned char) tt


OK.


> + }
> + do *op++ = *ii++; while (--t > 0);

memcpy(op, ii, t); ? Happens in other places as well.


I looked more carefully into such cases. Following type of code blocks
are repeated at several places:

---
   COPY4(op,ip);
   op += 4;
   ip += 4;
   if (--t > 0) {
   if (t >= 4) {
   do {
   COPY4(op,ip);
   op += 4; ip += 4; t -= 4;
   } while (t >= 4);
   if (t > 0)
   do
   *op++ = *ip++;
   while (--t > 0);
   } else
   do
   *op++ = *ip++;
   while (--t > 0);
   }
---
Such entire blocks can be replaced by simple:
memcpy(op, ip, t + 4);
Since kernel has separate memcpy() implementation optimized for
specific archs, we shouldn't loose on perf while having simpler (and
shorter) code.

I will work on this and post again.



> +#define COPY4(dst,src)   *(uint32_t *)(dst) = *(uint32_t *)(src)

Use u32.



What is the problem with uint32_t? Anyhow, I think COPY4 will
disappear after those memcpy changes :)

Thanks for comments. I will post revised patch soon.


Cheers,
Nitin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Pekka Enberg

On 5/18/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

So how about making that a little less verbose. Say like:

Copyright (c) 1996-2005 Markus Franz Xaver and Johannes Oberhumer


Oh, the author's name really is "Markus Franz Xaver Johannes
Oberhumer." But please make the copyright statement one line.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Pekka Enberg

On 5/18/07, Nitin Gupta <[EMAIL PROTECTED]> wrote:

+   Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer


[snip]

So how about making that a little less verbose. Say like:

Copyright (c) 1996-2005 Markus Franz Xaver and Johannes Oberhumer


+#define DX2(p,s1,s2) \
+   (size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0])
+#define DX3(p,s1,s2,s3) ((DX2((p)+1,s2,s3) << (s1)) ^ (p)[0])
+#define DMUL(a,b)  ((size_t) ((a) * (b)))
+#define DMS(v,s)   ((size_t) (((v) & (D_MASK >> (s))) << (s)))
+#define DM(v)  DMS(v,0)
+
+#define D_BITS 14
+#define DINDEX1(d,p)   d = DM(DMUL(0x21,DX3(p,5,5,6)) >> 5)
+#define DINDEX2(d,p)   d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f)
+#define DENTRY(p,in)   (p)


Please make these static inline functions.


+#define PTR(a) ((unsigned long) (a))
+#define PTR_LT(a,b)(PTR(a) < PTR(b))
+#define PTR_GE(a,b)(PTR(a) >= PTR(b))
+#define PTR_DIFF(a,b)  (PTR(a) - PTR(b))
+#define pd(a,b)((size_t) ((a)-(b)))


[snip]


+#define COPY4(dst,src) *(uint32_t *)(dst) = *(uint32_t *)(src)


Please drop these.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] LZO1X de/compression support

2007-05-18 Thread Heikki Orsila
Good work..

On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote:
> Facts for LZO (at least for original code. Should hold true for this
> port also - hence the RFC!):
> - The compressor can never overrun buffer.
> - The "non-safe" version of decompressor can never overrun buffer if
> compressed data is unmodified. I am not sure about this if compressed
> data is malicious (to be confirmed from the author).
> - The "safe" version can never crash (buffer overrun etc.) - confirmed
> from the author.

What's the proof?

> +/* LZO1X_1 compression */
> +int
> +lzo1x_compress(const unsigned char *src, size_t src_len,
> + unsigned char *dst, size_t *dst_len,
> + void *workmem);

int lzo1x_compress(const unsigned char *src, size_t src_len,
   unsigned char *dst, size_t *dst_len,
   void *workmem);

is the preferred style.




> +   Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer
> +   Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer
> +   All Rights Reserved.
> +
> +   The LZO library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU General Public License,
> +   version 2, as published by the Free Software Foundation.
> +
> +   The LZO library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with the LZO library; see the file COPYING.
> +   If not, write to the Free Software Foundation, Inc.,
> +   51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
> +
> +   Markus F.X.J. Oberhumer
> +   <[EMAIL PROTECTED]>
> +   http://www.oberhumer.com/opensource/lzo/
> +
> +
> +   This file is derived from lzo1x_1.c and lzo1x_c.ch found in original
> +   LZO 2.02 code. Some additional changes have also been made to make
> +   it work in kernel space.
> +
> +   Nitin Gupta
> +   <[EMAIL PROTECTED]>
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "lzo1x_int.h"
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("LZO1X Compression");
> +
> +/* compress a block of data. */
> +static unsigned int
> +lzo1x_compress_worker(const unsigned char *in, size_t in_len,
> + unsigned char *out, size_t *out_len,
> + void *workmem)
> +{
> + register const unsigned char *ip;

Is the register directive really useful? Or any subsequent usage of that 
directive?

> + unsigned char *op;
> + const unsigned char * const in_end = in + in_len;
> + const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5;
> + const unsigned char *ii;
> + const unsigned char ** const dict = (const unsigned char **)workmem;
> +
> + op = out;
> + ip = in;
> + ii = ip;
> +
> + ip += 4;
> + for (;;) {
> + register const unsigned char *m_pos;
> + size_t m_off;
> + size_t m_len;
> + size_t dindex;
> +
> + DINDEX1(dindex,ip);

Put a space after the delimiter: DINDEX1(dindex, ip); This happens in 
many places in the source, fix them all.

> + m_pos = dict[dindex];
> +
> + if (LZO_CHECK_MPOS_NON_DET(m_pos,m_off,in,ip,M4_MAX_OFFSET))
> + goto literal;
> + if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
> + goto try_match;
> +
> + DINDEX2(dindex,ip);
> + m_pos = dict[dindex];
> +
> + if (LZO_CHECK_MPOS_NON_DET(m_pos,m_off,in,ip,M4_MAX_OFFSET))
> + goto literal;
> + if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3])
> + goto try_match;
> +
> + goto literal;
> +
> +try_match:
> +if (*(const unsigned short *)m_pos != *(const unsigned short *)ip) 
> {
> +} else {
> + if (likely(m_pos[2] == ip[2]))
> + goto match;
> +}
> +
> + /* a literal */
> +literal:
> + dict[dindex] = ip;
> + ++ip;
> + if (unlikely(ip >= ip_end))
> + break;
> + continue;
> +
> +
> + /* a match */
> +match:
> + dict[dindex] = ip;
> + /* store current literal run */
> + if (pd(ip,ii) > 0) {
> + register size_t t = pd(ip,ii);
> + if (t <= 3)
> + op[-2] |= (unsigned char)(t);
> + else if (t <= 18)
> + *op++ = (unsigned char)(t - 3);
> + else {
> + register size_t tt = t - 18;
> + *op++ = 0;
> + while (tt > 255) {
> + tt -= 255;
> + *op++ = 0;
> + }
> + *op++ = (unsigned char)(tt);

Useless brackets: (unsigned char) tt

> + }

[RFC] LZO1X de/compression support

2007-05-18 Thread Nitin Gupta

Hi,

This is kernel port of LZO1X de/compression algo stripped down to just ~500 LOC!
It is derived from original LZO 2.02 code found at:
http://www.oberhumer.com/opensource/lzo/download/
The code has also been reformatted to match general kernel style.

Facts for LZO (at least for original code. Should hold true for this
port also - hence the RFC!):
- The compressor can never overrun buffer.
- The "non-safe" version of decompressor can never overrun buffer if
compressed data is unmodified. I am not sure about this if compressed
data is malicious (to be confirmed from the author).
- The "safe" version can never crash (buffer overrun etc.) - confirmed
from the author.

This patch, as of yet, only gives 'non-safe' version of decompressor.
The 'safe' version  will be included soon.

Since 'non-safe' version has no problems if compressed data is
unmodified, it is useful in cases we have such guarantees on
compressed data and hence don't want additional overhead of 'safe'
version. For e.g. Compressed Caching project
(http://linuxcompressed.sourceforge.net) has been using the 'non-safe'
version of LZO1X since long time without any problems w.r.t.
de/compression itself.

For now, I have tested this on x86 only.

Signed-off-by: Nitin Gupta <[EMAIL PROTECTED]>
---

diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h
new file mode 100755
index 000..aae547e
--- /dev/null
+++ b/include/linux/lzo1x.h
@@ -0,0 +1,61 @@
+/* lzo1x.h -- public interface of the LZO1X compression algorithm
+
+   This file is part of the LZO real-time data compression library.
+
+   Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer
+   Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer
+   All Rights Reserved.
+
+   The LZO library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU General Public License,
+   version 2, as published by the Free Software Foundation.
+
+   The LZO library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with the LZO library; see the file COPYING.
+   If not, write to the Free Software Foundation, Inc.,
+   51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+
+   Markus F.X.J. Oberhumer
+   <[EMAIL PROTECTED]>
+   http://www.oberhumer.com/opensource/lzo/
+
+
+   This file is modified version of lzo1x.h found in original LZO 2.02
+   code. Some additional changes have also been made to make it work
+   in kernel space.
+
+   Nitin Gupta
+   <[EMAIL PROTECTED]>
+ */
+
+#ifndef __LZO1X_H
+#define __LZO1X_H
+
+/* Size of temp buffer (workmem) required by lzo1x_compress */
+#define LZO1X_WORKMEM_SIZE ((size_t) (16384L * sizeof(unsigned char *)))
+
+/* LZO1X_1 compression */
+int
+lzo1x_compress(const unsigned char *src, size_t src_len,
+   unsigned char *dst, size_t *dst_len,
+   void *workmem);
+
+/* LZO1X_1 decompression */
+int
+lzo1x_decompress(const unsigned char *src, size_t src_len,
+   unsigned char *dst, size_t *dst_len);
+
+#endif
diff --git a/lib/Kconfig b/lib/Kconfig
index 2e7ae6b..9d30b1f 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -64,6 +64,12 @@ config ZLIB_INFLATE
 config ZLIB_DEFLATE
tristate

+config LZO1X
+   tristate "LZO1X Compression/Decompression"
+   help
+ Compression: LZO1X-1
+ Decompression: LZO1X
+
#
# Generic allocator support is selected if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index c8c8e20..4dad99d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/
obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/
obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
+obj-$(CONFIG_LZO1X)+= lzo1x/

 obj-$(CONFIG_TEXTSEARCH) += textsearch.o
obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile
new file mode 100644
index 000..322683e
--- /dev/null
+++ b/lib/lzo1x/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_LZO1X) += lzo1x.o
+lzo1x-objs := lzo1x_compress.o lzo1x_decompress.o
diff --git a/lib/lzo1x/lzo1x_compress.c b/lib/lzo1x/lzo1x_compress.c
new file mode 100755
index 000..02e3324
--- /dev/null
+++ b/lib/lzo1x/lzo1x_compress.c
@@ -0,0 +1,255 @@
+/* lzo1x_compress.c -- LZO1X-1