Re: [RFC] LZO1X de/compression support
Bernd Petrovitsch wrote: > The "register" keyword is and was always from start *at most* a hint to > the C compiler to use a register for that variable (similar to "inline" > BTW). > So every C compiler is allowed to simply ignore the "register" for any > reason - be it "not implemented" or "the compiler knows better". > Trivial reason: Think of a function with 100 register variables. > The only useful semantic for "register" these days is that its illegal to take the address of one. So it might be useful if you want to make sure you have no aliases of a particular variable. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On Tue, 2007-05-22 at 14:38 +0530, Nitin Gupta wrote: [...] > On 5/18/07, Andrey Panin <[EMAIL PROTECTED]> wrote: > > On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > > > + register const unsigned char *ip; > > > > register keyword is meaningless for today's compiler. > > But can we assume that gcc is being used? What if we use compiler for Yes. If another compiler wants to compile the kernel, it must have implemented various widely used gcc extensions. > which it does matter (can't give example for this...)? The "register" keyword is and was always from start *at most* a hint to the C compiler to use a register for that variable (similar to "inline" BTW). So every C compiler is allowed to simply ignore the "register" for any reason - be it "not implemented" or "the compiler knows better". Trivial reason: Think of a function with 100 register variables. Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On May 22 2007 14:38, Nitin Gupta wrote: > On 5/18/07, Andrey Panin <[EMAIL PROTECTED]> wrote: >> On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: >> > + register const unsigned char *ip; >> >> register keyword is meaningless for today's compiler. > > But can we assume that gcc is being used? What if we use compiler for > which it does matter (can't give example for this...)? How dumb must a compiler be? Although I cannot give an example either, the register keyword MAY 'clog up' a register, making it harder for cc to optimally use the whole register set. Imagine we had even less regs than the x86(_32) already does. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On May 18 2007 15:28, Nitin Gupta wrote: > +/* lzo1x.h -- public interface of the LZO1X compression algorithm > + > + This file is part of the LZO real-time data compression library. > + > + Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer > + All Rights Reserved. Oh holy, this is almost as bad as a 4-clause BSD-style license. (Could the author just write "1996-2005" instead, like most GNU tools do?) Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Nitin Gupta wrote: Ok. I will make them inline functions now. Btw, the only reason why any sane compiler would not inline them is because it has reason to believe the end-result will be more efficient. AFAIK you only need to use __always_inline where it matters for correctness (like, for example, in mm/slab.c) with GCC 4.0 and up. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Hi, On 5/18/07, Richard Purdie <[EMAIL PROTECTED]> wrote: > This patch, as of yet, only gives 'non-safe' version of decompressor. > The 'safe' version will be included soon. How are you planning to add that back? Please see newer patch posted. The LZO author had some concerns about this code. The major issue he highlighted was that it was 64-bit unsafe. Have you addressed that problem? I found certain parts which were 64-bit unsafe - corrected them. Now, I couldn't see any more of such instances and posted as RFC :) Has it been tested on 64bit? No. I am still looking for some 64-bit machine to test on (also some Big-endian machine). I'm worried that in converting this code the way you have, you've possibly introduced potential security holes. You've removed all bounds checking and are going to have to add that back to create the "safe" version of the decompression function. Until I mentioned it, you seemed unaware of the potential problem and the comments above suggest you don't understand this code as fully as I'd like with regard to overflows. I just did the 'logical' porting work. I don't understand the algorithm itself since I could not find any document that describes the same. The version I submitted has at least been subject to userspace scrutiny over a period of time and is basically unchanged with regard to security. It is much uglier though. Richard Yes. But it will be even better if we can verify/correct this cleaned-up version - shouldn't be that hard for just ~500 LOC :-) Thanks for your comments. - Nitin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On 5/22/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: Nitin Gupta wrote: > What if compiler decides not to actully inline them? In that case > there will be significant perf. hit. Then you use __always_inline. Pekka Ok. I will make them inline functions now. Thanks, Nitin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Hi, On 5/18/07, Andrey Panin <[EMAIL PROTECTED]> wrote: On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > + register const unsigned char *ip; register keyword is meaningless for today's compiler. But can we assume that gcc is being used? What if we use compiler for which it does matter (can't give example for this...)? > + unsigned char *op; > + const unsigned char * const in_end = in + in_len; > + const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5; > + const unsigned char *ii; > + const unsigned char ** const dict = (const unsigned char **)workmem; > + > + op = out; Why not write this as `unsigned char *op = out;` ? Done. Empty if() body ? No empty if() now. > + m_off -= 0x4000; What's this magic number 0x4000 repeated everywhere ? M3_MAX_OFFSET perhaps ? I don't know :) > +#if defined(__LITTLE_ENDIAN) > + m_pos = op - 1; > + m_pos -= (*(const unsigned short *)ip) >> 2; > +#else > + m_pos = op - 1; > + m_pos -= (ip[0] >> 2) + (ip[1] << 6); > +#endif IMHO you could write it this way: m_pos = op - 1 - (le16_to_cpu(*(const u16 *)ip) >> 2); Shouldn't this be cpu_to_le16() ? > +#if defined(__LITTLE_ENDIAN) > + m_pos -= (*(const unsigned short *)ip) >> 2; > +#else > + m_pos -= (ip[0] >> 2) + (ip[1] << 6); > +#endif m_pos -= le16_to_cpu(*(const u16 *)ip) >> 2; cpu_to_le16() ? --- Can you please snip-out irrelevant code parts. It is real pain for eyes to sneek out your comments otherwise :) Thanks for your comments. Cheers, Nitin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Nitin Gupta wrote: What if compiler decides not to actully inline them? In that case there will be significant perf. hit. Then you use __always_inline. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Hi, On 5/18/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: On 5/18/07, Nitin Gupta <[EMAIL PROTECTED]> wrote: > +#define DX2(p,s1,s2) \ > + (size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0]) > +#define DX3(p,s1,s2,s3) ((DX2((p)+1,s2,s3) << (s1)) ^ (p)[0]) > +#define DMUL(a,b) ((size_t) ((a) * (b))) > +#define DMS(v,s) ((size_t) (((v) & (D_MASK >> (s))) << (s))) > +#define DM(v) DMS(v,0) > + > +#define D_BITS 14 > +#define DINDEX1(d,p) d = DM(DMUL(0x21,DX3(p,5,5,6)) >> 5) > +#define DINDEX2(d,p) d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f) > +#define DENTRY(p,in) (p) Please make these static inline functions. What if compiler decides not to actully inline them? In that case there will be significant perf. hit. > +#define PTR(a) ((unsigned long) (a)) > +#define PTR_LT(a,b)(PTR(a) < PTR(b)) > +#define PTR_GE(a,b)(PTR(a) >= PTR(b)) > +#define PTR_DIFF(a,b) (PTR(a) - PTR(b)) > +#define pd(a,b)((size_t) ((a)-(b))) [snip] > +#define COPY4(dst,src) *(uint32_t *)(dst) = *(uint32_t *)(src) Please drop these. Done. Please see newer version posted. Thanks for comments. Cheers, Nitin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Bill Rugolsky Jr. wrote: I'm certainly missing something but what are the advantages of this code (over current gzip etc.), and what will be using it? Richard's patchset added it to the crypto library and wired it into the JFFS2 file system. We recently started using LZO in a userland UDP proxy to do stateless per-packet payload compression over a WAN link. With ~1000 octet packets, our particular data stream sees 60% compression with zlib, and 50% compression with (mini-)LZO, but LZO runs at ~5.6x the speed of zlib. IIRC, that translates into > 700Mbps on the input side on a 2GHZ Opteron, without any further tuning. Once LZO is in the kernel, I'd like to see it wired into IPComp. Unfortunately, last I checked only the "deflate" algorithm had an assigned compression parameter index (CPI), so one will have to use a private index until an official one is assigned. I also though of using LZO compression for some of the diskless nodes which use iSCSI over 100 Mbit or slower. Certainly, a fast de/compression algorithm in the kernel could bring some new, innovative uses: - there are talks about compressed filesystems (jffs2, reiser4, LogFS) - why no one thought about a compressed tmpfs (should be way easier than a compressed on-disk filesystem, as we don't have to care about data recovery in event of a failure)? - using compression for networking (like Bill mentioned) - compressed caching - compressed suspend-to-disk images (should suspend/restore faster this way) -- Tomasz Chmielewski http://wpkg.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On Sat, 2007-05-19 at 14:55 -0400, Bill Rugolsky Jr. wrote: > On Fri, May 18, 2007 at 11:14:57PM +0200, Krzysztof Halasa wrote: > > I'm certainly missing something but what are the advantages of this > > code (over current gzip etc.), and what will be using it? > > Richard's patchset added it to the crypto library and wired it into > the JFFS2 file system. Basically, LZO has much faster decompression speeds. For jffs2, there was a 40% filesystem read speed improvement with LZO compared to zlib and that resulted in a device booting 10% faster. The drawback was a slight drop in file compression (say 5% although it varied a lot depending on the test data). For lots of uses, that is an acceptable compromise. It appears resier4 is using LZO too. > We recently started using LZO in a userland UDP > proxy to do stateless per-packet payload compression over a WAN link. > With ~1000 octet packets, our particular data stream sees 60% compression > with zlib, and 50% compression with (mini-)LZO, but LZO runs at ~5.6x > the speed of zlib. IIRC, that translates into > 700Mbps on the input > side on a 2GHZ Opteron, without any further tuning. Those figures sound comparable with my experiences... Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On Fri, May 18, 2007 at 11:14:57PM +0200, Krzysztof Halasa wrote: > I'm certainly missing something but what are the advantages of this > code (over current gzip etc.), and what will be using it? Richard's patchset added it to the crypto library and wired it into the JFFS2 file system. We recently started using LZO in a userland UDP proxy to do stateless per-packet payload compression over a WAN link. With ~1000 octet packets, our particular data stream sees 60% compression with zlib, and 50% compression with (mini-)LZO, but LZO runs at ~5.6x the speed of zlib. IIRC, that translates into > 700Mbps on the input side on a 2GHZ Opteron, without any further tuning. Once LZO is in the kernel, I'd like to see it wired into IPComp. Unfortunately, last I checked only the "deflate" algorithm had an assigned compression parameter index (CPI), so one will have to use a private index until an official one is assigned. Regards, Bill Rugolsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
>I'm certainly missing something but what are the advantages of this >code (over current gzip etc.), and what will be using it? lzo compresses/decompresses much faster and using less cpu this is how it compares: bzip2: best compression, but damn slow performance gzip: good compression with good performance lzo: not that good compression but stunning performance reiser4 and compressed caching is alrady using lzo compression, but they bring their own implementation - there are other projects which could make use of it - so it`s better to share the code by making it an integral part of the kernel. roland > Facts for LZO (at least for original code. Should hold true for this > port also - hence the RFC!): > - The compressor can never overrun buffer. > - The "non-safe" version of decompressor can never overrun buffer if > compressed data is unmodified. I am not sure about this if compressed > data is malicious (to be confirmed from the author). > - The "safe" version can never crash (buffer overrun etc.) - confirmed > from the author. I'm certainly missing something but what are the advantages of this code (over current gzip etc.), and what will be using it? -- Krzysztof Halasa ___ SMS schreiben mit WEB.DE FreeMail - einfach, schnell und kostenguenstig. Jetzt gleich testen! http://f.web.de/?mc=021192 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
"Nitin Gupta" <[EMAIL PROTECTED]> writes: > Facts for LZO (at least for original code. Should hold true for this > port also - hence the RFC!): > - The compressor can never overrun buffer. > - The "non-safe" version of decompressor can never overrun buffer if > compressed data is unmodified. I am not sure about this if compressed > data is malicious (to be confirmed from the author). > - The "safe" version can never crash (buffer overrun etc.) - confirmed > from the author. I'm certainly missing something but what are the advantages of this code (over current gzip etc.), and what will be using it? -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > +/* lzo1x.h -- public interface of the LZO1X compression algorithm > + > + This file is part of the LZO real-time data compression library. > + > + Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer This could stand some compression. > + All Rights Reserved. All rights are not reserved, otherwise it wouldn't be GPL. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On 138, 05 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > Hi, > > This is kernel port of LZO1X de/compression algo stripped down to just ~500 > LOC! > It is derived from original LZO 2.02 code found at: > http://www.oberhumer.com/opensource/lzo/download/ > The code has also been reformatted to match general kernel style. > > Facts for LZO (at least for original code. Should hold true for this > port also - hence the RFC!): > - The compressor can never overrun buffer. > - The "non-safe" version of decompressor can never overrun buffer if > compressed data is unmodified. I am not sure about this if compressed > data is malicious (to be confirmed from the author). > - The "safe" version can never crash (buffer overrun etc.) - confirmed > from the author. > > This patch, as of yet, only gives 'non-safe' version of decompressor. > The 'safe' version will be included soon. > > Since 'non-safe' version has no problems if compressed data is > unmodified, it is useful in cases we have such guarantees on > compressed data and hence don't want additional overhead of 'safe' > version. For e.g. Compressed Caching project > (http://linuxcompressed.sourceforge.net) has been using the 'non-safe' > version of LZO1X since long time without any problems w.r.t. > de/compression itself. > > For now, I have tested this on x86 only. > > Signed-off-by: Nitin Gupta <[EMAIL PROTECTED]> > --- > > diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h > new file mode 100755 > index 000..aae547e > --- /dev/null > +++ b/include/linux/lzo1x.h > @@ -0,0 +1,61 @@ > +/* lzo1x.h -- public interface of the LZO1X compression algorithm > + > + This file is part of the LZO real-time data compression library. > + > + Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer > + All Rights Reserved. > + > + The LZO library is free software; you can redistribute it and/or > + modify it under the terms of the GNU General Public License, > + version 2, as published by the Free Software Foundation. > + > + The LZO library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with the LZO library; see the file COPYING. > + If not, write to the Free Software Foundation, Inc., > + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. > + > + Markus F.X.J. Oberhumer > + <[EMAIL PROTECTED]> > + http://www.oberhumer.com/opensource/lzo/ > + > + > + This file is modified version of lzo1x.h found in original LZO 2.02 > + code. Some additional changes have also been made to make it work > + in kernel space. > + > + Nitin Gupta > + <[EMAIL PROTECTED]> > + */ > + > +#ifndef __LZO1X_H > +#define __LZO1X_H > + > +/* Size of temp buffer (workmem) required by lzo1x_compress */ > +#define LZO1X_WORKMEM_SIZE ((size_t) (16384L * sizeof(unsigned char *))) > + > +/* LZO1X_1 compression */ > +int > +lzo1x_compress(const unsigned char *src, size_t src_len, > + unsigned char *dst, size_t *dst_len, > + void *workmem); > + > +/* LZO1X_1 decompression */ > +int > +lzo1x_decompress(const unsigned char *src, size_t src_len, > + unsigned char *dst, size_t *dst_len); > + > +#endif > diff --git a/lib/Kconfig b/lib/Kconfig > index 2e7ae6b..9d30b1f 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -64,6 +64,12 @@ config ZLIB_INFLATE > config ZLIB_DEFLATE > tristate > > +config LZO1X > + tristate "LZO1X Compression/Decompression" > + help > + Compression: LZO1X-1 > + Decompression: LZO1X > + > # > # Generic allocator support is selected if needed > # > diff --git a/lib/Makefile b/lib/Makefile > index c8c8e20..4dad99d 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -49,6 +49,7 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o > obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/ > obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/ > obj-$(CONFIG_REED_SOLOMON) += reed_solomon/ > +obj-$(CONFIG_LZO1X)+= lzo1x/ > > obj-$(CONFIG_TEXTSEARCH) += textsearch.o > obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o > diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile > new file mode 100644 > index 000..322683e > --- /dev/null > +++ b/lib/lzo1x/Makefil
Re: [RFC] LZO1X de/compression support
On Fri, 2007-05-18 at 16:57 +0530, Nitin Gupta wrote: > On 5/18/07, Heikki Orsila <[EMAIL PROTECTED]> wrote: > > Good work.. > > > > On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > > > Facts for LZO (at least for original code. Should hold true for this > > > port also - hence the RFC!): > > > - The compressor can never overrun buffer. > > > - The "non-safe" version of decompressor can never overrun buffer if > > > compressed data is unmodified. I am not sure about this if compressed > > > data is malicious (to be confirmed from the author). > > > - The "safe" version can never crash (buffer overrun etc.) - confirmed > > > from the author. > > > > What's the proof? > > I confirmned these from the author - I just ported this code. I think > he can answer you better - CC'ed him :-) The thing is this is more your code now, not his and you should be able to answer this question... Regards, Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Hi, On Fri, 2007-05-18 at 15:28 +0530, Nitin Gupta wrote: > This is kernel port of LZO1X de/compression algo stripped down to just ~500 > LOC! > It is derived from original LZO 2.02 code found at: > http://www.oberhumer.com/opensource/lzo/download/ > The code has also been reformatted to match general kernel style. > > Facts for LZO (at least for original code. Should hold true for this > port also - hence the RFC!): > - The compressor can never overrun buffer. > - The "non-safe" version of decompressor can never overrun buffer if > compressed data is unmodified. I am not sure about this if compressed > data is malicious (to be confirmed from the author). If the data is malicious, it *can* overrun the buffer. > - The "safe" version can never crash (buffer overrun etc.) - confirmed > from the author. > This patch, as of yet, only gives 'non-safe' version of decompressor. > The 'safe' version will be included soon. How are you planning to add that back? > Since 'non-safe' version has no problems if compressed data is > unmodified, it is useful in cases we have such guarantees on > compressed data and hence don't want additional overhead of 'safe' > version. For e.g. Compressed Caching project > (http://linuxcompressed.sourceforge.net) has been using the 'non-safe' > version of LZO1X since long time without any problems w.r.t. > de/compression itself. > > For now, I have tested this on x86 only. The LZO author had some concerns about this code. The major issue he highlighted was that it was 64-bit unsafe. Have you addressed that problem? Has it been tested on 64bit? I'm worried that in converting this code the way you have, you've possibly introduced potential security holes. You've removed all bounds checking and are going to have to add that back to create the "safe" version of the decompression function. Until I mentioned it, you seemed unaware of the potential problem and the comments above suggest you don't understand this code as fully as I'd like with regard to overflows. The version I submitted has at least been subject to userspace scrutiny over a period of time and is basically unchanged with regard to security. It is much uglier though. Richard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Hi, Thanks for review. My comments inline. On 5/18/07, Heikki Orsila <[EMAIL PROTECTED]> wrote: Good work.. On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > Facts for LZO (at least for original code. Should hold true for this > port also - hence the RFC!): > - The compressor can never overrun buffer. > - The "non-safe" version of decompressor can never overrun buffer if > compressed data is unmodified. I am not sure about this if compressed > data is malicious (to be confirmed from the author). > - The "safe" version can never crash (buffer overrun etc.) - confirmed > from the author. What's the proof? I confirmned these from the author - I just ported this code. I think he can answer you better - CC'ed him :-) > +/* LZO1X_1 compression */ > +int > +lzo1x_compress(const unsigned char *src, size_t src_len, > + unsigned char *dst, size_t *dst_len, > + void *workmem); int lzo1x_compress(const unsigned char *src, size_t src_len, unsigned char *dst, size_t *dst_len, void *workmem); is the preferred style. OK. Changed. > + register const unsigned char *ip; Is the register directive really useful? Or any subsequent usage of that directive? The author must be having some performance gain with this directive. Though I didn't test performance changes with/without this directive. > + DINDEX1(dindex,ip); Put a space after the delimiter: DINDEX1(dindex, ip); This happens in many places in the source, fix them all. OK. Useless brackets: (unsigned char) tt OK. > + } > + do *op++ = *ii++; while (--t > 0); memcpy(op, ii, t); ? Happens in other places as well. I looked more carefully into such cases. Following type of code blocks are repeated at several places: --- COPY4(op,ip); op += 4; ip += 4; if (--t > 0) { if (t >= 4) { do { COPY4(op,ip); op += 4; ip += 4; t -= 4; } while (t >= 4); if (t > 0) do *op++ = *ip++; while (--t > 0); } else do *op++ = *ip++; while (--t > 0); } --- Such entire blocks can be replaced by simple: memcpy(op, ip, t + 4); Since kernel has separate memcpy() implementation optimized for specific archs, we shouldn't loose on perf while having simpler (and shorter) code. I will work on this and post again. > +#define COPY4(dst,src) *(uint32_t *)(dst) = *(uint32_t *)(src) Use u32. What is the problem with uint32_t? Anyhow, I think COPY4 will disappear after those memcpy changes :) Thanks for comments. I will post revised patch soon. Cheers, Nitin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On 5/18/07, Pekka Enberg <[EMAIL PROTECTED]> wrote: So how about making that a little less verbose. Say like: Copyright (c) 1996-2005 Markus Franz Xaver and Johannes Oberhumer Oh, the author's name really is "Markus Franz Xaver Johannes Oberhumer." But please make the copyright statement one line. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
On 5/18/07, Nitin Gupta <[EMAIL PROTECTED]> wrote: + Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer [snip] So how about making that a little less verbose. Say like: Copyright (c) 1996-2005 Markus Franz Xaver and Johannes Oberhumer +#define DX2(p,s1,s2) \ + (size_t)((p)[2]) << (s2)) ^ (p)[1]) << (s1)) ^ (p)[0]) +#define DX3(p,s1,s2,s3) ((DX2((p)+1,s2,s3) << (s1)) ^ (p)[0]) +#define DMUL(a,b) ((size_t) ((a) * (b))) +#define DMS(v,s) ((size_t) (((v) & (D_MASK >> (s))) << (s))) +#define DM(v) DMS(v,0) + +#define D_BITS 14 +#define DINDEX1(d,p) d = DM(DMUL(0x21,DX3(p,5,5,6)) >> 5) +#define DINDEX2(d,p) d = (d & (D_MASK & 0x7ff)) ^ (D_HIGH | 0x1f) +#define DENTRY(p,in) (p) Please make these static inline functions. +#define PTR(a) ((unsigned long) (a)) +#define PTR_LT(a,b)(PTR(a) < PTR(b)) +#define PTR_GE(a,b)(PTR(a) >= PTR(b)) +#define PTR_DIFF(a,b) (PTR(a) - PTR(b)) +#define pd(a,b)((size_t) ((a)-(b))) [snip] +#define COPY4(dst,src) *(uint32_t *)(dst) = *(uint32_t *)(src) Please drop these. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] LZO1X de/compression support
Good work.. On Fri, May 18, 2007 at 03:28:31PM +0530, Nitin Gupta wrote: > Facts for LZO (at least for original code. Should hold true for this > port also - hence the RFC!): > - The compressor can never overrun buffer. > - The "non-safe" version of decompressor can never overrun buffer if > compressed data is unmodified. I am not sure about this if compressed > data is malicious (to be confirmed from the author). > - The "safe" version can never crash (buffer overrun etc.) - confirmed > from the author. What's the proof? > +/* LZO1X_1 compression */ > +int > +lzo1x_compress(const unsigned char *src, size_t src_len, > + unsigned char *dst, size_t *dst_len, > + void *workmem); int lzo1x_compress(const unsigned char *src, size_t src_len, unsigned char *dst, size_t *dst_len, void *workmem); is the preferred style. > + Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer > + Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer > + All Rights Reserved. > + > + The LZO library is free software; you can redistribute it and/or > + modify it under the terms of the GNU General Public License, > + version 2, as published by the Free Software Foundation. > + > + The LZO library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with the LZO library; see the file COPYING. > + If not, write to the Free Software Foundation, Inc., > + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. > + > + Markus F.X.J. Oberhumer > + <[EMAIL PROTECTED]> > + http://www.oberhumer.com/opensource/lzo/ > + > + > + This file is derived from lzo1x_1.c and lzo1x_c.ch found in original > + LZO 2.02 code. Some additional changes have also been made to make > + it work in kernel space. > + > + Nitin Gupta > + <[EMAIL PROTECTED]> > + */ > + > +#include > +#include > +#include > + > +#include "lzo1x_int.h" > + > +MODULE_LICENSE("GPL"); > +MODULE_DESCRIPTION("LZO1X Compression"); > + > +/* compress a block of data. */ > +static unsigned int > +lzo1x_compress_worker(const unsigned char *in, size_t in_len, > + unsigned char *out, size_t *out_len, > + void *workmem) > +{ > + register const unsigned char *ip; Is the register directive really useful? Or any subsequent usage of that directive? > + unsigned char *op; > + const unsigned char * const in_end = in + in_len; > + const unsigned char * const ip_end = in + in_len - M2_MAX_LEN - 5; > + const unsigned char *ii; > + const unsigned char ** const dict = (const unsigned char **)workmem; > + > + op = out; > + ip = in; > + ii = ip; > + > + ip += 4; > + for (;;) { > + register const unsigned char *m_pos; > + size_t m_off; > + size_t m_len; > + size_t dindex; > + > + DINDEX1(dindex,ip); Put a space after the delimiter: DINDEX1(dindex, ip); This happens in many places in the source, fix them all. > + m_pos = dict[dindex]; > + > + if (LZO_CHECK_MPOS_NON_DET(m_pos,m_off,in,ip,M4_MAX_OFFSET)) > + goto literal; > + if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3]) > + goto try_match; > + > + DINDEX2(dindex,ip); > + m_pos = dict[dindex]; > + > + if (LZO_CHECK_MPOS_NON_DET(m_pos,m_off,in,ip,M4_MAX_OFFSET)) > + goto literal; > + if (m_off <= M2_MAX_OFFSET || m_pos[3] == ip[3]) > + goto try_match; > + > + goto literal; > + > +try_match: > +if (*(const unsigned short *)m_pos != *(const unsigned short *)ip) > { > +} else { > + if (likely(m_pos[2] == ip[2])) > + goto match; > +} > + > + /* a literal */ > +literal: > + dict[dindex] = ip; > + ++ip; > + if (unlikely(ip >= ip_end)) > + break; > + continue; > + > + > + /* a match */ > +match: > + dict[dindex] = ip; > + /* store current literal run */ > + if (pd(ip,ii) > 0) { > + register size_t t = pd(ip,ii); > + if (t <= 3) > + op[-2] |= (unsigned char)(t); > + else if (t <= 18) > + *op++ = (unsigned char)(t - 3); > + else { > + register size_t tt = t - 18; > + *op++ = 0; > + while (tt > 255) { > + tt -= 255; > + *op++ = 0; > + } > + *op++ = (unsigned char)(tt); Useless brackets: (unsigned char) tt > + }
[RFC] LZO1X de/compression support
Hi, This is kernel port of LZO1X de/compression algo stripped down to just ~500 LOC! It is derived from original LZO 2.02 code found at: http://www.oberhumer.com/opensource/lzo/download/ The code has also been reformatted to match general kernel style. Facts for LZO (at least for original code. Should hold true for this port also - hence the RFC!): - The compressor can never overrun buffer. - The "non-safe" version of decompressor can never overrun buffer if compressed data is unmodified. I am not sure about this if compressed data is malicious (to be confirmed from the author). - The "safe" version can never crash (buffer overrun etc.) - confirmed from the author. This patch, as of yet, only gives 'non-safe' version of decompressor. The 'safe' version will be included soon. Since 'non-safe' version has no problems if compressed data is unmodified, it is useful in cases we have such guarantees on compressed data and hence don't want additional overhead of 'safe' version. For e.g. Compressed Caching project (http://linuxcompressed.sourceforge.net) has been using the 'non-safe' version of LZO1X since long time without any problems w.r.t. de/compression itself. For now, I have tested this on x86 only. Signed-off-by: Nitin Gupta <[EMAIL PROTECTED]> --- diff --git a/include/linux/lzo1x.h b/include/linux/lzo1x.h new file mode 100755 index 000..aae547e --- /dev/null +++ b/include/linux/lzo1x.h @@ -0,0 +1,61 @@ +/* lzo1x.h -- public interface of the LZO1X compression algorithm + + This file is part of the LZO real-time data compression library. + + Copyright (C) 2005 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 2004 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 2003 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 2002 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 2001 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 2000 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 1999 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 1998 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 1997 Markus Franz Xaver Johannes Oberhumer + Copyright (C) 1996 Markus Franz Xaver Johannes Oberhumer + All Rights Reserved. + + The LZO library is free software; you can redistribute it and/or + modify it under the terms of the GNU General Public License, + version 2, as published by the Free Software Foundation. + + The LZO library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with the LZO library; see the file COPYING. + If not, write to the Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + + Markus F.X.J. Oberhumer + <[EMAIL PROTECTED]> + http://www.oberhumer.com/opensource/lzo/ + + + This file is modified version of lzo1x.h found in original LZO 2.02 + code. Some additional changes have also been made to make it work + in kernel space. + + Nitin Gupta + <[EMAIL PROTECTED]> + */ + +#ifndef __LZO1X_H +#define __LZO1X_H + +/* Size of temp buffer (workmem) required by lzo1x_compress */ +#define LZO1X_WORKMEM_SIZE ((size_t) (16384L * sizeof(unsigned char *))) + +/* LZO1X_1 compression */ +int +lzo1x_compress(const unsigned char *src, size_t src_len, + unsigned char *dst, size_t *dst_len, + void *workmem); + +/* LZO1X_1 decompression */ +int +lzo1x_decompress(const unsigned char *src, size_t src_len, + unsigned char *dst, size_t *dst_len); + +#endif diff --git a/lib/Kconfig b/lib/Kconfig index 2e7ae6b..9d30b1f 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -64,6 +64,12 @@ config ZLIB_INFLATE config ZLIB_DEFLATE tristate +config LZO1X + tristate "LZO1X Compression/Decompression" + help + Compression: LZO1X-1 + Decompression: LZO1X + # # Generic allocator support is selected if needed # diff --git a/lib/Makefile b/lib/Makefile index c8c8e20..4dad99d 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -49,6 +49,7 @@ obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o obj-$(CONFIG_ZLIB_INFLATE) += zlib_inflate/ obj-$(CONFIG_ZLIB_DEFLATE) += zlib_deflate/ obj-$(CONFIG_REED_SOLOMON) += reed_solomon/ +obj-$(CONFIG_LZO1X)+= lzo1x/ obj-$(CONFIG_TEXTSEARCH) += textsearch.o obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o diff --git a/lib/lzo1x/Makefile b/lib/lzo1x/Makefile new file mode 100644 index 000..322683e --- /dev/null +++ b/lib/lzo1x/Makefile @@ -0,0 +1,2 @@ +obj-$(CONFIG_LZO1X) += lzo1x.o +lzo1x-objs := lzo1x_compress.o lzo1x_decompress.o diff --git a/lib/lzo1x/lzo1x_compress.c b/lib/lzo1x/lzo1x_compress.c new file mode 100755 index 000..02e3324 --- /dev/null +++ b/lib/lzo1x/lzo1x_compress.c @@ -0,0 +1,255 @@ +/* lzo1x_compress.c -- LZO1X-1