Re: Native (GZIP) decompress not faster than builtin

2009-05-10 Thread Stefan Podkowinski
Jens,

As your test shows, using a native codec won't make much sense for
small files, since the involved JNI overhead will likely out-weight
any possible gains.  With all the performance improvements in java 5 +
6 its reasonable to ask whether the native implementation does really
improve performance. I'd look at it as another option to further
squeeze out some more performance if you really need to.

- Stefan

On Sun, May 10, 2009 at 11:03 AM, Jens Riboe  wrote:
> Hi,
>
> During the past week I decided to use native decompress for a Hadoop job
> (using 0.20.0). But before implementing it I decided to write a small
> benchmark just so understand how much faster (better) it was. The result
> came out as a surprise
>
> May 6, 2009 10:56:47 PM org.apache.hadoop.util.NativeCodeLoader 
> INFO: Loaded the native-hadoop library
> May 6, 2009 10:56:47 PM org.apache.hadoop.io.compress.zlib.ZlibFactory
> 
> INFO: Successfully loaded & initialized native-zlib library
> May 6, 2009 10:56:47 PM org.apache.hadoop.io.compress.CodecPool
> getDecompressor
> INFO: Got brand-new decompressor
> Time of Hadoop  decompressor running 'small' job = 0:00:01.684 (1.684
> ms/file)
> Time of Hadoop  decompressor running 'large' job = 0:00:10.074 (1007.400
> ms/file)
> Time of Vanilla decompressor running 'small' job = 0:00:01.340 (1.340
> ms/file)
> Time of Vanilla decompressor running 'large' job = 0:00:10.094 (1009.400
> ms/file)
> Hadoop vs. Vanilla [small]: 125.67%
> Hadoop vs. Vanilla [large]: 99.80%
>
> For a small file, Hadoop native decompress takes 25% longer time to run that
> Java's built-in GZIPInputStream and for a few megabyte sized file the speed
> difference is negligible.
>
> I wrote a blog post about it which also contains the full source code of the
> benchmark.
> http://blog.ribomation.com/2009/05/07/comparison-of-decompress-ways-in-hadoo
> p/
>
> My questions are:
> [1]  Am I missing some key information for how to correctly use native GZIP
> compress?
>        I'm using codec pooling by the way.
>
> [2]  Will native decompress only take off for files larger than 100MB or
> 1000MB?
>        In my application I'm reading many KB sized gz files from an
> external source,
>        So I cannot change the compress method nor the file size.
>
> [3]  Has anybody experienced something similar to my result?
>
>
> Kind regards /jens
>
>


Native (GZIP) decompress not faster than builtin

2009-05-10 Thread Jens Riboe
Hi,

During the past week I decided to use native decompress for a Hadoop job
(using 0.20.0). But before implementing it I decided to write a small
benchmark just so understand how much faster (better) it was. The result
came out as a surprise

May 6, 2009 10:56:47 PM org.apache.hadoop.util.NativeCodeLoader 
INFO: Loaded the native-hadoop library
May 6, 2009 10:56:47 PM org.apache.hadoop.io.compress.zlib.ZlibFactory

INFO: Successfully loaded & initialized native-zlib library
May 6, 2009 10:56:47 PM org.apache.hadoop.io.compress.CodecPool
getDecompressor
INFO: Got brand-new decompressor
Time of Hadoop  decompressor running 'small' job = 0:00:01.684 (1.684
ms/file)
Time of Hadoop  decompressor running 'large' job = 0:00:10.074 (1007.400
ms/file)
Time of Vanilla decompressor running 'small' job = 0:00:01.340 (1.340
ms/file)
Time of Vanilla decompressor running 'large' job = 0:00:10.094 (1009.400
ms/file)
Hadoop vs. Vanilla [small]: 125.67%
Hadoop vs. Vanilla [large]: 99.80%

For a small file, Hadoop native decompress takes 25% longer time to run that
Java's built-in GZIPInputStream and for a few megabyte sized file the speed
difference is negligible. 

I wrote a blog post about it which also contains the full source code of the
benchmark.
http://blog.ribomation.com/2009/05/07/comparison-of-decompress-ways-in-hadoo
p/

My questions are:
[1]  Am I missing some key information for how to correctly use native GZIP
compress? 
I'm using codec pooling by the way.

[2]  Will native decompress only take off for files larger than 100MB or
1000MB? 
In my application I'm reading many KB sized gz files from an
external source,
So I cannot change the compress method nor the file size.

[3]  Has anybody experienced something similar to my result?


Kind regards /jens