>> - solves zlib version conflict problem by static linking zlib 1.2.3.
Oh, OK. I Missed It!
/Ed
On Wed, Nov 5, 2008 at 10:40 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
> Hi, welcome your contribute :)
>
> Here's my few comments,
>
> 1) We can't distribute any GPL or LGPL products with Hadoop. AFAIK,
> zlib was under a license that pure GPL. Should it be need a zlib in
> lib folder?
> 2) Yes, you can create Jira issue for this thing. If you attach your
> patch and submit patch, it'll be reviewed by active committers.
>
> Yours,
> Edward
>
> 2008/11/5 김대현[로그모델링] <[EMAIL PROTECTED]>:
>> Hello,
>>
>> I'm new to this mailing list, and this is the first trial of contribution.
>>
>>
>>
>> We have made a patch that enables multiple map tasks for one large *gzipped*
>> file. We call the patch RAgzip, which is the abbreviation of Random Access
>> gzip. It is like HADOOP-3646, which supports a big bzip2 file, and is an
>> alternative approach of PIG-42 which requires re-compression.
>>
>>
>>
>> RAgzip uses zlib's inflatePrime function which supports random access on a
>> gzipped file. Since the inflatePrime is supported from the version of
>> 1.2.2.4, it requires zlib 1.2.2.4 or higher. (We tested on zlib 1.2.3)
>>
>>
>>
>> RAgzip requires the preprocessing step that creates an access point (.ap)
>> file, which is like the index of the gzipped file chunks. (Unfortunately,
>> the preprocessing step seems to be sequential, that is, we cannot find the
>> way to parallelize.)
>>
>>
>>
>> RAgzip splits the gzipped file using the .ap file. To be more specific,
>> RAgzip reads the .ap file, get the start position and the compression
>> information of a partition of the gzipped file, decompress the partition and
>> feed it to the map task input when a map task starts.
>>
>>
>>
>> In short, you may use RAgzip by just changing InputFormat to
>> RAGZIPInputFormat.
>>
>>
>>
>> We have made RAgzip in two package types as follows:
>>
>> 1. jar
>>
>> - does not touch the Hadoop core
>>
>> - solves zlib version conflict problem by static linking zlib 1.2.3.
>>
>> 2. hadoop patch
>>
>> - integrated into Hadoop core
>>
>> - patches ZlibDecompressor.{c,java}: libhadoop.so changes
>>
>> - the version of zlib on the system should be 1.2.2.4 or higher.
>>
>>
>>
>> What I want to ask is:
>>
>> How to contribute RAgzip to Hadoop? May I just submit the hadoop patch
>> (package 2) to JIRA?
>>
>> I have read http://wiki.apache.org/hadoop/HowToContribute and changed our
>> source code to meet the coding style.
>>
>>
>>
>> Any comments will be appreciated.
>>
>> Thank you.
>>
>>
>>
>> - Daehyun Kim
>>
>>
>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> [EMAIL PROTECTED]
> http://blog.udanax.org
>
--
Best Regards, Edward J. Yoon @ NHN, corp.
[EMAIL PROTECTED]
http://blog.udanax.org