[
https://issues.apache.org/jira/browse/HADOOP-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694424#action_12694424
]
Hadoop QA commented on HADOOP-4652:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12400082/HADOOP-4652-v3.patch
against trunk revision 760783.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 7 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 Eclipse classpath. The patch retains Eclipse classpath integrity.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
+1 core tests. The patch passed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/87/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/87/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/87/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-minerva.apache.org/87/console
This message is automatically generated.
> RAgzip: multiple map tasks for a large gzipped file
> ---------------------------------------------------
>
> Key: HADOOP-4652
> URL: https://issues.apache.org/jira/browse/HADOOP-4652
> Project: Hadoop Core
> Issue Type: Improvement
> Components: io, mapred, native
> Affects Versions: 0.18.3, 0.19.0
> Reporter: Daehyun Kim
> Assignee: Daehyun Kim
> Priority: Minor
> Attachments: HADOOP-4652-v2.patch, HADOOP-4652-v3.patch,
> HADOOP-4652.path
>
>
> Currently, the hadoop processes gzipped files with only one map.
> We have made a patch that enables multiple map tasks for one large gzipped
> file. We call the patch RAgzip.
> To process multiple map tasks for gzipped file, you may use RAgzip by just
> changing InputFormat to RAGZIPInputFormat.
> The option used in RAGZIPInputFormat can be found at the javadoc of
> RAGZIPInputFormat part.
> RAgzip uses zlib's inflatePrime function which supports random access on a
> gzipped file.
> Since the inflatePrime is supported from the version of 1.2.2.4, it requires
> zlib 1.2.2.4 or higher. (We tested on zlib 1.2.3)
> RAgzip requires the preprocessing step that creates an access point (.ap)
> file, which is like the index of the gzipped file chunks.
> The access point(.ap) file is located in same path of the gzipped file.
> If there is a "/user/hadoop/test.gz", the .ap file is created with
> "/user/hadoop/test.gz.ap".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.