[jira] Commented: (HADOOP-6092) No space left on device
[ https://issues.apache.org/jira/browse/HADOOP-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893951#action_12893951 ] Meng Mao commented on HADOOP-6092: -- Well tonight we had a failure with more than 12TB free. So the problem gets worse for us. Here is a sample of the errors we got: java.io.IOException: Task: attempt_201005271420_14408_r_00_0 - The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for file:/hadoop/hadoop-metadata/cache/mapred/local/taskTracker/jobcache/job_201005271420_14408/attempt_201005271420_14408_r_00_0/output/map_1043.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2513) Error: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:84) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:217) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:157) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2533) Error: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:84) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:189) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorCombiner.reduce(ValueAggregatorCombiner.java:68) at com.visiblemeasures.overdrive.hadoop.metrics.OverdriveAggregator$OverdriveAggregatorCombiner.reduce(OverdriveAggregator.java:315) at com.visiblemeasures.overdrive.hadoop.metrics.OverdriveAggregator$OverdriveAggregatorCombiner.reduce(OverdriveAggregator.java:286) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1148) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2642) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2580) No space left on device --- Key: HADOOP-6092 URL: https://issues.apache.org/jira/browse/HADOOP-6092 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.19.0 Environment: ubuntu0.8.4 Reporter: mawanqiang Exception in thread main org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at
[jira] Assigned: (HADOOP-6887) Need a separate metrics per garbage collector
[ https://issues.apache.org/jira/browse/HADOOP-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Lu reassigned HADOOP-6887: --- Assignee: Luke Lu Need a separate metrics per garbage collector - Key: HADOOP-6887 URL: https://issues.apache.org/jira/browse/HADOOP-6887 Project: Hadoop Common Issue Type: Improvement Components: metrics Reporter: Bharath Mundlapudi Assignee: Luke Lu Fix For: 0.22.0 In addition to current GC metrics which are the sum of all the collectors, Need separate metrics for monitoring young generation and old generation collections per collector w.r.t collection count and collection time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HADOOP-6706: - Status: Open (was: Patch Available) Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HADOOP-6706: - Attachment: HADOOP-6706.5.patch New patch uploaded. Previous patch had an error. Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HADOOP-6706: - Status: Patch Available (was: Open) Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-6890: -- Attachment: improveListFiles.patch This patch 1. Move listFiles to be a method of FileContext#Util; 2. Modify FileContext#Util#listFiles so that the subtree is traversed in the depth-first order. 3. Improve the comments. Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-6890: -- Attachment: (was: improveListFiles.patch) Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-6890: -- Attachment: improveListFiles.patch Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-6890: -- Status: Patch Available (was: Open) Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894117#action_12894117 ] Hadoop QA commented on HADOOP-6706: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450920/HADOOP-6706.5.patch against trunk revision 980648. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/651/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/651/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/651/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/651/console This message is automatically generated. Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894123#action_12894123 ] Hadoop QA commented on HADOOP-6890: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450922/improveListFiles.patch against trunk revision 980648. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/652/console This message is automatically generated. Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-6890: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I've committed this! Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894143#action_12894143 ] Hairong Kuang commented on HADOOP-6890: --- Thanks Suresh for your quick review! The 6 javadoc warnings are all security related, not introduced by this patch. Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6632) Support for using different Kerberos keys for different instances of Hadoop services
[ https://issues.apache.org/jira/browse/HADOOP-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894158#action_12894158 ] Todd Lipcon commented on HADOOP-6632: - It looks like the 6632.mr.patch portion was committed to ydist but not trunk - was this intentional? Support for using different Kerberos keys for different instances of Hadoop services Key: HADOOP-6632 URL: https://issues.apache.org/jira/browse/HADOOP-6632 Project: Hadoop Common Issue Type: Improvement Reporter: Kan Zhang Assignee: Kan Zhang Fix For: 0.22.0 Attachments: 6632.mr.patch, c6632-05.patch, c6632-07.patch, HADOOP-6632-Y20S-18.patch, HADOOP-6632-Y20S-22.patch We tested using the same Kerberos key for all datanodes in a HDFS cluster or the same Kerberos key for all TaskTarckers in a MapRed cluster. But it doesn't work. The reason is that when datanodes try to authenticate to the namenode all at once, the Kerberos authenticators they send to the namenode may have the same timestamp and will be rejected as replay requests. This JIRA makes it possible to use a unique key for each service instance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6632) Support for using different Kerberos keys for different instances of Hadoop services
[ https://issues.apache.org/jira/browse/HADOOP-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894170#action_12894170 ] Devaraj Das commented on HADOOP-6632: - Yes this was intentional. The mr patch seemed like a hack and that's why we didn't commit it to trunk, and instead raised MAPREDUCE-1824 to discuss that... BTW, the problem which the mr patch attempted to address would be significantly less once we have HADOOP-6706 committed that does retries in case of failures due to the false replay attack detection by the rpc servers. MAPREDUCE-1824 takes a low priority.. Support for using different Kerberos keys for different instances of Hadoop services Key: HADOOP-6632 URL: https://issues.apache.org/jira/browse/HADOOP-6632 Project: Hadoop Common Issue Type: Improvement Reporter: Kan Zhang Assignee: Kan Zhang Fix For: 0.22.0 Attachments: 6632.mr.patch, c6632-05.patch, c6632-07.patch, HADOOP-6632-Y20S-18.patch, HADOOP-6632-Y20S-22.patch We tested using the same Kerberos key for all datanodes in a HDFS cluster or the same Kerberos key for all TaskTarckers in a MapRed cluster. But it doesn't work. The reason is that when datanodes try to authenticate to the namenode all at once, the Kerberos authenticators they send to the namenode may have the same timestamp and will be rejected as replay requests. This JIRA makes it possible to use a unique key for each service instance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-6892) Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters)
Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters) -- Key: HADOOP-6892 URL: https://issues.apache.org/jira/browse/HADOOP-6892 Project: Hadoop Common Issue Type: New Feature Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 HDFS-1150 will have changes to the start-up scripts and HttpServer. These are handled here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6892) Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters)
[ https://issues.apache.org/jira/browse/HADOOP-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HADOOP-6892: Attachment: HADOOP-6892.patch Patch for trunk. It's actually smaller than it appears. HttpServer had some strangely indented code that patch moved in and out of. Straight forward port of 20 work. Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters) -- Key: HADOOP-6892 URL: https://issues.apache.org/jira/browse/HADOOP-6892 Project: Hadoop Common Issue Type: New Feature Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HADOOP-6892.patch HDFS-1150 will have changes to the start-up scripts and HttpServer. These are handled here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6892) Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters)
[ https://issues.apache.org/jira/browse/HADOOP-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HADOOP-6892: Status: Patch Available (was: Open) submitting patch. Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters) -- Key: HADOOP-6892 URL: https://issues.apache.org/jira/browse/HADOOP-6892 Project: Hadoop Common Issue Type: New Feature Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HADOOP-6892.patch HDFS-1150 will have changes to the start-up scripts and HttpServer. These are handled here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6632) Support for using different Kerberos keys for different instances of Hadoop services
[ https://issues.apache.org/jira/browse/HADOOP-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894192#action_12894192 ] Todd Lipcon commented on HADOOP-6632: - Thanks, Deveraj. That makes sense. Support for using different Kerberos keys for different instances of Hadoop services Key: HADOOP-6632 URL: https://issues.apache.org/jira/browse/HADOOP-6632 Project: Hadoop Common Issue Type: Improvement Reporter: Kan Zhang Assignee: Kan Zhang Fix For: 0.22.0 Attachments: 6632.mr.patch, c6632-05.patch, c6632-07.patch, HADOOP-6632-Y20S-18.patch, HADOOP-6632-Y20S-22.patch We tested using the same Kerberos key for all datanodes in a HDFS cluster or the same Kerberos key for all TaskTarckers in a MapRed cluster. But it doesn't work. The reason is that when datanodes try to authenticate to the namenode all at once, the Kerberos authenticators they send to the namenode may have the same timestamp and will be rejected as replay requests. This JIRA makes it possible to use a unique key for each service instance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6892) Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters)
[ https://issues.apache.org/jira/browse/HADOOP-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894196#action_12894196 ] Hadoop QA commented on HADOOP-6892: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450926/HADOOP-6892.patch against trunk revision 980953. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/653/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/653/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/653/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/653/console This message is automatically generated. Common component of HDFS-1150 (Verify datanodes' identities to clients in secure clusters) -- Key: HADOOP-6892 URL: https://issues.apache.org/jira/browse/HADOOP-6892 Project: Hadoop Common Issue Type: New Feature Components: security Affects Versions: 0.22.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.22.0 Attachments: HADOOP-6892.patch HDFS-1150 will have changes to the start-up scripts and HttpServer. These are handled here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HADOOP-6706: - Status: Open (was: Patch Available) Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HADOOP-6706: - Attachment: HADOOP-6706.6.patch New patch addressing the comments. Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch, HADOOP-6706.6.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HADOOP-6706: - Status: Patch Available (was: Open) Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch, HADOOP-6706.6.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6706) Relogin behavior for RPC clients could be improved
[ https://issues.apache.org/jira/browse/HADOOP-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894219#action_12894219 ] Hadoop QA commented on HADOOP-6706: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450934/HADOOP-6706.6.patch against trunk revision 980953. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/654/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/654/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/654/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/654/console This message is automatically generated. Relogin behavior for RPC clients could be improved -- Key: HADOOP-6706 URL: https://issues.apache.org/jira/browse/HADOOP-6706 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: 6706-bp20-2.patch, 6706.bp20.1.patch, 6706.bp20.patch, HADOOP-6706-BP20-fix1.patch, HADOOP-6706-BP20-fix2.patch, HADOOP-6706-BP20-fix3.patch, HADOOP-6706.2.patch, HADOOP-6706.4.patch, HADOOP-6706.5.patch, HADOOP-6706.6.patch Currently, the relogin in the RPC client happens on only a SaslException. But we have seen cases where other exceptions are thrown (like IllegalStateException when the client's ticket is invalid). This jira is to fix that behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6890) Improve listFiles API introduced by HADOOP-6870
[ https://issues.apache.org/jira/browse/HADOOP-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-6890: -- Attachment: listFilesInFS.patch This patch makes the implementation of listFiles in FileSystem is the same as the one in FileContext except that it does not handle symbolic links. This will make the code easier to maintain later one. Improve listFiles API introduced by HADOOP-6870 --- Key: HADOOP-6890 URL: https://issues.apache.org/jira/browse/HADOOP-6890 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: improveListFiles.patch, listFilesInFS.patch This jira is mainly for addressing Suresh's review comments for HADOOP-6870: 1. General comment: I have concerns about recursive listing. This could be abused by the applications, creating a lot of requests into HDFS. 2. Any deletion of files/directories while reursing through directories results in RuntimeException and application has a partial result. Should we ignore if a directory was in stack and was not found later when iterating through it? 3. FileSystem.java * listFile() - method javadoc could be better organized - first write about if path is directory and two cases recursive=true and false. Then if path is file and two cases recursive=true or false. * listFile() - document throwing RuntimeException, UnsupportedOperationException and the possible cause. IOException is no longer thrown. 4. TestListFiles.java * testDirectory() - comments test empty directory and test directory with 1 file should be moved up to relevant sections of the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Roelofs updated HADOOP-6837: - Attachment: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch Hand-edited pseudo-patch for easier reviewing. New files from the LZMA SDK were diffed against that (version 9.12), and patch-hunks containing only trivial changes (Java package, formatting, Apache boilerplate, white space, etc.) were replaced with a corresponding meta-comment. Note that the 9.12 diffs were diff -ruw (i.e., ignore whitespace-only changes on single lines) and only one directory level above the relevant code (LZMA-SDK vs. LZMA). The Java and C diffs were then combined, and the truly new files (e.g., LzmaCodec.java) were then added back from the original patch. 3700 lines vs. original 10400+ Support for LZMA compression Key: HADOOP-6837 URL: https://issues.apache.org/jira/browse/HADOOP-6837 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Nicholas Carlini Assignee: Nicholas Carlini Attachments: HADOOP-6837-lzma-1-20100722.non-trivial.pseudo-patch, HADOOP-6837-lzma-1-20100722.patch, HADOOP-6837-lzma-c-20100719.patch, HADOOP-6837-lzma-java-20100623.patch Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves higher compression ratios than both gzip and bzip2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-6349) Implement FastLZCodec for fastlz/lzo algorithm
[ https://issues.apache.org/jira/browse/HADOOP-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Carlini updated HADOOP-6349: - Attachment: hadoop-6349-3.patch Another patch. There is still debug code scattered about (commented out), as I might need to put it to use at some point. This code isn't tested as well as the last patch. Adds support for native compression/decompression. Native compression is 230% faster than java. Native decompression is 70% faster than java. Somewhat-large redesign of the compressor. Compression is now fifty times faster when compressing around 64MB. The compressor used to keep in memory all input it had previously processed, and arraycopy it to a new array every time it needed more space, so through the process of compressing 64MB of data calling write every 64k, it would end up copying ~32GB through memory (this is how it was for my test case). Instead compress 128MB of data and write every 1k, and you copy 8.8TB through memory. Also modified compressor to include an end-of-stream marker. This way the decompressor can set to finished so the stream can return -1. The end of stream mark is indicated by setting the four unused bytes after the input size to high in the last chunk of length 0. By this way, any decompressor which does not support the end of stream marker will never read those bytes and will just decompress an empty block and not notice anything is wrong. Adds another method to TestCodecPerformance which haves it load a (relatively small) input file to memory, and from it generate 64MB of data to compress. (It does this by taking random substrings from 16 to 128 bytes at random offsets until there are 64MB.) It then directly compresses the 64MB from memory to memory and times that. These times seem to be more reflective than timing the compression of key %d value %d and of timing the compression of random data. Right now this mode is enabled by calling it with the -input flag. Ported code for Adler32 to C, uses it when using native libraries. Added a constant in the compressor to allow for uncompressible data to instead be copied over byte for byte. This decreases the speed of the compressor by ~10% as it results in another memcpy, but it can more than double the speed of decompression. Here's what the new part of the test codec performance gives when given a log file. For comparison: DefaultCodec gets the size down to 11% and the BZip2Codec down to 8%. Previous patch: 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB. 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original). 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total compression time: 381868 ms (1716 KBps). 10/07/29 11:51:39 INFO compress.TestCodecPerformance: Total decompression time: 5051 ms (126 MBps). Current patch: Native C: 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB. 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original). 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total compression time: 3314 ms (193 MBps). 10/07/29 11:56:57 INFO compress.TestCodecPerformance: Total decompression time: 2861 ms (223 MBps). Current patch: Pure Java: 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total decompressed size: 640 MB. 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total compressed size: 177 MB (27% of original). 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total compression time: 7891 ms (81 MBps). 10/07/29 12:15:50 INFO compress.TestCodecPerformance: Total decompression time: 5077 ms (126 MBps). Implement FastLZCodec for fastlz/lzo algorithm -- Key: HADOOP-6349 URL: https://issues.apache.org/jira/browse/HADOOP-6349 Project: Hadoop Common Issue Type: New Feature Components: io Reporter: William Kinney Attachments: hadoop-6349-1.patch, hadoop-6349-2.patch, hadoop-6349-3.patch, HADOOP-6349-TestFastLZCodec.patch, HADOOP-6349.patch, TestCodecPerformance.java, TestCodecPerformance.java, testCodecPerfResults.tsv Per [HADOOP-4874|http://issues.apache.org/jira/browse/HADOOP-4874], FastLZ is a good (speed, license) alternative to LZO. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-6837) Support for LZMA compression
[ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894246#action_12894246 ] Greg Roelofs commented on HADOOP-6837: -- First, apologies for having steered Nicholas wrong on the liblzma issue. As Hong noted, it provides a much saner (that is, zlib-like) API for this sort of thing, but I mistakenly thought it shared the GPL license of (parts of) xz, so we ignored it and he worked on the LZMA SDK code instead. (The latter did include a Java port, however; liblzma does not.) Overall, the 20100722 patch looks pretty decent (given the starting material), but it does include some less-than-beautiful workarounds to cope with the impedance mismatch between push- and pull-style I/O models. In light of the fact that liblzma is, in fact, public-domain software (every file under xz-4.999.9beta-143-g3e49/src/liblzma is either explicitly in the public domain or has been automatically generated by such a file), I'm going to ask that Nicholas redo the native-code version to use liblzma rather than the SDK. (Unfortunately, it looks like the transformation from C SDK to liblzma was a significant amount of work, so it doesn't appear that a trivial liblzma- ification of the Java SDK code is likely. If Nicholas concurs with that assessment, we can instead file a separate JIRA to port the liblzma C code to Java.) Note that liblzma includes an LZMA2 codec, so Scott Carey's splittable-codec suggestion is within reach, too. OK, enough preamble. There were a number of relatively minor style issues, which I'll simply list below, but the five main concerns were: - FakeInputStream.java, FakeOutputStream.java: the linked lists of byte arrays are tough to swallow, even given the push/pull problem, even given our previous discussions on the matter. It would be good to know what the stats are on these things in typical cases--how frequently does overflow occur in LzmaInputStream, for example, and how many buffers are used? - Is the Code(..., len) call in LzmaInputStream guaranteed to produce len bytes if it returns true? The calling read() function assumes this, but it's not at all obvious to me; the parameter is outSize in Code(), and I don't see that it's decremented or even really used at all (other than being stored in oldOutSize), unless it's buried inside macros defined elsewhere. The next two (or perhaps three) are no longer directly relevant, but they're general things to watch out for: - The return value from inStream.read() in LzmaNativeInputStream.java is ignored even though there's no guarantee the call will return the requested number of bytes. A comment (never have to call ... again) reiterates this error. - There's no need for recursion in LzmaNativeOutputStream.java's write() method; iterative solutions tend to be far cleaner, I think. (Probably ditto for LzmaNativeInputStream.java's read() method.) - LzmaCompressor.c has a pair of memleaks (state-outStream, state-inStream). Here are the minor readability/maintainability/cosmetic/style issues: * add LZMA SDK version (apparently 9.12) and perhaps its release date to the boilerplate * tabs/formatting of LZMA SDK code (CRC.java, BinTree, etc.): I _think_ tabs are frowned upon in Hadoop, though I might be wrong; at any rate, they seem to be rarely used ** for easy Hadoop-style formatting, indent -i2 -br -l80 is a start (though it's sometimes confused by Java/C++ constructs) * reuse existing CRC implementation(s)? (JDK has one, Hadoop has another) * prefer lowercase lzma for subdirs * use uppercase LZMA when referring to codec/algorithm (e.g., comments) * add README mentioning anything special/weird/etc. (e.g., weird hashtable issue); summary of changes made for Hadoop; potential Java/C diffs; binary compatibility between various output formats (other than trivial headers/ footers); LZMA2 (splittable, not yet implemented); liblzma (much cleaner, more zlib-like implementation, still PD); etc. * ant javadoc run yet? (apparently not recently) * line lengths, particularly comments (LzmaInputStream.java, etc.): should be no more than 80 columns in general (Hadoop guidelines) * avoid generic variable names for globals and class members; use existing conventions where possible (e.g., look at gzip/zlib and bzip2 code) * LzmaCodec.java: ** uppercase LZMA when referring to codec/algorithm in general ** funcionality x 4 ** throws ... { continuation line: don't further indent * LZ/InWindow.java ** leftover debug code at end * RangeCoder/Encoder.java ** spurious blank line (else just boilerplate) * FakeOutputStream.java: ** stuffeed ** ammount ** isOverflow() - didOverflow() * LzmaInputStream.java: ** [uses FakeOutputStream] ** bufferd ** we 've ** index - overflowIndex (or similar): too generic **