[jira] Commented: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878507#action_12878507 ] Hadoop QA commented on MAPREDUCE-1853: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446976/cache-task-attempts.diff against trunk revision 953976. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/241/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/241/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/241/console This message is automatically generated. MultipleOutputs does not cache TaskAttemptContext - Key: MAPREDUCE-1853 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Environment: OSX 10.6 java6 Reporter: Torsten Curdt Priority: Critical Fix For: 0.22.0 Attachments: cache-task-attempts.diff In MultipleOutputs there is [code] private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } [code] so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878509#action_12878509 ] Amareshwari Sriramadasu commented on MAPREDUCE-1853: Test failure is due to MAPREDUCE-1834 MultipleOutputs does not cache TaskAttemptContext - Key: MAPREDUCE-1853 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Environment: OSX 10.6 java6 Reporter: Torsten Curdt Priority: Critical Fix For: 0.22.0 Attachments: cache-task-attempts.diff In MultipleOutputs there is [code] private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } [code] so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878508#action_12878508 ] Hadoop QA commented on MAPREDUCE-1857: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446851/patch-1857.txt against trunk revision 953976. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/568/console This message is automatically generated. Remove unused stream.numinputspecs configuration Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878510#action_12878510 ] Amareshwari Sriramadasu commented on MAPREDUCE-1857: Test failure is due to MAPREDUCE-1834. bq. -1 tests included. The patch removes unused code. So, no tests are added. Remove unused stream.numinputspecs configuration Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1863) [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen
[Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen -- Key: MAPREDUCE-1863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1863 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amar Kamat Assignee: Amar Kamat All the traces generated by Rumen for jobs having failed task attempts has null value for failedMapAttemptCDFs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1863) [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen
[ https://issues.apache.org/jira/browse/MAPREDUCE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-1863: -- Fix Version/s: 0.22.0 Affects Version/s: 0.22.0 [Rumen] Null failedMapAttemptCDFs in job traces generated by Rumen -- Key: MAPREDUCE-1863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1863 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 All the traces generated by Rumen for jobs having failed task attempts has null value for failedMapAttemptCDFs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878515#action_12878515 ] Amareshwari Sriramadasu commented on MAPREDUCE-1122: Users can specify Mapper/Reducer to be Java Mapper/Reducer or a command. Also, he could specify input format, output format and partitioner for his streaming job. The below tables summarize the mapper or reducer in use when streaming supports both old and new api. Note : In the tables below, NS stands for 'Not specified. *Table 1* Mapper-in-use for given spec, when num reducers = 0: ||Mapper || InputFormat || OutputFormat || Valid conf?|| Mapper-in-use || |Command|NS|NS|Yes|New| |Command|Old|NS|Yes|Old| |Command|Old|Old|Yes|Old| |Command|Old|New|{color:red}No{color}| |Command|New|NS|Yes|New| |Command|New|Old|{color:red}No{color}| |Command|New|New|Yes|New| |Old|NS|NS|Yes|Old| |Old|NS|Old|Yes|Old| |Old|Old|NS|Yes|Old| |Old|Old|Old|Yes|Old| |Old|-|New|{color:red}No{color}| |Old|New|-|{color:red}No{color}| |New|NS|NS|Yes|New| |New|NS|New|Yes|New| |New|New|NS|Yes|New| |New|New|New|Yes|New| |New|-|Old|{color:red}No{color}| |New|Old|-|{color:red}No{color}| *Table 2* Mapper-in-use for given spec, when num reducers != 0: ||Mapper || InputFormat || Partitioner|| Valid conf?|| Mapper-in-use || |Command|NS|NS|Yes|New| |Command|Old|NS|Yes|Old| |Command|Old|Old|Yes|Old| |Command|Old|New|{color:red}No{color}| |Command|New|NS|Yes|New| |Command|New|Old|{color:red}No{color}| |Command|New|New|Yes|New| |Old|NS|NS|Yes|Old| |Old|NS|Old|Yes|Old| |Old|Old|NS|Yes|Old| |Old|Old|Old|Yes|Old| |Old|New|-|{color:red}No{color}| |Old|-|New|{color:red}No{color}| |New|NS|NS|Yes|New| |New|NS|New|Yes|New| |New|New|NS|Yes|New| |New|New|New|Yes|New| |New|Old|-|{color:red}No{color}| |New|-|Old|{color:red}No{color}| *Table 3* Reducer-in-use for a given spec : || Reducer || OutputFormat || Valid conf?|| Reducer-in-use || | Command | NS |Yes |New| | Command | Old |Yes |Old | | Command | New |Yes |New| |Old|NS|Yes|Old| |New|NS|Yes|New| |Old|Old|Yes|Old| |New|New|Yes|New| |Old|New|{color:red}No{color}| |New|Old|{color:red}No{color}| streaming with custom input format does not support the new API --- Key: MAPREDUCE-1122 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.1 Environment: any OS Reporter: Keith Jackson When trying to implement a custom input format for use with streaming, I have found that streaming does not support the new API, org.apache.hadoop.mapreduce.InputFormat, but requires the old API, org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1851) Document configuration parameters in streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878530#action_12878530 ] Ravi Gummadi commented on MAPREDUCE-1851: - It seems stream.jobLog_ is not documented anywhere and seems to be not useful. We can remove it altogether, may be in a separate JIRA. So let us not document that here ? stream.addenvironment seems to be internal property and is not intended for hadoop streaming users. Let us not document it. We can add the config property stream.stderr.reporter.prefix with the default value reporter:. This would need changes to the questions/answers related to update status and update counter in FAQ ? Document configuration parameters in streaming -- Key: MAPREDUCE-1851 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1851 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: patch-1851.txt There are several streaming options such as stream.map.output.field.separator, stream.num.map.output.key.fields, stream.map.input.field.separator, stream.reduce.input.field.separator, stream.map.input.ignoreKey, stream.non.zero.exit.is.failure etc which are spread everywhere. These should be documented at single place with description and default-value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878532#action_12878532 ] Ravi Gummadi commented on MAPREDUCE-1857: - There seems to be another config property stream.debug that is seen only in some unit tests. I don't know if it was added for some purpose earlier, but it doesn't seem to be used anywhere in source code. So can we remove that also in this patch itself ? Remove unused stream.numinputspecs configuration Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1857) Remove unused stream.numinputspecs configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1857: --- Status: Open (was: Patch Available) Remove unused stream.numinputspecs configuration Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1857) Remove unused streaming configuration from src
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1857: --- Summary: Remove unused streaming configuration from src (was: Remove unused stream.numinputspecs configuration) Remove unused streaming configuration from src -- Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1851) Document configuration parameters in streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878537#action_12878537 ] Ravi Gummadi commented on MAPREDUCE-1851: - We could also specify for the 4 properties stream.map.input, stream.map.output, stream.reduce.input and stream.reduce.input that these will take the values given with -D only if -io identifier is not used. In other words, Should we say that -io identifier will replace these 4 properties with the identifier ? Document configuration parameters in streaming -- Key: MAPREDUCE-1851 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1851 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/streaming, documentation Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: patch-1851.txt There are several streaming options such as stream.map.output.field.separator, stream.num.map.output.key.fields, stream.map.input.field.separator, stream.reduce.input.field.separator, stream.map.input.ignoreKey, stream.non.zero.exit.is.failure etc which are spread everywhere. These should be documented at single place with description and default-value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1857) Remove unused streaming configuration from src
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1857: --- Attachment: patch-1857-1.txt Patch removes stream.debug from testcases and unused stream.recordreader.compression from TestGZipInput. Also removes commented lines in some testcases. Remove unused streaming configuration from src -- Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857-1.txt, patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1857) Remove unused streaming configuration from src
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1857: --- Status: Patch Available (was: Open) Remove unused streaming configuration from src -- Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857-1.txt, patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1857) Remove unused streaming configuration from src
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878544#action_12878544 ] Amareshwari Sriramadasu commented on MAPREDUCE-1857: bq. unused stream.recordreader.compression from TestGZipInput. When the test was added, this configuration was read from StreamLineRecordReader. Now, the RecordReader no more exists. Remove unused streaming configuration from src -- Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857-1.txt, patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1857) Remove unused streaming configuration from src
[ https://issues.apache.org/jira/browse/MAPREDUCE-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878546#action_12878546 ] Ravi Gummadi commented on MAPREDUCE-1857: - Patch looks good. +1 Remove unused streaming configuration from src -- Key: MAPREDUCE-1857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1857 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Priority: Trivial Fix For: 0.22.0 Attachments: patch-1857-1.txt, patch-1857.txt The configuration stream.numinputspecs is just set and not read anywhere. It can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1829) JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878659#action_12878659 ] Scott Chen commented on MAPREDUCE-1829: --- Thanks for your help, Vinod and Ravi :) JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort() -- Key: MAPREDUCE-1829 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1829 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1829-20100610.txt, MAPREDUCE-1829.txt findSpeculativeTask needs only one candidate to speculate so it does not need to sort the whole list. It may looks OK but someone can still submit big jobs with small slow task thresholds. In this case, this sorting becomes expensive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test
[ https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878719#action_12878719 ] Konstantin Boudnik commented on MAPREDUCE-1854: --- bq. The tasktrackerStatus is a writable object, should'nt the inner class of writable object be public for others to use. You might be right. However, this field has package-private access. And I believe this has been done for a reason. I am not an expert on MR's internals to tell you one way or another. However, from the common standpoint such widening of permissions isn't advisable. .bq My real intention of having abstract parent class is have common functionality that can be shared If in the future we'll see that the number of such shared functions is growing and it become useful to move them all to the common parent we might do just that. However, two functions don't like a good justification to me. Thanks for the explanations on the Common classes' modifications. They all make sense. I guess these changes will have to end up in a separate JIRA though. What about getting rid of the script wrappers for ssh functionality and sleep? [herriot] Automate health script system test Key: MAPREDUCE-1854 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 Project: Hadoop Map/Reduce Issue Type: New Feature Components: test Environment: Herriot framework Reporter: Balaji Rajagopalan Assignee: Balaji Rajagopalan Attachments: health_script_5.txt Original Estimate: 120h Remaining Estimate: 120h 1. There are three scenarios, first is induce a error from health script, verify that task tracker is blacklisted. 2. Make the health script timeout and verify the task tracker is blacklisted. 3. Make an error in the health script path and make sure the task tracker stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1774) Large-scale Automated Framework
[ https://issues.apache.org/jira/browse/MAPREDUCE-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1774: -- Attachment: MAPREDUCE-1774.patch - This patch addresses audit warnings caused by missing Apache license boiler plate in a couple of places. - Javac warnings are caused by using deprecated {{JobConf}} and {{JobContext}} in two new classes from {{testjar}} package. While this is a valid issue I am not sure if it has to fought considering 2K+ of similar warnings all over the MR code. - Core tests failures are old: they are around for at least 6 days and this patch hasn't cause any ones Contrib test failure seems irrelevant (a Mumak testcase {{TestSimulatorDeterministicReplay}} timing out for over 10 days). Large-scale Automated Framework --- Key: MAPREDUCE-1774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1774 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch, MAPREDUCE-1774.patch This is MapReduce part of HADOOP-6332 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1559) The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated MAPREDUCE-1559: Attachment: MR-1559.1.patch Patch for trunk uploaded. The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem -- Key: MAPREDUCE-1559 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1559 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: MR-1559.1.patch, mr-1559.patch The submitJob RPC finally creates a timer task for renewing the delegation tokens of the submitting user. This timer task inherits the context of the RPC handler that runs in the context of the job submitting user, and when it tries to create a filesystem, the RPC client tries to use the user's credentials. This should instead use the JobTracker's credentials. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1559) The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated MAPREDUCE-1559: Status: Patch Available (was: Open) The DelegationTokenRenewal timer task should use the jobtracker's credentials to create the filesystem -- Key: MAPREDUCE-1559 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1559 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: MR-1559.1.patch, mr-1559.patch The submitJob RPC finally creates a timer task for renewing the delegation tokens of the submitting user. This timer task inherits the context of the RPC handler that runs in the context of the job submitting user, and when it tries to create a filesystem, the RPC client tries to use the user's credentials. This should instead use the JobTracker's credentials. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1848: -- Attachment: MAPREDUCE-1848-20100614.txt Put number of speculative, data local, rack local tasks in JobTracker metrics - Key: MAPREDUCE-1848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1848-20100614.txt It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1848: -- Attachment: (was: MAPREDUCE-1848-20100614.txt) Put number of speculative, data local, rack local tasks in JobTracker metrics - Key: MAPREDUCE-1848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1848-20100614.txt It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1848: -- Attachment: MAPREDUCE-1848-20100614.txt Put number of speculative, data local, rack local tasks in JobTracker metrics - Key: MAPREDUCE-1848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1848-20100614.txt It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1848: -- Status: Patch Available (was: Open) Put number of speculative, data local, rack local tasks in JobTracker metrics - Key: MAPREDUCE-1848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1848-20100614.txt It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878783#action_12878783 ] Scott Chen commented on MAPREDUCE-1848: --- Add four methods in JobTrackerInstrumentation to collect speculative tasks, data and rack local tasks: {code} public void speculateMap(TaskAttemptID taskAttemptID) public void speculateReduce(TaskAttemptID taskAttemptID) public void launchDataLocalMap(TaskAttemptID taskAttemptID) public void launchRackLocalMap(TaskAttemptID taskAttemptID) {code} Put number of speculative, data local, rack local tasks in JobTracker metrics - Key: MAPREDUCE-1848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1848-20100614.txt It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1850) Include job submit host information (name and ip) in jobconf and jobdetails display
[ https://issues.apache.org/jira/browse/MAPREDUCE-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1850: Attachment: mapred-1850.patch This is a forward port from a patch for an earlier release Fix for deprecated APIs For trunk still need to fix Job.java and Configuration Include job submit host information (name and ip) in jobconf and jobdetails display --- Key: MAPREDUCE-1850 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1850 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Krishna Ramachandran Assignee: Krishna Ramachandran Attachments: mapred-1850.patch Enhancement to identify the source (submit host and ip) of a job request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1831: -- Status: Open (was: Patch Available) Delete the co-located replicas when raiding file Key: MAPREDUCE-1831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt In raid, it is good to have the blocks on the same stripe located on different machine. This way when one machine is down, it does not broke two blocks on the stripe. By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica missing probability). One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located. So when raiding the file, we can make the remaining replicas live on different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1831: -- Status: Patch Available (was: Open) Delete the co-located replicas when raiding file Key: MAPREDUCE-1831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt In raid, it is good to have the blocks on the same stripe located on different machine. This way when one machine is down, it does not broke two blocks on the stripe. By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica missing probability). One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located. So when raiding the file, we can make the remaining replicas live on different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-647) Update the DistCp forrest doc to make it consistent with the latest changes (5472, 5620, 5762, 5826)
[ https://issues.apache.org/jira/browse/MAPREDUCE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878823#action_12878823 ] Rodrigo Schmidt commented on MAPREDUCE-647: --- Nicholas, would you mind reviewing this patch? Update the DistCp forrest doc to make it consistent with the latest changes (5472, 5620, 5762, 5826) Key: MAPREDUCE-647 URL: https://issues.apache.org/jira/browse/MAPREDUCE-647 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Attachments: MAPREDUCE-647.patch New features have been added to DistCp and the documentation must be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1548) Hadoop archives should be able to preserve times and other properties from original files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt updated MAPREDUCE-1548: --- Status: Open (was: Patch Available) Hadoop archives should be able to preserve times and other properties from original files - Key: MAPREDUCE-1548 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1548 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 Attachments: MAPREDUCE-1548.0.patch Files inside hadoop archives don't keep their original: - modification time - access time - permission - owner - group all such properties are currently taken from the file storing the archive index, and not the stored files. This doesn't look very correct. There should be possible to preserve the original properties of the stored files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1548) Hadoop archives should be able to preserve times and other properties from original files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt updated MAPREDUCE-1548: --- Status: Patch Available (was: Open) Hadoop archives should be able to preserve times and other properties from original files - Key: MAPREDUCE-1548 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1548 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Affects Versions: 0.22.0 Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Fix For: 0.22.0 Attachments: MAPREDUCE-1548.0.patch Files inside hadoop archives don't keep their original: - modification time - access time - permission - owner - group all such properties are currently taken from the file storing the archive index, and not the stored files. This doesn't look very correct. There should be possible to preserve the original properties of the stored files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878824#action_12878824 ] Dick King commented on MAPREDUCE-323: - After some discussions, we've come to some decisions. 1: We'll store the completed jobs' history files in the DFS done history files tree, in the following fixed format: {{DONE/job-tracker-instance-ID//MM/DD/987654/}} The job tracker instance ID includes both the job tracker machine name and the epoch time of the instance start. There won't be very many directories on this level. {{/MM/DD}} documents the date of completion [actually, the date that the history file is copied to DFS]. {{987654}} are the leading six digits of the job serial number, considered as a nine-digit integer. The leading zeros ARE included, so the directories can be enumerated correctly in lexicographical order. Therefore, no directory will have more than 2000 files, except in the unlikely case that there are more than 2 million jobs in one day. 2: We will modify the web application, {{jobhistory.jsp}} , in the following ways: 2a: We will decide how many jobs to filter based on the following criteria 2a1: We stop at 11 tranches of serial numbers [the tenth boundary] or a day boundary, whichever comes first [but that page delivers buttons inviting you to ask for previous days,or more tranches]. Of course, as now, we stop at 100 items if we get that many items before crossing the directory boundary, but in the new code we will remember where to continue. However, in the new codebase we won't {{ls}} the files we don't present, improving the responsiveness accordingly. 2b: We will present the job history links, newest first. 2b1: To make this coherent, we will remember where we left off for pagination To summarize how the code will work, the pagination controls will look like this: Available Jobs in History (displaying 100 jobs from 1 to 100) {{[show all] [show 1000 per page] [show entire day] [first page][last page]}} {{ golem-jt1.megacorp.com-2010-05-18 golem-jt1.megacorp.com-2010-04-18 }} [current JT instance, previous and/or following. This line of pagination controls is omitted if there is only one.] {{ newest 2010/06/14 2010/06/13 2010/06/12 2010/06/11 2010/06/10 oldest }} [current day, two days previous, two days succeeding -- only within the current JT instance] {{ oldest 1 2 3 4 5 next newest }} directional words change when the search direction changes 2c: There is a notion of search direction. Currently we display oldest first, but I'm thinking of changing that because I judge most recent first to be the better default, especially as uptimes increase as the product becomes more mature. What do you think? Users can change direction by going to last page -- or oldest/newest date -- or oldest/newest task tracker. When you've done that, the navigation cursors change so you're going in the right direction. Improve the way job history files are managed - Key: MAPREDUCE-323 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0, 0.22.0 Reporter: Amar Kamat Assignee: Dick King Priority: Critical Today all the jobhistory files are dumped in one _job-history_ folder. This can cause problems when there is a need to search the history folder (job-recovery etc). It would be nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid, date, jobname_ etc but using _username_ will make the search much more efficient and also will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1073: - Status: Open (was: Patch Available) Assignee: Dick King Progress reported for pipes tasks is incorrect. --- Key: MAPREDUCE-1073 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 0.20.1 Reporter: Sreekanth Ramakrishnan Assignee: Dick King Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReaderK1, V1, OutputCollectorK2, V2, Reporter)}} we do the following: {code} while (input.next(key, value)) { downlink.mapItem(key, value); if(skipping) { downlink.flush(); } } {code} This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1778: Attachment: mapred-1778.20S-1.patch revised 20S patch (git pull) after repo sync CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable -- Key: MAPREDUCE-1778 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Amar Kamat Assignee: Krishna Ramachandran Attachments: mapred-1778-1.patch, mapred-1778-2.patch, mapred-1778-3.patch, mapred-1778-4.patch, mapred-1778.20S-1.patch, mapred-1778.20S.patch, mapred-1778.patch If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then CompletedJobStatusStore silently ignores the failure and disables CompletedJobStatusStore. Ideally the JobTracker should bail out early indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-469) Support concatenated gzip and bzip2 files
[ https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Roelofs updated MAPREDUCE-469: --- Attachment: grr-hadoop-common.dif.20100614c grr-hadoop-mapreduce.dif.20100614c Almost-final gzip concatenation code (several style-related issues to deal with, but working code, both native and non-native, with no debug statements) and a halfway test case (need to get bzip2 half working). Summary: I implemented an Inflater-based Decompressor with manual gzip header/trailer parsing and CRC checks, and added new getRemaining() and resetPartially() methods to the interface. I also modified DecompressorStream to support concatenated streams (decompress() and getCompressedData() methods). For backward compatibility, the default behavior is unchanged; one needs to set the new io.compression.gzip.concat config option to true to turn it on. Since bzip2 apparently changed its behavior without such a setting, perhaps this is overkill... Anyway, this is against trunk (as of a week or two ago). I still need to check it against Yahoo's tree, deal with the FIXMEs, update my source tree(s), run test-patch, etc. Also, I haven't included the (binary) test files here; I'll do so in one of the next versions of the patch. Support concatenated gzip and bzip2 files - Key: MAPREDUCE-469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Tom White Assignee: Greg Roelofs Attachments: grr-hadoop-common.dif.20100614c, grr-hadoop-mapreduce.dif.20100614c When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-469) Support concatenated gzip and bzip2 files
[ https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878848#action_12878848 ] David Ciemiewicz commented on MAPREDUCE-469: On vacation Mon-Wed Feb 15-17. Offsite Thu-Fri, Feb 18-19. Support concatenated gzip and bzip2 files - Key: MAPREDUCE-469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Tom White Assignee: Greg Roelofs Attachments: grr-hadoop-common.dif.20100614c, grr-hadoop-mapreduce.dif.20100614c When running MapReduce with concatenated gzip files as input only the first part is read, which is confusing, to say the least. Concatenated gzip is described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1864) PipeMapRed.java has unintialized members log_ and LOGNAME
PipeMapRed.java has unintialized members log_ and LOGNAME -- Key: MAPREDUCE-1864 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1864 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Fix For: 0.22.0 PipeMapRed.java has members log_ and LOGNAME, which are never initialized and they are used in code for logging in several places. They should be removed and PipeMapRed should use commons LogFactory and Log for logging. This would improve code maintainability. Also, as per [comment | https://issues.apache.org/jira/browse/MAPREDUCE-1851?focusedCommentId=12878530page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878530], stream.joblog_ configuration property can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1864) PipeMapRed.java has uninitialized members log_ and LOGNAME
[ https://issues.apache.org/jira/browse/MAPREDUCE-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1864: --- Summary: PipeMapRed.java has uninitialized members log_ and LOGNAME (was: PipeMapRed.java has unintialized members log_ and LOGNAME ) PipeMapRed.java has uninitialized members log_ and LOGNAME --- Key: MAPREDUCE-1864 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1864 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Fix For: 0.22.0 PipeMapRed.java has members log_ and LOGNAME, which are never initialized and they are used in code for logging in several places. They should be removed and PipeMapRed should use commons LogFactory and Log for logging. This would improve code maintainability. Also, as per [comment | https://issues.apache.org/jira/browse/MAPREDUCE-1851?focusedCommentId=12878530page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878530], stream.joblog_ configuration property can be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1765) Streaming doc - change StreamXmlRecord to StreamXmlRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1765: --- Status: Patch Available (was: Open) Assignee: Corinne Chandel Fix Version/s: 0.22.0 Streaming doc - change StreamXmlRecord to StreamXmlRecordReader Key: MAPREDUCE-1765 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1765 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming, documentation Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Minor Fix For: 0.22.0 Attachments: streaming-doc.patch Streaming doc - fix typo. CHANGE: hadoop jar hadoop-streaming.jar -inputreader StreamXmlRecord,begin=BEGIN_STRING,end=END_STRING . (rest of the command) TO THIS: hadoop jar hadoop-streaming.jar -inputreader StreamXmlRecordReader,begin=BEGIN_STRING,end=END_STRING . (rest of the command) Note: No new test code; changes to documentation only. See: Bugzilla Ticket 2520942 - XML Streaming -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.