[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885843#action_12885843 ] Aaron Kimball commented on MAPREDUCE-1920: -- I agree that this shouldn't break :) And yet, I configured MapReduce as a straight-up pseudo-distributed instance. I didn't set anything other than mapred.job.tracker and fs.default.name in the conf files. My application calls job.getCounters() immediately upon return from job.waitForCompletion(). It may be possible that jobs are retiring instantaneously / "very quickly" in a manner that is racing with my application? Is there a guaranteed window of time for which a job won't be retired? I feel like there should be a guaranteed minimum; maybe this is in time, maybe as long as the original reference to a Job object on the client is live? (Easier said than done in the latter case -- maybe the Job could be configured in such a way as to reserve the right to retrieve its Counters or other post-execution data at least once after waitForCompletion() returns?) > Job.getCounters() returns null when using a cluster > --- > > Key: MAPREDUCE-1920 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Aaron Kimball >Priority: Critical > > Calling Job.getCounters() after the job has completed (successfully) returns > null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test
[ https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885841#action_12885841 ] Balaji Rajagopalan commented on MAPREDUCE-1854: --- how about creating src/test/system/fw and src/test/system/tc directories and have the scripts be present in two different directories. > [herriot] Automate health script system test > > > Key: MAPREDUCE-1854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test > Environment: Herriot framework >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan > Attachments: health_script_5.txt, health_script_7.txt, > health_script_trunk.txt, health_script_y20.txt > > Original Estimate: 120h > Remaining Estimate: 120h > > 1. There are three scenarios, first is induce a error from health script, > verify that task tracker is blacklisted. > 2. Make the health script timeout and verify the task tracker is blacklisted. > 3. Make an error in the health script path and make sure the task tracker > stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1741) Automate the test scenario of job related files are moved from history directory to done directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iyappan Srinivasan updated MAPREDUCE-1741: -- Attachment: MAPREDUCE-1741.patch patch for trunk > Automate the test scenario of job related files are moved from history > directory to done directory > --- > > Key: MAPREDUCE-1741 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1741 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 0.22.0 >Reporter: Iyappan Srinivasan > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1741.patch, MAPREDUCE-1741.patch, > TestJobHistoryLocation-ydist-security-patch.txt, > TestJobHistoryLocation-ydist-security-patch.txt, > TestJobHistoryLocation-ydist-security-patch.txt, > TestJobHistoryLocation.patch, TestJobHistoryLocation.patch, > TestJobHistoryLocation.patch > > > Job related files are moved from history directory to done directory, when > 1) Job succeeds > 2) Job is killed > 3) When 100 files are put in the done directory > 4) When multiple jobs are completed at the same time, some successful, some > failed. > Also, two files, conf.xml and job files should be present in the done > directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1741) Automate the test scenario of job related files are moved from history directory to done directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iyappan Srinivasan updated MAPREDUCE-1741: -- Attachment: TestJobHistoryLocation-ydist-security-patch.txt Remove the assert statements in private method and check them in the test method block. - These are now called in the test block. Also, a variable is added called retiredJobInterval, which is taken from mapred.jobtracker.retirejob.check and used in cases of wait for job to finish. > Automate the test scenario of job related files are moved from history > directory to done directory > --- > > Key: MAPREDUCE-1741 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1741 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 0.22.0 >Reporter: Iyappan Srinivasan > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1741.patch, > TestJobHistoryLocation-ydist-security-patch.txt, > TestJobHistoryLocation-ydist-security-patch.txt, > TestJobHistoryLocation-ydist-security-patch.txt, > TestJobHistoryLocation.patch, TestJobHistoryLocation.patch, > TestJobHistoryLocation.patch > > > Job related files are moved from history directory to done directory, when > 1) Job succeeds > 2) Job is killed > 3) When 100 files are put in the done directory > 4) When multiple jobs are completed at the same time, some successful, some > failed. > Also, two files, conf.xml and job files should be present in the done > directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885824#action_12885824 ] Vinay Kumar Thota commented on MAPREDUCE-1713: -- Cos, we have already opened a JIRA(HADOOP-6772) for common and it has been committed to trunk also. > Utilities for system tests specific. > > > Key: MAPREDUCE-1713 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, > systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch > > > 1. A method for restarting the daemon with new configuration. > public static void restartCluster(Hashtable props, String > confFile) throws Exception; > 2. A method for resetting the daemon with default configuration. > public void resetCluster() throws Exception; > 3. A method for waiting until daemon to stop. > public void waitForClusterToStop() throws Exception; > 4. A method for waiting until daemon to start. > public void waitForClusterToStart() throws Exception; > 5. A method for checking the job whether it has started or not. > public boolean isJobStarted(JobID id) throws IOException; > 6. A method for checking the task whether it has started or not. > public boolean isTaskStarted(TaskInfo taskInfo) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885821#action_12885821 ] Amareshwari Sriramadasu commented on MAPREDUCE-1248: bq. -1 contrib tests. Is due to MAPREDUCE-1834 and MAPREDUCE-1375. javac warnings failure needs investigation. > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > Attachments: MAPREDUCE-1248-v1.0.patch > > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885819#action_12885819 ] Amareshwari Sriramadasu commented on MAPREDUCE-1122: bq. -1 contrib tests. The failure is because of MAPREDUCE-1834. > streaming with custom input format does not support the new API > --- > > Key: MAPREDUCE-1122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 > Environment: any OS >Reporter: Keith Jackson >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1122.txt > > > When trying to implement a custom input format for use with streaming, I have > found that streaming does not support the new API, > org.apache.hadoop.mapreduce.InputFormat, but requires the old API, > org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885817#action_12885817 ] Amareshwari Sriramadasu commented on MAPREDUCE-1920: Are you sure that the job is not retired? I strongly feel this should not break, because there are many unit tests calling this api. For example, TestMiniMRDFSSort calls this api and runs successfully on branch 0.21. > Job.getCounters() returns null when using a cluster > --- > > Key: MAPREDUCE-1920 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Aaron Kimball >Priority: Critical > > Calling Job.getCounters() after the job has completed (successfully) returns > null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1375: --- Component/s: contrib/streaming > TestFileArgs fails intermittently > - > > Key: MAPREDUCE-1375 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming, test >Reporter: Amar Kamat >Assignee: Todd Lipcon > Fix For: 0.22.0 > > Attachments: mapreduce-1375.txt, > TEST-org.apache.hadoop.streaming.TestFileArgs.txt > > > TestFileArgs failed once for me with the following error > {code} > expected:<[job.jar > sidefile > tmp > ]> but was:<[]> > sidefile > tmp > ]> but was:<[]> > at > org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107) > at > org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1820) InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885811#action_12885811 ] Hadoop QA commented on MAPREDUCE-1820: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448832/M1820-4.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/console This message is automatically generated. > InputSampler does not create a deep copy of the key object when creating a > sample, which causes problems with some formats like SequenceFile > --- > > Key: MAPREDUCE-1820 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1820 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Alex Kozlov >Assignee: Alex Kozlov > Attachments: M1820-4.patch, MAPREDUCE-1820-2.patch, > MAPREDUCE-1820-3.patch, MAPREDUCE-1820.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > I tried to use the InputSampler on a SequenceFile and found that > it comes up with duplicate keys in the sample. The problem was tracked down > to the fact that the Text object returned from the reader is essentially a > wrapper pointing to a byte array, which changes as the sequence file reader > progresses. There was also a bug in that the reader should be initialized > before the use. The am attaching a patch that fixes both of the issues. > --Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1923) Support arbitrary precision in the distbbp example
Support arbitrary precision in the distbbp example -- Key: MAPREDUCE-1923 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1923 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor The precision obtained by _distbbp_ is limited by Java {{double}} (IEEE 754 64-bit), which has machine epsilon e=2^(-53). When it is used to compute the 10^15 th bit of π, only 26-bit precision with 99.998% confident is obtained. (Will provide the error analysis later.) It would be great if it supports arbitrary precision arithmetics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1922) Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack
Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack --- Key: MAPREDUCE-1922 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: All Reporter: Milind Bhandarkar Assignee: Arun C Murthy As more and more applications use combine file input format (to reduce number of mappers), formats with columns groups implemented as different hdfs files (zebra, hbase), composite input formats (map-side joins), data-locality and rack-locality loses its meaning. (A map task reading only one column group, say 20% of its input, locally and 80% remote still gets flagged as data-local map.) So, my suggestion is to drop these counters, and instead, replace them with HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These counters will make it easier to reason about read-performance for maps. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1758) Building blocks for the herriot test cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885795#action_12885795 ] Konstantin Boudnik commented on MAPREDUCE-1758: --- bq. For generating the patch for external, first the dependent patches needs to be forward ported first. You can apply the needed patches in order to generate one for this JIRA (the dependencies are already listed). Does it seem to be a problem? > Building blocks for the herriot test cases > > > Key: MAPREDUCE-1758 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1758 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan >Priority: Minor > Attachments: bb_patch.txt, bb_patch_1.txt, bb_patch_2.txt > > > There is so much commonality in the test cases that we are writing, so it is > pertinent to create reusable code. The common methods will be added to > herriot framework. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885793#action_12885793 ] Konstantin Boudnik commented on MAPREDUCE-1794: --- In trunk tests are suppose to go to {{src/test/system/test}}. Please refit the patch. Also, please make sure that other new tests you guys were working are placed into that location. > Test the job status of lost task trackers before and after the timeout. > --- > > Key: MAPREDUCE-1794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1794-ydist-security.patch, 1794_lost_tasktracker.patch, > MAPREDUCE-1794.patch > > > This test covers the following scenarios. > 1. Verify the job status whether it is succeeded or not when the task > tracker is lost and alive before the timeout. > 2. Verify the job status and killed attempts of a task whether it is > succeeded or not and killed attempts are matched or not when the task > trackers are lost and it timeout for all the four attempts of a task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885794#action_12885794 ] Konstantin Boudnik commented on MAPREDUCE-1713: --- +1 patch looks good. Were the JIRA for Common opened and fixed yet? > Utilities for system tests specific. > > > Key: MAPREDUCE-1713 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, > systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch > > > 1. A method for restarting the daemon with new configuration. > public static void restartCluster(Hashtable props, String > confFile) throws Exception; > 2. A method for resetting the daemon with default configuration. > public void resetCluster() throws Exception; > 3. A method for waiting until daemon to stop. > public void waitForClusterToStop() throws Exception; > 4. A method for waiting until daemon to start. > public void waitForClusterToStart() throws Exception; > 5. A method for checking the job whether it has started or not. > public boolean isJobStarted(JobID id) throws IOException; > 6. A method for checking the task whether it has started or not. > public boolean isTaskStarted(TaskInfo taskInfo) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1913) [Herriot] Couple of issues occurred while running the tests in a cluster with security enabled.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885792#action_12885792 ] Konstantin Boudnik commented on MAPREDUCE-1913: --- - For trunk, the config key is already defined as in {noformat} src/java/org/apache/hadoop/mapreduce/MRJobConfig.java: public static final String JOB_CANCEL_DELEGATION_TOKEN = "mapreduce.job.complete.cancel.delegation.tokens"; {noformat} - Also, please link if this JIRA to its blockers if any. > [Herriot] Couple of issues occurred while running the tests in a cluster with > security enabled. > --- > > Key: MAPREDUCE-1913 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1913 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1913-ydist-security.patch, MAPREDUCE-1913.patch > > > 1. New configuration directory is not cleaning up after resetting to default > configuration directory in a pushconfig functionality. Because of this > reason, it's giving permission denied problem for a folder, if other user > tried running the tests in the same cluster with pushconfig functionality. I > could see this issue while running the tests on a cluster with security > enabled and different user. > I have added the functionality for above issue and attaching the patch > 2. Throwing IOException and it says token is expired while running the > tests. I could see this issue in a secure cluster. > This issue has been resolved by setting the following attribute in the > configuration. > mapreduce.job.complete.cancel.delegation.tokens=false > > adding/updating this attribute in the push configuration functionality while > creating the new configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test
[ https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885789#action_12885789 ] Konstantin Boudnik commented on MAPREDUCE-1854: --- - This script seems to be a test related thing {{src/test/system/scripts/healthScriptError}} So, shall it be the part of framework scripts? - inconsistent formatting: {noformat} + private void deleteFileOnRemoteHost(String path, String hostname) + { {noformat} and {noformat} + private void verifyTTBlackList(Configuration conf, TTClient client, String + errorMessage) throws IOException{ {noformat} Looks good otherwise. > [herriot] Automate health script system test > > > Key: MAPREDUCE-1854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test > Environment: Herriot framework >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan > Attachments: health_script_5.txt, health_script_7.txt, > health_script_trunk.txt, health_script_y20.txt > > Original Estimate: 120h > Remaining Estimate: 120h > > 1. There are three scenarios, first is induce a error from health script, > verify that task tracker is blacklisted. > 2. Make the health script timeout and verify the task tracker is blacklisted. > 3. Make an error in the health script path and make sure the task tracker > stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1889) [herriot] Ability to restart a single node for pushconfig
[ https://issues.apache.org/jira/browse/MAPREDUCE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885787#action_12885787 ] Konstantin Boudnik commented on MAPREDUCE-1889: --- - technically, you might end up with a situation where the same host is having two different daemons, say JT and TT. Or NN and second DN. I believe in such situation this new method {{+ public RemoteProcess getDaemonProcess(String hostname) { }} will be deterministic. That's why we have {{HadoopDaemonInfo}} class with a role for any daemon. Perhaps, the method should have an extra parameter and return only daemons with a specific role. - JavaDoc hasn't been changed for the changes of the signature {[+ String pushConfig(String localDir) throws IOException;}} Also, will changes of the method signature affect existing tests? And please name the pachs {{something.patch}} instead of .txt or else. > [herriot] Ability to restart a single node for pushconfig > - > > Key: MAPREDUCE-1889 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1889 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan > Attachments: restartDaemon.txt, restartDaemon_1.txt > > > Right now the pushconfig is supported only at a cluster level, this jira will > introduce the functionality to be supported at node level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885784#action_12885784 ] Konstantin Boudnik commented on MAPREDUCE-1730: --- I think we have discussed this on a number of occasions: please do not use Thread.sleep(5); directly. I believe there's a utility method for this. > Automate test scenario for successful/killed jobs' memory is properly removed > from jobtracker after these jobs retire. > -- > > Key: MAPREDUCE-1730 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730 > Project: Hadoop Map/Reduce > Issue Type: Test >Affects Versions: 0.22.0 >Reporter: Iyappan Srinivasan > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1730.patch, TestJobRetired.patch, > TestJobRetired.patch, TestRetiredJobs-ydist-security-patch.txt, > TestRetiredJobs.patch > > > Automate using herriot framework, test scenario for successful/killed jobs' > memory is properly removed from jobtracker after these jobs retire. > This should test when successful and failed jobs are retired, their > jobInProgress object are removed properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1921: Status: Patch Available (was: Open) > IOExceptions should contain the filename of the broken input files > -- > > Key: MAPREDUCE-1921 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Krishna Ramachandran >Assignee: Krishna Ramachandran > Attachments: mapreduce-1921.patch > > > If bzip or other decompression fails, the IOException does not contain the > name of the broken file that caused the exception. > It would be nice if such actions could be avoided in the future by having the > name of the files that are broken spelled > out in the exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"
[ https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885778#action_12885778 ] Konstantin Boudnik commented on MAPREDUCE-1871: --- bq. Code is right now like this. If you check the testcases, values are getting received in the way you mentioned. When using Aspectj, I cannot use ArrayLists or Integer Arrays as return values in JobTracker. So, used int array . I think there's a confusion here. Current code uses an int array as a return type and then you accessing the content of the array by accessing the elements of array i.e. {{int succeededTasksSinceStartBeforeJob = ttAllInfo[1];}} This is bad for at least two reasons: - it is hard to say what [1] or [2] means. But you can fix this by having named constants for the array elements (although, it is still C-like programming style) - you have to keep the order of elements to be synced both on the producer (JT) and consumer (your test) sides. This is ugly and hard to maintain. What I have suggested is this. Instead of int array have class Foo with a number of int fields, a constructor, and a bunch of getters returning int as well. Instead of creating an array now you'll instantiate an object type Foo by passing whatever values you need to its constructor. The following method signature {{public int[] JobTracker.getInfoFromAllClientsForAllTaskType()}} will change to {{public Foo JobTracker.getInfoFromAllClientsForAllTaskType()}}. Your test will be accessing the needed value via particular getters from the object say: {{foo.getSucceededTasksSinceStartBeforeJob()}} (also, the name doesn't make much sense to me... sincestartbefore ?). Hope it makes more sense now. > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > > > Key: MAPREDUCE-1871 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch > > > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > 1) Verification of all the above mentioned fields with the specified TTs. > Total no. of tasks and successful tasks should be equal to the corresponding > no. of tasks specified in TTs logs > 2) Fail a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly. > 3) Kill a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly > 4) Positive Run simultaneous jobs and check if all the fields are populated > with proper values of tasks. Node UI should have correct valiues for all the > fields mentioned above. > 5) Check the fields across one hour window Fields related to hour should be > updated after every hour > 6) Check the fields across one day window fields related to hour should be > updated after every day > 7) Restart a TT and bring it back. UI should retain the fields values. > 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1921: Attachment: mapreduce-1921.patch patch to include filename in i/o exception > IOExceptions should contain the filename of the broken input files > -- > > Key: MAPREDUCE-1921 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Krishna Ramachandran >Assignee: Krishna Ramachandran > Attachments: mapreduce-1921.patch > > > If bzip or other decompression fails, the IOException does not contain the > name of the broken file that caused the exception. > It would be nice if such actions could be avoided in the future by having the > name of the files that are broken spelled > out in the exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files
IOExceptions should contain the filename of the broken input files -- Key: MAPREDUCE-1921 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Krishna Ramachandran Assignee: Krishna Ramachandran If bzip or other decompression fails, the IOException does not contain the name of the broken file that caused the exception. It would be nice if such actions could be avoided in the future by having the name of the files that are broken spelled out in the exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1309: - Attachment: mr-1309-yhadoop-20.10.patch patch for yahoo hadoop 20.10. not to be committed. > I want to change the rumen job trace generator to use a more modular internal > structure, to allow for more input log formats > - > > Key: MAPREDUCE-1309 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tools/rumen >Reporter: Dick King >Assignee: Dick King > Fix For: 0.21.0 > > Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, > demuxer-plus-concatenated-files--2010-01-06.patch, > demuxer-plus-concatenated-files--2010-01-08-b.patch, > demuxer-plus-concatenated-files--2010-01-08-c.patch, > demuxer-plus-concatenated-files--2010-01-08-d.patch, > demuxer-plus-concatenated-files--2010-01-08.patch, > demuxer-plus-concatenated-files--2010-01-11.patch, > mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, > mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, > mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, > mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, > mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, > mr-1309-yhadoop-20.10.patch, rumen-yhadoop-20.patch > > > There are two orthogonal questions to answer when processing a job tracker > log: how will the logs and the xml configuration files be packaged, and in > which release of hadoop map/reduce were the logs generated? The existing > rumen only has a couple of answers to this question. The new engine will > handle three answers to the version question: 0.18, 0.20 and current, and two > answers to the packaging question: separate files with names derived from the > job ID, and concatenated files with a header between sections [used for > easier file interchange]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885726#action_12885726 ] Hadoop QA commented on MAPREDUCE-1906: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448507/MAPREDUCE-1906-0.21.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/console This message is automatically generated. > Lower minimum heartbeat interval for tasktracker > Jobtracker > - > > Key: MAPREDUCE-1906 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.20.2 >Reporter: Scott Carey > Attachments: MAPREDUCE-1906-0.21.patch > > > I get a 0% to 15% performance increase for smaller clusters by making the > heartbeat throttle stop penalizing clusters with less than 300 nodes. > Between 0.19 and 0.20, the default minimum heartbeat interval increased from > 2s to 3s. If a JobTracker is throttled at 100 heartbeats / sec for large > clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats > per second? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1820) InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1820: - Status: Patch Available (was: Open) > InputSampler does not create a deep copy of the key object when creating a > sample, which causes problems with some formats like SequenceFile > --- > > Key: MAPREDUCE-1820 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1820 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Alex Kozlov >Assignee: Alex Kozlov > Attachments: M1820-4.patch, MAPREDUCE-1820-2.patch, > MAPREDUCE-1820-3.patch, MAPREDUCE-1820.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > I tried to use the InputSampler on a SequenceFile and found that > it comes up with duplicate keys in the sample. The problem was tracked down > to the fact that the Text object returned from the reader is essentially a > wrapper pointing to a byte array, which changes as the sequence file reader > progresses. There was also a bug in that the reader should be initialized > before the use. The am attaching a patch that fixes both of the issues. > --Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1820) InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1820: - Attachment: M1820-4.patch Added a unit test. Ideally, this should be in 0.21. > InputSampler does not create a deep copy of the key object when creating a > sample, which causes problems with some formats like SequenceFile > --- > > Key: MAPREDUCE-1820 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1820 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Alex Kozlov >Assignee: Alex Kozlov > Attachments: M1820-4.patch, MAPREDUCE-1820-2.patch, > MAPREDUCE-1820-3.patch, MAPREDUCE-1820.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > I tried to use the InputSampler on a SequenceFile and found that > it comes up with duplicate keys in the sample. The problem was tracked down > to the fact that the Text object returned from the reader is essentially a > wrapper pointing to a byte array, which changes as the sequence file reader > progresses. There was also a bug in that the reader should be initialized > before the use. The am attaching a patch that fixes both of the issues. > --Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
Job.getCounters() returns null when using a cluster --- Key: MAPREDUCE-1920 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Aaron Kimball Priority: Critical Calling Job.getCounters() after the job has completed (successfully) returns null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885663#action_12885663 ] Aaron Kimball commented on MAPREDUCE-1920: -- The new API seems to have an issue w.r.t. counters. Calling Job.getCounters() after the job has completed (successfully) returns null. I can see all the counters there on the JobTracker status web page. They have the correct values. But I can't access them programmatically. So, this is returning null: {code} public class Job extends JobContextImpl implements JobContext { ... public Counters getCounters() throws IOException, InterruptedException { ensureState(JobState.RUNNING); return cluster.getClient().getJobCounters(getJobID()); } } {code} This seems to work fine with the LocalJobRunner. > Job.getCounters() returns null when using a cluster > --- > > Key: MAPREDUCE-1920 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Aaron Kimball >Priority: Critical > > Calling Job.getCounters() after the job has completed (successfully) returns > null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1920: - Affects Version/s: 0.21.0 > Job.getCounters() returns null when using a cluster > --- > > Key: MAPREDUCE-1920 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Aaron Kimball >Priority: Critical > > Calling Job.getCounters() after the job has completed (successfully) returns > null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1919: - Attachment: MAPREDUCE-1919.patch Patch for trunk. > [Herriot] Test for verification of per cache file ref count. > - > > Key: MAPREDUCE-1919 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch > > > It covers the following scenarios. > 1. Run the job with two distributed cache files and verify whether job is > succeeded or not. > 2. Run the job with distributed cache files and remove one cache file from > the DFS when it is localized.verify whether the job is failed or not. > 3. Run the job with two distribute cache files and the size of one file > should be larger than local.cache.size.Verify whether job is succeeded or > not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kumar Thota updated MAPREDUCE-1919: - Attachment: 1919-ydist-security.patch patch for Yahoo dist security branch. > [Herriot] Test for verification of per cache file ref count. > - > > Key: MAPREDUCE-1919 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1919-ydist-security.patch > > > It covers the following scenarios. > 1. Run the job with two distributed cache files and verify whether job is > succeeded or not. > 2. Run the job with distributed cache files and remove one cache file from > the DFS when it is localized.verify whether the job is failed or not. > 3. Run the job with two distribute cache files and the size of one file > should be larger than local.cache.size.Verify whether job is succeeded or > not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.
[Herriot] Test for verification of per cache file ref count. - Key: MAPREDUCE-1919 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919 Project: Hadoop Map/Reduce Issue Type: Task Components: test Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota It covers the following scenarios. 1. Run the job with two distributed cache files and verify whether job is succeeded or not. 2. Run the job with distributed cache files and remove one cache file from the DFS when it is localized.verify whether the job is failed or not. 3. Run the job with two distribute cache files and the size of one file should be larger than local.cache.size.Verify whether job is succeeded or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1918) Add documentation to Rumen
Add documentation to Rumen -- Key: MAPREDUCE-1918 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 Add forrest documentation to Rumen tool. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated MAPREDUCE-1906: --- Status: Patch Available (was: Open) Is it possible to consider this for 0.21? > Lower minimum heartbeat interval for tasktracker > Jobtracker > - > > Key: MAPREDUCE-1906 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.2, 0.20.1 >Reporter: Scott Carey > Attachments: MAPREDUCE-1906-0.21.patch > > > I get a 0% to 15% performance increase for smaller clusters by making the > heartbeat throttle stop penalizing clusters with less than 300 nodes. > Between 0.19 and 0.20, the default minimum heartbeat interval increased from > 2s to 3s. If a JobTracker is throttled at 100 heartbeats / sec for large > clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats > per second? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"
[ https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iyappan Srinivasan updated MAPREDUCE-1871: -- Attachment: MAPREDUCE-1871.patch patch for trunk > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > > > Key: MAPREDUCE-1871 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch > > > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > 1) Verification of all the above mentioned fields with the specified TTs. > Total no. of tasks and successful tasks should be equal to the corresponding > no. of tasks specified in TTs logs > 2) Fail a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly. > 3) Kill a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly > 4) Positive Run simultaneous jobs and check if all the fields are populated > with proper values of tasks. Node UI should have correct valiues for all the > fields mentioned above. > 5) Check the fields across one hour window Fields related to hour should be > updated after every hour > 6) Check the fields across one day window fields related to hour should be > updated after every day > 7) Restart a TT and bring it back. UI should retain the fields values. > 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1917) Semantics of map.input.bytes is not consistent
Semantics of map.input.bytes is not consistent -- Key: MAPREDUCE-1917 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1917 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: All Reporter: Milind Bhandarkar Assignee: Arun C Murthy map.input.bytes counter is updated by RecordReader. For sequence files, it is the size of the raw data, which may be compressed. For text files, it is the size of uncompressed data. For PigStorage, it is always 0. This request is to have a consistent semantics for this counter. Since HDFS_BYTES_READ already shows the raw split size read by the mapper, MAP_INPUT_BYTES should be the size of uncompressed data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885598#action_12885598 ] Hadoop QA commented on MAPREDUCE-1248: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426511/MAPREDUCE-1248-v1.0.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/console This message is automatically generated. > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > Attachments: MAPREDUCE-1248-v1.0.patch > > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"
[ https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885596#action_12885596 ] Iyappan Srinivasan commented on MAPREDUCE-1871: --- testTaskTrackerInfoAll I don't think it is a good idea to wait for arbitrary delay, replace this with polling logic, add functionality if required using aspects. + //Waiting for 20 seconds to make sure that all the completed tasks + //are reflected in their corresponding Tasktracker boxes. + Thread.sleep(2); The same comment holds good in testTaskTrackerInfoKilled and other places where arbitrary delay is used. - Replaced with Tasktracker heartbeat variable used as delay.Changed in all places. countLoop++ is a vestige variable has to be removed. - removed. FailedMapperClass still exists as part of inner class, move it to testjar, or reuse FailedMapper already available in testjar. - Removed. reusing FailedMapper. + public static TTClient getTTClientIns(MRCluster cluster, TaskInfo taskInfo) + throws IOException { Apologies that my previous comment was not clear and you did move the method to TTClient like I mentioned but my intention was different. I do not like static method in TTClient, I would rather have a non static method in MRCluster, the general guideline for building block, add the helper method to a class from which it uses most of the member variables, if you have helper methods as static method in test cases highly unlikely anyone will reuse it, please refrain from adding static method, getTTClientIns should be moved to MRCluster as non-static method, having more static methods gives C flavor of coding, with less emphasis on object oriented means. - Moved to MRCluster. + private int getInfoFromAllClients(String timePeriod, String taskType) + throws Exception { + List ttClients = cluster.getTTClients(); + LOG.info("ttClients.size() :" + ttClients.size()); + + int totalTasksCount = 0; + int totalTasksRanForJob = 0; + for ( int i = 0; i< ttClients.size(); i++) { + TTClient ttClient = (TTClient)ttClients.get(i); + TaskTrackerStatus ttStatus = ttClient.getStatus(); + int totalTasks = remoteJTClient.getTaskTrackerLevelStatistics( + ttStatus, timePeriod, taskType); + totalTasksCount += totalTasks; + } + return totalTasksCount; + } The above code can be refactored to use the new method which gets all the information in single shot, no looping through task trackers required in client side, will reduce the number of rpc calls. - Refactored and made a part of TTClient, with testcase just calling it.. > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > > > Key: MAPREDUCE-1871 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch > > > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > 1) Verification of all the above mentioned fields with the specified TTs. > Total no. of tasks and successful tasks should be equal to the corresponding > no. of tasks specified in TTs logs > 2) Fail a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly. > 3) Kill a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly > 4) Positive Run simultaneous jobs and check if all the fields are populated > with proper values of tasks. Node UI should have correct valiues for all the > fields mentioned above. > 5) Check the fields across one hour window Fields related to hour should be > updated after every hour > 6) Check the fields across one day window fields related to hour should be > updated after every day > 7) Restart a TT and bring it back. UI should retain the fields values. > 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"
[ https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iyappan Srinivasan updated MAPREDUCE-1871: -- Attachment: 1871-ydist-security-patch.txt New patch addressing Balaji's comments > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > > > Key: MAPREDUCE-1871 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, > 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch > > > Create automated test scenario for "Collect information about number of tasks > succeeded / total per time unit for a tasktracker" > 1) Verification of all the above mentioned fields with the specified TTs. > Total no. of tasks and successful tasks should be equal to the corresponding > no. of tasks specified in TTs logs > 2) Fail a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly. > 3) Kill a task on tasktracker. Node UI should update the status of tasks on > that TT accordingly > 4) Positive Run simultaneous jobs and check if all the fields are populated > with proper values of tasks. Node UI should have correct valiues for all the > fields mentioned above. > 5) Check the fields across one hour window Fields related to hour should be > updated after every hour > 6) Check the fields across one day window fields related to hour should be > updated after every day > 7) Restart a TT and bring it back. UI should retain the fields values. > 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885573#action_12885573 ] Hadoop QA commented on MAPREDUCE-1122: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448755/patch-1122.txt against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 92 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/console This message is automatically generated. > streaming with custom input format does not support the new API > --- > > Key: MAPREDUCE-1122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 > Environment: any OS >Reporter: Keith Jackson >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1122.txt > > > When trying to implement a custom input format for use with streaming, I have > found that streaming does not support the new API, > org.apache.hadoop.mapreduce.InputFormat, but requires the old API, > org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-615) need more unit tests for Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-615. --- Resolution: Invalid Currently streaming has more than 20 unit tests. Please open different issues for any specific feature to be tested. > need more unit tests for Hadoop streaming > - > > Key: MAPREDUCE-615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-615 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Runping Qi > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-583) get rid of excessive flushes from PipeMapper/Reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-583. --- Resolution: Duplicate Fixed by HADOOP-3429 > get rid of excessive flushes from PipeMapper/Reducer > > > Key: MAPREDUCE-583 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-583 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Joydeep Sen Sarma > > there's a flush on the buffered output streams in mapper/reducer for every > row of data. > // 2/4 Hadoop to Tool > > if (numExceptions_ == 0) { > if (!this.ignoreKey) { > write(key); > clientOut_.write('\t'); > } > write(value); > if(!this.skipNewline) { > clientOut_.write('\n'); > } > clientOut_.flush(); > } else { > numRecSkipped_++; > } > tried to measure impact of removing this. number of context switches reported > by vmstat shows marked decline. > with flush (10 second intervals): > r b swpd free buff cache si sobibo incs us sy id wa > 4 2784 23140 83352 311464800 4819 32397 1175 13220 59 11 13 > 17 > 1 2784 129724 80704 307569600 4614 27196 1156 14797 49 11 19 > 21 > 4 0784 24160 83440 31748800096 36070 1337 10976 67 11 9 > 12 > 5 0784 155872 84400 315884000 125 44084 1280 11044 68 14 10 > 8 > 2 1784 365128 87048 289203200 119 38472 1317 11610 69 14 10 > 7 > without flush: > 5 0784 24652 56056 321786400 310 29499 1379 7603 76 9 7 > 8 > 5 3784 118456 54568 320999200 3249 33426 1173 6828 63 11 12 > 14 > 0 2784 227628 54820 319856000 7840 30063 1146 8899 60 10 15 > 15 > 3 1784 25608 55048 331351200 3251 36276 1194 7915 60 10 15 > 15 > 1 2784 197324 49968 319457200 4714 35479 1281 8204 62 13 12 > 13 > cs goes down by about 20-30%. but having trouble measuring overall speed > improvement (too many variables due to spec. execution etc. - need better > benchmark). > can't hurt. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil
[ https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1248: --- Status: Patch Available (was: Open) Patch looks good. Submitting for hudson. > Redundant memory copying in StreamKeyValUtil > > > Key: MAPREDUCE-1248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/streaming >Reporter: Ruibang He >Priority: Minor > Attachments: MAPREDUCE-1248-v1.0.patch > > > I found that when MROutputThread collecting the output of Reducer, it calls > StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there > for each line of output. Later these two byte-arrays are passed to variable > key and val. There are twice memory copying here, one is the > System.arraycopy() method, the other is inside key.set() / val.set(). > This causes double times of memory copying for the whole output (may lead to > higher CPU consumption), and frequent temporay object allocation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-622) Streaming should include more unit tests to test more features that it provides.
[ https://issues.apache.org/jira/browse/MAPREDUCE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-622. --- Resolution: Invalid Currently streaming has more than 20 unit tests. Please open different issues for any specific feature to be tested. > Streaming should include more unit tests to test more features that it > provides. > > > Key: MAPREDUCE-622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-622 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Mahadev konar >Priority: Minor > > Currently streaming has only one test that runs with ant test. It should > include more tests to check for the features that streaming provides. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1138) Erroneous output folder handling in streaming testcases
[ https://issues.apache.org/jira/browse/MAPREDUCE-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-1138. Resolution: Duplicate Fixed by MAPREDUCE-1888 > Erroneous output folder handling in streaming testcases > --- > > Key: MAPREDUCE-1138 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1138 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Amar Kamat > > Output folder is shared across testcases. Ideally we should use different > output folder for each testcases, Also the deletion failure is silently > ignored. MAPREDUCE-947 fixed some part of o/p dir cleaning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-591) TestStreamingStderr fails occassionally
[ https://issues.apache.org/jira/browse/MAPREDUCE-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-591. --- Resolution: Cannot Reproduce Haven't seen this failure in recent times. Please reopen if you see the failure again. > TestStreamingStderr fails occassionally > --- > > Key: MAPREDUCE-591 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-591 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Hemanth Yamijala > > TestStreamingStderr fails occassionally with a timeout on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-581) slurpHadoop(Path, FileSystem) ignores result of java.io.InputStream.read(byte[], int, int)
[ https://issues.apache.org/jira/browse/MAPREDUCE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-581. --- Resolution: Invalid The code in question no longer exists. > slurpHadoop(Path, FileSystem) ignores result of > java.io.InputStream.read(byte[], int, int) > -- > > Key: MAPREDUCE-581 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-581 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Nigel Daley > > org.apache.hadoop.streaming.StreamUtil.java line 326 > This method call ignores the return value of java.io.InputStream.read() which > may read fewer bytes than requested. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1916) Usage should be added to HadoopStreaming.java
Usage should be added to HadoopStreaming.java - Key: MAPREDUCE-1916 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1916 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: Amareshwari Sriramadasu Priority: Minor Fix For: 0.22.0 The command: bin/hadoop jar streaming.jar just prints : No Arguments Given! It should print the valid arguments also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1517) streaming should support running on background
[ https://issues.apache.org/jira/browse/MAPREDUCE-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885488#action_12885488 ] Amareshwari Sriramadasu commented on MAPREDUCE-1517: Bochun, Can you update the patch to trunk and upload again? One comment on the patch : * Update the -background option in exitUsage() with proper description and specify it as optional. > streaming should support running on background > -- > > Key: MAPREDUCE-1517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1517 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Reporter: Bochun Bai > Attachments: contrib-streaming-background-2.patch, > contrib-streaming-background.patch, contrib-streaming-background.patch > > > StreamJob submit the job and use a while loop monitor the progress. > I prefer it running on background. > Just add "&" at the end of command is a alternative solution, but it keeps a > java process on client machine. > When submit hundreds jobs at the same time, the client machine is overloaded. > Adding a -background option to StreamJob, tell it only submit and don't > monitor the progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1122: --- Status: Patch Available (was: Open) Hadoop Flags: [Incompatible change] Fix Version/s: 0.22.0 Patch is ready for review. > streaming with custom input format does not support the new API > --- > > Key: MAPREDUCE-1122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 > Environment: any OS >Reporter: Keith Jackson >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1122.txt > > > When trying to implement a custom input format for use with streaming, I have > found that streaming does not support the new API, > org.apache.hadoop.mapreduce.InputFormat, but requires the old API, > org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1122: --- Attachment: patch-1122.txt Attaching a patch which does the following: * Deprectaes all the library classes in streaming such as AutoInputFormat, StreamInputFormat, StreamXmlRecordReader etc. and adds new classes which use new api. * Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes. * Adds StreamJobConfig holding all the configuration properties used in streaming. * Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which extend new api Mapper and Reducer classes. ** Adds a class StreamingProcess which starts streaming process, MR output/error threads and waits for the threads and etc. This functionality is in PipeMapred.java for the old api mapper/reducer; PipeMapper and PipeReducer extend PipeMapred and implement old Mapper/Reducer interfaces. We cannot make StreamingMapper/StreamingReducer extend StreamingProcess because in new api mapper and reducer are not interfaces. So moved this into a separate class so that StreamingMapper/StreamingReducer composes it. ** InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance as a parameter for the constructor. But it does not make sense now because the process handling is served by separate class, StreamingProcess, for new api mapper/reducer. So, did a following Incompatible change (looks clean now): *** Changes OutputReader constructor to take DataInput as parameter, instead of PipeMapRed *** Changes InputWriter constructor to take DataOutput as parameter, instead of PipeMapRed * Moves some utility methods in PipeMapRed to StreamUtil. * Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates static public JobConf createJob(String[] argv); and adds static public Job createStreamingJob(String[] argv) * Refactors setJobConf() into multiple setters to set appropriate mapper/reducer in use. * Adds unit tests for all the usecases described [above|https://issues.apache.org/jira/browse/MAPREDUCE-1122?focusedCommentId=12878515&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878515] > streaming with custom input format does not support the new API > --- > > Key: MAPREDUCE-1122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 > Environment: any OS >Reporter: Keith Jackson >Assignee: Amareshwari Sriramadasu > Attachments: patch-1122.txt > > > When trying to implement a custom input format for use with streaming, I have > found that streaming does not support the new API, > org.apache.hadoop.mapreduce.InputFormat, but requires the old API, > org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885482#action_12885482 ] Amareshwari Sriramadasu commented on MAPREDUCE-1122: For supporting new api in streaming, the implementation involves two major tasks: # Setting job configuration for the streaming job: set appropriate mapper and reducer depending on the arguments passed. Summarizing the above requirements table : ** The old api mapper, PipeMapper, is used as mapper for the job only if mapper is command and a) old api input format is passed or b) #reduces=0 and old api output format is passed or c) #reduces !=0 and old api partitioner is passed. ** Similarly the old api reducer, PipeReducer, is used as reducer for the job only if reducer is command and old output format is passed. # Implementation of new api streaming mapper, reducer and etc. > streaming with custom input format does not support the new API > --- > > Key: MAPREDUCE-1122 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 > Environment: any OS >Reporter: Keith Jackson >Assignee: Amareshwari Sriramadasu > > When trying to implement a custom input format for use with streaming, I have > found that streaming does not support the new API, > org.apache.hadoop.mapreduce.InputFormat, but requires the old API, > org.apache.hadoop.mapred.InputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.