[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-06 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885843#action_12885843
 ] 

Aaron Kimball commented on MAPREDUCE-1920:
--

I agree that this shouldn't break :) And yet, I configured MapReduce as a 
straight-up pseudo-distributed instance. I didn't set anything other than 
mapred.job.tracker and fs.default.name in the conf files. 

My application calls job.getCounters() immediately upon return from 
job.waitForCompletion(). It may be possible that jobs are retiring 
instantaneously / "very quickly" in a manner that is racing with my 
application? Is there a guaranteed window of time for which a job won't be 
retired?

I feel like there should be a guaranteed minimum; maybe this is in time, maybe 
as long as the original reference to a Job object on the client is live? 
(Easier said than done in the latter case -- maybe the Job could be configured 
in such a way as to reserve the right to retrieve its Counters or other 
post-execution data at least once after waitForCompletion() returns?)


> Job.getCounters() returns null when using a cluster
> ---
>
> Key: MAPREDUCE-1920
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Aaron Kimball
>Priority: Critical
>
> Calling Job.getCounters() after the job has completed (successfully) returns 
> null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test

2010-07-06 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885841#action_12885841
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1854:
---

how about creating src/test/system/fw and src/test/system/tc directories and 
have the scripts be present in two different directories. 

> [herriot] Automate health script system test
> 
>
> Key: MAPREDUCE-1854
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
> Environment: Herriot framework
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: health_script_5.txt, health_script_7.txt, 
> health_script_trunk.txt, health_script_y20.txt
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> 1. There are three scenarios, first is induce a error from health script, 
> verify that task tracker is blacklisted. 
> 2. Make the health script timeout and verify the task tracker is blacklisted. 
> 3. Make an error in the health script path and make sure the task tracker 
> stays healthy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1741) Automate the test scenario of job related files are moved from history directory to done directory

2010-07-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1741:
--

Attachment: MAPREDUCE-1741.patch

patch for trunk

> Automate the test scenario of  job related files are moved from history 
> directory to done directory
> ---
>
> Key: MAPREDUCE-1741
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1741
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1741.patch, MAPREDUCE-1741.patch, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation.patch, TestJobHistoryLocation.patch, 
> TestJobHistoryLocation.patch
>
>
> Job related files are moved from history directory to done directory, when
> 1) Job succeeds
> 2) Job is killed
> 3) When 100 files are put in the done directory
> 4) When multiple jobs are completed at the same time, some successful, some 
> failed.
> Also, two files, conf.xml and job files should be present in the done 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1741) Automate the test scenario of job related files are moved from history directory to done directory

2010-07-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1741:
--

Attachment: TestJobHistoryLocation-ydist-security-patch.txt

Remove the assert statements in private method and check them in the test 
method block.

- These are now called in the test block.

Also, a variable is added called retiredJobInterval, which is taken from 
mapred.jobtracker.retirejob.check and used in cases of wait for job to finish.

> Automate the test scenario of  job related files are moved from history 
> directory to done directory
> ---
>
> Key: MAPREDUCE-1741
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1741
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1741.patch, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation-ydist-security-patch.txt, 
> TestJobHistoryLocation.patch, TestJobHistoryLocation.patch, 
> TestJobHistoryLocation.patch
>
>
> Job related files are moved from history directory to done directory, when
> 1) Job succeeds
> 2) Job is killed
> 3) When 100 files are put in the done directory
> 4) When multiple jobs are completed at the same time, some successful, some 
> failed.
> Also, two files, conf.xml and job files should be present in the done 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.

2010-07-06 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885824#action_12885824
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1713:
--

Cos, we have already opened a JIRA(HADOOP-6772) for common and it has been 
committed to trunk also. 

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
> systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885821#action_12885821
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1248:


bq. -1 contrib tests.
Is due to MAPREDUCE-1834 and MAPREDUCE-1375.

javac warnings failure needs investigation.

> Redundant memory copying in StreamKeyValUtil
> 
>
> Key: MAPREDUCE-1248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: Ruibang He
>Priority: Minor
> Attachments: MAPREDUCE-1248-v1.0.patch
>
>
> I found that when MROutputThread collecting the output of  Reducer, it calls 
> StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there 
> for each line of output. Later these two byte-arrays are passed to variable 
> key and val. There are twice memory copying here, one is the 
> System.arraycopy() method, the other is inside key.set() / val.set().
> This causes double times of memory copying for the whole output (may lead to 
> higher CPU consumption), and frequent temporay object allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885819#action_12885819
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1122:


bq. -1 contrib tests.
The failure is because of MAPREDUCE-1834.

> streaming with custom input format does not support the new API
> ---
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.1
> Environment: any OS
>Reporter: Keith Jackson
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have 
> found that streaming does not support the new API, 
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
> org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885817#action_12885817
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1920:


Are you sure that the job is not retired? I strongly feel this should not 
break, because there are many unit tests calling this api. For example, 
TestMiniMRDFSSort calls this api and runs successfully on branch 0.21.

> Job.getCounters() returns null when using a cluster
> ---
>
> Key: MAPREDUCE-1920
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Aaron Kimball
>Priority: Critical
>
> Calling Job.getCounters() after the job has completed (successfully) returns 
> null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1375:
---

Component/s: contrib/streaming

> TestFileArgs fails intermittently
> -
>
> Key: MAPREDUCE-1375
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming, test
>Reporter: Amar Kamat
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: mapreduce-1375.txt, 
> TEST-org.apache.hadoop.streaming.TestFileArgs.txt
>
>
> TestFileArgs failed once for me with the following error
> {code}
> expected:<[job.jar
> sidefile
> tmp
> ]> but was:<[]>
> sidefile
> tmp
> ]> but was:<[]>
> at 
> org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
> at 
> org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1820) InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile

2010-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885811#action_12885811
 ] 

Hadoop QA commented on MAPREDUCE-1820:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448832/M1820-4.patch
  against trunk revision 960808.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/593/console

This message is automatically generated.

> InputSampler does not create a deep copy of the key object when creating a 
> sample, which causes problems with some formats like SequenceFile
> ---
>
> Key: MAPREDUCE-1820
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1820
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
> Attachments: M1820-4.patch, MAPREDUCE-1820-2.patch, 
> MAPREDUCE-1820-3.patch, MAPREDUCE-1820.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I tried to use the InputSampler on a SequenceFile and found that 
> it comes up with duplicate keys in the sample.  The problem was tracked down 
> to the fact that the Text object returned from the reader is essentially a 
> wrapper pointing to a byte array, which changes as the sequence file reader 
> progresses.  There was also a bug in that the reader should be initialized 
> before the use.  The am attaching a patch that fixes both of the issues.  
> --Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1923) Support arbitrary precision in the distbbp example

2010-07-06 Thread Tsz Wo (Nicholas), SZE (JIRA)
Support arbitrary precision in the distbbp example
--

 Key: MAPREDUCE-1923
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1923
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: examples
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor


The precision obtained by _distbbp_ is limited by Java {{double}} (IEEE 754 
64-bit), which has machine epsilon e=2^(-53).  When it is used to compute the 
10^15 th bit of π, only 26-bit precision with 99.998% confident is 
obtained.  (Will provide the error analysis later.)   It would be great if it 
supports arbitrary precision arithmetics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1922) Counters for data-local and rack-local tasks should be replaced by bytes-read-local and bytes-read-rack

2010-07-06 Thread Milind Bhandarkar (JIRA)
Counters for data-local and rack-local tasks should be replaced by 
bytes-read-local and bytes-read-rack
---

 Key: MAPREDUCE-1922
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1922
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Arun C Murthy


As more and more applications use combine file input format (to reduce number 
of mappers), formats with columns groups implemented as different hdfs files 
(zebra, hbase), composite input formats (map-side joins), data-locality and 
rack-locality loses its meaning. (A map task reading only one column group, say 
20% of its input, locally and 80% remote still gets flagged as data-local map.)

So, my suggestion is to drop these counters, and instead, replace them with 
HDFS_LOCAL_BYTES_READ, HDFS_RACK_BYTES_READ, and HDFS_TOTAL_BYTES_READ. These 
counters will make it easier to reason about read-performance for maps.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1758) Building blocks for the herriot test cases

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885795#action_12885795
 ] 

Konstantin Boudnik commented on MAPREDUCE-1758:
---

bq. For generating the patch for external, first the dependent patches needs to 
be forward ported first. 
You can apply the needed patches in order to generate one for this JIRA (the 
dependencies are already listed). Does it seem to be a problem?

> Building blocks for the  herriot test cases 
> 
>
> Key: MAPREDUCE-1758
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1758
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
>Priority: Minor
> Attachments: bb_patch.txt, bb_patch_1.txt, bb_patch_2.txt
>
>
> There is so much commonality in the test cases that we are writing, so it is 
> pertinent to create reusable code. The common methods will be added to 
> herriot framework. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout.

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885793#action_12885793
 ] 

Konstantin Boudnik commented on MAPREDUCE-1794:
---

In trunk tests are suppose to go to {{src/test/system/test}}. Please refit the 
patch. Also, please make sure that other new tests you guys were working are 
placed into that location.

> Test the job status of lost task trackers before and after the timeout.
> ---
>
> Key: MAPREDUCE-1794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1794-ydist-security.patch, 1794_lost_tasktracker.patch, 
> MAPREDUCE-1794.patch
>
>
> This test covers the following scenarios.
> 1. Verify the job status whether it is succeeded or not when  the task 
> tracker is lost and alive before the timeout.
> 2. Verify the job status and killed attempts of a task whether it is 
> succeeded or not and killed attempts are matched or not  when the task 
> trackers are lost and it timeout for all the four attempts of a task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885794#action_12885794
 ] 

Konstantin Boudnik commented on MAPREDUCE-1713:
---

+1 patch looks good. Were the JIRA for Common opened and fixed yet?

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
> systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1913) [Herriot] Couple of issues occurred while running the tests in a cluster with security enabled.

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885792#action_12885792
 ] 

Konstantin Boudnik commented on MAPREDUCE-1913:
---

- For trunk, the config key is already defined as in
{noformat}
src/java/org/apache/hadoop/mapreduce/MRJobConfig.java:  public static final 
String JOB_CANCEL_DELEGATION_TOKEN = 
"mapreduce.job.complete.cancel.delegation.tokens";
{noformat}
- Also, please link if this JIRA to its blockers if any.

> [Herriot] Couple of issues occurred while running the tests in a cluster with 
> security enabled.
> ---
>
> Key: MAPREDUCE-1913
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1913
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1913-ydist-security.patch, MAPREDUCE-1913.patch
>
>
> 1. New configuration directory is not cleaning up after resetting to default 
> configuration directory in a pushconfig functionality. Because of this 
> reason, it's giving  permission denied problem for a folder, if  other user 
> tried running the tests in the same cluster with pushconfig functionality. I 
> could see this issue while running the tests on a cluster with security 
> enabled and different user.
> I have added the functionality for above issue and attaching the patch
> 2.  Throwing IOException and it says token is expired while running  the 
> tests. I could see this issue in a secure cluster.
> This issue has been resolved by setting the following attribute in the 
> configuration. 
> mapreduce.job.complete.cancel.delegation.tokens=false
>  
> adding/updating this attribute in the push configuration functionality while 
> creating the new configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885789#action_12885789
 ] 

Konstantin Boudnik commented on MAPREDUCE-1854:
---

- This script seems to be a test related thing 
{{src/test/system/scripts/healthScriptError}} So, shall it be the part of 
framework scripts?
- inconsistent formatting:
{noformat}
+  private void deleteFileOnRemoteHost(String path, String hostname) 
+  {
{noformat}
and
{noformat}
+  private void verifyTTBlackList(Configuration conf, TTClient client, String 
+  errorMessage) throws IOException{   
{noformat}

Looks good otherwise.

> [herriot] Automate health script system test
> 
>
> Key: MAPREDUCE-1854
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
> Environment: Herriot framework
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: health_script_5.txt, health_script_7.txt, 
> health_script_trunk.txt, health_script_y20.txt
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> 1. There are three scenarios, first is induce a error from health script, 
> verify that task tracker is blacklisted. 
> 2. Make the health script timeout and verify the task tracker is blacklisted. 
> 3. Make an error in the health script path and make sure the task tracker 
> stays healthy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1889) [herriot] Ability to restart a single node for pushconfig

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885787#action_12885787
 ] 

Konstantin Boudnik commented on MAPREDUCE-1889:
---

- technically, you might end up with a situation where the same host is having 
two different daemons, say JT and TT. Or NN and second DN. I believe in such 
situation this new method {{+  public RemoteProcess getDaemonProcess(String 
hostname) { }} will be deterministic. That's why we have {{HadoopDaemonInfo}} 
class with a role for any daemon. Perhaps, the method should have an extra 
parameter and return only daemons with a specific role.
- JavaDoc hasn't been changed for the changes of the signature
{[+  String pushConfig(String localDir) throws IOException;}}
Also, will changes of the method signature affect existing tests?

And please name the pachs {{something.patch}} instead of .txt or else.

> [herriot] Ability to restart a single node for pushconfig
> -
>
> Key: MAPREDUCE-1889
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1889
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: test
>Reporter: Balaji Rajagopalan
>Assignee: Balaji Rajagopalan
> Attachments: restartDaemon.txt, restartDaemon_1.txt
>
>
> Right now the pushconfig is supported only at a cluster level, this jira will 
> introduce the functionality to be supported at node level. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885784#action_12885784
 ] 

Konstantin Boudnik commented on MAPREDUCE-1730:
---

I think we have discussed this on a number of occasions: please do not use 
Thread.sleep(5); directly. I believe there's a utility method for this.

> Automate test scenario for successful/killed jobs' memory is properly removed 
> from jobtracker after these jobs retire.
> --
>
> Key: MAPREDUCE-1730
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1730.patch, TestJobRetired.patch, 
> TestJobRetired.patch, TestRetiredJobs-ydist-security-patch.txt, 
> TestRetiredJobs.patch
>
>
> Automate using herriot framework,  test scenario for successful/killed jobs' 
> memory is properly removed from jobtracker after these jobs retire.
> This should test when successful and failed jobs are retired,  their 
> jobInProgress object are removed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files

2010-07-06 Thread Krishna Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Ramachandran updated MAPREDUCE-1921:


Status: Patch Available  (was: Open)

> IOExceptions should contain the filename of the broken input files
> --
>
> Key: MAPREDUCE-1921
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Krishna Ramachandran
>Assignee: Krishna Ramachandran
> Attachments: mapreduce-1921.patch
>
>
> If bzip or other decompression fails, the IOException  does not contain the 
> name of the broken file that caused the exception.
> It would be nice if such actions could be avoided in the future by having the 
> name of the files that are broken spelled
> out in the exception. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-06 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885778#action_12885778
 ] 

Konstantin Boudnik commented on MAPREDUCE-1871:
---

bq. Code is right now like this. If you check the testcases, values are getting 
received in the way you mentioned. When using Aspectj, I cannot use ArrayLists 
or Integer Arrays as return values in JobTracker. So, used int array .

I think there's a confusion here. Current code uses an int array as a return 
type and then you accessing the content of the array by accessing the elements 
of array i.e. {{int succeededTasksSinceStartBeforeJob = ttAllInfo[1];}} This is 
bad for at least two reasons:
- it is hard to say what [1] or [2] means. But you can fix this by having named 
constants for the array elements (although, it is still C-like programming 
style)
- you have to keep the order of elements to be synced both on the producer (JT) 
and consumer (your test) sides. This is ugly and hard to maintain.

What I have suggested is this. Instead of int array have class Foo with a 
number of int fields, a constructor, and a bunch of getters returning int as 
well. Instead of creating an array now you'll instantiate an object type Foo by 
passing whatever values you need to its constructor. The following method 
signature {{public int[] JobTracker.getInfoFromAllClientsForAllTaskType()}} 
will change to {{public Foo JobTracker.getInfoFromAllClientsForAllTaskType()}}. 
Your test will be accessing the needed value via particular getters from the 
object say: {{foo.getSucceededTasksSinceStartBeforeJob()}} (also, the name 
doesn't make much sense to me... sincestartbefore ?).

Hope it makes more sense now.


> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files

2010-07-06 Thread Krishna Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Ramachandran updated MAPREDUCE-1921:


Attachment: mapreduce-1921.patch

patch to include filename in i/o exception

> IOExceptions should contain the filename of the broken input files
> --
>
> Key: MAPREDUCE-1921
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Krishna Ramachandran
>Assignee: Krishna Ramachandran
> Attachments: mapreduce-1921.patch
>
>
> If bzip or other decompression fails, the IOException  does not contain the 
> name of the broken file that caused the exception.
> It would be nice if such actions could be avoided in the future by having the 
> name of the files that are broken spelled
> out in the exception. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files

2010-07-06 Thread Krishna Ramachandran (JIRA)
IOExceptions should contain the filename of the broken input files
--

 Key: MAPREDUCE-1921
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Krishna Ramachandran
Assignee: Krishna Ramachandran


If bzip or other decompression fails, the IOException  does not contain the 
name of the broken file that caused the exception.

It would be nice if such actions could be avoided in the future by having the 
name of the files that are broken spelled
out in the exception. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-07-06 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1309:
-

Attachment: mr-1309-yhadoop-20.10.patch

patch for yahoo hadoop 20.10. not to be committed.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tools/rumen
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, 
> mr-1309-yhadoop-20.10.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

2010-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885726#action_12885726
 ] 

Hadoop QA commented on MAPREDUCE-1906:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12448507/MAPREDUCE-1906-0.21.patch
  against trunk revision 960808.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/console

This message is automatically generated.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -
>
> Key: MAPREDUCE-1906
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.1, 0.20.2
>Reporter: Scott Carey
> Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the 
> heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
> 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
> clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
> per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1820) InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile

2010-07-06 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1820:
-

Status: Patch Available  (was: Open)

> InputSampler does not create a deep copy of the key object when creating a 
> sample, which causes problems with some formats like SequenceFile
> ---
>
> Key: MAPREDUCE-1820
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1820
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
> Attachments: M1820-4.patch, MAPREDUCE-1820-2.patch, 
> MAPREDUCE-1820-3.patch, MAPREDUCE-1820.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I tried to use the InputSampler on a SequenceFile and found that 
> it comes up with duplicate keys in the sample.  The problem was tracked down 
> to the fact that the Text object returned from the reader is essentially a 
> wrapper pointing to a byte array, which changes as the sequence file reader 
> progresses.  There was also a bug in that the reader should be initialized 
> before the use.  The am attaching a patch that fixes both of the issues.  
> --Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1820) InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile

2010-07-06 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1820:
-

Attachment: M1820-4.patch

Added a unit test. Ideally, this should be in 0.21.

> InputSampler does not create a deep copy of the key object when creating a 
> sample, which causes problems with some formats like SequenceFile
> ---
>
> Key: MAPREDUCE-1820
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1820
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Alex Kozlov
>Assignee: Alex Kozlov
> Attachments: M1820-4.patch, MAPREDUCE-1820-2.patch, 
> MAPREDUCE-1820-3.patch, MAPREDUCE-1820.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I tried to use the InputSampler on a SequenceFile and found that 
> it comes up with duplicate keys in the sample.  The problem was tracked down 
> to the fact that the Text object returned from the reader is essentially a 
> wrapper pointing to a byte array, which changes as the sequence file reader 
> progresses.  There was also a bug in that the reader should be initialized 
> before the use.  The am attaching a patch that fixes both of the issues.  
> --Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-06 Thread Aaron Kimball (JIRA)
Job.getCounters() returns null when using a cluster
---

 Key: MAPREDUCE-1920
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Priority: Critical


Calling Job.getCounters() after the job has completed (successfully) returns 
null.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-06 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885663#action_12885663
 ] 

Aaron Kimball commented on MAPREDUCE-1920:
--

The new API seems to have an issue w.r.t. counters. Calling Job.getCounters() 
after the job has completed (successfully) returns null. I can see all the 
counters there on the JobTracker status web page. They have the correct values. 
But I can't access them programmatically.

So, this is returning null:

{code}
public class Job extends JobContextImpl implements JobContext {

 ...

  public Counters getCounters()
  throws IOException, InterruptedException {
ensureState(JobState.RUNNING);
return cluster.getClient().getJobCounters(getJobID());
  }

}
{code}


This seems to work fine with the LocalJobRunner.

> Job.getCounters() returns null when using a cluster
> ---
>
> Key: MAPREDUCE-1920
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Aaron Kimball
>Priority: Critical
>
> Calling Job.getCounters() after the job has completed (successfully) returns 
> null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-07-06 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1920:
-

Affects Version/s: 0.21.0

> Job.getCounters() returns null when using a cluster
> ---
>
> Key: MAPREDUCE-1920
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Aaron Kimball
>Priority: Critical
>
> Calling Job.getCounters() after the job has completed (successfully) returns 
> null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-06 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1919:
-

Attachment: MAPREDUCE-1919.patch

Patch for trunk.

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-06 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1919:
-

Attachment: 1919-ydist-security.patch

patch for Yahoo dist security branch.

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-06 Thread Vinay Kumar Thota (JIRA)
[Herriot] Test for verification of per cache file ref  count.
-

 Key: MAPREDUCE-1919
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota


It covers the following scenarios.

1. Run the job with two distributed cache files and verify whether job is 
succeeded or not.
2.  Run the job with distributed cache files and remove one cache file from the 
DFS when it is localized.verify whether the job is failed or not.
3.  Run the job with two distribute cache files and the size of  one file 
should be larger than local.cache.size.Verify  whether job is succeeded or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1918) Add documentation to Rumen

2010-07-06 Thread Amar Kamat (JIRA)
Add documentation to Rumen
--

 Key: MAPREDUCE-1918
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0


Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

2010-07-06 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Patch Available  (was: Open)

Is it possible to consider this for 0.21?  

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -
>
> Key: MAPREDUCE-1906
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.2, 0.20.1
>Reporter: Scott Carey
> Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the 
> heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
> 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
> clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
> per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

Attachment: MAPREDUCE-1871.patch

patch for trunk

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1917) Semantics of map.input.bytes is not consistent

2010-07-06 Thread Milind Bhandarkar (JIRA)
Semantics of map.input.bytes is not consistent
--

 Key: MAPREDUCE-1917
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1917
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Arun C Murthy


map.input.bytes counter is updated by RecordReader. For sequence files, it is 
the size of the raw data, which may be compressed. For text files, it is the 
size of uncompressed data. For PigStorage, it is always 0. This request is to 
have a consistent semantics for this counter. Since HDFS_BYTES_READ already 
shows the raw split size read by the mapper, MAP_INPUT_BYTES should be the size 
of uncompressed data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil

2010-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885598#action_12885598
 ] 

Hadoop QA commented on MAPREDUCE-1248:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426511/MAPREDUCE-1248-v1.0.patch
  against trunk revision 960808.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/592/console

This message is automatically generated.

> Redundant memory copying in StreamKeyValUtil
> 
>
> Key: MAPREDUCE-1248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: Ruibang He
>Priority: Minor
> Attachments: MAPREDUCE-1248-v1.0.patch
>
>
> I found that when MROutputThread collecting the output of  Reducer, it calls 
> StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there 
> for each line of output. Later these two byte-arrays are passed to variable 
> key and val. There are twice memory copying here, one is the 
> System.arraycopy() method, the other is inside key.set() / val.set().
> This causes double times of memory copying for the whole output (may lead to 
> higher CPU consumption), and frequent temporay object allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-06 Thread Iyappan Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885596#action_12885596
 ] 

Iyappan Srinivasan commented on MAPREDUCE-1871:
---

testTaskTrackerInfoAll I don't think it is a good idea to wait for arbitrary 
delay, replace this with polling logic, add functionality if required using 
aspects. 

+ //Waiting for 20 seconds to make sure that all the completed tasks 
+ //are reflected in their corresponding Tasktracker boxes. 
+ Thread.sleep(2);

The same comment holds good in testTaskTrackerInfoKilled and other places where 
arbitrary delay is used. 

- Replaced with Tasktracker heartbeat variable used as delay.Changed in all 
places.

countLoop++ is a vestige variable has to be removed. 

- removed.

FailedMapperClass still exists as part of inner class, move it to testjar, or 
reuse FailedMapper already available in testjar. 

- Removed. reusing FailedMapper.

+ public static TTClient getTTClientIns(MRCluster cluster, TaskInfo taskInfo) 
+ throws IOException {

Apologies that my previous comment was not clear and you did move the method to 
TTClient like I mentioned but my intention was different. I do not like static 
method in TTClient, I would rather have a non static method in MRCluster, the 
general guideline for building block, add the helper method to a class from 
which it uses most of the member variables, if you have helper methods as 
static method in test cases highly unlikely anyone will reuse it, please 
refrain from adding static method, getTTClientIns should be moved to MRCluster 
as non-static method, having more static methods gives C flavor of coding, with 
less emphasis on object oriented means. 

- Moved to MRCluster.

+ private int getInfoFromAllClients(String timePeriod, String taskType) 
+ throws Exception {
+ List ttClients = cluster.getTTClients();
+ LOG.info("ttClients.size() :" + ttClients.size());
+
+ int totalTasksCount = 0;
+ int totalTasksRanForJob = 0;
+ for ( int i = 0; i< ttClients.size(); i++) { + TTClient ttClient = 
(TTClient)ttClients.get(i); + TaskTrackerStatus ttStatus = 
ttClient.getStatus(); + int totalTasks = 
remoteJTClient.getTaskTrackerLevelStatistics( + ttStatus, timePeriod, 
taskType); + totalTasksCount += totalTasks; + }
+ return totalTasksCount;
+ }

The above code can be refactored to use the new method which gets all the 
information in single shot, no looping through task trackers required in client 
side, will reduce the number of rpc calls. 

- Refactored and made a part of TTClient, with testcase just calling it..


> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-06 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

Attachment: 1871-ydist-security-patch.txt

New patch addressing Balaji's comments

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API

2010-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885573#action_12885573
 ] 

Hadoop QA commented on MAPREDUCE-1122:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448755/patch-1122.txt
  against trunk revision 960808.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 92 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/console

This message is automatically generated.

> streaming with custom input format does not support the new API
> ---
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.1
> Environment: any OS
>Reporter: Keith Jackson
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have 
> found that streaming does not support the new API, 
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
> org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-615) need more unit tests for Hadoop streaming

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-615.
---

Resolution: Invalid

Currently streaming has more than 20 unit tests. Please open different issues 
for any specific feature to be tested. 

> need more unit tests for Hadoop streaming
> -
>
> Key: MAPREDUCE-615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-615
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Runping Qi
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-583) get rid of excessive flushes from PipeMapper/Reducer

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-583.
---

Resolution: Duplicate

Fixed by HADOOP-3429

> get rid of excessive flushes from PipeMapper/Reducer
> 
>
> Key: MAPREDUCE-583
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-583
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Joydeep Sen Sarma
>
> there's a flush on the buffered output streams in mapper/reducer for every 
> row of data.
>   // 2/4 Hadoop to Tool   
> 
>   if (numExceptions_ == 0) {
> if (!this.ignoreKey) {
>   write(key);
>   clientOut_.write('\t');
> }
> write(value);
> if(!this.skipNewline) {
> clientOut_.write('\n');
> }
> clientOut_.flush();
>   } else {
> numRecSkipped_++;
>   }
> tried to measure impact of removing this. number of context switches reported 
> by vmstat shows marked decline. 
> with flush (10 second intervals):
>  r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
>  4  2784  23140  83352 311464800  4819 32397 1175 13220 59 11 13 
> 17
>  1  2784 129724  80704 307569600  4614 27196 1156 14797 49 11 19 
> 21
>  4  0784  24160  83440 31748800096 36070 1337 10976 67 11  9 
> 12
>  5  0784 155872  84400 315884000   125 44084 1280 11044 68 14 10  
> 8
>  2  1784 365128  87048 289203200   119 38472 1317 11610 69 14 10  
> 7
> without flush:
>  5  0784  24652  56056 321786400   310 29499 1379  7603 76  9  7  
> 8
>  5  3784 118456  54568 320999200  3249 33426 1173  6828 63 11 12 
> 14
>  0  2784 227628  54820 319856000  7840 30063 1146  8899 60 10 15 
> 15
>  3  1784  25608  55048 331351200  3251 36276 1194  7915 60 10 15 
> 15
>  1  2784 197324  49968 319457200  4714 35479 1281  8204 62 13 12 
> 13
> cs goes down by about 20-30%. but having trouble measuring overall speed 
> improvement (too many variables due to spec. execution etc. - need better 
> benchmark).
> can't hurt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1248) Redundant memory copying in StreamKeyValUtil

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1248:
---

Status: Patch Available  (was: Open)

Patch looks good.
Submitting for hudson.

> Redundant memory copying in StreamKeyValUtil
> 
>
> Key: MAPREDUCE-1248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: Ruibang He
>Priority: Minor
> Attachments: MAPREDUCE-1248-v1.0.patch
>
>
> I found that when MROutputThread collecting the output of  Reducer, it calls 
> StreamKeyValUtil.splitKeyVal() and two local byte-arrays are allocated there 
> for each line of output. Later these two byte-arrays are passed to variable 
> key and val. There are twice memory copying here, one is the 
> System.arraycopy() method, the other is inside key.set() / val.set().
> This causes double times of memory copying for the whole output (may lead to 
> higher CPU consumption), and frequent temporay object allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-622) Streaming should include more unit tests to test more features that it provides.

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-622.
---

Resolution: Invalid

Currently streaming has more than 20 unit tests. Please open different issues 
for any specific feature to be tested.

> Streaming should include more unit tests to test more features that it 
> provides.
> 
>
> Key: MAPREDUCE-622
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-622
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Mahadev konar
>Priority: Minor
>
> Currently streaming has only one test that runs with ant test. It should 
> include more tests to check for the features that streaming provides.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1138) Erroneous output folder handling in streaming testcases

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-1138.


Resolution: Duplicate

Fixed by MAPREDUCE-1888

> Erroneous output folder handling in streaming testcases
> ---
>
> Key: MAPREDUCE-1138
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1138
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Amar Kamat
>
> Output folder is shared across testcases. Ideally we should use different 
> output folder for each testcases, Also the deletion failure is silently 
> ignored. MAPREDUCE-947 fixed some part of o/p dir cleaning. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-591) TestStreamingStderr fails occassionally

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-591.
---

Resolution: Cannot Reproduce

Haven't seen this failure in recent times. Please reopen if you see the failure 
again.

> TestStreamingStderr fails occassionally
> ---
>
> Key: MAPREDUCE-591
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-591
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Hemanth Yamijala
>
> TestStreamingStderr fails occassionally with a timeout on trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-581) slurpHadoop(Path, FileSystem) ignores result of java.io.InputStream.read(byte[], int, int)

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu resolved MAPREDUCE-581.
---

Resolution: Invalid

The code in question no longer exists.

> slurpHadoop(Path, FileSystem) ignores result of 
> java.io.InputStream.read(byte[], int, int)
> --
>
> Key: MAPREDUCE-581
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-581
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Reporter: Nigel Daley
>
> org.apache.hadoop.streaming.StreamUtil.java line 326
> This method call ignores the return value of java.io.InputStream.read() which 
> may read fewer bytes than requested.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1916) Usage should be added to HadoopStreaming.java

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)
Usage should be added to HadoopStreaming.java
-

 Key: MAPREDUCE-1916
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1916
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Priority: Minor
 Fix For: 0.22.0


The command:
bin/hadoop jar streaming.jar
just prints :
No Arguments Given!

It should print the valid arguments also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1517) streaming should support running on background

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885488#action_12885488
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1517:


Bochun, Can you update the patch to trunk and upload again?

One comment on the patch :
* Update the -background option in exitUsage() with proper description and 
specify it as optional.



> streaming should support running on background
> --
>
> Key: MAPREDUCE-1517
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1517
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Reporter: Bochun Bai
> Attachments: contrib-streaming-background-2.patch, 
> contrib-streaming-background.patch, contrib-streaming-background.patch
>
>
> StreamJob submit the job and use a while loop monitor the progress.
> I prefer it running on background.
> Just add "&" at the end of command is a alternative solution, but it keeps a 
> java process on client machine.
> When submit hundreds jobs at the same time, the client machine is overloaded.
> Adding a -background option to StreamJob, tell it only submit and don't 
> monitor the progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1122:
---

   Status: Patch Available  (was: Open)
 Hadoop Flags: [Incompatible change]
Fix Version/s: 0.22.0

Patch is ready for review.

> streaming with custom input format does not support the new API
> ---
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.1
> Environment: any OS
>Reporter: Keith Jackson
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have 
> found that streaming does not support the new API, 
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
> org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1122) streaming with custom input format does not support the new API

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1122:
---

Attachment: patch-1122.txt

Attaching a patch which does the following:
* Deprectaes all the library classes in streaming such as AutoInputFormat, 
StreamInputFormat, StreamXmlRecordReader etc. and adds new classes which use 
new api. 
* Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes.
* Adds StreamJobConfig holding all the configuration properties used in 
streaming.
* Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which 
extend new api Mapper and Reducer classes.
  ** Adds a class StreamingProcess which starts streaming process, MR 
output/error threads and waits for the threads and etc. This functionality is 
in PipeMapred.java for the old api mapper/reducer; PipeMapper and PipeReducer 
extend PipeMapred and implement old Mapper/Reducer interfaces. We cannot make 
StreamingMapper/StreamingReducer extend StreamingProcess because in new api 
mapper and reducer are not interfaces. So moved this into a separate class so 
that StreamingMapper/StreamingReducer composes it.
  ** InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance 
as a parameter for the constructor. But it does not make sense now because the 
process handling is served by separate class, StreamingProcess, for new api 
mapper/reducer. So, did a following Incompatible change (looks clean now):
  *** Changes OutputReader constructor to take DataInput as parameter, instead 
of PipeMapRed
  *** Changes InputWriter constructor to take DataOutput as parameter, instead 
of PipeMapRed
* Moves some utility methods in PipeMapRed to StreamUtil.
* Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates 
static public JobConf createJob(String[] argv); and adds static public Job 
createStreamingJob(String[] argv)
* Refactors setJobConf() into multiple setters to set appropriate 
mapper/reducer in use.
* Adds unit tests for all the usecases described 
[above|https://issues.apache.org/jira/browse/MAPREDUCE-1122?focusedCommentId=12878515&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12878515]


> streaming with custom input format does not support the new API
> ---
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.1
> Environment: any OS
>Reporter: Keith Jackson
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1122.txt
>
>
> When trying to implement a custom input format for use with streaming, I have 
> found that streaming does not support the new API, 
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
> org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1122) streaming with custom input format does not support the new API

2010-07-06 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885482#action_12885482
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1122:


For supporting new api in streaming, the implementation involves two major 
tasks:
# Setting job configuration for the streaming job: set appropriate mapper and 
reducer depending on the arguments passed. Summarizing the above requirements 
table :
 ** The old api mapper, PipeMapper, is used as mapper for the job only if 
mapper is command and 
a) old api input format is passed  or
b) #reduces=0 and old api output format is passed or 
c) #reduces !=0 and old api partitioner is passed.
 ** Similarly the old api reducer, PipeReducer, is used as reducer for the job 
only if reducer is command and old output format is passed.
# Implementation of new api streaming mapper, reducer and etc.


> streaming with custom input format does not support the new API
> ---
>
> Key: MAPREDUCE-1122
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1122
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.1
> Environment: any OS
>Reporter: Keith Jackson
>Assignee: Amareshwari Sriramadasu
>
> When trying to implement a custom input format for use with streaming, I have 
> found that streaming does not support the new API, 
> org.apache.hadoop.mapreduce.InputFormat, but requires the old API, 
> org.apache.hadoop.mapred.InputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.