[jira] [Commented] (MAPREDUCE-4810) Add admin command options for ApplicationMaster

2012-12-11 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529569#comment-13529569
 ] 

Thomas Graves commented on MAPREDUCE-4810:
--

That would be great! Upload a patch when you have something working and we will 
review. If you have any questions let me know. 

> Add admin command options for ApplicationMaster
> ---
>
> Key: MAPREDUCE-4810
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4810
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 2.0.2-alpha, 0.23.4
>Reporter: Jason Lowe
>Priority: Minor
>
> It would be nice if the MR ApplicationMaster had the notion of admin options 
> in addition to the existing user options much like we have for map and reduce 
> tasks, e.g.: mapreduce.admin.map.child.java.opts vs. mapreduce.map.java.opts. 
>  This allows site-wide configuration options for MR AMs but still allows a 
> user to easily override the heap size of the AM without worrying about 
> dropping other admin-specified options.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4856) TestJobOutputCommitter uses same directory as TestJobCleanup

2012-12-11 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-4856:
--

Description: This can cause problems if one of the tests fails to delete.  
(was: This can cause problems if the tests are run concurrently.)

> TestJobOutputCommitter uses same directory as TestJobCleanup
> 
>
> Key: MAPREDUCE-4856
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4856
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4856.patch
>
>
> This can cause problems if one of the tests fails to delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5) Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException

2012-12-11 Thread David Parks (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529555#comment-13529555
 ] 

David Parks commented on MAPREDUCE-5:
-

I'm encountering this same situation on AWS's mapreduce instance using v1.0.3.

> Shuffle's getMapOutput() fails with EofException, followed by 
> IllegalStateException
> ---
>
> Key: MAPREDUCE-5
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 
> 4150 (x64) 10 node cluster
>Reporter: George Porter
> Attachments: temp.rar
>
>
> During the shuffle phase, I'm seeing a large sequence of the following 
> actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker: 
> getMapOutput(attempt_200905181452_0002_m_10_0,0) failed : 
> org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410 
> getMapOutput(attempt_200905181452_0002_m_10_0,0) failed : 
> org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: 
> Committed
> The map phase completes with 100%, and then the reduce phase crawls along 
> with the above errors in each of the TaskTracker logs.  None of the 
> tasktrackers get lost.  When I run non-data jobs like the 'pi' test from the 
> example jar, everything works fine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4810) Add admin command options for ApplicationMaster

2012-12-11 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529554#comment-13529554
 ] 

Jerry Chen commented on MAPREDUCE-4810:
---

I can contribute for this improvement if nobody is doing it.

> Add admin command options for ApplicationMaster
> ---
>
> Key: MAPREDUCE-4810
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4810
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 2.0.2-alpha, 0.23.4
>Reporter: Jason Lowe
>Priority: Minor
>
> It would be nice if the MR ApplicationMaster had the notion of admin options 
> in addition to the existing user options much like we have for map and reduce 
> tasks, e.g.: mapreduce.admin.map.child.java.opts vs. mapreduce.map.java.opts. 
>  This allows site-wide configuration options for MR AMs but still allows a 
> user to easily override the heap size of the AM without worrying about 
> dropping other admin-specified options.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4871) AM uses mapreduce.jobtracker.split.metainfo.maxsize but mapred-default has mapreduce.job.split.metainfo.maxsize

2012-12-11 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-4871:
-

 Summary: AM uses mapreduce.jobtracker.split.metainfo.maxsize but 
mapred-default has mapreduce.job.split.metainfo.maxsize
 Key: MAPREDUCE-4871
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4871
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha, 0.23.3
Reporter: Jason Lowe


When the user needs to configure a larger split metainfo file size, 
mapred-default.xml points to the mapreduce.job.split.metainfo.maxsize property. 
 However the ApplicationMaster actually uses the 
mapreduce.*jobtracker*.split.metainfo.maxsize property when determining the 
largest allowed size.  This leads to much confusion on the part of end-users 
trying to increase the allowed limit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4396) Make LocalJobRunner work with private distributed cache

2012-12-11 Thread Yu Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529477#comment-13529477
 ] 

Yu Gao commented on MAPREDUCE-4396:
---

Hmm, the patch here is the same as that in HADOOP-8734 for LocalJobRunner, so 
either way is ok.

> Make LocalJobRunner work with private distributed cache
> ---
>
> Key: MAPREDUCE-4396
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4396
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.0.3
>Reporter: Luke Lu
>Assignee: Yu Gao
>Priority: Minor
> Attachments: mapreduce-4396-branch-1.patch, test-afterpatch.result, 
> test-beforepatch.result, test-patch.result
>
>
> Some LocalJobRunner related unit tests fails if user directory permission 
> and/or umask is too restrictive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4396) Make LocalJobRunner work with private distributed cache

2012-12-11 Thread Yu Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529474#comment-13529474
 ] 

Yu Gao commented on MAPREDUCE-4396:
---

@[~eyang] Trunk does not have this issue.

> Make LocalJobRunner work with private distributed cache
> ---
>
> Key: MAPREDUCE-4396
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4396
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.0.3
>Reporter: Luke Lu
>Assignee: Yu Gao
>Priority: Minor
> Attachments: mapreduce-4396-branch-1.patch, test-afterpatch.result, 
> test-beforepatch.result, test-patch.result
>
>
> Some LocalJobRunner related unit tests fails if user directory permission 
> and/or umask is too restrictive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4866) ShuffleRamManager is limited to 2Gb of memory - we should increase that

2012-12-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529388#comment-13529388
 ] 

Chris Douglas commented on MAPREDUCE-4866:
--

The patch is against the 1.x line; the 0.20 line is not actively developed. The 
changes to make these longs were in MAPREDUCE-1182.

Pulling a >2GB segment into memory is... optimistic. It's a very specialized 
job that would benefit from relaxing this constraint.

> ShuffleRamManager is limited to 2Gb of memory - we should increase that
> ---
>
> Key: MAPREDUCE-4866
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4866
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 0.20.2
> Environment: linux, 64bits cpu, more than 2Gb of memory for each 
> reducer tasks
>Reporter: Varene Olivier
>Priority: Minor
>  Labels: patch
> Attachments: M4866-0.patch, MAPREDUCE-4866-INCOMPLETE.patch
>
>
> Inside the org.apache.hadoop.mapred.ReduceTask.java, the *ShuffleRamManager* 
> is limited to allocate up to 2Gb of memory during the shuffle phase. 
> We should be able to allocate more, to take advantage of the full memory we 
> have on servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4870) TestMRJobsWithHistoryService causes infinite loop if it fails

2012-12-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529363#comment-13529363
 ] 

Xuan Gong commented on MAPREDUCE-4870:
--

The code looks good for me. It definitely can get out of the infinite loop. I 
checked the RMAppImpl, it does not contain the transition from status failure 
to status finish. So, this line Assert.assertEquals(RMAppState.FINISHED, 
mrCluster.getResourceManager().getRMContext().getRMApps().get(appID).getState())
 will always get wrong in this case. 
Looks like that the failure is because we can not launch the container :
2012-12-11 12:02:14,938 INFO  [ContainersLauncher #0] 
nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(175)) - launchContainer: [bash, 
/Users/xgong/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService-localDir-nm-0_0/usercache/root/appcache/application_1355256124849_0001/container_1355256124849_0001_01_01/default_container_executor.sh]
It will returen non-zero exit code 127. 
Then it will cause the following AM and application failure 

> TestMRJobsWithHistoryService causes infinite loop if it fails
> -
>
> Key: MAPREDUCE-4870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, trunk-win
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-4870.1.patch
>
>
> {{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and 
> sleep after job execution, checking for the application state to reach 
> {{RMAppState#FINISHED}}.  If the job fails, then the application could be in 
> a different terminal state, and this polling loop will never terminate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4860) Inconsistent synchronization in mapreduce.security.token.DelegationTokenRenewal

2012-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529297#comment-13529297
 ] 

Hadoop QA commented on MAPREDUCE-4860:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12560439/mr-4860.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3119//console

This message is automatically generated.

> Inconsistent synchronization in 
> mapreduce.security.token.DelegationTokenRenewal
> ---
>
> Key: MAPREDUCE-4860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.1.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: mr-4860.patch, mr-4860.patch, mr-4860.patch
>
>
> mapreduce.security.token.DelegationTokenRenewal synchronizes on 
> removeDelegationToken, but fails to synchronize on addToken, and renewing 
> tokens in run().
> This inconsistency is exposed by frequent failures of 
> TestDelegationTokenRenewal:
> {noformat}
> Error Message
> renew wasn't called as many times as expected expected:<4> but was:<5>
> Stacktrace
> junit.framework.AssertionFailedError: renew wasn't called as many times as 
> expected expected:<4> but was:<5>
>   at 
> org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.testDTRenewal(TestDelegationTokenRenewal.java:317)
>   at 
> org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.testDTRenewalAfterClose(TestDelegationTokenRenewal.java:338)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4860) Inconsistent synchronization in mapreduce.security.token.DelegationTokenRenewal

2012-12-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-4860:


Attachment: mr-4860.patch

Thanks for pointing that out Alejandro. Verified that all operations on 
synchronized collections are synchronized on the returned object.

Updated the patch to reflect that.

> Inconsistent synchronization in 
> mapreduce.security.token.DelegationTokenRenewal
> ---
>
> Key: MAPREDUCE-4860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.1.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: mr-4860.patch, mr-4860.patch, mr-4860.patch
>
>
> mapreduce.security.token.DelegationTokenRenewal synchronizes on 
> removeDelegationToken, but fails to synchronize on addToken, and renewing 
> tokens in run().
> This inconsistency is exposed by frequent failures of 
> TestDelegationTokenRenewal:
> {noformat}
> Error Message
> renew wasn't called as many times as expected expected:<4> but was:<5>
> Stacktrace
> junit.framework.AssertionFailedError: renew wasn't called as many times as 
> expected expected:<4> but was:<5>
>   at 
> org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.testDTRenewal(TestDelegationTokenRenewal.java:317)
>   at 
> org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.testDTRenewalAfterClose(TestDelegationTokenRenewal.java:338)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-12-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529263#comment-13529263
 ] 

Alejandro Abdelnur edited comment on MAPREDUCE-4549 at 12/11/12 7:51 PM:
-

Committed to branch-2 now, I'll wait till FRI noon to see if there are 
objections for committing this to trunk as well.

  was (Author: tucu00):
I'll be commit to branch-2 now. I'll wait till FRI noon to see if there are 
objections for committing this to trunk as well.
  
> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-4549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.2-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt
>
>
> I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
> backwards compatibility with 1.0 in distribtued cache entries.  instead of 
> changing the behavior of the distributed cache to more closely match 1.0 
> behavior I want to just change the exception to a warning message informing 
> the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-12-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529263#comment-13529263
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4549:
---

I'll be commit to branch-2 now. I'll wait till FRI noon to see if there are 
objections for committing this to trunk as well.

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-4549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.2-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt
>
>
> I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
> backwards compatibility with 1.0 in distribtued cache entries.  instead of 
> changing the behavior of the distributed cache to more closely match 1.0 
> behavior I want to just change the exception to a warning message informing 
> the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4860) Inconsistent synchronization in mapreduce.security.token.DelegationTokenRenewal

2012-12-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529258#comment-13529258
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4860:
---

delegationTokens is already a synchronized collection, except for the 
synchronization around the iterator() to avoid changes while iterating, I don't 
see why we need the others.

> Inconsistent synchronization in 
> mapreduce.security.token.DelegationTokenRenewal
> ---
>
> Key: MAPREDUCE-4860
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4860
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.1.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: mr-4860.patch, mr-4860.patch
>
>
> mapreduce.security.token.DelegationTokenRenewal synchronizes on 
> removeDelegationToken, but fails to synchronize on addToken, and renewing 
> tokens in run().
> This inconsistency is exposed by frequent failures of 
> TestDelegationTokenRenewal:
> {noformat}
> Error Message
> renew wasn't called as many times as expected expected:<4> but was:<5>
> Stacktrace
> junit.framework.AssertionFailedError: renew wasn't called as many times as 
> expected expected:<4> but was:<5>
>   at 
> org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.testDTRenewal(TestDelegationTokenRenewal.java:317)
>   at 
> org.apache.hadoop.mapreduce.security.token.TestDelegationTokenRenewal.testDTRenewalAfterClose(TestDelegationTokenRenewal.java:338)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4861) Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal

2012-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529257#comment-13529257
 ] 

Hudson commented on MAPREDUCE-4861:
---

Integrated in Hadoop-trunk-Commit #3112 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3112/])
MAPREDUCE-4861. Cleanup: Remove unused 
mapreduce.security.token.DelegationTokenRenewal. (kkambatl via tucu) (Revision 
1420345)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420345
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/token/DelegationTokenRenewal.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/security/token/TestDelegationTokenRenewal.java


> Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal
> --
>
> Key: MAPREDUCE-4861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.0.3-alpha
>
> Attachments: mr-4861.patch, mr-4861.patch, mr-4861.patch
>
>
> mapreduce.security.token.DelegationTokenRenewal doesn't seem to be used in 
> branch-2 at all. grep on trunk yields no results, not even ReflectionUtils 
> related suff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529250#comment-13529250
 ] 

Hadoop QA commented on MAPREDUCE-4549:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12560334/MAPREDUCE-4549-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3118//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3118//console

This message is automatically generated.

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-4549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.2-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt
>
>
> I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
> backwards compatibility with 1.0 in distribtued cache entries.  instead of 
> changing the behavior of the distributed cache to more closely match 1.0 
> behavior I want to just change the exception to a warning message informing 
> the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4861) Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal

2012-12-11 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4861:
--

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Karthik. Committed to trunk and branch-2.

> Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal
> --
>
> Key: MAPREDUCE-4861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.0.3-alpha
>
> Attachments: mr-4861.patch, mr-4861.patch, mr-4861.patch
>
>
> mapreduce.security.token.DelegationTokenRenewal doesn't seem to be used in 
> branch-2 at all. grep on trunk yields no results, not even ReflectionUtils 
> related suff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4861) Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal

2012-12-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529236#comment-13529236
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4861:
---

+1

> Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal
> --
>
> Key: MAPREDUCE-4861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: mr-4861.patch, mr-4861.patch, mr-4861.patch
>
>
> mapreduce.security.token.DelegationTokenRenewal doesn't seem to be used in 
> branch-2 at all. grep on trunk yields no results, not even ReflectionUtils 
> related suff.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-12-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529234#comment-13529234
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4549:
---

+1 pending jenkins. 


> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-4549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.2-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt
>
>
> I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
> backwards compatibility with 1.0 in distribtued cache entries.  instead of 
> changing the behavior of the distributed cache to more closely match 1.0 
> behavior I want to just change the exception to a warning message informing 
> the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-12-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529233#comment-13529233
 ] 

Sandy Ryza commented on MAPREDUCE-4549:
---

Reopened and uploaded a patch for trunk

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-4549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.2-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt
>
>
> I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
> backwards compatibility with 1.0 in distribtued cache entries.  instead of 
> changing the behavior of the distributed cache to more closely match 1.0 
> behavior I want to just change the exception to a warning message informing 
> the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-12-11 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-4549:
--

Status: Patch Available  (was: Reopened)

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-4549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha, 0.23.3
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Fix For: 2.0.3-alpha, 0.23.5
>
> Attachments: MAPREDUCE-4549-trunk.patch, MR-4549-branch-0.23.txt
>
>
> I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
> backwards compatibility with 1.0 in distribtued cache entries.  instead of 
> changing the behavior of the distributed cache to more closely match 1.0 
> behavior I want to just change the exception to a warning message informing 
> the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4586) Reduce large output segments directly from remote host

2012-12-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529217#comment-13529217
 ] 

Luke Lu commented on MAPREDUCE-4586:


Separate namespace for these data (hence separate namenodes in HDFS2) would 
help as well. 

> Reduce large output segments directly from remote host
> --
>
> Key: MAPREDUCE-4586
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4586
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: task
>Reporter: Chris Douglas
>
> For some jobs, copying large output segments to the local host is 
> inefficient. The reduce can construct iterators on remote hosts, provided the 
> stream is restartable. This should reduce task latency by amortizing the cost 
> of the data transfer over the entire reduce, rather than paying it upfront.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4870) TestMRJobsWithHistoryService causes infinite loop if it fails

2012-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529200#comment-13529200
 ] 

Hadoop QA commented on MAPREDUCE-4870:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12560413/MAPREDUCE-4870.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3117//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3117//console

This message is automatically generated.

> TestMRJobsWithHistoryService causes infinite loop if it fails
> -
>
> Key: MAPREDUCE-4870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, trunk-win
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-4870.1.patch
>
>
> {{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and 
> sleep after job execution, checking for the application state to reach 
> {{RMAppState#FINISHED}}.  If the job fails, then the application could be in 
> a different terminal state, and this polling loop will never terminate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4870) TestMRJobsWithHistoryService causes infinite loop if it fails

2012-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-4870:
-

Status: Patch Available  (was: Open)

> TestMRJobsWithHistoryService causes infinite loop if it fails
> -
>
> Key: MAPREDUCE-4870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, trunk-win
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-4870.1.patch
>
>
> {{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and 
> sleep after job execution, checking for the application state to reach 
> {{RMAppState#FINISHED}}.  If the job fails, then the application could be in 
> a different terminal state, and this polling loop will never terminate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4870) TestMRJobsWithHistoryService causes infinite loop if it fails

2012-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-4870:
-

Attachment: MAPREDUCE-4870.1.patch

I noticed this problem on Windows, where the test currently fails.  The 
attached patch changes the test to poll for any of the terminal states: 
{{RMAppState.FINISHED}}, {{RMAppState.FAILED}}, or {{RMAppState.KILLED}}.  
Those are all of the terminal states, right?  After the loop, I added an 
assertion that it was {{RMAppState.FINISHED}}.  For extra safety, I also 
aborted the polling loop after a maximum of 60 seconds.

The test still fails on Windows on the new assertion.  We'll need to fix that 
later, but for right now, I just want to fix the infinite loop, which tends to 
ruin entire project test runs.

This patch can commit to trunk and then merge to branch-trunk-win.


> TestMRJobsWithHistoryService causes infinite loop if it fails
> -
>
> Key: MAPREDUCE-4870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, trunk-win
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: MAPREDUCE-4870.1.patch
>
>
> {{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and 
> sleep after job execution, checking for the application state to reach 
> {{RMAppState#FINISHED}}.  If the job fails, then the application could be in 
> a different terminal state, and this polling loop will never terminate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4870) TestMRJobsWithHistoryService causes infinite loop if it fails

2012-12-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-4870:


 Summary: TestMRJobsWithHistoryService causes infinite loop if it 
fails
 Key: MAPREDUCE-4870
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4870
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, trunk-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{TestMRJobsWithHistoryService#testJobHistoryData}} has a periodic poll and 
sleep after job execution, checking for the application state to reach 
{{RMAppState#FINISHED}}.  If the job fails, then the application could be in a 
different terminal state, and this polling loop will never terminate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4810) Add admin command options for ApplicationMaster

2012-12-11 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529009#comment-13529009
 ] 

Thomas Graves commented on MAPREDUCE-4810:
--

UberAM is another example you might want different heap size. You may also just 
want to specify other java options and not have to copy all the default admin 
ones again too.

> Add admin command options for ApplicationMaster
> ---
>
> Key: MAPREDUCE-4810
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4810
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 2.0.2-alpha, 0.23.4
>Reporter: Jason Lowe
>Priority: Minor
>
> It would be nice if the MR ApplicationMaster had the notion of admin options 
> in addition to the existing user options much like we have for map and reduce 
> tasks, e.g.: mapreduce.admin.map.child.java.opts vs. mapreduce.map.java.opts. 
>  This allows site-wide configuration options for MR AMs but still allows a 
> user to easily override the heap size of the AM without worrying about 
> dropping other admin-specified options.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4049) plugin for generic shuffle service

2012-12-11 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-4049.
---

Resolution: Fixed

Chris thanks for taking a look at these JIRAs.

Withdrawing my -1. I'm OK with doing necessary tweaks in follow up JIRAs.

I'd suggest we merge MAPREDUCE-4809 and MAPREDUCE-4807 to trunk. Then we can do 
the incremental work for MAPREDUCE-4812 and MAPREDUCE-4808 in trunk as well.

Arun, is that OK with you?

> plugin for generic shuffle service
> --
>
> Key: MAPREDUCE-4049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: performance, task, tasktracker
>Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>Reporter: Avner BenHanoch
>Assignee: Avner BenHanoch
>  Labels: merge, plugin, rdma, shuffle
> Fix For: 3.0.0
>
> Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
> mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
> mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & 
> ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on 
> shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
> or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
> RDMA shuffle, the plugin can also utilize a suitable merge approach during 
> the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
> dependency of NodeManager with a specific version of mapreduce shuffle 
> (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
> from Auburn University with others, 
> [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins 
> (currently, based on 1.0 branch)
> # I am providing link for downloading UDA - Mellanox's open source plugin 
> that implements generic shuffle service using RDMA and levitated merge.  
> Note: At this phase, the code is in C++ through JNI and you should consider 
> it as beta only.  Still, it can serve anyone that wants to implement or 
> contribute to levitated merge. (Please be advised that levitated merge is 
> mostly suit in very fast networks) - 
> [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.

2012-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528984#comment-13528984
 ] 

Hudson commented on MAPREDUCE-4703:
---

Integrated in Hadoop-Mapreduce-trunk #1282 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1282/])
MAPREDUCE-4703. Add the ability to start the MiniMRClientCluster using the 
configurations used before it is being stopped. (ahmed.radwan via tucu) 
(Revision 1419618)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1419618
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRClientCluster.java


> Add the ability to start the MiniMRClientCluster using the configurations 
> used before it is being stopped.
> --
>
> Key: MAPREDUCE-4703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, test
>Affects Versions: 1.2.0, 2.0.3-alpha
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4703_branch-1.patch, 
> MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703_branch-1_rev3.patch, 
> MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch
>
>
> The objective here is to enable starting back the cluster, after being 
> stopped, using the same configurations/port numbers used before stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4866) ShuffleRamManager is limited to 2Gb of memory - we should increase that

2012-12-11 Thread Varene Olivier (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528953#comment-13528953
 ] 

Varene Olivier commented on MAPREDUCE-4866:
---

Sorry Chris,

but I think your patch is incomplete, only changing maxSize is not enough,

please have a look to the proposed patch MAPREDUCE-4866-INCOMPLETE.patch with 
all the changes proposed
the patch is incomplete because of array size limitation to around 
Integer.MAX_VALUE

ReduceTask.java > Class MapOutputCopier > method ShuffleInMemory
line 1507:byte[] shuffleData = new byte[mapOutputLength];

we need to change this allocation with some kind of big array (whose size can 
be a long) 

regards
Olivier

> ShuffleRamManager is limited to 2Gb of memory - we should increase that
> ---
>
> Key: MAPREDUCE-4866
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4866
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 0.20.2
> Environment: linux, 64bits cpu, more than 2Gb of memory for each 
> reducer tasks
>Reporter: Varene Olivier
>Priority: Minor
>  Labels: patch
> Attachments: M4866-0.patch, MAPREDUCE-4866-INCOMPLETE.patch
>
>
> Inside the org.apache.hadoop.mapred.ReduceTask.java, the *ShuffleRamManager* 
> is limited to allocate up to 2Gb of memory during the shuffle phase. 
> We should be able to allocate more, to take advantage of the full memory we 
> have on servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.

2012-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528934#comment-13528934
 ] 

Hudson commented on MAPREDUCE-4703:
---

Integrated in Hadoop-Hdfs-trunk #1251 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1251/])
MAPREDUCE-4703. Add the ability to start the MiniMRClientCluster using the 
configurations used before it is being stopped. (ahmed.radwan via tucu) 
(Revision 1419618)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1419618
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRClientCluster.java


> Add the ability to start the MiniMRClientCluster using the configurations 
> used before it is being stopped.
> --
>
> Key: MAPREDUCE-4703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, test
>Affects Versions: 1.2.0, 2.0.3-alpha
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4703_branch-1.patch, 
> MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703_branch-1_rev3.patch, 
> MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch
>
>
> The objective here is to enable starting back the cluster, after being 
> stopped, using the same configurations/port numbers used before stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4866) ShuffleRamManager is limited to 2Gb of memory - we should increase that

2012-12-11 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-4866:
-

Attachment: M4866-0.patch

> ShuffleRamManager is limited to 2Gb of memory - we should increase that
> ---
>
> Key: MAPREDUCE-4866
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4866
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 0.20.2
> Environment: linux, 64bits cpu, more than 2Gb of memory for each 
> reducer tasks
>Reporter: Varene Olivier
>Priority: Minor
>  Labels: patch
> Attachments: M4866-0.patch, MAPREDUCE-4866-INCOMPLETE.patch
>
>
> Inside the org.apache.hadoop.mapred.ReduceTask.java, the *ShuffleRamManager* 
> is limited to allocate up to 2Gb of memory during the shuffle phase. 
> We should be able to allocate more, to take advantage of the full memory we 
> have on servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-12-11 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528914#comment-13528914
 ] 

Laxman commented on MAPREDUCE-4049:
---

With this patch, we are able to meet our goals (plugin custom shuffle algorithm 
"Network Levitate Merge") without any issues. Thanks Avner for keeping the 
patch small and crisp.


> plugin for generic shuffle service
> --
>
> Key: MAPREDUCE-4049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: performance, task, tasktracker
>Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>Reporter: Avner BenHanoch
>Assignee: Avner BenHanoch
>  Labels: merge, plugin, rdma, shuffle
> Fix For: 3.0.0
>
> Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
> mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
> mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & 
> ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on 
> shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
> or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
> RDMA shuffle, the plugin can also utilize a suitable merge approach during 
> the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
> dependency of NodeManager with a specific version of mapreduce shuffle 
> (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
> from Auburn University with others, 
> [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins 
> (currently, based on 1.0 branch)
> # I am providing link for downloading UDA - Mellanox's open source plugin 
> that implements generic shuffle service using RDMA and levitated merge.  
> Note: At this phase, the code is in C++ through JNI and you should consider 
> it as beta only.  Still, it can serve anyone that wants to implement or 
> contribute to levitated merge. (Please be advised that levitated merge is 
> mostly suit in very fast networks) - 
> [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-12-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528913#comment-13528913
 ] 

Chris Douglas commented on MAPREDUCE-4049:
--

This is clearly related to MAPREDUCE-2454. Many of the final patches are even 
in similar styles after thorough review. But the 2454 branch was supposed to 
make it easier to review [~masokan]'s extensive refactoring, not cover every 
API change to MapReduce supporting extensions. These interfaces aren't 
permanent, there will be more issues proposing hooks and callbacks after these, 
and the more expansive patches will get revised in different branches. It's 
routine; everyone please chill.

I read the rest of the MAPREDUCE-2454 subtasks and reviewed the outstanding and 
committed patches. It looks like a good first pass for adding hooks to Tasks, 
ready to be committed and merged to trunk presently. Alejandro: OK with you if 
we close this as resolved?

> plugin for generic shuffle service
> --
>
> Key: MAPREDUCE-4049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: performance, task, tasktracker
>Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>Reporter: Avner BenHanoch
>Assignee: Avner BenHanoch
>  Labels: merge, plugin, rdma, shuffle
> Fix For: 3.0.0
>
> Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
> mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
> mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & 
> ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on 
> shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
> or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
> RDMA shuffle, the plugin can also utilize a suitable merge approach during 
> the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
> dependency of NodeManager with a specific version of mapreduce shuffle 
> (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
> from Auburn University with others, 
> [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins 
> (currently, based on 1.0 branch)
> # I am providing link for downloading UDA - Mellanox's open source plugin 
> that implements generic shuffle service using RDMA and levitated merge.  
> Note: At this phase, the code is in C++ through JNI and you should consider 
> it as beta only.  Still, it can serve anyone that wants to implement or 
> contribute to levitated merge. (Please be advised that levitated merge is 
> mostly suit in very fast networks) - 
> [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4868) Allow multiple iteration for map

2012-12-11 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528891#comment-13528891
 ] 

Radim Kolar commented on MAPREDUCE-4868:


Mapper will get some chunk of data assigned. You can process that data with 
anything you wish then write it to context.write or anywhere else you need to.

> Allow multiple iteration for map
> 
>
> Key: MAPREDUCE-4868
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Jerry Chen
> Fix For: 3.0.0, 2.0.3-alpha
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the Mapper class allows advanced users to override "public void 
> run(Context context)" method for more control over the execution of the 
> mapper, while Context interface limit the operations over the data which is 
> the foundation of "more control".
> One of use cases is that when I am considering a hive optimziation problem, I 
> want to go two passes over the input data instead of using a another job or 
> task ( which may slower the whole process). Each pass do the same thing but 
> with a different parameters.
> This is a new paradigm of Map Reduce usage and can be archived easily by 
> extend Context interface a little with the more control over the data such as 
> reset the input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.

2012-12-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528869#comment-13528869
 ] 

Hudson commented on MAPREDUCE-4703:
---

Integrated in Hadoop-Yarn-trunk #62 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/62/])
MAPREDUCE-4703. Add the ability to start the MiniMRClientCluster using the 
configurations used before it is being stopped. (ahmed.radwan via tucu) 
(Revision 1419618)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1419618
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientCluster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRClientCluster.java


> Add the ability to start the MiniMRClientCluster using the configurations 
> used before it is being stopped.
> --
>
> Key: MAPREDUCE-4703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, test
>Affects Versions: 1.2.0, 2.0.3-alpha
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4703_branch-1.patch, 
> MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703_branch-1_rev3.patch, 
> MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch
>
>
> The objective here is to enable starting back the cluster, after being 
> stopped, using the same configurations/port numbers used before stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4794) DefaultSpeculator generates error messages on normal shutdown

2012-12-11 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528825#comment-13528825
 ] 

Jerry Chen commented on MAPREDUCE-4794:
---

I reproduced this problem and I will help to fix this.

> DefaultSpeculator generates error messages on normal shutdown
> -
>
> Key: MAPREDUCE-4794
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4794
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>
> DefaultSpeculator can log the following error message on a normal shutdown of 
> the ApplicationMaster:
> {noformat}
> 2012-11-13 01:35:31,841 ERROR [DefaultSpeculator background processing] 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: Background 
> thread returning, interrupted : java.lang.InterruptedException
> {noformat}
> and in addition for some reason it logs the corresponding backtrace to stdout.
> Like the errors fixed in MAPREDUCE-4741, this error message in the syslog and 
> backtrace on stdout can be confusing to users as to whether the job really 
> succeeded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4847) Command Parsing in Hadoop Streaming

2012-12-11 Thread Peng Lei (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528823#comment-13528823
 ] 

Peng Lei commented on MAPREDUCE-4847:
-

Hi Asokan,
Thank you. I'm new to the community, who are the hadoop-streaming contributors? 
don't you? How to assign this issue to hadoop-streaming owners.

-Peng


> Command Parsing in Hadoop Streaming
> ---
>
> Key: MAPREDUCE-4847
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4847
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Reporter: Peng Lei
>  Labels: features
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Hadoop streaming parse the mapper and reducer commands by itself, this is not 
> a good choice, when I write a complex mapper/reducer script inline, such as 
> 'perl -ne ...', it don't work.
> An alternative way is to send the command to the shell, simply create new 
> process(sh -c "command_and_args"), this not also simplize the streaming code, 
> but also improve its capability!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2841) Task level native optimization

2012-12-11 Thread Cheng Haowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Haowei updated MAPREDUCE-2841:


Environment: x86-64 Linux/Unix  (was: x86-64 Linux)

> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux/Unix
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: DESIGN.html, dualpivot-0.patch, dualpivotv20-0.patch, 
> MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2841) Task level native optimization

2012-12-11 Thread Cheng Haowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Haowei updated MAPREDUCE-2841:


Description: 
I'm recently working on native optimization for MapTask based on JNI. 

The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
emitted by mapper, therefore sort, spill, IFile serialization can all be done 
in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
results:

1. Sort is about 3x-10x as fast as java(only binary string compare is supported)

2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
CRC32C is used, things can get much faster(1G/

3. Merge code is not completed yet, so the test use enough io.sort.mb to 
prevent mid-spill

This leads to a total speed up of 2x~3x for the whole MapTask, if 
IdentityMapper(mapper does nothing) is used

There are limitations of course, currently only Text and BytesWritable is 
supported, and I have not think through many things right now, such as how to 
support map side combine. I had some discussion with somebody familiar with 
hive, it seems that these limitations won't be much problem for Hive to benefit 
from those optimizations, at least. Advices or discussions about improving 
compatibility are most welcome:) 

Currently NativeMapOutputCollector has a static method called canEnable(), 
which checks if key/value type, comparator type, combiner are all compatible, 
then MapTask can choose to enable NativeMapOutputCollector.

This is only a preliminary test, more work need to be done. I expect better 
final results, and I believe similar optimization can be adopt to reduce task 
and shuffle too. 






  was:
I'm recently working on native optimization for MapTask based on JNI. 

The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
emitted by mapper, therefore sort, spill, IFile serialization can all be done 
in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
results:

1. Sort is about 3x-10x as fast as java(only binary string compare is supported)

2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
CRC32C is used, things can get much faster(1G/

3. Merge code is not completed yet, so the test use enough io.sort.mb to 
prevent mid-spill

This leads to a total speed up of 2x~3x for the whole MapTask, if 
IdentityMapper(mapper does nothing) is used.

There are limitations of course, currently only Text and BytesWritable is 
supported, and I have not think through many things right now, such as how to 
support map side combine. I had some discussion with somebody familiar with 
hive, it seems that these limitations won't be much problem for Hive to benefit 
from those optimizations, at least. Advices or discussions about improving 
compatibility are most welcome:) 

Currently NativeMapOutputCollector has a static method called canEnable(), 
which checks if key/value type, comparator type, combiner are all compatible, 
then MapTask can choose to enable NativeMapOutputCollector.

This is only a preliminary test, more work need to be done. I expect better 
final results, and I believe similar optimization can be adopt to reduce task 
and shuffle too. 







> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: DESIGN.html, dualpivot-0.patch, dualpivotv20-0.patch, 
> MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improvin

[jira] [Updated] (MAPREDUCE-2841) Task level native optimization

2012-12-11 Thread Cheng Haowei (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Haowei updated MAPREDUCE-2841:


Description: 
I'm recently working on native optimization for MapTask based on JNI. 

The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
emitted by mapper, therefore sort, spill, IFile serialization can all be done 
in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
results:

1. Sort is about 3x-10x as fast as java(only binary string compare is supported)

2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
CRC32C is used, things can get much faster(1G/

3. Merge code is not completed yet, so the test use enough io.sort.mb to 
prevent mid-spill

This leads to a total speed up of 2x~3x for the whole MapTask, if 
IdentityMapper(mapper does nothing) is used.

There are limitations of course, currently only Text and BytesWritable is 
supported, and I have not think through many things right now, such as how to 
support map side combine. I had some discussion with somebody familiar with 
hive, it seems that these limitations won't be much problem for Hive to benefit 
from those optimizations, at least. Advices or discussions about improving 
compatibility are most welcome:) 

Currently NativeMapOutputCollector has a static method called canEnable(), 
which checks if key/value type, comparator type, combiner are all compatible, 
then MapTask can choose to enable NativeMapOutputCollector.

This is only a preliminary test, more work need to be done. I expect better 
final results, and I believe similar optimization can be adopt to reduce task 
and shuffle too. 






  was:
I'm recently working on native optimization for MapTask based on JNI. 

The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
emitted by mapper, therefore sort, spill, IFile serialization can all be done 
in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
results:

1. Sort is about 3x-10x as fast as java(only binary string compare is supported)

2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
CRC32C is used, things can get much faster(1G/s).

3. Merge code is not completed yet, so the test use enough io.sort.mb to 
prevent mid-spill

This leads to a total speed up of 2x~3x for the whole MapTask, if 
IdentityMapper(mapper does nothing) is used.

There are limitations of course, currently only Text and BytesWritable is 
supported, and I have not think through many things right now, such as how to 
support map side combine. I had some discussion with somebody familiar with 
hive, it seems that these limitations won't be much problem for Hive to benefit 
from those optimizations, at least. Advices or discussions about improving 
compatibility are most welcome:) 

Currently NativeMapOutputCollector has a static method called canEnable(), 
which checks if key/value type, comparator type, combiner are all compatible, 
then MapTask can choose to enable NativeMapOutputCollector.

This is only a preliminary test, more work need to be done. I expect better 
final results, and I believe similar optimization can be adopt to reduce task 
and shuffle too. 







> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: DESIGN.html, dualpivot-0.patch, dualpivotv20-0.patch, 
> MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used.
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> imp

[jira] [Commented] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.

2012-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528801#comment-13528801
 ] 

Hadoop QA commented on MAPREDUCE-4703:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12560370/MAPREDUCE-4703_branch-1_rev3.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3116//console

This message is automatically generated.

> Add the ability to start the MiniMRClientCluster using the configurations 
> used before it is being stopped.
> --
>
> Key: MAPREDUCE-4703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, test
>Affects Versions: 1.2.0, 2.0.3-alpha
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4703_branch-1.patch, 
> MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703_branch-1_rev3.patch, 
> MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch
>
>
> The objective here is to enable starting back the cluster, after being 
> stopped, using the same configurations/port numbers used before stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4703) Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.

2012-12-11 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-4703:


Attachment: MAPREDUCE-4703_branch-1_rev3.patch

The branch-1 patch runs fine for me. I am only seeing this same failure when 
reducing the 5 sec sleep period that takes place after shutting down the 
cluster and before starting. This sleep period is machine dependent and this is 
why you are seeing it and I am not. I have modified the patch to avoid having a 
fixed sleep period, instead it now retries to start the cluster every 1 sec 
until it starts. Can you please test this patch on your machine and see if it 
solves the problem?

> Add the ability to start the MiniMRClientCluster using the configurations 
> used before it is being stopped.
> --
>
> Key: MAPREDUCE-4703
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4703
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, test
>Affects Versions: 1.2.0, 2.0.3-alpha
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4703_branch-1.patch, 
> MAPREDUCE-4703_branch-1_rev2.patch, MAPREDUCE-4703_branch-1_rev3.patch, 
> MAPREDUCE-4703.patch, MAPREDUCE-4703_rev2.patch, MAPREDUCE-4703_rev3.patch
>
>
> The objective here is to enable starting back the cluster, after being 
> stopped, using the same configurations/port numbers used before stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4865) Launching node-level combiner at the end stage of MapTask

2012-12-11 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4865:
--

Summary: Launching node-level combiner at the end stage of MapTask  (was: 
Launching node-level combiner the end stage of MapTask)

> Launching node-level combiner at the end stage of MapTask
> -
>
> Key: MAPREDUCE-4865
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4865
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: tasktracker
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
>
> MapTask needs to start node-level aggregation against local outputs at the 
> end stage of MapTask after calling getAggregationTargets().
> This feature is implemented with Merger and CombinerRunner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira