[jira] [Updated] (MAPREDUCE-5911) Terasort TeraOutputFormat does not check for output directory existance

2014-09-09 Thread Bruno P. Kinoshita (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno P. Kinoshita updated MAPREDUCE-5911:
--
Attachment: HADOOP-5911.patch

Hi, first time writing a patch for Hadoop. Based on the description provided by 
Ivan. Couldn't find any tests referencing this class, but no tests failed in 
maven.

HTH, Bruno

> Terasort TeraOutputFormat does not check for output directory existance
> ---
>
> Key: MAPREDUCE-5911
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5911
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Reporter: Ivan Mitic
>Assignee: Ivan Mitic
>Priority: Minor
> Attachments: HADOOP-5911.patch
>
>
> The enforcement that the directory must not yet exist is implemented in 
> {{FileOutputFormat#checkOutputSpecs}} by throwing 
> {{FileAlreadyExistsException}}.  However, terasort uses a specialized output 
> format, {{TeraOutputFormat}}, which is a subclass of {{FileOutputFormat}}.  
> The subclass overrides {{checkOutputSpecs}}, but does not re-implement the 
> existence check and throw {{FileAlreadyExistsException}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6078) native-task: fix gtest build on macosx

2014-09-09 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127974#comment-14127974
 ] 

Binglin Chang commented on MAPREDUCE-6078:
--

What do you mean? I guess thats the weird cmake syntax.

> native-task: fix gtest build on macosx
> --
>
> Key: MAPREDUCE-6078
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6078
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Trivial
> Attachments: MAPREDUCE-6078.v1.patch
>
>
> Try compile the HEAD code in macos but failed, looks like MAPREDUCE-5977 
> separate gtest compile from nttest in order to surpress compile warnings, but 
> it forget to add addition compile flags added to nttest is also required for  
> gtest build, this patch fix this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6048) TestJavaSerialization fails in trunk build

2014-09-09 Thread Bruno P. Kinoshita (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127853#comment-14127853
 ] 

Bruno P. Kinoshita commented on MAPREDUCE-6048:
---

Hi, I think the builds were removed from Jenkins, but I could **not** reproduce 
with the following settings:

Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
2014-02-14T15:37:52-03:00)
Maven home: 
/home/kinow/java/tupilabs/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/EMBEDDED
Java version: 1.7.0_65, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-35-generic", arch: "amd64", family: "unix"

> TestJavaSerialization fails in trunk build
> --
>
> Key: MAPREDUCE-6048
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6048
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> This happened in builds #1871 and #1872
> {code}
> testMapReduceJob(org.apache.hadoop.mapred.TestJavaSerialization)  Time 
> elapsed: 2.784 sec  <<< FAILURE!
> junit.framework.ComparisonFailure: expected:<[a   ]1> but was:<[0 1]1>
>   at junit.framework.Assert.assertEquals(Assert.java:100)
>   at junit.framework.Assert.assertEquals(Assert.java:107)
>   at junit.framework.TestCase.assertEquals(TestCase.java:269)
>   at 
> org.apache.hadoop.mapred.TestJavaSerialization.testMapReduceJob(TestJavaSerialization.java:127)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6075) HistoryServerFileSystemStateStore can create zero-length files

2014-09-09 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127748#comment-14127748
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6075:
---

[~daryn], thanks for your point, you're right. +1(non-binding) for Jason's 
change.

http://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html

> HistoryServerFileSystemStateStore can create zero-length files
> --
>
> Key: MAPREDUCE-6075
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6075
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: MAPREDUCE-6075.patch
>
>
> When the history server state store writes a token file it uses 
> IOUtils.cleanup() to close the file which will silently ignore errors.  This 
> can lead to empty token files in the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3024) Make all poms to have hadoop-project POM as common parent

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3024:

Fix Version/s: 2.0.0-alpha

> Make all poms to have hadoop-project POM as common parent
> -
>
> Key: MAPREDUCE-3024
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3024
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.0.0-alpha
>
>
> in order to effectively use the Maven 'versions' plugin to update version 
> numbers all POMs should have the hadoop-project POM as their common parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3024) Make all poms to have hadoop-project POM as common parent

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3024:

Fix Version/s: (was: 3.0.0)

> Make all poms to have hadoop-project POM as common parent
> -
>
> Key: MAPREDUCE-3024
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3024
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Fix For: 2.0.0-alpha
>
>
> in order to effectively use the Maven 'versions' plugin to update version 
> numbers all POMs should have the hadoop-project POM as their common parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2806) [Gridmix] Load job fails with timeout errors when resource emulation is turned on

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2806:

Fix Version/s: (was: 3.0.0)
   1.1.0

> [Gridmix] Load job fails with timeout errors when resource emulation is 
> turned on
> -
>
> Key: MAPREDUCE-2806
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2806
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>  Labels: gridmix, loadjob, timeout
> Fix For: 1.1.0
>
>
> When the Load job's tasks are emulating cpu/memory, the task-tracker kills 
> the emulating task due to lack of status updates. Load job has its own status 
> reporter which dies too soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-3168) [Gridmix] TestCompressionEmulationUtils fails after MR-3158

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-3168.
-
   Resolution: Duplicate
Fix Version/s: (was: 3.0.0)

> [Gridmix] TestCompressionEmulationUtils fails after MR-3158
> ---
>
> Key: MAPREDUCE-3168
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3168
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix
>Affects Versions: 0.24.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>  Labels: compression-emulation, gridmix, local-job-runner
>
> TestCompressionEmulationUtils fails after MAPREDUCE-3158 as it uses local 
> job-runner to run jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (MAPREDUCE-3168) [Gridmix] TestCompressionEmulationUtils fails after MR-3158

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened MAPREDUCE-3168:
-

> [Gridmix] TestCompressionEmulationUtils fails after MR-3158
> ---
>
> Key: MAPREDUCE-3168
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3168
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix
>Affects Versions: 0.24.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>  Labels: compression-emulation, gridmix, local-job-runner
>
> TestCompressionEmulationUtils fails after MAPREDUCE-3158 as it uses local 
> job-runner to run jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3191) docs for map output compression incorrectly reference SequenceFile

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3191:

Fix Version/s: (was: 2.5.0)
   (was: 3.0.0)

> docs for map output compression incorrectly reference SequenceFile
> --
>
> Key: MAPREDUCE-3191
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3191
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Chen He
>Priority: Trivial
>  Labels: documentation, noob
> Fix For: 0.23.11, 2.4.1
>
> Attachments: MAPREDUCE-3191-v2.patch, MAPREDUCE-3191.patch
>
>
> The documentation currently says that map output compression uses 
> SequenceFile compression. This hasn't been true in several years, since we 
> use IFile for intermediate data now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4868) Allow multiple iteration for map

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4868:

Fix Version/s: (was: 2.4.0)
   (was: 3.0.0)

> Allow multiple iteration for map
> 
>
> Key: MAPREDUCE-4868
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Jerry Chen
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, the Mapper class allows advanced users to override "public void 
> run(Context context)" method for more control over the execution of the 
> mapper, while Context interface limit the operations over the data which is 
> the foundation of "more control".
> One of use cases is that when I am considering a hive optimziation problem, I 
> want to go two passes over the input data instead of using a another job or 
> task ( which may slower the whole process). Each pass do the same thing but 
> with a different parameters.
> This is a new paradigm of Map Reduce usage and can be archived easily by 
> extend Context interface a little with the more control over the data such as 
> reset the input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5513) ConcurrentModificationException in JobControl

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5513:

Fix Version/s: (was: 3.0.0)

> ConcurrentModificationException in JobControl
> -
>
> Key: MAPREDUCE-5513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta, 0.23.9
>Reporter: Jason Lowe
>Assignee: Robert Parker
> Fix For: 0.23.10, 2.2.0
>
> Attachments: MAPREDUCE-5513-1.patch
>
>
> JobControl.toList is locking individual lists to iterate them, but those 
> lists can be modified elsewhere without holding the list lock.  The locking 
> approaches are mismatched, with toList holding the lock on the actual list 
> object while other methods hold the JobControl lock when modifying the lists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6063) In sortAndSpill of MapTask.java, size is calculated wrongly when bufend < bufstart.

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6063:

Fix Version/s: (was: 3.0.0)

> In sortAndSpill of MapTask.java, size is calculated wrongly when bufend < 
> bufstart.
> ---
>
> Key: MAPREDUCE-6063
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6063
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-6063.000.patch, MAPREDUCE-6063.branch-1.patch
>
>
> In sortAndSpill of MapTask.java, size is calculated wrongly when bufend < 
> bufstart.  we should change (bufvoid - bufend) + bufstart to (bufvoid - 
> bufstart) + bufend.
> Should change
> {code}
>  long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
> {code}
> to:
> {code}
>  long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufstart) + bufend) +
>   partitions * APPROX_HEADER_LENGTH;
> {code}
> It is because when wraparound happen (bufend < bufstart) ,  the size should 
> bufvoid - bufstart (bigger one) + bufend(small one).
> You can find similar code implementation in MapTask.java:
> {code}
> mapOutputByteCounter.increment(valend >= keystart
> ? valend - keystart
> : (bufvoid - keystart) + valend);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5821) IFile merge allocates new byte array for every value

2014-09-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5821:

Fix Version/s: (was: 2.5.0)
   (was: 3.0.0)

> IFile merge allocates new byte array for every value
> 
>
> Key: MAPREDUCE-5821
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: performance, task
>Affects Versions: 2.4.1
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 2.4.1
>
> Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, 
> mapreduce-5821.txt
>
>
> I wrote a standalone benchmark of the MapOutputBuffer and found that it did a 
> lot of allocations during the merge phase. After looking at an allocation 
> profile, I found that IFile.Reader.nextRawValue() would always allocate a new 
> byte array for every value, so the allocation rate goes way up during the 
> merge phase of the mapper. I imagine this also affects the reducer input, 
> though I didn't profile that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6075) HistoryServerFileSystemStateStore can create zero-length files

2014-09-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127432#comment-14127432
 ] 

Daryn Sharp commented on MAPREDUCE-6075:


I'm +1 on the change.  The close/null/cleanup is a rather common pattern is 
hadoop.  Using flush isn't a substitute for a close for all filesystems.  Close 
must always be allowed to throw an exception and only swallowed when another 
exception occurred.

In java, close() is supposed to be idempotent so double close is fine.  Double 
closing a fd is bad because the fd may have already been recycled by another 
thread.

> HistoryServerFileSystemStateStore can create zero-length files
> --
>
> Key: MAPREDUCE-6075
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6075
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: MAPREDUCE-6075.patch
>
>
> When the history server state store writes a token file it uses 
> IOUtils.cleanup() to close the file which will silently ignore errors.  This 
> can lead to empty token files in the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-09-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127309#comment-14127309
 ] 

Jason Lowe commented on MAPREDUCE-5891:
---

bq. a) dynamic MR to YARN query, given NM recovery flag is a global cluster 
level setting ( although it is possible to config it on per NM basis ), can we 
derive the value of mapreduce.reduce.shuffle.fetch.retry.enabled at job 
submission time from some YARN API call to RM?

The RM is unaware of whether the NM supports work-preserving restart, and I'd 
rather not add that coupling just for this.

bq. b) shuffle protocol change. It seems Fetcher and ShuffleHandler check http 
header via property key names. So if we add a new property to indicate if 
recovery is supported and continue to keep the same http "version" property, 
new version of fetcher might be able to work with old version of 
shufflehandler, and vise versa.

True, we could add a new HTTP header that new Fetchers could query.

> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891-v3.patch, MAPREDUCE-5891-v4.patch, MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6078) native-task: fix gtest build on macosx

2014-09-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127269#comment-14127269
 ] 

Allen Wittenauer commented on MAPREDUCE-6078:
-

Um, should the conditional try to match on the same thing?

> native-task: fix gtest build on macosx
> --
>
> Key: MAPREDUCE-6078
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6078
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>Priority: Trivial
> Attachments: MAPREDUCE-6078.v1.patch
>
>
> Try compile the HEAD code in macos but failed, looks like MAPREDUCE-5977 
> separate gtest compile from nttest in order to surpress compile warnings, but 
> it forget to add addition compile flags added to nttest is also required for  
> gtest build, this patch fix this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-09-09 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127212#comment-14127212
 ] 

Ming Ma commented on MAPREDUCE-5891:


The patch looks good. I like Jason's idea to have 
mapreduce.reduce.shuffle.fetch.retry.enabled use 
${yarn.nodemanager.recovery.enabled} as default value. As for the other 
approaches,

a) dynamic MR to YARN query, given NM recovery flag is a global cluster level 
setting ( although it is possible to config it on per NM basis ), can we derive 
the value of mapreduce.reduce.shuffle.fetch.retry.enabled at job submission 
time from some YARN API call to RM?

b) shuffle protocol change. It seems Fetcher and ShuffleHandler check http 
header via property key names. So if we add a new property to indicate if 
recovery is supported and continue to keep the same http "version" property, 
new version of fetcher might be able to work with old version of 
shufflehandler, and vise versa.

> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891-v3.patch, MAPREDUCE-5891-v4.patch, MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

2014-09-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127197#comment-14127197
 ] 

Todd Lipcon commented on MAPREDUCE-2841:


bq. -1 javac. The applied patch generated 1265 javac compiler warnings (more 
than the trunk's current 1264 warnings).

This is due to needing to import the deprecated UTF8 class to provide support 
for that type.

Aside from that, seems like Jenkins is happy with the patch. The merge vote is 
already started on mapreduce-dev and is set to close 9/12 EOD PST.

> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux/Unix
>Reporter: Binglin Chang
>Assignee: Sean Zhong
> Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
> MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, 
> dualpivotv20-0.patch, fb-shuffle.patch, 
> hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, 
> mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, 
> mr-2841-merge.txt
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6080) JHS checks YARN application ACLs to determine user's access to aggregated logs

2014-09-09 Thread Zhijie Shen (JIRA)
Zhijie Shen created MAPREDUCE-6080:
--

 Summary: JHS checks YARN application ACLs to determine user's 
access to aggregated logs
 Key: MAPREDUCE-6080
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6080
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, webapps
Affects Versions: 2.5.0, 3.0.0
Reporter: Zhijie Shen


While JHS uses JobACLsManager to check user's access tot the job history 
information, it uses ApplicationACLsManager to justify whether the user has 
access to the aggregated log, because it directly imports AggregatedLogsBlock 
into the log web page.

In most cases, the two manager can do consistent access control. However we 
observed case that YARN acls is enabled while MR cluster acls is not. 
Therefore, the user can view all the job information except accessing the 
aggregated logs from JHS. It confuses the user. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-09-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127043#comment-14127043
 ] 

Jason Lowe commented on MAPREDUCE-5891:
---

Thanks for updating the patch, Junping, and sorry for the delay in re-review.   
 The fixes all look fine.

I agree with Ming that we should be consistent about the default state of this 
feature and NM restart, although I'm not a fan of adding a YARN API to query NM 
restart.  Task containers currently don't talk with the NM, and IMHO this is 
not a good enough reason to change that.  I'm OK with adding it to the shuffle 
protocol if we can do it in a backwards-compatible way, although I don't know 
offhand how that would be accomplished.  Another approach is to try to tie the 
two properties together and have the default value of 
mapreduce.reduce.shuffle.fetch.retry.enabled in mapred-default.xml be 
$\{yarn.nodemanager.recovery.enabled\}, so they could still be set 
independently but by default the NM restart setting drives the fetch retry 
setting.

> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891-v3.patch, MAPREDUCE-5891-v4.patch, MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5972) Fix typo 'programatically' in job.xml (and a few other places)

2014-09-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127005#comment-14127005
 ] 

Hudson commented on MAPREDUCE-5972:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1866 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1866/])
MAPREDUCE-5972. Fix typo 'programatically' in job.xml (and a few other places) 
(Akira AJISAKA via aw) (aw: rev d989ac04449dc33da5e2c32a7f24d59cc92de536)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapredAppMasterRest.apt.vm
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewer.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfServlet.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/site/apt/HistoryServerRest.apt.vm
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewerPB.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfiguration.java
* hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/jquery.js


> Fix typo 'programatically' in job.xml (and a few other places)
> --
>
> Key: MAPREDUCE-5972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5972.patch
>
>
> In job.xml, there's a typo 'programatically' as the below if a property is 
> set through program.
> {code}
> 
>   mapreduce.job.map.class
>   org.apache.hadoop.examples.WordCount$TokenizerMapper
>   programatically
> 
> {code}
> should be 'programmatically'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5972) Fix typo 'programatically' in job.xml (and a few other places)

2014-09-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126977#comment-14126977
 ] 

Hudson commented on MAPREDUCE-5972:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1891 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1891/])
MAPREDUCE-5972. Fix typo 'programatically' in job.xml (and a few other places) 
(Akira AJISAKA via aw) (aw: rev d989ac04449dc33da5e2c32a7f24d59cc92de536)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewerPB.java
* hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/jquery.js
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfServlet.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfiguration.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/site/apt/HistoryServerRest.apt.vm
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapredAppMasterRest.apt.vm
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewer.java


> Fix typo 'programatically' in job.xml (and a few other places)
> --
>
> Key: MAPREDUCE-5972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5972.patch
>
>
> In job.xml, there's a typo 'programatically' as the below if a property is 
> set through program.
> {code}
> 
>   mapreduce.job.map.class
>   org.apache.hadoop.examples.WordCount$TokenizerMapper
>   programatically
> 
> {code}
> should be 'programmatically'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5972) Fix typo 'programatically' in job.xml (and a few other places)

2014-09-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126889#comment-14126889
 ] 

Hudson commented on MAPREDUCE-5972:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #675 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/675/])
MAPREDUCE-5972. Fix typo 'programatically' in job.xml (and a few other places) 
(Akira AJISAKA via aw) (aw: rev d989ac04449dc33da5e2c32a7f24d59cc92de536)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewerPB.java
* hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/jquery.js
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfServlet.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/site/apt/HistoryServerRest.apt.vm
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfiguration.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapredAppMasterRest.apt.vm


> Fix typo 'programatically' in job.xml (and a few other places)
> --
>
> Key: MAPREDUCE-5972
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5972
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Trivial
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5972.patch
>
>
> In job.xml, there's a typo 'programatically' as the below if a property is 
> set through program.
> {code}
> 
>   mapreduce.job.map.class
>   org.apache.hadoop.examples.WordCount$TokenizerMapper
>   programatically
> 
> {code}
> should be 'programmatically'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName

2014-09-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126751#comment-14126751
 ] 

Hadoop QA commented on MAPREDUCE-6079:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12667367/MAPREDUCE-6079.1.patch
  against trunk revision 90c8ece.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4864//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4864//console

This message is automatically generated.

> Renaming JobImpl#username to reporterUserName
> -
>
> Key: MAPREDUCE-6079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-6079.1.patch
>
>
> On MAPREDUCE-6033, we found the bug because of confusing field names 
> {{userName}} and {{username}}. We should change the names to distinguish them 
> easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName

2014-09-09 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126747#comment-14126747
 ] 

Akira AJISAKA commented on MAPREDUCE-6079:
--

Thanks for the report and the patch. +1 (non-binding) pending Jenkins.

> Renaming JobImpl#username to reporterUserName
> -
>
> Key: MAPREDUCE-6079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-6079.1.patch
>
>
> On MAPREDUCE-6033, we found the bug because of confusing field names 
> {{userName}} and {{username}}. We should change the names to distinguish them 
> easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName

2014-09-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6079:
--
Assignee: Tsuyoshi OZAWA
  Status: Patch Available  (was: Open)

> Renaming JobImpl#username to reporterUserName
> -
>
> Key: MAPREDUCE-6079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-6079.1.patch
>
>
> On MAPREDUCE-6033, we found the bug because of confusing field names 
> {{userName}} and {{username}}. We should change the names to distinguish them 
> easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName

2014-09-09 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-6079:
--
Attachment: MAPREDUCE-6079.1.patch

> Renaming JobImpl#username to reporterUserName
> -
>
> Key: MAPREDUCE-6079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-6079.1.patch
>
>
> On MAPREDUCE-6033, we found the bug because of confusing field names 
> {{userName}} and {{username}}. We should change the names to distinguish them 
> easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6079) Renaming JobImpl#username to reporterUserName

2014-09-09 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created MAPREDUCE-6079:
-

 Summary: Renaming JobImpl#username to reporterUserName
 Key: MAPREDUCE-6079
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6079
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Tsuyoshi OZAWA


On MAPREDUCE-6033, we found the bug because of confusing field names 
{{userName}} and {{username}}. We should change the names to distinguish them 
easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)