[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116252#comment-14116252
 ] 

Hadoop QA commented on MAPREDUCE-5891:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12665542/MAPREDUCE-5891-v3.patch
  against trunk revision 270a271.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4838//console

This message is automatically generated.

> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891-v3.patch, MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-08-29 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5891:
--

Attachment: MAPREDUCE-5891-v3.patch

> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891-v3.patch, MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-08-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116244#comment-14116244
 ] 

Junping Du commented on MAPREDUCE-5891:
---

Thanks [~jlowe] for comments!
bq. SHUFFLE_FETCH_TIMEOUT_MS should be 
"mapreduce.reduce.shuffle.fetch.retry.timeout-ms"
Nice catch, done.

bq. openConnectionWithRetry calls abortConnect if stopped, but the one caller 
of this function does the same thing when it returns. Maybe 
openConnectionWithRetry should just return if stopped?
Yes. Even caller can return directly as caller from upper layer already address 
it. Fixed.

bq. Nit: The code block in copyMapOutput's catch of IOException is getting 
really long. It would be good to refactor some of this code into methods. Minor 
nit: "get failed" should be "failed".
Done.

bq. openConnectionWithRetry is being called and retries even if fetch retry is 
disabled
Good point, fixed.

bq. Shouldn't we be setting retryStartTime back to zero instead of endTime 
below?
Also good one, fixed it. 

bq. Also wondering if we should reset it after each successful transfer (e.g.: 
after a successful header parse and successful shuffle)?
May not be necessary. If retryStartTime is not 0, which means this fetcher 
haven't successfully make any progress since last failure of getMapOutput, it 
should keep trying and wait time aggregation until timeout. 

> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-2841) Task level native optimization

2014-08-29 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2841:
---

Attachment: micro-benchmark.txt

Attaching some micro-benchmark results that compare FB's collector 
(BlockMapOutputBuffer), the current MR2 collector (MapTask$MapOutputBuffer), 
and the new native collector from this JIRA.

The test here is running 30 map tasks within a single JVM, which gives the JIT 
a lot of time to warm up. Even the native collector benefits from some warmup 
(first run takes 1600ms CPU whereas later runs take 1000ms CPU time). In the 
table below, I'll express "Post-JIT" as the fastest runtime of all of the runs.

To summarize results:

*Current collector:*
First task: 7020ms CPU
Post-JIT: 5730ms CPU

*FB collector:*
First task: 4100ms CPU
Post-JIT: 2010ms CPU

*Native collector:*
First task: 1620ms CPU
Post-JIT: 970ms CPU

If you assume that people generally don't use JVM reuse, then the "first task" 
runtimes are most interesting. In that case, the native collector is winning by 
>4x vs status quo, and still >2x vs the best known Java collector. 
Additionally, the FB collector _only_ supports BytesWritable as far as I know, 
whereas the native one supports many common writable types and is also user- 
(or framework-) extensible.

If you consider the post-JIT results, the difference isn't quite as striking, 
but still pretty significant.

> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux/Unix
>Reporter: Binglin Chang
>Assignee: Sean Zhong
> Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
> MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, 
> fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, 
> micro-benchmark.txt
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

2014-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115958#comment-14115958
 ] 

Jason Lowe commented on MAPREDUCE-5891:
---

Thanks for updating the patch!  Comments:

SHUFFLE_FETCH_TIMEOUT_MS = "mapreduce.reduce.shuffle.fetch.timeout-ms" but it 
should be "mapreduce.reduce.shuffle.fetch.retry.timeout-ms"

openConnectionWithRetry calls abortConnect if stopped, but the one caller of 
this function does the same thing when it returns.  Maybe 
openConnectionWithRetry should just return if stopped?

Nit: The code block in copyMapOutput's catch of IOException is getting really 
long.  It would be good to refactor some of this code into methods

Minor nit: "get failed" should be "failed".

openConnectionWithRetry is being called and retries even if fetch retry is 
disabled

Shouldn't we be setting retryStartTime back to zero instead of endTime below?  
Otherwise the next error could timeout without any retry if the transfer before 
the error took longer than the timeout interval.
{code}
  // Refresh retryStartTime as map task make progress if retried before.
  if (retryStartTime != 0) {
retryStartTime = endTime;
  }
{code}
Also wondering if we should reset it after each successful transfer (e.g.: 
after a successful header parse and successful shuffle)?


> Improved shuffle error handling across NM restarts
> --
>
> Key: MAPREDUCE-5891
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Junping Du
> Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, 
> MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an 
> NM restart it would be nice if reducers only reported a fetch failure after 
> trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

2014-08-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115900#comment-14115900
 ] 

Todd Lipcon commented on MAPREDUCE-2841:


Hey Joy. Nice to hear from you, and glad to hear the benchmark was useful.

A couple interesting points:

bq. Took the average of 3 runs after one warmup run (all in same JVM)

Do you typically enable JVM reuse? How many runs do you typically get within 
the same JVM in typical Qubole applications?

I found that, if I increase the number of runs within a JVM to 30 or 40, then 
the existing collector becomes nearly as efficient as the native one. But, it 
really takes this many runs for the JIT to fully kick in. So, one of the main 
advantages of the native collector isn't that C++ code is so much faster than 
JITted Java code, but rather that, in the context of a map task, we rarely have 
a process living long enough to get the full available performance of the JIT.

I ran some benchmarks with -XX:+PrintCompilation and found that the JIT was 
indeed kicking in on the first run. But, after many runs, some key functions 
got re-jitted and became much faster.

Given that most people I know do not enable JVM reuse, and even if they do, 
typically do not manage to run 30-40 tasks within a JVM, I think there is a 
significant boost to running precompiled code for this hot part of the code.

bq. Old Collector: 20.3s
bq. New Collector: 7.48s

This is comparing the MR2 collector vs the FB collector (BMOB?) Did you also 
try the native collector? It's interesting that your "old collector" runtimes 
are so slow. Did you tweak anything about the benchmark? On my system, the 
current MR2 collector pretty quickly gets down to <10sec.

bq.  I think query latency is absolutely the wrong benchmark for measuring the 
utility of these optimizations. The problem is Hive runtime (for example) is 
dominated by startup and launch overheads for these types of queries. But in a 
CPU/throughput bound cluster - the improvements would matter much more than 
straight line query latency improvements would indicate.

Agreed. That's why the benchmark also reports total CPU time. The native 
collector is single-threaded whereas the existing MR2 collector is 
multi-threaded. So even though the wall time of a single task may not improve 
that much, it's using significantly less CPU to do the same work (meaning in a 
real job you'll get better overall throughput and cluster utilization).




> Task level native optimization
> --
>
> Key: MAPREDUCE-2841
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
> Environment: x86-64 Linux/Unix
>Reporter: Binglin Chang
>Assignee: Sean Zhong
> Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
> MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, 
> fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5931) Validate SleepJob command line parameters

2014-08-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5931:
--

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Gera!  I committed this to trunk and branch-2.

> Validate SleepJob command line parameters
> -
>
> Key: MAPREDUCE-5931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.1, 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
> Fix For: 2.6.0
>
> Attachments: MAPREDUCE-5931.v01.patch, MAPREDUCE-5931.v02.patch, 
> MAPREDUCE-5931.v03.patch, MAPREDUCE-5931.v04.patch
>
>
> This is a minor issue per se. I had a typo in my script specifying a negative 
> number of reducers for the SleepJob. It results in the exception that is far 
> from the root cause, and appeared as a serious issue with the map-side sort.
> {noformat}
> 2014-06-17 21:42:48,072 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector@972141f
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2014-06-17 21:42:48,075 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.IllegalArgumentException
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
>   at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:51)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1824)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1484)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5931) Validate SleepJob command line parameters

2014-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115708#comment-14115708
 ] 

Jason Lowe commented on MAPREDUCE-5931:
---

The test failures are unrelated bind exception failures.

+1 lgtm.  Committing this.

> Validate SleepJob command line parameters
> -
>
> Key: MAPREDUCE-5931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.1, 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
> Attachments: MAPREDUCE-5931.v01.patch, MAPREDUCE-5931.v02.patch, 
> MAPREDUCE-5931.v03.patch, MAPREDUCE-5931.v04.patch
>
>
> This is a minor issue per se. I had a typo in my script specifying a negative 
> number of reducers for the SleepJob. It results in the exception that is far 
> from the root cause, and appeared as a serious issue with the map-side sort.
> {noformat}
> 2014-06-17 21:42:48,072 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector@972141f
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2014-06-17 21:42:48,075 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.IllegalArgumentException
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
>   at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:51)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1824)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1484)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5931) Validate SleepJob command line parameters

2014-08-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115587#comment-14115587
 ] 

Hadoop QA commented on MAPREDUCE-5931:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12665134/MAPREDUCE-5931.v04.patch
  against trunk revision 4bd0194.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect
  
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4837//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4837//console

This message is automatically generated.

> Validate SleepJob command line parameters
> -
>
> Key: MAPREDUCE-5931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5931
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.2.1, 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
> Attachments: MAPREDUCE-5931.v01.patch, MAPREDUCE-5931.v02.patch, 
> MAPREDUCE-5931.v03.patch, MAPREDUCE-5931.v04.patch
>
>
> This is a minor issue per se. I had a typo in my script specifying a negative 
> number of reducers for the SleepJob. It results in the exception that is far 
> from the root cause, and appeared as a serious issue with the map-side sort.
> {noformat}
> 2014-06-17 21:42:48,072 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector@972141f
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1447)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2014-06-17 21:42:48,075 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.IllegalArgumentException
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:330)
>   at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:51)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1824)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1484)
>   at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:173)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5977) Fix or suppress native-task gcc warnings

2014-08-29 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115166#comment-14115166
 ] 

Binglin Chang commented on MAPREDUCE-5977:
--

decster:~/projects/hadoop-trunk> git la
2014-08-27 bfd1d75 (Todd Lipcon): MAPREDUCE-6054. native-task: Speed up tests. 
Contributed by Todd Lipcon.
2014-08-27 fad4524 (Todd Lipcon): MAPREDUCE-5977. Fix or suppress native-task 
gcc warnings. Contributed by Manu Zhang.

> Fix or suppress native-task gcc warnings
> 
>
> Key: MAPREDUCE-5977
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5977
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Todd Lipcon
>Assignee: Manu Zhang
> Attachments: gcc_compile.log, mapreduce-5977-v2.txt, 
> mapreduce-5977-v3.txt, mapreduce-5977.txt
>
>
> Currently, building the native task code on gcc 4.8 has a fair number of 
> warnings. We should fix or suppress them so that new warnings are easier to 
> see.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5977) Fix or suppress native-task gcc warnings

2014-08-29 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115162#comment-14115162
 ] 

Binglin Chang commented on MAPREDUCE-5977:
--

Haodop has moved to git, see https://wiki.apache.org/hadoop/HowToCommitWithGit

> Fix or suppress native-task gcc warnings
> 
>
> Key: MAPREDUCE-5977
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5977
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Todd Lipcon
>Assignee: Manu Zhang
> Attachments: gcc_compile.log, mapreduce-5977-v2.txt, 
> mapreduce-5977-v3.txt, mapreduce-5977.txt
>
>
> Currently, building the native task code on gcc 4.8 has a fair number of 
> warnings. We should fix or suppress them so that new warnings are easier to 
> see.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5977) Fix or suppress native-task gcc warnings

2014-08-29 Thread Manu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115154#comment-14115154
 ] 

Manu Zhang commented on MAPREDUCE-5977:
---

Hi Todd. Haven't seen this committed in. Anything wrong with git ?

> Fix or suppress native-task gcc warnings
> 
>
> Key: MAPREDUCE-5977
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5977
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Todd Lipcon
>Assignee: Manu Zhang
> Attachments: gcc_compile.log, mapreduce-5977-v2.txt, 
> mapreduce-5977-v3.txt, mapreduce-5977.txt
>
>
> Currently, building the native task code on gcc 4.8 has a fair number of 
> warnings. We should fix or suppress them so that new warnings are easier to 
> see.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5050) Cannot find partition.lst in Terasort on Hadoop/Local File System

2014-08-29 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115061#comment-14115061
 ] 

Ewan Higgs commented on MAPREDUCE-5050:
---

This is a duplicate of MAPREDUCE-5528.

> Cannot find partition.lst in Terasort on Hadoop/Local File System
> -
>
> Key: MAPREDUCE-5050
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5050
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 0.20.2
> Environment: Cloudera VM CDH3u4, VMWare, Linux, Java SE 1.6.0_31-b04
>Reporter: Matt Parker
>Priority: Minor
>
> I'm trying to simulate running Hadoop on Lustre by configuring it to use the 
> local file system using a single cloudera VM (cdh3u4).
> I can generate the data just fine, but when running the sorting portion of 
> the program, I get an error about not being able to find the _partition.lst 
> file. It exists in the generated data directory.
> Perusing the Terasort code, I see in the main method that has a Path 
> reference to partition.lst, which is created with the parent directory.
>   public int run(String[] args) throws Exception {
>LOG.info("starting");
>   JobConf job = (JobConf) getConf();
> >>  Path inputDir = new Path(args[0]);
> >>  inputDir = inputDir.makeQualified(inputDir.getFileSystem(job));
> >>  Path partitionFile = new Path(inputDir, 
> >> TeraInputFormat.PARTITION_FILENAME);
>   URI partitionUri = new URI(partitionFile.toString() +
>"#" + TeraInputFormat.PARTITION_FILENAME);
>   TeraInputFormat.setInputPaths(job, new Path(args[0]));
>   FileOutputFormat.setOutputPath(job, new Path(args[1]));
>   job.setJobName("TeraSort");
>   job.setJarByClass(TeraSort.class);
>   job.setOutputKeyClass(Text.class);
>   job.setOutputValueClass(Text.class);
>   job.setInputFormat(TeraInputFormat.class);
>   job.setOutputFormat(TeraOutputFormat.class);
>   job.setPartitionerClass(TotalOrderPartitioner.class);
>   TeraInputFormat.writePartitionFile(job, partitionFile);
>   DistributedCache.addCacheFile(partitionUri, job);
>   DistributedCache.createSymlink(job);
>   job.setInt("dfs.replication", 1);
>   TeraOutputFormat.setFinalSync(job, true);
>   JobClient.runJob(job);
>   LOG.info("done");
>   return 0;
>   }
> But in the configure method, the Path isn't created with the parent directory 
> reference.
> public void configure(JobConf job) {
>   try {
> FileSystem fs = FileSystem.getLocal(job);
> >>Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
> splitPoints = readPartitions(fs, partFile, job);
> trie = buildTrie(splitPoints, 0, splitPoints.length, new Text(), 2);
>   } catch (IOException ie) {
> throw new IllegalArgumentException("can't read paritions file", ie);
>   }
> }
> I modified the code as follows, and now sorting portion of the Terasort test 
> works using the
> general file system. I think the above code is a bug.
> public void configure(JobConf job) {
>   try {
> FileSystem fs = FileSystem.getLocal(job);
>   >>  Path[] inputPaths = TeraInputFormat.getInputPaths(job);
>   >>  Path partFile = new Path(inputPaths[0], 
> TeraInputFormat.PARTITION_FILENAME);
> splitPoints = readPartitions(fs, partFile, job);
> trie = buildTrie(splitPoints, 0, splitPoints.length, new Text(), 2);
>   } catch (IOException ie) {
> throw new IllegalArgumentException("can't read paritions file", ie);
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-6056) nativetask: move system test working dir to target dir and cleanup test config xml files

2014-08-29 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115051#comment-14115051
 ] 

Binglin Chang commented on MAPREDUCE-6056:
--

Thanks for the patch Manu, some comments:
1. most test use System.getProperty("test.build.data", "target/test/data") to 
locate test work dir, better to follow that rule rather than hard code into code
2. those xml config files require apache license header


> nativetask: move system test working dir to target dir and cleanup test 
> config xml files
> 
>
> Key: MAPREDUCE-6056
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6056
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Binglin Chang
>Assignee: Manu Zhang
>Priority: Minor
> Attachments: mapreduce-6056-v2.txt, mapreduce-6056.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-6056) nativetask: move system test working dir to target dir and cleanup test config xml files

2014-08-29 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated MAPREDUCE-6056:
--

Attachment: mapreduce-6056-v2.txt

sorry, the newly added file was missing. uploaded a new patch. should be ok 
now. 

> nativetask: move system test working dir to target dir and cleanup test 
> config xml files
> 
>
> Key: MAPREDUCE-6056
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6056
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: task
>Reporter: Binglin Chang
>Assignee: Manu Zhang
>Priority: Minor
> Attachments: mapreduce-6056-v2.txt, mapreduce-6056.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5528) TeraSort fails with "can't read paritions file" - does not read partition file from distributed cache

2014-08-29 Thread Ewan Higgs (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115013#comment-14115013
 ] 

Ewan Higgs commented on MAPREDUCE-5528:
---

I have reproduced this issue.

> TeraSort fails with "can't read paritions file" - does not read partition 
> file from distributed cache
> -
>
> Key: MAPREDUCE-5528
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 3.0.0
>Reporter: Albert Chu
>Priority: Minor
> Attachments: MAPREDUCE-5528.patch
>
>
> I was trying to run TeraSort against a parallel networked file system, 
> setting things up via the 'file://" scheme.  I always got the following error 
> when running terasort:
> {noformat}
> 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
> attempt_1379960046506_0001_m_80_1, Status : FAILED
> Error: java.lang.IllegalArgumentException: can't read paritions file
> at 
> org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:678)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
> Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
> at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
> at org.apache.hadoop.util.Shell.run(Shell.java:417)
> at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
> at 
> org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
> at 
> org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
> ... 10 more
> {noformat}
> After digging into TeraSort, I noticed that the partitions file was created 
> in the output directory, then added into the distributed cache
> {noformat}
> Path outputDir = new Path(args[1]);
> ...
> Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
> ...
> job.addCacheFile(partitionUri);
> {noformat}
> but the partitions file doesn't seem to be read back from the output 
> directory or distributed cache:
> {noformat}
> FileSystem fs = FileSystem.getLocal(conf);
> ...
> Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
> splitPoints = readPartitions(fs, partFile, conf);
> {noformat}
> It seems the file is being read from whatever the working directory is for 
> the filesystem returned from FileSystem.getLocal(conf).
> Under HDFS this code works, the working directory seems to be the distributed 
> cache (I guess by default??).
> But when I set things up with the networked file system and 'file://' scheme, 
> the working directory was the directory I was running my Hadoop binaries out 
> of.
> The attached patch fixed things for me.  It grabs the partition file from the 
> distributed cache all of the time, instead of trusting things underneath to 
> work out.  It seems to be the right thing to do???
> Apologies, I was unable to get this to reproduce under the TeraSort example 
> tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
> difference is in the setup.  I tested under both HDFS & 'file' scheme and the 
> patch worked under both.



--
This message was sent by Atlassia