[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984027#comment-13984027
 ] 

Sangjin Lee commented on MAPREDUCE-5861:


LGTM. I'd wait for committers to chime in. Thanks!

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch, MAPREDUCE-5861.2.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

2014-04-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983935#comment-13983935
 ] 

Sandy Ryza commented on MAPREDUCE-5862:
---

{code}
+checkRecordSpanningMultipleSplits("recordSpanningMultipleSplits.txt.bz2",
+  200 * 1000,
+  true);
{code}
indentation should be:
{code}
+checkRecordSpanningMultipleSplits("recordSpanningMultipleSplits.txt.bz2",
+200 * 1000, true);
{code}
I can fix these on commit.

Otherwise, the updated patch looks good to me.  [~jlowe], anything you see that 
I'm missing?

> Line records longer than 2x split size aren't handled correctly
> ---
>
> Key: MAPREDUCE-5862
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: bc Wong
>Priority: Critical
> Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
>0  100200 300
>| split | curr | split |
>  <--- record --->
>  90 240
> {noformat}
>   
> Currently, the first split would read the entire record, up to offset 240, 
> which is good. But the 2nd split has a bug in producing a phantom record of 
> (200, 240).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-28 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983862#comment-13983862
 ] 

Tsz Wo Nicholas Sze commented on MAPREDUCE-5402:


Sure, I should be able to review this later this week.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983851#comment-13983851
 ] 

Hadoop QA commented on MAPREDUCE-5861:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12642376/MAPREDUCE-5861.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4566//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4566//console

This message is automatically generated.

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch, MAPREDUCE-5861.2.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5221) Reduce side Combiner is not used when using the new API

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983850#comment-13983850
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5221:
---

ping.

> Reduce side Combiner is not used when using the new API
> ---
>
> Key: MAPREDUCE-5221
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5221
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5221.1.patch, MAPREDUCE-5221.2.patch, 
> MAPREDUCE-5221.3.patch, MAPREDUCE-5221.4.patch, MAPREDUCE-5221.5.patch, 
> MAPREDUCE-5221.6.patch, MAPREDUCE-5221.7-2.patch, MAPREDUCE-5221.7.patch, 
> MAPREDUCE-5221.8.patch, MAPREDUCE-5221.9.patch
>
>
> If a combiner is specified using o.a.h.mapreduce.Job.setCombinerClass - this 
> will silently ignored on the reduce side since the reduce side usage is only 
> aware of the old api combiner.
> This doesn't fail the job - since the new combiner key does not deprecate the 
> old key.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983845#comment-13983845
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5402:
---

Hi [~szetszwo], I looked that you've worked for distcp on Hadoop JIRA. Can you 
take a review?

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be better than having an 
> arbitrary hard-coded limit that *prevents* proper parallelization when 
> dealing with large files and/or large numbers of mappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983817#comment-13983817
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5861:
---

Confirmed that doneWithMaps is also accessed from only one thread. A latest 
patch make not only finishedSubMaps but also doneWithMaps non-volatile.

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch, MAPREDUCE-5861.2.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5861:
--

Attachment: MAPREDUCE-5861.2.patch

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch, MAPREDUCE-5861.2.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983800#comment-13983800
 ] 

Hadoop QA commented on MAPREDUCE-5861:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12642364/MAPREDUCE-5861.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4565//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4565//console

This message is automatically generated.

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983792#comment-13983792
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5861:
---

Oops, I misread what you mentioned. As you mentioned, we dont need make 
finishedSubMaps volatile if it runs on single thread. I'll update a patch soon.

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983770#comment-13983770
 ] 

Sangjin Lee commented on MAPREDUCE-5861:


Hmm, I don't think AtomicInteger adds any benefit here. The current version of 
the code is *correct in terms of thread safety* (and I would argue even 
volatile is not needed). The atomic increment concern happens only if the said 
variable is being read and updated by multiple threads. However, in this case 
only one thread (the task runner) reads and writes to this variable. As such, 
it gets trivial thread safety (via thread confinement).


> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5861:
--

Assignee: Tsuyoshi OZAWA
  Status: Patch Available  (was: Open)

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Tsuyoshi OZAWA
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5861:
--

Attachment: MAPREDUCE-5861.1.patch

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: MAPREDUCE-5861.1.patch
>
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983715#comment-13983715
 ] 

Sangjin Lee commented on MAPREDUCE-5861:


Correction: even with the latest version, strictly speaking volatile is not 
necessary as long as the task runner is single-threaded.

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAPREDUCE-5830) HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3

2014-04-28 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-5830.


Resolution: Won't Fix

HIVE-6900 fixed it in Hive. Closing as Won't FIX unless there are other 
projects out there that need this.

MAPREDUCE-5857 is the ticket for providing users with similar functionality in 
Hadoop-2.

> HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3
> --
>
> Key: MAPREDUCE-5830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5830
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> HostUtil.getTaskLogUrl used to have a signature like this in Hadoop 2.3.0 and 
> earlier:
> public static String getTaskLogUrl(String taskTrackerHostName, String 
> httpPort, String taskAttemptID)
> but now has a signature like this:
> public static String getTaskLogUrl(String scheme, String taskTrackerHostName, 
> String httpPort, String taskAttemptID)
> This breaks source and binary backwards-compatibility.  MapReduce and Hive 
> both have references to this, so their jars compiled against 2.3 or earlier 
> do not work on 2.4.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows

2014-04-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983447#comment-13983447
 ] 

Hadoop QA commented on MAPREDUCE-5866:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12642225/apache-yarn-1992.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapred.pipes.TestPipeApplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4563//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4563//console

This message is automatically generated.

> TestFixedLengthInputFormat fails in windows
> ---
>
> Key: MAPREDUCE-5866
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-1992.0.patch
>
>
> org.apache.hadoop.mapred.TextFixedLengthInputFormat and 
> org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail 
> in Windows



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

2014-04-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983388#comment-13983388
 ] 

Hadoop QA commented on MAPREDUCE-5862:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12642294/0001-Handle-records-larger-than-2x-split-size.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

  org.apache.hadoop.mapred.TestLineRecordReader
  org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4564//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4564//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4564//console

This message is automatically generated.

> Line records longer than 2x split size aren't handled correctly
> ---
>
> Key: MAPREDUCE-5862
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: bc Wong
>Priority: Critical
> Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
>0  100200 300
>| split | curr | split |
>  <--- record --->
>  90 240
> {noformat}
>   
> Currently, the first split would read the entire record, up to offset 240, 
> which is good. But the 2nd split has a bug in producing a phantom record of 
> (200, 240).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

2014-04-28 Thread bc Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bc Wong updated MAPREDUCE-5862:
---

Attachment: 0001-Handle-records-larger-than-2x-split-size.1.patch

Thanks for taking a look, Jason. I fixed {{maxBytesToConsume}} instead, and 
added tests for the mapred variant.

I added tests for compressed input for sanity sake. It's currently working. But 
while I'm here, why not?

> Line records longer than 2x split size aren't handled correctly
> ---
>
> Key: MAPREDUCE-5862
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: bc Wong
>Priority: Critical
> Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
>0  100200 300
>| split | curr | split |
>  <--- record --->
>  90 240
> {noformat}
>   
> Currently, the first split would read the entire record, up to offset 240, 
> which is good. But the 2nd split has a bug in producing a phantom record of 
> (200, 240).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows

2014-04-28 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev moved YARN-1992 to MAPREDUCE-5866:


Key: MAPREDUCE-5866  (was: YARN-1992)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> TestFixedLengthInputFormat fails in windows
> ---
>
> Key: MAPREDUCE-5866
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-1992.0.patch
>
>
> org.apache.hadoop.mapred.TextFixedLengthInputFormat and 
> org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail 
> in Windows



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5865) RecordReader is not closed in TeraInputFormat#writePartitionFile()

2014-04-28 Thread Ted Yu (JIRA)
Ted Yu created MAPREDUCE-5865:
-

 Summary: RecordReader is not closed in 
TeraInputFormat#writePartitionFile()
 Key: MAPREDUCE-5865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ted Yu


Here is related code:
{code}
RecordReader reader = 
  inFormat.createRecordReader(splits.get(sampleStep * idx),
  context);
reader.initialize(splits.get(sampleStep * idx), context);
while (reader.nextKeyValue()) {
  sampler.addKey(new Text(reader.getCurrentKey()));
  records += 1;
  if (recordsPerSample <= records) {
break;
  }
}
{code}
reader should be closed using finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

2014-04-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983179#comment-13983179
 ] 

Jason Lowe commented on MAPREDUCE-5862:
---

Note that I was not able to get the test to fail with compressed input even 
without the proposed fix.  The code already throws away the first record if it 
isn't the first split, and if that brings the reported position past the end of 
the current split then no records are reported for the split since 
getFilePosition > end on the first call to getNextValue().

The real issue is we aren't allowing a large enough read to occur for the first 
line read when using an uncompressed input.  Note that when we read a line 
during readNextValue() the max bytes to consume is computed as 
Math.max(maxBytesToConsume(pos), maxLineLength)) but when we read the first 
"throw-away" record it is just maxBytesToConsume(pos).  This isn't an issue for 
compressed input since maxBytesToConsume always returns Integer.MAX_VALUE, but 
it's problematic for uncompressed input when the split size is less than the 
maximum line length.

Changing the record read during init to match the same max bytes computation 
that readNextValue() uses allows the test to pass, and it is a simpler change.  
Arguably maxBytesToConsume() should just take maxLineLength into account 
already so others using it in the future don't make similar mistakes for tiny 
split sizes.

Couple of other comments on the patch:
- There should be a corresponding test for mapred LineRecordReader
- The test sends down 9-byte splits but moves the offset 10 bytes each time.  
Seems to me "splitSize - 1" should be "splitSize" when constructing the 
FileSplits in readRecords.

> Line records longer than 2x split size aren't handled correctly
> ---
>
> Key: MAPREDUCE-5862
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: bc Wong
>Priority: Critical
> Attachments: 0001-Handle-records-larger-than-2x-split-size.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
>0  100200 300
>| split | curr | split |
>  <--- record --->
>  90 240
> {noformat}
>   
> Currently, the first split would read the entire record, up to offset 240, 
> which is good. But the 2nd split has a bug in producing a phantom record of 
> (200, 240).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5812) Make job context available to OutputCommitter.isRecoverySupported()

2014-04-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983110#comment-13983110
 ] 

Hudson commented on MAPREDUCE-5812:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #5580 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5580/])
MAPREDUCE-5812. Make job context available to 
OutputCommitter.isRecoverySupported(). Contributed by Mohammad Kamrul Islam 
(jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1590668)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileOutputCommitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/OutputCommitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/OutputCommitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java


>  Make job context available to OutputCommitter.isRecoverySupported()
> 
>
> Key: MAPREDUCE-5812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5812
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.3.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Fix For: 3.0.0, 2.5.0
>
> Attachments: MAPREDUCE-5812.1.patch, MAPREDUCE-5812.2.patch, 
> MAPREDUCE-5812.3.patch, MAPREDUCE-5812.4.patch, MAPREDUCE-5812.5.patch, 
> MAPREDUCE-5812.6.patch, MAPREDUCE-5812.7.patch
>
>
> Background
> ==
> The system like Hive provides its version of  OutputCommitter. The custom 
> implementation of isRecoverySupported() requires task context. From 
> taskContext:getConfiguration(), hive checks if  hive-defined specific 
> property is set or not. Based on the property value, it returns true or 
> false. However, in the current OutputCommitter:isRecoverySupported(), there 
> is no way of getting task config. As a result, user can't  turn on/off the 
> MRAM recovery feature.
> Proposed resolution:
> ===
> 1. Pass Task Context into  isRecoverySupported() method.
> Pros: Easy and clean
> Cons: Possible backward compatibility issue due to aPI changes. (Is it true?)
> 2. Call outputCommitter.setupTask(taskContext) from MRAM: The new 
> OutputCommitter will store the context in the class level variable and use it 
> from  isRecoverySupported() 
> Props: No API changes. No backward compatibility issue. This call can be made 
> from MRAppMaster.getOutputCommitter() method for old API case.
> Cons: Might not be very clean solution due to class level variable.
> Please give your comments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5812) Make job context available to OutputCommitter.isRecoverySupported()

2014-04-28 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5812:
--

   Resolution: Fixed
Fix Version/s: 2.5.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks, Mohammad!  I committed this to trunk and branch-2.

>  Make job context available to OutputCommitter.isRecoverySupported()
> 
>
> Key: MAPREDUCE-5812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5812
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.3.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Fix For: 3.0.0, 2.5.0
>
> Attachments: MAPREDUCE-5812.1.patch, MAPREDUCE-5812.2.patch, 
> MAPREDUCE-5812.3.patch, MAPREDUCE-5812.4.patch, MAPREDUCE-5812.5.patch, 
> MAPREDUCE-5812.6.patch, MAPREDUCE-5812.7.patch
>
>
> Background
> ==
> The system like Hive provides its version of  OutputCommitter. The custom 
> implementation of isRecoverySupported() requires task context. From 
> taskContext:getConfiguration(), hive checks if  hive-defined specific 
> property is set or not. Based on the property value, it returns true or 
> false. However, in the current OutputCommitter:isRecoverySupported(), there 
> is no way of getting task config. As a result, user can't  turn on/off the 
> MRAM recovery feature.
> Proposed resolution:
> ===
> 1. Pass Task Context into  isRecoverySupported() method.
> Pros: Easy and clean
> Cons: Possible backward compatibility issue due to aPI changes. (Is it true?)
> 2. Call outputCommitter.setupTask(taskContext) from MRAM: The new 
> OutputCommitter will store the context in the class level variable and use it 
> from  isRecoverySupported() 
> Props: No API changes. No backward compatibility issue. This call can be made 
> from MRAppMaster.getOutputCommitter() method for old API case.
> Cons: Might not be very clean solution due to class level variable.
> Please give your comments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5861) LocalContainerLauncher does not atomically update volatile variable finishedSubMaps

2014-04-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983096#comment-13983096
 ] 

Sangjin Lee commented on MAPREDUCE-5861:


It was made volatile as part of MAPREDUCE-5841.

However, before that patch the subtask runner was single-threaded. So, I don't 
think volatile was necessary then (it is now).

> LocalContainerLauncher does not atomically update volatile variable 
> finishedSubMaps
> ---
>
> Key: MAPREDUCE-5861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Around line 374:
> {code}
>   if (++finishedSubMaps == numMapTasks) {
> doneWithMaps = true;
>   }
> {code}
> The increment of finishedSubMaps is not atomic.
> See the answer to 
> http://stackoverflow.com/questions/9749746/what-is-the-difference-of-atomic-volatile-synchronize
>  .
> AtomicInteger can be used to achieve atomicity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5812) Make job context available to OutputCommitter.isRecoverySupported()

2014-04-28 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5812:
--

  Issue Type: Improvement  (was: Bug)
 Summary:  Make job context available to 
OutputCommitter.isRecoverySupported()  (was:  Make task context available to 
OutputCommitter.isRecoverySupported())
Hadoop Flags: Reviewed

Thanks for addressing the java warnings, Mohammad.

+1 latest patch looks good to me.  Committing this.

>  Make job context available to OutputCommitter.isRecoverySupported()
> 
>
> Key: MAPREDUCE-5812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5812
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 2.3.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: MAPREDUCE-5812.1.patch, MAPREDUCE-5812.2.patch, 
> MAPREDUCE-5812.3.patch, MAPREDUCE-5812.4.patch, MAPREDUCE-5812.5.patch, 
> MAPREDUCE-5812.6.patch, MAPREDUCE-5812.7.patch
>
>
> Background
> ==
> The system like Hive provides its version of  OutputCommitter. The custom 
> implementation of isRecoverySupported() requires task context. From 
> taskContext:getConfiguration(), hive checks if  hive-defined specific 
> property is set or not. Based on the property value, it returns true or 
> false. However, in the current OutputCommitter:isRecoverySupported(), there 
> is no way of getting task config. As a result, user can't  turn on/off the 
> MRAM recovery feature.
> Proposed resolution:
> ===
> 1. Pass Task Context into  isRecoverySupported() method.
> Pros: Easy and clean
> Cons: Possible backward compatibility issue due to aPI changes. (Is it true?)
> 2. Call outputCommitter.setupTask(taskContext) from MRAM: The new 
> OutputCommitter will store the context in the class level variable and use it 
> from  isRecoverySupported() 
> Props: No API changes. No backward compatibility issue. This call can be made 
> from MRAppMaster.getOutputCommitter() method for old API case.
> Cons: Might not be very clean solution due to class level variable.
> Please give your comments.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5864) The part-r-00000 did not generate when I run a sample, wordcount with non-ascii path.

2014-04-28 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982841#comment-13982841
 ] 

Binglin Chang commented on MAPREDUCE-5864:
--

More details:

{code}
2014-04-28 08:40:29,574 INFO org.apache.hadoop.mapred.Merger: Down to the last 
merge-pass, with 1 segments left of total size: 50 bytes
2014-04-28 08:40:29,705 INFO org.apache.hadoop.mapred.Task: 
Task:attempt_201404220359_0033_r_00_0 is done. And is in the process of 
commiting
2014-04-28 08:40:30,774 INFO org.apache.hadoop.mapred.Task: Task 
attempt_201404220359_0033_r_00_0 is allowed to commit now
2014-04-28 08:40:30,785 INFO 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of 
task 'attempt_201404220359_0033_r_00_0' to /dex/en??/output
2014-04-28 08:40:30,790 INFO org.apache.hadoop.mapred.Task: Task 
'attempt_201404220359_0033_r_00_0' done.

[serengeti@wdc-vhadp-pub2-dhcp-72-245 ~]$ hadoop fs -ls /dex/en??/output
-rw-r--r--   3 serengeti hadoop  0 2014-04-28 08:40 
/dex/en??/output/_SUCCESS
-rw-r--r--   3 serengeti hadoop 36 2014-04-28 08:40 
/dex/en??/output/part-r-0
drwxrwxr-x   - serengeti hadoop  0 2014-04-28 08:40 
/dex/en中文/output/_logs
{code}

Looks like the some of the Chinese character is replaced to "??",  _SUCCESS and 
part files are affected, but _SUCCESS file are just fine.



> The part-r-0 did not generate when I run a sample, wordcount with 
> non-ascii path.
> -
>
> Key: MAPREDUCE-5864
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5864
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 1.2.1
> Environment: CentOS 
>Reporter: Lizhao.Du
>Assignee: Binglin Chang
> Fix For: 1.2.1
>
> Attachments: HadoopMapReduce_nonASCII_PATH.png
>
>
> When I run a command, hadoop jar /opt/serengeti/hadoop-examples-1.2.1.jar 
> wordcount /user/Administrator/测试/input  /user/Administrator/测试/output, the 
> output seem that it succeeded. But, only _logs can be generated under the 
> /user/Administrator/测试/output.  The file _SUCCESS and part-r-0 have not 
> generated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAPREDUCE-5864) The part-r-00000 did not generate when I run a sample, wordcount with non-ascii path.

2014-04-28 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned MAPREDUCE-5864:


Assignee: Binglin Chang

> The part-r-0 did not generate when I run a sample, wordcount with 
> non-ascii path.
> -
>
> Key: MAPREDUCE-5864
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5864
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 1.2.1
> Environment: CentOS 
>Reporter: Lizhao.Du
>Assignee: Binglin Chang
> Fix For: 1.2.1
>
> Attachments: HadoopMapReduce_nonASCII_PATH.png
>
>
> When I run a command, hadoop jar /opt/serengeti/hadoop-examples-1.2.1.jar 
> wordcount /user/Administrator/测试/input  /user/Administrator/测试/output, the 
> output seem that it succeeded. But, only _logs can be generated under the 
> /user/Administrator/测试/output.  The file _SUCCESS and part-r-0 have not 
> generated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5864) The part-r-00000 did not generate when I run a sample, wordcount with non-ascii path.

2014-04-28 Thread Lizhao.Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lizhao.Du updated MAPREDUCE-5864:
-

Attachment: HadoopMapReduce_nonASCII_PATH.png

The pic of command failed

> The part-r-0 did not generate when I run a sample, wordcount with 
> non-ascii path.
> -
>
> Key: MAPREDUCE-5864
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5864
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 1.2.1
> Environment: CentOS 
>Reporter: Lizhao.Du
> Fix For: 1.2.1
>
> Attachments: HadoopMapReduce_nonASCII_PATH.png
>
>
> When I run a command, hadoop jar /opt/serengeti/hadoop-examples-1.2.1.jar 
> wordcount /user/Administrator/测试/input  /user/Administrator/测试/output, the 
> output seem that it succeeded. But, only _logs can be generated under the 
> /user/Administrator/测试/output.  The file _SUCCESS and part-r-0 have not 
> generated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5864) The part-r-00000 did not generate when I run a sample, wordcount with non-ascii path.

2014-04-28 Thread Lizhao.Du (JIRA)
Lizhao.Du created MAPREDUCE-5864:


 Summary: The part-r-0 did not generate when I run a sample, 
wordcount with non-ascii path.
 Key: MAPREDUCE-5864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5864
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 1.2.1
 Environment: CentOS 
Reporter: Lizhao.Du
 Fix For: 1.2.1


When I run a command, hadoop jar /opt/serengeti/hadoop-examples-1.2.1.jar 
wordcount /user/Administrator/测试/input  /user/Administrator/测试/output, the 
output seem that it succeeded. But, only _logs can be generated under the 
/user/Administrator/测试/output.  The file _SUCCESS and part-r-0 have not 
generated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)