[jira] [Commented] (MAPREDUCE-463) The job setup and cleanup tasks should be optional

2011-08-11 Thread yuling (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083946#comment-13083946
 ] 

yuling commented on MAPREDUCE-463:
--

_temporary is create at clearnup phase,and used at some OutputFormat, when 
mapred.committer.job.setup.cleanup.needed is set to false, we will have to 
create _temporary before we use it.

> The job setup and cleanup tasks should be optional
> --
>
> Key: MAPREDUCE-463
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-463
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.21.0
>
> Attachments: patch-463-1.txt, patch-463.txt, patch-5785-1.txt, 
> patch-5785-2.txt, patch-5785-3.txt, patch-5785-4.txt, patch-5785-5.txt, 
> patch-5785.txt
>
>
> For jobs that require low latency and do not require setup or cleanup tasks 
> for the job, it should be possible to turn them off for that job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2129) Job may hang if mapreduce.job.committer.setup.cleanup.needed=false and mapreduce.map/reduce.failures.maxpercent>0

2011-08-11 Thread yuling (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083940#comment-13083940
 ] 

yuling commented on MAPREDUCE-2129:
---

good patch,got it.
mapreduce.job.committer.setup.cleanup.needed is add by MAPREDUCE-463.

> Job may hang if mapreduce.job.committer.setup.cleanup.needed=false and 
> mapreduce.map/reduce.failures.maxpercent>0
> -
>
> Key: MAPREDUCE-2129
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2129
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0
>Reporter: Kang Xiao
>  Labels: hadoop
> Attachments: MAPREDUCE-2129.patch, MAPREDUCE-2129.patch
>
>
> Job may hang at RUNNING state if 
> mapreduce.job.committer.setup.cleanup.needed=false and 
> mapreduce.map/reduce.failures.maxpercent>0. It happens when some tasks fail 
> but havent reached failures.maxpercent.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2839) MR Jobs fail on a secure cluster with viewfs

2011-08-11 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-2839:
--

Status: Patch Available  (was: Open)

the call to getDelegationToken is retained for now, for hftp to work. This will 
however end up getting the delegation token twice for other filesystem which 
implement both APIs.

> MR Jobs fail on a secure cluster with viewfs
> 
>
> Key: MAPREDUCE-2839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.23.0
>
> Attachments: MR2839_0.patch
>
>
> TokenCache needs to use the new FileSystem.getDelegationTokens api for it to 
> work with viewfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2839) MR Jobs fail on a secure cluster with viewfs

2011-08-11 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-2839:
--

Attachment: MR2839_0.patch

> MR Jobs fail on a secure cluster with viewfs
> 
>
> Key: MAPREDUCE-2839
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2839
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.23.0
>
> Attachments: MR2839_0.patch
>
>
> TokenCache needs to use the new FileSystem.getDelegationTokens api for it to 
> work with viewfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2839) MR Jobs fail on a secure cluster with viewfs

2011-08-11 Thread Siddharth Seth (JIRA)
MR Jobs fail on a secure cluster with viewfs


 Key: MAPREDUCE-2839
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2839
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.23.0


TokenCache needs to use the new FileSystem.getDelegationTokens api for it to 
work with viewfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2187) map tasks timeout during sorting

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083918#comment-13083918
 ] 

Hudson commented on MAPREDUCE-2187:
---

Integrated in Hadoop-Common-trunk-Commit #736 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/736/])
MAPREDUCE-2187 - Missed adding the file

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156962
Files : 
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestCombineOutputCollector.java


> map tasks timeout during sorting
> 
>
> Key: MAPREDUCE-2187
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2187
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Gianmarco De Francisci Morales
>Assignee: Anupam Seth
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2187-20-security-v2.patch, 
> MAPREDUCE-2187-20-security.patch, MAPREDUCE-2187-22.patch, 
> MAPREDUCE-2187-MR-279-v2.patch, MAPREDUCE-2187-branch-MR-279.patch, 
> MAPREDUCE-2187-trunk-v2.patch, MAPREDUCE-2187-trunk-v3.patch, 
> MAPREDUCE-2187-trunk.patch
>
>
> During the execution of a large job, the map tasks timeout:
> {code}
> INFO mapred.JobClient: Task Id : attempt_201010290414_60974_m_57_1, 
> Status : FAILED
> Task attempt_201010290414_60974_m_57_1 failed to report status for 609 
> seconds. Killing!
> {code}
> The bug is in the fact that the mapper has already finished, and, according 
> to the logs, the timeout occurs during the merge sort phase.
> The intermediate data generated by the map task is quite large. So I think 
> this is the problem.
> The logs show that the merge-sort was running for 10 minutes when the task 
> was killed.
> I think the mapred.Merger should call Reporter.progress() somewhere.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2187) map tasks timeout during sorting

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083913#comment-13083913
 ] 

Hudson commented on MAPREDUCE-2187:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #762 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/762/])
MAPREDUCE-2187 - Missed adding the file

acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156962
Files : 
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestCombineOutputCollector.java


> map tasks timeout during sorting
> 
>
> Key: MAPREDUCE-2187
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2187
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Gianmarco De Francisci Morales
>Assignee: Anupam Seth
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2187-20-security-v2.patch, 
> MAPREDUCE-2187-20-security.patch, MAPREDUCE-2187-22.patch, 
> MAPREDUCE-2187-MR-279-v2.patch, MAPREDUCE-2187-branch-MR-279.patch, 
> MAPREDUCE-2187-trunk-v2.patch, MAPREDUCE-2187-trunk-v3.patch, 
> MAPREDUCE-2187-trunk.patch
>
>
> During the execution of a large job, the map tasks timeout:
> {code}
> INFO mapred.JobClient: Task Id : attempt_201010290414_60974_m_57_1, 
> Status : FAILED
> Task attempt_201010290414_60974_m_57_1 failed to report status for 609 
> seconds. Killing!
> {code}
> The bug is in the fact that the mapper has already finished, and, according 
> to the logs, the timeout occurs during the merge sort phase.
> The intermediate data generated by the map task is quite large. So I think 
> this is the problem.
> The logs show that the merge-sort was running for 10 minutes when the task 
> was killed.
> I think the mapred.Merger should call Reporter.progress() somewhere.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2037) Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2037:
-

Attachment: MAPREDUCE-2037.patch

Minor update, messed up diff previously.

> Capturing interim progress times, CPU usage, and memory usage, when tasks 
> reach certain progress thresholds
> ---
>
> Key: MAPREDUCE-2037
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2037
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2037.patch, MAPREDUCE-2037.patch
>
>
> We would like to capture the following information at certain progress 
> thresholds as a task runs:
>* Time taken so far
>* CPU load [either at the time the data are taken, or exponentially 
> smoothed]
>* Memory load [also either at the time the data are taken, or 
> exponentially smoothed]
> This would be taken at intervals that depend on the task progress plateaus.  
> For example, reducers have three progress ranges -- [0-1/3], (1/3-2/3], and 
> (2/3-3/3] -- where fundamentally different activities happen.  Mappers have 
> different boundaries, I understand, that are not symmetrically placed.  Data 
> capture boundaries should coincide with activity boundaries.  For the state 
> information capture [CPU and memory] we should average over the covered 
> interval.
> This data would flow in with the heartbeats.  It would be placed in the job 
> history as part of the task attempt completion event, so it could be 
> processed by rumen or some similar tool and could drive a benchmark engine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2837) MR-279: Bug fixes ported from y-merge

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2837:
-

Attachment: rest.patch

Minor update, messed up diff previously.

> MR-279: Bug fixes ported from y-merge
> -
>
> Key: MAPREDUCE-2837
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2837
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
> Attachments: rest.patch, rest.patch
>
>
> Similar to MAPREDUCE-2679.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083846#comment-13083846
 ] 

Luke Lu commented on MAPREDUCE-901:
---

The port, which includes smarter counter limits (exclude framework counters for 
limits) lgtm. Defer +1 to Jenkins. There are some improvements can be made to 
improve binary compatibility (for most existing job jars) that we did/does not 
promise, deferring to a separate JIRA if binary compatibility becomes an issue.

> Move Framework Counters into a TaskMetric structure
> ---
>
> Key: MAPREDUCE-901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Owen O'Malley
>Assignee: Luke Lu
> Fix For: 0.23.0
>
> Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, 
> MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, 
> mr-901-trunk-v1.patch
>
>
> I think we should move all of the Counters that the framework updates into a 
> single class called TaskMetrics. TaskMetrics would have specific fields for 
> each of the metrics like input records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the 
> Counters down to just the user's counters) and decrease the latency for 
> updates to the JobTracker (since Counters are sent at most 1/minute instead 
> of 1/heartbeat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2838) to fix mapreduce builds to use the new hadoop common test jars

2011-08-11 Thread Giridharan Kesavan (JIRA)
to fix mapreduce builds to use the new hadoop common test jars
--

 Key: MAPREDUCE-2838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2838
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan


maprecude builds are still resolving the old hadoop-common-test jars.. Instead 
ivy classifiers should be used to resolve the new hadoop-common test jars ; as 
maven publishes test jars with classifier tests and not as a separate artifact.

[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-common;0.23.0-SNAPSHOT!hadoop-common.jar (1979ms)
[ivy:resolve] downloading 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common-test/0.23.0-SNAPSHOT/hadoop-common-test-0.23.0-20110727.191243-218.jar
 ...
[ivy:resolve] 

 (885kB)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2837) MR-279: Bug fixes ported from y-merge

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2837:
-

Status: Patch Available  (was: Open)

> MR-279: Bug fixes ported from y-merge
> -
>
> Key: MAPREDUCE-2837
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2837
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
> Attachments: rest.patch
>
>
> Similar to MAPREDUCE-2679.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-901:


Fix Version/s: 0.23.0
   Status: Patch Available  (was: Open)

> Move Framework Counters into a TaskMetric structure
> ---
>
> Key: MAPREDUCE-901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Owen O'Malley
>Assignee: Luke Lu
> Fix For: 0.23.0
>
> Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, 
> MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, 
> mr-901-trunk-v1.patch
>
>
> I think we should move all of the Counters that the framework updates into a 
> single class called TaskMetrics. TaskMetrics would have specific fields for 
> each of the metrics like input records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the 
> Counters down to just the user's counters) and decrease the latency for 
> updates to the JobTracker (since Counters are sent at most 1/minute instead 
> of 1/heartbeat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2037) Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2037:
-

Fix Version/s: 0.23.0
   Status: Patch Available  (was: Open)

> Capturing interim progress times, CPU usage, and memory usage, when tasks 
> reach certain progress thresholds
> ---
>
> Key: MAPREDUCE-2037
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2037
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2037.patch
>
>
> We would like to capture the following information at certain progress 
> thresholds as a task runs:
>* Time taken so far
>* CPU load [either at the time the data are taken, or exponentially 
> smoothed]
>* Memory load [also either at the time the data are taken, or 
> exponentially smoothed]
> This would be taken at intervals that depend on the task progress plateaus.  
> For example, reducers have three progress ranges -- [0-1/3], (1/3-2/3], and 
> (2/3-3/3] -- where fundamentally different activities happen.  Mappers have 
> different boundaries, I understand, that are not symmetrically placed.  Data 
> capture boundaries should coincide with activity boundaries.  For the state 
> information capture [CPU and memory] we should average over the covered 
> interval.
> This data would flow in with the heartbeats.  It would be placed in the job 
> history as part of the task attempt completion event, so it could be 
> processed by rumen or some similar tool and could drive a benchmark engine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2837) MR-279: Bug fixes ported from y-merge

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2837:
-

Attachment: rest.patch

>From y-merge, the most important fixes are to make MapOutputFile pluggable for 
>MAPREDUCE-279 as we prepare to merge.

> MR-279: Bug fixes ported from y-merge
> -
>
> Key: MAPREDUCE-2837
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2837
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
> Attachments: rest.patch
>
>
> Similar to MAPREDUCE-2679.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2837) MR-279: Bug fixes ported from y-merge

2011-08-11 Thread Arun C Murthy (JIRA)
MR-279: Bug fixes ported from y-merge
-

 Key: MAPREDUCE-2837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2837
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arun C Murthy
 Attachments: rest.patch

Similar to MAPREDUCE-2679.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-901:


Attachment: MAPREDUCE-901.patch

Patch ported from y-merge branch for ensuring we can merge MAPREDUCE-901 to 
trunk. Credit, of course, goes to Luke.

> Move Framework Counters into a TaskMetric structure
> ---
>
> Key: MAPREDUCE-901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Owen O'Malley
>Assignee: Luke Lu
> Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, 
> MAPREDUCE-901.patch, MAPREDUCE-901.patch, MAPREDUCE-901.patch, 
> mr-901-trunk-v1.patch
>
>
> I think we should move all of the Counters that the framework updates into a 
> single class called TaskMetrics. TaskMetrics would have specific fields for 
> each of the metrics like input records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the 
> Counters down to just the user's counters) and decrease the latency for 
> updates to the JobTracker (since Counters are sent at most 1/minute instead 
> of 1/heartbeat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2037) Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2037:
-

Attachment: MAPREDUCE-2037.patch

Patch ported from y-merge branch for ensuring we can merge MAPREDUCE-279 to 
trunk. Credit, of course, goes to Dick.

> Capturing interim progress times, CPU usage, and memory usage, when tasks 
> reach certain progress thresholds
> ---
>
> Key: MAPREDUCE-2037
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2037
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Dick King
>Assignee: Dick King
> Attachments: MAPREDUCE-2037.patch
>
>
> We would like to capture the following information at certain progress 
> thresholds as a task runs:
>* Time taken so far
>* CPU load [either at the time the data are taken, or exponentially 
> smoothed]
>* Memory load [also either at the time the data are taken, or 
> exponentially smoothed]
> This would be taken at intervals that depend on the task progress plateaus.  
> For example, reducers have three progress ranges -- [0-1/3], (1/3-2/3], and 
> (2/3-3/3] -- where fundamentally different activities happen.  Mappers have 
> different boundaries, I understand, that are not symmetrically placed.  Data 
> capture boundaries should coincide with activity boundaries.  For the state 
> information capture [CPU and memory] we should average over the covered 
> interval.
> This data would flow in with the heartbeats.  It would be placed in the job 
> history as part of the task attempt completion event, so it could be 
> processed by rumen or some similar tool and could drive a benchmark engine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2805) Update RAID for HDFS-2241

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083817#comment-13083817
 ] 

Hudson commented on MAPREDUCE-2805:
---

Integrated in Hadoop-Mapreduce-trunk #752 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/752/])
MAPREDUCE-2805. Update RAID for HDFS-2241.

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156513
Files : 
* 
/hadoop/common/trunk/mapreduce/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt


> Update RAID for HDFS-2241
> -
>
> Key: MAPREDUCE-2805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2805
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: m2805_20110811.patch
>
>
> {noformat}
> src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:44:
>  interface expected here
> [javac] public class RaidBlockSender implements java.io.Closeable, 
> FSConstants {
> [javac]^
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083814#comment-13083814
 ] 

Hudson commented on MAPREDUCE-2489:
---

Integrated in Hadoop-Mapreduce-trunk #752 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/752/])
MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable 
(jeffrey naisbit via mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156821
Files : 
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java


> Jobsplits with random hostnames can make the queue unusable
> ---
>
> Key: MAPREDUCE-2489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.205.0, 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.20.205.0, 0.23.0
>
> Attachments: MAPREDUCE-2489-0.20s-v2.patch, 
> MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, 
> MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, 
> MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, 
> MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, 
> MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, 
> MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for 
> the splits that were then causing the JobTracker to attempt to excessively 
> resolve host names.  This caused a major slowdown for the JobTracker.  We 
> should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure 
> that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
> could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2797) Some java files cannot be compiled

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083816#comment-13083816
 ] 

Hudson commented on MAPREDUCE-2797:
---

Integrated in Hadoop-Mapreduce-trunk #752 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/752/])
MAPREDUCE-2797. Update mapreduce tests and RAID for HDFS-2239.

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156215
Files : 
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* 
/hadoop/common/trunk/mapreduce/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRaid.java
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCacheOldApi.java


> Some java files cannot be compiled
> --
>
> Key: MAPREDUCE-2797
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2797
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: m2797_20110810.patch
>
>
> Due to the changes in HDFS-2239, the following files cannot be compiled 
> (Thanks Amar for pointing them out.)
> 1. src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java
> 2. 
> src/test/mapred/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java
> 3. 
> src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCacheOldApi.java
> 4. 
> src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRaid.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083809#comment-13083809
 ] 

Hudson commented on MAPREDUCE-2489:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #761 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/761/])
MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable 
(jeffrey naisbit via mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156821
Files : 
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java


> Jobsplits with random hostnames can make the queue unusable
> ---
>
> Key: MAPREDUCE-2489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.205.0, 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.20.205.0, 0.23.0
>
> Attachments: MAPREDUCE-2489-0.20s-v2.patch, 
> MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, 
> MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, 
> MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, 
> MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, 
> MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, 
> MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for 
> the splits that were then causing the JobTracker to attempt to excessively 
> resolve host names.  This caused a major slowdown for the JobTracker.  We 
> should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure 
> that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
> could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2805) Update RAID for HDFS-2241

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083812#comment-13083812
 ] 

Hudson commented on MAPREDUCE-2805:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #761 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/761/])
MAPREDUCE-2805. Update RAID for HDFS-2241.

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156513
Files : 
* 
/hadoop/common/trunk/mapreduce/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt


> Update RAID for HDFS-2241
> -
>
> Key: MAPREDUCE-2805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2805
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: m2805_20110811.patch
>
>
> {noformat}
> src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/datanode/RaidBlockSender.java:44:
>  interface expected here
> [javac] public class RaidBlockSender implements java.io.Closeable, 
> FSConstants {
> [javac]^
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2797) Some java files cannot be compiled

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083811#comment-13083811
 ] 

Hudson commented on MAPREDUCE-2797:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #761 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/761/])
MAPREDUCE-2797. Update mapreduce tests and RAID for HDFS-2239.

szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156215
Files : 
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* 
/hadoop/common/trunk/mapreduce/src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRaid.java
* 
/hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCacheOldApi.java


> Some java files cannot be compiled
> --
>
> Key: MAPREDUCE-2797
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2797
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid, test
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: m2797_20110810.patch
>
>
> Due to the changes in HDFS-2239, the following files cannot be compiled 
> (Thanks Amar for pointing them out.)
> 1. src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java
> 2. 
> src/test/mapred/org/apache/hadoop/mapreduce/security/TestBinaryTokenFile.java
> 3. 
> src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCacheOldApi.java
> 4. 
> src/contrib/raid/src/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyRaid.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2649) MR279: Fate of finished Applications on RM

2011-08-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2649:
-

Status: Open  (was: Patch Available)

> MR279: Fate of finished Applications on RM
> --
>
> Key: MAPREDUCE-2649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2649-patch-mr279.txt, MAPREDUCE-2649-v2.patch, 
> MAPREDUCE-2649-v3.patch, MAPREDUCE-2649-v4.patch
>
>
> Today RM keeps the references of finished application for ever. Though this 
> is not sustainable long term, it keeps
> the user experience saner. Users can revisit RM UI and check the status of 
> their apps.
> We need to think of purging old references yet keeping the UX sane.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2836) Provide option to fail jobs when submitted to non-existent pools.

2011-08-11 Thread Jeff Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2836:
-

Priority: Minor  (was: Major)

> Provide option to fail jobs when submitted to non-existent pools.
> -
>
> Key: MAPREDUCE-2836
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2836
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/fair-share
>Reporter: Jeff Bean
>Priority: Minor
>
> In some environments, it might be desirable to explicitly specify the fair 
> scheduler pools and to explicitly fail jobs that are not submitted to any of 
> the pools. 
> Current behavior of the fair scheduler is to submit jobs to a default pool if 
> a pool name isn't specified or to create a pool with the new name if the pool 
> name doesn't already exist. There should be a configuration option for the 
> fair scheduler that causes it to noisily fail the job if it's submitted to a 
> pool that isn't pre-specified or if the specified pool doesn't exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2649) MR279: Fate of finished Applications on RM

2011-08-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2649:
-

Attachment: MAPREDUCE-2649-v4.patch

updated the yarn-default.xml to include new config.

> MR279: Fate of finished Applications on RM
> --
>
> Key: MAPREDUCE-2649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2649-patch-mr279.txt, MAPREDUCE-2649-v2.patch, 
> MAPREDUCE-2649-v3.patch, MAPREDUCE-2649-v4.patch
>
>
> Today RM keeps the references of finished application for ever. Though this 
> is not sustainable long term, it keeps
> the user experience saner. Users can revisit RM UI and check the status of 
> their apps.
> We need to think of purging old references yet keeping the UX sane.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2836) Provide option to fail jobs when submitted to non-existent pools.

2011-08-11 Thread Jeff Bean (JIRA)
Provide option to fail jobs when submitted to non-existent pools.
-

 Key: MAPREDUCE-2836
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2836
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Jeff Bean


In some environments, it might be desirable to explicitly specify the fair 
scheduler pools and to explicitly fail jobs that are not submitted to any of 
the pools. 

Current behavior of the fair scheduler is to submit jobs to a default pool if a 
pool name isn't specified or to create a pool with the new name if the pool 
name doesn't already exist. There should be a configuration option for the fair 
scheduler that causes it to noisily fail the job if it's submitted to a pool 
that isn't pre-specified or if the specified pool doesn't exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2649) MR279: Fate of finished Applications on RM

2011-08-11 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083548#comment-13083548
 ] 

Mahadev konar commented on MAPREDUCE-2649:
--

Thomas, could you please add the property to yarn-default.xml as well. We need 
to populate our yarn-default.xml which currently is missing quite a few of the 
config knobs.

> MR279: Fate of finished Applications on RM
> --
>
> Key: MAPREDUCE-2649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2649-patch-mr279.txt, MAPREDUCE-2649-v2.patch, 
> MAPREDUCE-2649-v3.patch
>
>
> Today RM keeps the references of finished application for ever. Though this 
> is not sustainable long term, it keeps
> the user experience saner. Users can revisit RM UI and check the status of 
> their apps.
> We need to think of purging old references yet keeping the UX sane.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2649) MR279: Fate of finished Applications on RM

2011-08-11 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083540#comment-13083540
 ] 

Thomas Graves commented on MAPREDUCE-2649:
--

Reworked the patch to send an event when the RMapp completes and have the 
expirer handle it. It expires apps based on a maximum number of completed jobs.

> MR279: Fate of finished Applications on RM
> --
>
> Key: MAPREDUCE-2649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2649-patch-mr279.txt, MAPREDUCE-2649-v2.patch, 
> MAPREDUCE-2649-v3.patch
>
>
> Today RM keeps the references of finished application for ever. Though this 
> is not sustainable long term, it keeps
> the user experience saner. Users can revisit RM UI and check the status of 
> their apps.
> We need to think of purging old references yet keeping the UX sane.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2649) MR279: Fate of finished Applications on RM

2011-08-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2649:
-

Release Note: 
New config added:

   // the maximum number of completed applications the RM keeps 
yarn.server.resourcemanager.expire.applications.completed.max

  was:
The new configs added are:

  // time(in ms) between when the expire applications thread checks  
yarn.server.resourcemanager.expire.applications.monitor.interval
 
  // the length of time(in ms) the RM keeps a completed application
yarn.server.resourcemanager.expire.applications.interval

   // the maximum number of completed applications per user RM keeps 
yarn.server.resourcemanager.expire.applications.user.completed.max

  Status: Patch Available  (was: Open)

> MR279: Fate of finished Applications on RM
> --
>
> Key: MAPREDUCE-2649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2649-patch-mr279.txt, MAPREDUCE-2649-v2.patch, 
> MAPREDUCE-2649-v3.patch
>
>
> Today RM keeps the references of finished application for ever. Though this 
> is not sustainable long term, it keeps
> the user experience saner. Users can revisit RM UI and check the status of 
> their apps.
> We need to think of purging old references yet keeping the UX sane.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2649) MR279: Fate of finished Applications on RM

2011-08-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2649:
-

Attachment: MAPREDUCE-2649-v3.patch

> MR279: Fate of finished Applications on RM
> --
>
> Key: MAPREDUCE-2649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2649-patch-mr279.txt, MAPREDUCE-2649-v2.patch, 
> MAPREDUCE-2649-v3.patch
>
>
> Today RM keeps the references of finished application for ever. Though this 
> is not sustainable long term, it keeps
> the user experience saner. Users can revisit RM UI and check the status of 
> their apps.
> We need to think of purging old references yet keeping the UX sane.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083529#comment-13083529
 ] 

Hudson commented on MAPREDUCE-2489:
---

Integrated in Hadoop-Common-trunk-Commit #728 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/728/])
MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable 
(jeffrey naisbit via mahadev)

mahadev : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156821
Files : 
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* 
/hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java


> Jobsplits with random hostnames can make the queue unusable
> ---
>
> Key: MAPREDUCE-2489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.205.0, 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.20.205.0, 0.23.0
>
> Attachments: MAPREDUCE-2489-0.20s-v2.patch, 
> MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, 
> MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, 
> MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, 
> MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, 
> MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, 
> MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for 
> the splits that were then causing the JobTracker to attempt to excessively 
> resolve host names.  This caused a major slowdown for the JobTracker.  We 
> should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure 
> that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
> could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083521#comment-13083521
 ] 

Luke Lu commented on MAPREDUCE-2600:


bq. when using the client API, do I have to define the dependency for one 
artifact and I'm done (all the other come as transitive dependencies and are 
implementation specific not exposed to the user)?

If you just need to use the client API, dependency on 
org.apache.hadoop:hadoop-mapreduce-client-jobclient would suffice. If it's not 
so, we need to fix it :)



> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-08-11 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-2489:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just pushed this to 0.20 security and mapred trunk. Thanks Jeff!

> Jobsplits with random hostnames can make the queue unusable
> ---
>
> Key: MAPREDUCE-2489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.205.0, 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.20.205.0, 0.23.0
>
> Attachments: MAPREDUCE-2489-0.20s-v2.patch, 
> MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, 
> MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, 
> MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, 
> MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, 
> MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, 
> MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for 
> the splits that were then causing the JobTracker to attempt to excessively 
> resolve host names.  This caused a major slowdown for the JobTracker.  We 
> should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure 
> that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
> could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2835) Make per-job counter limits configurable

2011-08-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-2835:
-

Attachment: MAPREDUCE-2835.patch

Results of test-patch:

{noformat}
[exec] +1 overall.  
[exec] 
[exec] +1 @author.  The patch does not contain any @author tags.
[exec] 
[exec] +1 tests included.  The patch appears to include 3 new or modified 
tests.
[exec] 
[exec] +1 javadoc.  The javadoc tool did not generate any warning messages.
[exec] 
[exec] +1 javac.  The applied patch does not increase the total number of 
javac compiler warnings.
[exec] 
[exec] +1 findbugs.  The patch does not introduce any new Findbugs (version 
1.3.9) warnings.
[exec] 
{noformat}

> Make per-job counter limits configurable
> 
>
> Key: MAPREDUCE-2835
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2835
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.20.204.0
>Reporter: Tom White
>Assignee: Tom White
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2835.patch
>
>
> The per-job counter limits introduced in MAPREDUCE-1943 are fixed, except for 
> the total number allowed per job (mapreduce.job.counters.limit). It would be 
> useful to make them all configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2835) Make per-job counter limits configurable

2011-08-11 Thread Tom White (JIRA)
Make per-job counter limits configurable


 Key: MAPREDUCE-2835
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2835
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.204.0
Reporter: Tom White
Assignee: Tom White
 Fix For: 0.20.205.0


The per-job counter limits introduced in MAPREDUCE-1943 are fixed, except for 
the total number allowed per job (mapreduce.job.counters.limit). It would be 
useful to make them all configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2834) [MR-279] Enable dense update for file sink metrics

2011-08-11 Thread Ramya Sunil (JIRA)
[MR-279] Enable dense update for file sink metrics
--

 Key: MAPREDUCE-2834
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2834
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
 Fix For: 0.23.0


Currently, if File sink is enabled for MRAppMaster or Resourcemanager, it does 
not populate the file with all the available attributes. It would be useful for 
debugging and admin purpose to have all the metrics populated in the file.

For eg: MRAppMaster metrics currently logs value only for JobsRunning even 
though the total available job level metrics are JobsCompleted, JobsFailed, 
JobsKilled, JobsPreparing etc



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2011-08-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1943:
-

Fix Version/s: 0.20.203.0

This was fixed in 0.20.203.0 (see Subversion Commits tab, also commit r1077730).

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.20.203.0
>
> Attachments: MAPREDUCE-1943-0.20-yahoo.patch, 
> MAPREDUCE-1943-0.20-yahoo.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S.patch
>
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2833) Job Tracker needs to collect more job/task execution stats and save them to DFS file

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-1950 to MAPREDUCE-2833:


Key: MAPREDUCE-2833  (was: HADOOP-1950)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Job Tracker needs to collect more job/task execution stats and save them to 
> DFS file
> 
>
> Key: MAPREDUCE-2833
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2833
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Runping Qi
>  Labels: newbie
>
> In order to facilitate offline analysis on the dynamic behaviors and 
> performance characterics of map/reduce jobs, 
> we need the job tracker to collect some data about jobs and save them to DFS 
> files. Some data are  in time series form, 
> and some are not.
> Below is a preliminary list of desired data. Some of them are already 
> available in the current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
>1. jobid, jobname,  number of mappers, number of reducers, start time, end 
> time, end of mapper phase
>2. Average (median, min, max) of successful mapper execution time, 
> input/output records/bytes
>3. Average (median, min, max) of uncessful mapper execution time, 
> input/output records/bytes
>4.Total mapper retries,  max, average number of re-tries per mapper
>5. The reasons for mapper task fails.
>6. Average (median, min, max) of successful reducer execution time, 
> input/output reocrds/bytes
>Execution time is the difference between the sort end time and the 
> task end time
>7. Average (median, min, max) of successful copy time (from the mapper 
> phase end time  to the sort start time).
>8. Average (median, min, max) of successful sorting time for successful 
> reducers
>9. Average (median, min, max) of unsuccessful reducer execution time (from 
> the end of mapper phase or the start of the task, 
>whichever later, to the end of task)
>10. Total reducer retries,  max, average number of per reducer retries
>11. The reasons for reducer task fails (user code error, lost tracker, 
> failed to write to DFS, etc.)
> For each map/reduce job, we collect the following  time series data (with one 
> minute interval):
> 1. Numbers of pending mappers, reducers
> 2. Number of running mappers, reducers
> For the job tracker, we need the following data:
> 1. Number of trackers 
> 2. Start time 
> 3. End time 
> 4. The list of map reduce jobs (their ids, starttime/endtime)
> 
> The following time series data (with one minute interval):
> 1. The number of running jobs
> 2. The numbers of running mappers/reducers
> 3. The number pending mappers/reducers 
> The data collection should be optional. That is, a job tracker can turn off 
> such data collection, and 
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data 
> in such a way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2831) Some changes to Record I/O interfaces

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-2030 to MAPREDUCE-2831:


Key: MAPREDUCE-2831  (was: HADOOP-2030)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Some changes to Record I/O interfaces
> -
>
> Key: MAPREDUCE-2831
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2831
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Vivek Ratan
>
> I wanted to suggest some changes to the Record I/O interfaces. 
> Under org.apache.hadoop.record, _RecordInput_ and _RecordOutput_ are the 
> interfaces to serialize and deserialize basic types for Java-generated stubs. 
> All the methods in _RecordInput_ and _RecordOutput_ take a parameter, a 
> string, called 'tag'. As far as I can see, this tag is used only for 
> XML-based serialization, to write out the name of the field that is being 
> serialized.A lot of the  methods ignore it. My proposal is to eliminate this 
> parameter, for a number of reasons: 
> - We don't need to write the name of a field when serializing in XML. None of 
> the other serializers (for binary or CSV) write out the name of a field - we 
> only write the field value. The generated stubs know which field is 
> associated with which value (and now, with type information support, the 
> field name is part of the type information and is not required to be 
> serialized along with the field data). In fact, even in XML, I don't see the 
> field name being read back in, so it serves no purpose whatsoever. 
> - The tag is used occasionally in the error message, but again this can be 
> handled better by the caller of _RecordInput_ and _RecordOutput_. 
> - The tag is also used to detect whether a record is nested or not. In CSV, 
> we wrap nested records with "s{}". We also want to know whether a record is 
> nested or the top-most, so that we add a newline at the end of a top-most 
> record. If a tag is empty, it is assumed that the record is the top-most. 
> This is using the tag parameter to mean something else. It's far more 
> readable to just pass in a boolean to _startRecord()_ and _endRecord()_ which 
> directly indicates whether the record is nested or not. Or, add two 
> additional methods to _RecordOutput_ and _RecordInput_: _start()_ and 
> _stop()_, which are called at the beginning and end of every top-most record 
> while _startRecord()_ and _endRecord()_ are used only for nested records. The 
> former's slightly better, IMO, but each method is much better than using an 
> empty tag to indicate a top-level record.
> The issue with tags brings up a related issue. Sometimes, we may need to pass 
> in additional information to _RecordInput_ or _RecordOutput_. For example, 
> suppose we do need to write the field name along with the field value. We can 
> think of such a requirement in two ways. A) Such decisions of what to 
> serialize/deserialize are independent of the format/protocol that the data is 
> serialized in. If we want to write something else, that should be written 
> separately by the stub. So, if we want to serialize the field name before a 
> field value, a stub should call _RecordOutput.writeString()_ 
> first, followed by _RecordOutput.writeInt()_. The methods in 
> _RecordInput_ and _RecordOutput_  are the lowest level methods and they 
> should just be concerned with writing individual types.  B) What if a 
> protocol wants to write things differently? For example, we may want to write 
> the field name before the field value for XML only (for debugging sake, or 
> for whatever else). Or it may be that the field name and field value need to 
> be enclosed in certain tags that can't happen if you write them separately. 
> In these cases, methods in _RecordInput_ and _RecordOutput_ need to be passed 
> additional information. This can be done by providing an optional parameter 
> for these methods. Maybe a structure/class containing field information, or a 
> reference to the field itself (the Tag parameter was meant to serve a similar 
> purpose, but just passing in a String may be inadequate). For now, there is 
> no real need for either of these situations, so we should be OK with getting 
> rid of the tag parameter. 
> Similar changes need to be done to the C++ side, where we have _OArchive_ and 
> _IArchive_: 
> - The tag parameter needs to be removed
> - _startRecord()_ and _endRecord()_ in _OArchive_ and _IArchive_ need to take 
> a boolean parameter that indicates whether the record is nested or not
> - Currently, both _startRecord()_ and _endRecord()_ in  _IArchive_ take an 
> additional parameter, a reference to a hadoop record. This is never used 
> anywhere not required (the corresponding methods in _RecordInput_ and 
> _RecordOutput_ don't take any parameters, whi

[jira] [Moved] (MAPREDUCE-2832) sleeping with lock held in JobEndNotifier

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-2008 to MAPREDUCE-2832:


Key: MAPREDUCE-2832  (was: HADOOP-2008)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> sleeping with lock held in JobEndNotifier
> -
>
> Key: MAPREDUCE-2832
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2832
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> FindBugs points out a problem in JobEndNotifier from HADOOP-.
> {code}
>   synchronized (Thread.currentThread()) {
> Thread.currentThread().sleep(notification.getRetryInterval());
>   }
> {code}
> I haven't tracked through the code, but I suspect it should be a wait instead 
> of a sleep.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2745) [MR-279] NM UI should get a read-only view instead of the actual NMContext

2011-08-11 Thread Anupam Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anupam Seth reassigned MAPREDUCE-2745:
--

Assignee: Anupam Seth

> [MR-279] NM UI should get a read-only view instead of the actual NMContext 
> ---
>
> Key: MAPREDUCE-2745
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2745
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Anupam Seth
>  Labels: newbie
> Fix For: 0.23.0
>
>
> NMContext is modifiable, the UI should only get read-only access. Just like 
> the AM web-ui.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2830) Document config parameters for each Map-Reduce class/interface

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-2122 to MAPREDUCE-2830:


Component/s: (was: documentation)
 documentation
Key: MAPREDUCE-2830  (was: HADOOP-2122)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Document config parameters for each Map-Reduce class/interface
> --
>
> Key: MAPREDUCE-2830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arun C Murthy
>  Labels: newbie
>
> I propose we add a table in the javadoc for each user-facing Map-Reduce 
> interface/class which lists, and provides details, of each and every config 
> parameter which has any bearing on that interface/class. Clearly some 
> parameters affect more than one place and they should be put in more than one 
> table.
> For e.g. 
> Mapper -> io.sort.mb, io.sort.factor
> Reducer -> fs.inmemory.size.mb
> ...
> etc.
> It would very nice to explain how it interacts with the framework and rest of 
> config params etc.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2830) Document config parameters for each Map-Reduce class/interface

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-2830:
---

Labels: newbie  (was: )

> Document config parameters for each Map-Reduce class/interface
> --
>
> Key: MAPREDUCE-2830
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2830
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Arun C Murthy
>  Labels: newbie
>
> I propose we add a table in the javadoc for each user-facing Map-Reduce 
> interface/class which lists, and provides details, of each and every config 
> parameter which has any bearing on that interface/class. Clearly some 
> parameters affect more than one place and they should be put in more than one 
> table.
> For e.g. 
> Mapper -> io.sort.mb, io.sort.factor
> Reducer -> fs.inmemory.size.mb
> ...
> etc.
> It would very nice to explain how it interacts with the framework and rest of 
> config params etc.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2829) TestMiniMRMapRedDebugScript times out

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-2260 to MAPREDUCE-2829:


  Component/s: (was: fs)
Affects Version/s: (was: 0.16.0)
  Key: MAPREDUCE-2829  (was: HADOOP-2260)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> TestMiniMRMapRedDebugScript times out
> -
>
> Key: MAPREDUCE-2829
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2829
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
> Environment: Linux
>Reporter: Konstantin Shvachko
> Attachments: Hadoop-2260.log, testrun-2260.log
>
>
> I am running TestMiniMRMapRedDebugScript from trunc.
> This is what I see in the stdout:
> {code}
> 2007-11-22 02:21:23,494 WARN  conf.Configuration 
> (Configuration.java:loadResource(808)) - 
> hadoop/build/test/mapred/local/1_0/taskTracker/jobcache/job_200711220217_0001/task_200711220217_0001_m_00_0/job.xml:a
>  attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2007-11-22 02:21:28,940 INFO  jvm.JvmMetrics (JvmMetrics.java:init(56)) - 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2007-11-22 02:22:09,504 INFO  mapred.MapTask (MapTask.java:run(127)) - 
> numReduceTasks: 0
> 2007-11-22 02:22:42,434 WARN  mapred.TaskTracker 
> (TaskTracker.java:main(1982)) - Error running child
> java.io.IOException
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:41)
>   at 
> org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript$MapClass.map(TestMiniMRMapRedDebugScript.java:35)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1977)
> {code}
> Stderr and debugout both say: Bailing out.
> BTW on Windows everything works just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2828) SequenceFile.MergeQueue.merge inadvertently creates merge-outputs in the wrong FileSystem, at times in the InMemoryFileSystem

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3356 to MAPREDUCE-2828:


  Component/s: (was: io)
Affects Version/s: (was: 0.16.3)
  Key: MAPREDUCE-2828  (was: HADOOP-3356)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> SequenceFile.MergeQueue.merge inadvertently creates merge-outputs in the 
> wrong FileSystem, at times in the InMemoryFileSystem
> -
>
> Key: MAPREDUCE-2828
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2828
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Minor
>
> The offending code is:
> {code:title=SequenceFile.java}
> Path outputFile =  lDirAlloc.getLocalPathForWrite(
> tmpFilename.toString(),
> approxOutputSize, conf);
> LOG.debug("writing intermediate results to " + outputFile);
> Writer writer = cloneFileAttributes(
> 
> fs.makeQualified(segmentsToMerge.get(0).segmentPathName), 
> fs.makeQualified(outputFile), 
> null);
> {code}
> *fs* is InMemoryFileSystem when ReduceTask.ReduceCopier constructs it... so 
> the wrong FileSystem is used during intermediate merges.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2827) Test code can create Integer.MIN_INT when trying to create a random non-negative integer

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3567 to MAPREDUCE-2827:


Affects Version/s: (was: 0.17.0)
  Key: MAPREDUCE-2827  (was: HADOOP-3567)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Test code can create Integer.MIN_INT when trying to create a random 
> non-negative integer
> 
>
> Key: MAPREDUCE-2827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2827
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Tim Halloran
>Priority: Minor
>
> Sadly, Math.abs returns Integer.MIN_VALUE when passed Integer.MIN_VALUE  Thus 
> the code in 
> org.apache.hadoop.mapred.TestMapRed appears to need to consider this case.  
> Patch below.
> Index: .
> ===
> --- . (revision 8259)
> +++ . (working copy)
> @@ -97,7 +97,9 @@
>int randomCount = key.get();
>  
>for (int i = 0; i < randomCount; i++) {
> -out.collect(new IntWritable(Math.abs(r.nextInt())), new 
> IntWritable(randomVal));
> + int collectKey = Math.abs(r.nextInt());
> + if (collectKey == Integer.MIN_VALUE) collectKey = Integer.MAX_VALUE;
> +out.collect(new IntWritable(collectKey), new IntWritable(randomVal));
>}
>  }
>  public void close() {

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2825) Factor out commonly used code in mapred testcases

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-2825:
---

Labels: newbie  (was: )

> Factor out commonly used code in mapred testcases
> -
>
> Key: MAPREDUCE-2825
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2825
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: test
>Reporter: Amar Kamat
>Priority: Minor
>  Labels: newbie
>
> The commonly used code in the testcases are made _static_ like 
> {{TestRackAwareTaskPlacement.configureJobConf()}}. It would be nice to factor 
> out these apis and either add it to a class like {{StringUtils}} or into a 
> separate dir like {{utils}}.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2826) Change the job state observer classes to interfaces

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3801 to MAPREDUCE-2826:


Key: MAPREDUCE-2826  (was: HADOOP-3801)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Change the job state observer classes to interfaces
> ---
>
> Key: MAPREDUCE-2826
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2826
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Vivek Ratan
>
> Schedulers will most often want to be the observers of the job state events 
> in a single class. Therefore, I think they should  be interfaces which can 
> have multiple inheritance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2825) Factor out commonly used code in mapred testcases

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3749 to MAPREDUCE-2825:


Component/s: (was: test)
 test
Key: MAPREDUCE-2825  (was: HADOOP-3749)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Factor out commonly used code in mapred testcases
> -
>
> Key: MAPREDUCE-2825
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2825
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: test
>Reporter: Amar Kamat
>Priority: Minor
>
> The commonly used code in the testcases are made _static_ like 
> {{TestRackAwareTaskPlacement.configureJobConf()}}. It would be nice to factor 
> out these apis and either add it to a class like {{StringUtils}} or into a 
> separate dir like {{utils}}.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2808) pull MAPREDUCE-2797 into mr279 branch

2011-08-11 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-2808:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Just pushed this. Thanks Thomas!

> pull MAPREDUCE-2797 into mr279 branch
> -
>
> Key: MAPREDUCE-2808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2808
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2808.patch
>
>
> The ant tar command fails in the mapreduce directory on the mr279 branch.  
> The issue was a change in hdfs and was fixed on trunk with jira 
> MAPREDUCE-2797.  Pull that change into mr279.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2824) MiniMRCluster should have an idempotent shutdown

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3843 to MAPREDUCE-2824:


  Component/s: (was: test)
   test
Affects Version/s: (was: 0.19.0)
  Key: MAPREDUCE-2824  (was: HADOOP-3843)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> MiniMRCluster should have an idempotent shutdown
> 
>
> Key: MAPREDUCE-2824
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2824
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Steve Loughran
>
> It looks on a quick skim-through that the 
> org.apache.hadoop.mapred.MiniMRCluster class has nothing to stop a caller 
> calling shutdown() more than once, with possible adverse consequences. This 
> will normally only show up if a test fails at precisely the wrong place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2823) map-reduce doctor (Mr Doctor)

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3956 to MAPREDUCE-2823:


Key: MAPREDUCE-2823  (was: HADOOP-3956)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> map-reduce doctor (Mr Doctor)
> -
>
> Key: MAPREDUCE-2823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2823
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Amir Youssefi
>
> Problem Description: 
>  Users typically submit jobs with sub-optimal parameters resulting in 
> under-utilization, black-listed task-trackers, time-outs, re-tries etc.
>  Issue can be mitigated by submitting job with custom Hadoop parameters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2822) TaskTracker.offerService could handle IO and Remote Exceptions better

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-3987 to MAPREDUCE-2822:


Affects Version/s: (was: 0.19.0)
  Key: MAPREDUCE-2822  (was: HADOOP-3987)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> TaskTracker.offerService could handle IO and Remote Exceptions better
> -
>
> Key: MAPREDUCE-2822
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2822
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Steve Loughran
>
> The core offerService() loop has a try/catch wrapper that catches and 
> processes exceptions. Most cause offerService() to return, which then 
> triggers a sleep and restart in the main loop. But some exceptions are just 
> logged and ignored, which may be inappropriate

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2821) [MR-279] Missing fields in job summary logs

2011-08-11 Thread Ramya Sunil (JIRA)
[MR-279] Missing fields in job summary logs 


 Key: MAPREDUCE-2821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Priority: Minor
 Fix For: 0.23.0


The following fields are missing in the job summary logs in mrv2:
- numSlotsPerMap
- numSlotsPerReduce
- clusterCapacity (Earlier known as clusterMapCapacity and 
clusterReduceCapacity in 0.20.x)

The first two fields are important to know if the job was a High RAM job or not 
and the last field is important to know the total available resource in the 
cluster during job execution.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2820) Task tracker should not ask for tasks if its temp disk space is almost full

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-4033 to MAPREDUCE-2820:


Affects Version/s: (was: 0.17.1)
  Key: MAPREDUCE-2820  (was: HADOOP-4033)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Task tracker should not ask for tasks if its temp disk space is almost full
> ---
>
> Key: MAPREDUCE-2820
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2820
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Runping Qi
>Assignee: Ravi Gummadi
>
> I observed a case where a task tracker still asked for task even though the 
> available disk space on the machine is less than 1%.
> Consequently, it had hard time to finish the tasks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2819) extend JobClient.runJob(JobConf) with the ability to take a timeout, so fail better during test runs

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-4639 to MAPREDUCE-2819:


  Component/s: (was: test)
   test
Affects Version/s: (was: 0.20.0)
   Issue Type: Bug  (was: Improvement)
  Key: MAPREDUCE-2819  (was: HADOOP-4639)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> extend JobClient.runJob(JobConf) with the ability to take a timeout, so fail 
> better during test runs
> 
>
> Key: MAPREDUCE-2819
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2819
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> Tests that submit jobs via JobClient hang until they are killed if something 
> goes wrong in the back end -JobClient does not impose limits on how long runs 
> should take, but JUnit does. If we had an overload of runJob() that took a 
> timeout, JobClient could kill a job that was taking too long, extracting the 
> stack trace and better diagnostics to the test reports. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2818) Rest API for retrieving job / task statistics

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-4559 to MAPREDUCE-2818:


Key: MAPREDUCE-2818  (was: HADOOP-4559)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Rest API for retrieving job / task statistics 
> --
>
> Key: MAPREDUCE-2818
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2818
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Florian Leibert
>Priority: Trivial
> Attachments: HADOOP-4559v2.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> a rest api that returns a simple JSON containing information about a given 
> job such as:  min/max/avg times per task, failed tasks, etc. This would be 
> useful in order to allow external restart or modification of parameters of a 
> run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2817) MiniRMCluster hardcodes 'mapred.local.dir' configuration to 'build/test/mapred/local'

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-4536 to MAPREDUCE-2817:


Component/s: (was: test)
 test
Key: MAPREDUCE-2817  (was: HADOOP-4536)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> MiniRMCluster hardcodes 'mapred.local.dir' configuration to 
> 'build/test/mapred/local'
> -
>
> Key: MAPREDUCE-2817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
> Environment: all
>Reporter: Alejandro Abdelnur
>Priority: Minor
>
> The {{mapred.local.dir}} configuration property for the {{MiniMRCluster}} is 
> forced to {{build/test/mapred/local}}
> This is inconvenient in different situations. For example:
> * When running multiple tests using {{MiniMRCluster}} is not possible to see 
> the end state of the dir for a particular test
> * When using {{MiniMRCluster}} in another build system (i.e. Maven) that uses 
> a different output directory (target instead build)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2816) SortedMapWritable: inkonsistent put() and putAll() behaviour

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5028 to MAPREDUCE-2816:


Affects Version/s: (was: 0.19.0)
  Key: MAPREDUCE-2816  (was: HADOOP-5028)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> SortedMapWritable: inkonsistent put() and putAll() behaviour
> 
>
> Key: MAPREDUCE-2816
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2816
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
>
> The current SortedMapWritable implementation is breaking support for custom 
> classes in case putAll() is used. Its important for putAll() that addToMap() 
> will called to register all used classes. Please consider to have putAll() 
> call put() for each map entry.
> trunk:
>   public Writable put(WritableComparable key, Writable value) {
> addToMap(key.getClass());
> addToMap(value.getClass());
> return instance.put(key, value);
>   }
>   public void putAll(Map t) 
> {
> for (Map.Entry e:
>   t.entrySet()) {
>   
>   instance.put(e.getKey(), e.getValue());
> }
>   }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2815) JavaDoc does not generate correctly for MultithreadedMapRunner

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-4928 to MAPREDUCE-2815:


  Component/s: (was: documentation)
   documentation
Affects Version/s: (was: 0.19.0)
  Key: MAPREDUCE-2815  (was: HADOOP-4928)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> JavaDoc does not generate correctly for MultithreadedMapRunner
> --
>
> Key: MAPREDUCE-2815
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2815
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Reporter: Shane Butler
>Priority: Minor
>
> The following code in MultithreadedMapRunner.java does not get published to 
> the HTML docs correctly.
> This is what actually appears in the HTML docs:
> "It can be used instead of the default implementation, "
> This is what *should* appear:
> /**
>  * Multithreaded implementation for @link 
> org.apache.hadoop.mapred.MapRunnable.
>  * 
>  * It can be used instead of the default implementation,
>  * @link org.apache.hadoop.mapred.MapRunner, when the Map operation is not CPU
>  * bound in order to improve throughput.
>  * 
>  * Map implementations using this MapRunnable must be thread-safe.
>  * 
>  * The Map-Reduce job has to be configured to use this MapRunnable class 
> (using
>  * the JobConf.setMapRunnerClass method) and
>  * the number of thread the thread-pool can use with the
>  * mapred.map.multithreadedrunner.threads property, its default
>  * value is 10 threads.
>  * 
>  */

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2813) Tasks freeze with "No live nodes contain current block", job takes long time to recover

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5361 to MAPREDUCE-2813:


Affects Version/s: (was: 0.21.0)
   0.21.0
  Key: MAPREDUCE-2813  (was: HADOOP-5361)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Tasks freeze with "No live nodes contain current block", job takes long time 
> to recover
> ---
>
> Key: MAPREDUCE-2813
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2813
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Matei Zaharia
>
> Running a recent version of trunk on 100 nodes, I occasionally see some tasks 
> freeze at startup and hang the job. These tasks are not speculatively 
> executed either. Here's sample output from one of them:
> {noformat}
> 2009-02-27 15:19:10,229 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2009-02-27 15:19:10,486 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 0
> 2009-02-27 15:21:20,952 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> obtain block blk_2086525142250101885_39076 from any node:  
> java.io.IOException: No live nodes contain current block
> 2009-02-27 15:23:23,972 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> obtain block blk_2086525142250101885_39076 from any node:  
> java.io.IOException: No live nodes contain current block
> 2009-02-27 15:25:26,992 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> obtain block blk_2086525142250101885_39076 from any node:  
> java.io.IOException: No live nodes contain current block
> 2009-02-27 15:27:30,012 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: 
> java.io.IOException: Could not obtain block: blk_2086525142250101885_39076 
> file=/user/root/rand2/part-00864
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1664)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1619)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
> at 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
> at 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:155)
> 2009-02-27 15:27:30,018 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> java.io.IOException: Could not obtain block: blk_2086525142250101885_39076 
> file=/user/root/rand2/part-00864
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1664)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1619)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
> at 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
> at 
> org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:155)
> {noformat}
> Note how the DFS client fails multiple times to retrieve the block, with a 2 
> minute wait between each one, without giving up. During this time, the task 
> is *not* speculated. However, once this task finally failed, a new version of 
> it ran successfully. Getting the input file in question with bin/hadoop fs 
> -get also worked fine.
> There is no mention of the task attempt in question in the NameNode logs but 
> my guess is that something to do with RPC queues is causing its connection to 
> get lost, and the DFSClient does not recover.

--
This message is automatically generated by JIRA.
For more information on JIRA, see

[jira] [Moved] (MAPREDUCE-2814) Relax the strict type check by allowing subclasses pass the check

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5452 to MAPREDUCE-2814:


Issue Type: Bug  (was: Improvement)
   Key: MAPREDUCE-2814  (was: HADOOP-5452)
   Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Relax the strict type check by allowing subclasses pass the check
> -
>
> Key: MAPREDUCE-2814
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2814
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: He Yongqiang
>
> The type check like:
> {code}
> if (key.getClass() != keyClass)
> throw new IOException("wrong key class: "+key.getClass().getName()
>   +" is not "+keyClass);
> if (val.getClass() != valClass)
> throw new IOException("wrong value class: "+val.getClass().getName()
>   +" is not "+valClass);
> {code}
> is used a lot when a type check is needed. 
> I found their uses in org.apache.hadoop.io.SequenceFile, 
> org.apache.hadoop.mapred.IFile, org.apache.hadoop.mapred.MapTask. Because i 
> search with(key.getClass() != keyClass), so these codes may also appear in 
> other classes.
> I suggest we can relax the strict type check by using 
> {code}
> if (key.getClass().isAssignableFrom(keyClass))
> {code}
> The error in my situation is listed below:
> {panel:borderStyle=dashed| borderColor=#ccc| titleBGColor=#F7D6C1| 
> bgColor=#CE}
> java.io.IOException: Type mismatch in value from map: expected 
> cn.ac.ict.vega.type.Type, recieved cn.ac.ict.vega.type.Type$Float
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:553)
>   at 
> cn.ac.ict.vega.parse.mapreduce.block.FilterColumnBlockMapper.map(FilterColumnBlockMapper.java:77)
>   at 
> cn.ac.ict.vega.parse.mapreduce.block.BlockMapRunner.run(BlockMapRunner.java:33)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>   at org.apache.hadoop.mapred.Child.main(Child.java:155)
> {panel} 
> Float is a sub class of Type. I wish it can pass the check. I use Type 
> instead of Float is because i can not determint exactly whether it is Float, 
> String or  some others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2812) Combiner that aggregates all the mappers from a machine

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5340 to MAPREDUCE-2812:


Affects Version/s: (was: 0.19.1)
  Key: MAPREDUCE-2812  (was: HADOOP-5340)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Combiner that aggregates all the mappers from a machine
> ---
>
> Key: MAPREDUCE-2812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2812
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Nathan Marz
>
> From what I can tell, the Combiner just aggregates data from a single map 
> task. It would be useful, especially during map-only jobs, to have a combiner 
> that aggregates data from all the map tasks on a given machine. My use case 
> for this is to vertically partition a set of records which start out in the 
> same files. By doing this in a map-only task, way too many files are created 
> (About 50 files are created per input split). By pumping all the data through 
> a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I 
> would get 50*number of machines files rather than 50*number of input splits 
> files for this use case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2811) Adding Multiple Reducers implementations.

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5630 to MAPREDUCE-2811:


Key: MAPREDUCE-2811  (was: HADOOP-5630)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Adding Multiple Reducers implementations.
> -
>
> Key: MAPREDUCE-2811
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2811
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Sidharth Gupta
>
> Like the Patch released here https://issues.apache.org/jira/browse/HADOOP-372 
> can we have a multi format Reducer too. Someone suggested that if we need 
> different reducers and map implementations(like what i need) I was better of 
> by writing 2 jobs. I dont quite agree. I am calculating 2 big matrices that 
> must be calculated in the map step, summed in the reducers multiplied and 
> then written to a file. The First mapper sums a matrix  based on the i,j th 
> index(key) into the file and the second mapper adds the N*1  dimension vector 
> that uses a new line as key. These keys must be passed as such to the reduce 
> process.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2810) CLI interface for managing Jobtracker Queues and ACLs

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5676 to MAPREDUCE-2810:


Component/s: (was: conf)
 security
 job submission
Key: MAPREDUCE-2810  (was: HADOOP-5676)
Project: Hadoop Map/Reduce  (was: Hadoop Common)

> CLI interface for managing Jobtracker Queues and ACLs
> -
>
> Key: MAPREDUCE-2810
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2810
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: job submission, security
>Reporter: Rajiv Chittajallu
>
> As the number of users in a hadoop cluster increases, it gets difficult to 
> manage Queues and ACLs in mapred-site.xml .  Its good to have a CLI  
> interface to update mapred-site.xml and validate the configuration. The CLI 
> should provie
>  - list of current ACLs
>  - Update and refresh ACLs

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2809) Refactor JobRecoveryManager into a new class file.

2011-08-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-2809.
--

Resolution: Won't Fix

Not necessary with MR-279.

> Refactor JobRecoveryManager into a new class file.
> --
>
> Key: MAPREDUCE-2809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>
> RecoveryManager in itself subsumes a lot of code, and should be moved out of 
> JobTracker.java. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-2809) Refactor JobRecoveryManager into a new class file.

2011-08-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5427 to MAPREDUCE-2809:


Issue Type: Improvement  (was: Bug)
   Key: MAPREDUCE-2809  (was: HADOOP-5427)
   Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Refactor JobRecoveryManager into a new class file.
> --
>
> Key: MAPREDUCE-2809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2809
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>
> RecoveryManager in itself subsumes a lot of code, and should be moved out of 
> JobTracker.java. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083292#comment-13083292
 ] 

Alejandro Abdelnur commented on MAPREDUCE-2600:
---

That seems better :).

I'm not familiar with MR2 code distribution, but where do we find out the 
MapReduce APIs? That should be a separate JAR, just the MR interface, no?

Also, when using the client API, do I have to define the dependency for one 
artifact and I'm done (all the other come as transitive dependencies and are 
implementation specific not exposed to the user)?


> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083244#comment-13083244
 ] 

Luke Lu commented on MAPREDUCE-2600:


OK 12 modules as of now:
# yarn-api
# yarn-common
# yarn-server-common
# yarn-server-nodemanager
# yarn-server-resourcemanager
# yarn-server-tests (an integration test module)
# hadoop-mapreduce-client-core
# hadoop-mapreduce-client-common
# hadoop-mapreduce-client-shuffle (shuffle plugin for node manager)
# hadoop-mapreduce-client-app (MR app master)
# hadoop-mapreduce-client-hs  (MR job history server)
# hadoop-mapreduce-client-jobclient


> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083235#comment-13083235
 ] 

Luke Lu commented on MAPREDUCE-2600:


bq. 59 JARs for a project seems a bit too much, it seems that JARs are being 
used instead of Java packages to separate class.

No we don't have 59 jars or 59 source roots. We only have 11 
source-root/modules, the rest are dependencies. In fact, we do separate modules 
mostly at package boundaries.

> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2690) Construct the web page for default scheduler

2011-08-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned MAPREDUCE-2690:
-

Assignee: Eric Payne

> Construct the web page for default scheduler
> 
>
> Key: MAPREDUCE-2690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2690
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Ramya Sunil
>Assignee: Eric Payne
> Fix For: 0.23.0
>
>
> Currently, the web page for default scheduler reads as "Under construction". 
> This is a long known issue, but could not find a tracking ticket. Hence 
> opening one.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083225#comment-13083225
 ] 

Alejandro Abdelnur commented on MAPREDUCE-2600:
---

59 JARs for a project seems a bit too much, it seems that JARs are being used 
instead of Java packages to separate class.

IMO, a more logical set of JARs is along the lines of Owen described when 
opening the JIRA: api, client, server, utils

Even if IDEs can handle several source roots, 59 becomes cumbersome. Plus, from 
Maven side, that means the reactor will do much more work to resolve module 
dependency, thus slowing down the build.

Finally, I advice against merging JARs into one, this complicates significantly 
troubleshooting.


> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083220#comment-13083220
 ] 

Luke Lu commented on MAPREDUCE-2600:


bq. I don't see how it makes development easier or faster to have lots of 
little directories.

Smaller module means smaller code base to start for a typical feature and much 
faster to recompile if one doesn't use an IDE. i.e, just mvn clean install in 
the module directory. yarn only have 5 modules (including an integration test 
module), the mapreduce runtime has 6 modules. Is this "lots of little 
directories" that's out of control?

> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2716) MR279: MRReliabilityTest job fails because of missing job-file.

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2716:


Status: Open  (was: Patch Available)

We found some more bugs in the existing code and new code that are causing the 
jobFile to use the incorrect user.  I will update the patch to address these 
issues.

> MR279: MRReliabilityTest job fails because of missing job-file.
> ---
>
> Key: MAPREDUCE-2716
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2716
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2716-v2.patch, MAPREDUCE-2716-v3.patch, 
> MAPREDUCE-2716-v4.patch, MAPREDUCE-2716-v5.patch, MAPREDUCE-2716.patch
>
>
> The ApplicationReport should have the jobFile (e.g. 
> hdfs://localhost:9000/tmp/hadoop-/mapred/staging//.staging/job_201107121640_0001/job.xml)
> Without it, jobs such as MRReliabilityTest fail with the following error 
> (caused by the fact that jobFile is hardcoded to "" in TypeConverter.java):
> e.g. java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:88)
> at org.apache.hadoop.fs.Path.(Path.java:96)
> at org.apache.hadoop.mapred.JobConf.(JobConf.java:445)
> at org.apache.hadoop.mapreduce.Cluster.getJobs(Cluster.java:104)
> at org.apache.hadoop.mapreduce.Cluster.getAllJobs(Cluster.java:218)
> at org.apache.hadoop.mapred.JobClient.getAllJobs(JobClient.java:757)
> at 
> org.apache.hadoop.mapred.JobClient.jobsToComplete(JobClient.java:741)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.runTest(ReliabilityTest.java:219)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.runSleepJobTest(ReliabilityTest.java:133)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.run(ReliabilityTest.java:116)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.main(ReliabilityTest.java:504)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:111)
> at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2808) pull MAPREDUCE-2797 into mr279 branch

2011-08-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2808:
-

Attachment: MAPREDUCE-2808.patch

> pull MAPREDUCE-2797 into mr279 branch
> -
>
> Key: MAPREDUCE-2808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2808
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2808.patch
>
>
> The ant tar command fails in the mapreduce directory on the mr279 branch.  
> The issue was a change in hdfs and was fixed on trunk with jira 
> MAPREDUCE-2797.  Pull that change into mr279.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2808) pull MAPREDUCE-2797 into mr279 branch

2011-08-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2808:
-

Status: Patch Available  (was: Open)

> pull MAPREDUCE-2797 into mr279 branch
> -
>
> Key: MAPREDUCE-2808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2808
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2808.patch
>
>
> The ant tar command fails in the mapreduce directory on the mr279 branch.  
> The issue was a change in hdfs and was fixed on trunk with jira 
> MAPREDUCE-2797.  Pull that change into mr279.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2808) pull MAPREDUCE-2797 into mr279 branch

2011-08-11 Thread Thomas Graves (JIRA)
pull MAPREDUCE-2797 into mr279 branch
-

 Key: MAPREDUCE-2808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2808
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Minor
 Fix For: 0.23.0


The ant tar command fails in the mapreduce directory on the mr279 branch.  The 
issue was a change in hdfs and was fixed on trunk with jira MAPREDUCE-2797.  
Pull that change into mr279.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083202#comment-13083202
 ] 

Owen O'Malley commented on MAPREDUCE-2600:
--

It is a big issue for downstream users. Projects that use Hadoop already pick 
up a lot of jars and increasing the set when all of the versions are the same 
is a problem. We'll also have users using different versions of the jars, which 
won't be useful.

Having a source structure that requires an IDE to use isn't making the code 
easy for people to browse, use and modify. It will also become a maintenance 
problem as the dependency graph between the components change.

Yes, you can munge the results together into a single jar as part of the build, 
but I don't see how it makes development easier or faster to have lots of 
little directories. 

That said, I don't have cycles to do the work right now. If no one else does 
either, we can postpone the debate.

> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2716) MR279: MRReliabilityTest job fails because of missing job-file.

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083185#comment-13083185
 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2716:
-

By the way, all tests pass with 'mvn test', except for TestLeafQueue - which is 
unrelated to my patch and fails without my change.

> MR279: MRReliabilityTest job fails because of missing job-file.
> ---
>
> Key: MAPREDUCE-2716
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2716
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2716-v2.patch, MAPREDUCE-2716-v3.patch, 
> MAPREDUCE-2716-v4.patch, MAPREDUCE-2716-v5.patch, MAPREDUCE-2716.patch
>
>
> The ApplicationReport should have the jobFile (e.g. 
> hdfs://localhost:9000/tmp/hadoop-/mapred/staging//.staging/job_201107121640_0001/job.xml)
> Without it, jobs such as MRReliabilityTest fail with the following error 
> (caused by the fact that jobFile is hardcoded to "" in TypeConverter.java):
> e.g. java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:88)
> at org.apache.hadoop.fs.Path.(Path.java:96)
> at org.apache.hadoop.mapred.JobConf.(JobConf.java:445)
> at org.apache.hadoop.mapreduce.Cluster.getJobs(Cluster.java:104)
> at org.apache.hadoop.mapreduce.Cluster.getAllJobs(Cluster.java:218)
> at org.apache.hadoop.mapred.JobClient.getAllJobs(JobClient.java:757)
> at 
> org.apache.hadoop.mapred.JobClient.jobsToComplete(JobClient.java:741)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.runTest(ReliabilityTest.java:219)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.runSleepJobTest(ReliabilityTest.java:133)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.run(ReliabilityTest.java:116)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.main(ReliabilityTest.java:504)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:111)
> at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2727) MR-279: SleepJob throws divide by zero exception when count = 0

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2727:


Status: Patch Available  (was: Open)

> MR-279: SleepJob throws divide by zero exception when count = 0
> ---
>
> Key: MAPREDUCE-2727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2727
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2727-trunk.patch, MAPREDUCE-2727.patch
>
>
> When the count is 0 for mappers or reducers, a divide-by-zero exception is 
> thrown.  There are existing checks to error out when count < 0, which 
> obviously doesn't handle the 0 case.  This is causing the MRReliabilityTest 
> to fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2489:


Status: Patch Available  (was: Open)

> Jobsplits with random hostnames can make the queue unusable
> ---
>
> Key: MAPREDUCE-2489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.205.0, 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.20.205.0, 0.23.0
>
> Attachments: MAPREDUCE-2489-0.20s-v2.patch, 
> MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, 
> MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, 
> MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, 
> MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, 
> MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, 
> MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for 
> the splits that were then causing the JobTracker to attempt to excessively 
> resolve host names.  This caused a major slowdown for the JobTracker.  We 
> should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure 
> that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
> could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2489:


Attachment: MAPREDUCE-2489-mapred-v7.patch
MAPREDUCE-2489-0.20s-v6.patch

Updated patches removing the incompatible change and corresponding code 
references here.  

All tests pass on branch-0.20-security.

branch-0.20-security test-patch results:
test-patch results:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 

==
All tests pass in trunk except for those that are currently failing with a 
clean checkout of trunk as well (TestMRCLI, TestFileSystem, 
TestMapredSystemDir, TestLocalRunner, TestDBJob, TestDataDrivenDBInputFormat)

test-patch results for trunk:
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 system test framework.  The patch failed system test 
framework compile.
 [exec] 


The tests were added to the hadoop-common portion of this patch.
The findbug warning is unrelated to this patch.
The system test failure is during the -compile-fault-inject: phase because the 
hadoop-common changes have not been committed yet.




> Jobsplits with random hostnames can make the queue unusable
> ---
>
> Key: MAPREDUCE-2489
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.205.0, 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.20.205.0, 0.23.0
>
> Attachments: MAPREDUCE-2489-0.20s-v2.patch, 
> MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, 
> MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, 
> MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, 
> MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, 
> MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, 
> MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for 
> the splits that were then causing the JobTracker to attempt to excessively 
> resolve host names.  This caused a major slowdown for the JobTracker.  We 
> should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure 
> that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
> could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2791) [MR-279] Missing/incorrect info on job -status CLI

2011-08-11 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned MAPREDUCE-2791:


Assignee: Devaraj K

> [MR-279] Missing/incorrect info on job -status CLI 
> ---
>
> Key: MAPREDUCE-2791
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2791
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Ramya Sunil
>Assignee: Devaraj K
> Fix For: 0.23.0
>
>
> There are a couple of details missing/incorrect on the job -status command 
> line output for completed jobs:
> 1. Incorrect job file
> 2. map() completion is always 0
> 3. reduce() completion is always set to 0
> 4. history URL is empty
> 5. Missing launched map tasks
> 6. Missing launched reduce tasks 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2796) [MR-279] Start time for all the apps is set to 0

2011-08-11 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned MAPREDUCE-2796:


Assignee: Devaraj K

> [MR-279] Start time for all the apps is set to 0
> 
>
> Key: MAPREDUCE-2796
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2796
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Ramya Sunil
>Assignee: Devaraj K
> Fix For: 0.23.0
>
>
> The start time for all the apps in the output of "job -list" is set to 0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2727) MR-279: SleepJob throws divide by zero exception when count = 0

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2727:


Attachment: MAPREDUCE-2727-trunk.patch

As requested, here is a patch for trunk (which only applies to the one 
SleepJob.java file)

> MR-279: SleepJob throws divide by zero exception when count = 0
> ---
>
> Key: MAPREDUCE-2727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2727
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2727-trunk.patch, MAPREDUCE-2727.patch
>
>
> When the count is 0 for mappers or reducers, a divide-by-zero exception is 
> thrown.  There are existing checks to error out when count < 0, which 
> obviously doesn't handle the 0 case.  This is causing the MRReliabilityTest 
> to fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2716) MR279: MRReliabilityTest job fails because of missing job-file.

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2716:


Attachment: MAPREDUCE-2716-v5.patch

Aargh... I forgot to add the jobId, and I was assigning the same jobFile to 
multiple apps/jobs.  This patch addresses these issues.

I'm currently using the JobID since that's what fromYarn(applicationId) 
returns, but I am curious if I should be trying to use JobId instead.

> MR279: MRReliabilityTest job fails because of missing job-file.
> ---
>
> Key: MAPREDUCE-2716
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2716
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2716-v2.patch, MAPREDUCE-2716-v3.patch, 
> MAPREDUCE-2716-v4.patch, MAPREDUCE-2716-v5.patch, MAPREDUCE-2716.patch
>
>
> The ApplicationReport should have the jobFile (e.g. 
> hdfs://localhost:9000/tmp/hadoop-/mapred/staging//.staging/job_201107121640_0001/job.xml)
> Without it, jobs such as MRReliabilityTest fail with the following error 
> (caused by the fact that jobFile is hardcoded to "" in TypeConverter.java):
> e.g. java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:88)
> at org.apache.hadoop.fs.Path.(Path.java:96)
> at org.apache.hadoop.mapred.JobConf.(JobConf.java:445)
> at org.apache.hadoop.mapreduce.Cluster.getJobs(Cluster.java:104)
> at org.apache.hadoop.mapreduce.Cluster.getAllJobs(Cluster.java:218)
> at org.apache.hadoop.mapred.JobClient.getAllJobs(JobClient.java:757)
> at 
> org.apache.hadoop.mapred.JobClient.jobsToComplete(JobClient.java:741)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.runTest(ReliabilityTest.java:219)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.runSleepJobTest(ReliabilityTest.java:133)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.run(ReliabilityTest.java:116)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at 
> org.apache.hadoop.mapred.ReliabilityTest.main(ReliabilityTest.java:504)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:111)
> at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2807) MR-279: AM restart does not work after RM refactor

2011-08-11 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal reassigned MAPREDUCE-2807:
-

Assignee: Sharad Agarwal

> MR-279: AM restart does not work after RM refactor
> --
>
> Key: MAPREDUCE-2807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2807
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sharad Agarwal
>Assignee: Sharad Agarwal
>
> When the AM crashes, RM is not able to launch a new App attempt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2727) MR-279: SleepJob throws divide by zero exception when count = 0

2011-08-11 Thread Jeffrey Naisbitt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083086#comment-13083086
 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2727:
-

Bobby Evans is actually currently working on removing the extra SleepJob - I 
believe there are some issues that need to be resolved there (in a separate 
Jira).

As far as trunk, it does look like it has the same issue, so I'll post a patch 
for that as well.

> MR-279: SleepJob throws divide by zero exception when count = 0
> ---
>
> Key: MAPREDUCE-2727
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2727
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Jeffrey Naisbitt
>Assignee: Jeffrey Naisbitt
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-2727.patch
>
>
> When the count is 0 for mappers or reducers, a divide-by-zero exception is 
> thrown.  There are existing checks to error out when count < 0, which 
> obviously doesn't handle the 0 case.  This is causing the MRReliabilityTest 
> to fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2677) MR-279: 404 error while accessing pages from history server

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083053#comment-13083053
 ] 

Luke Lu commented on MAPREDUCE-2677:


Looks reasonable overall. Some nits: 
# Tests a la TestAMWebApp would be good, as I also mentioned in MAPREDUCE-2676.
# Inconsistent @Override usage in HsAppController.
# HsCounterPage seems unnecessary.















> MR-279: 404 error while accessing pages from history server
> ---
>
> Key: MAPREDUCE-2677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2677
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.0
>Reporter: Ramya Sunil
>Assignee: Robert Joseph Evans
> Fix For: 0.23.0
>
> Attachments: MR-2677-v1.txt
>
>
> Accessing the following pages from the history server, causes 404 HTTP error
> 1. Cluster-> About 
> 2. Cluster -> Applications
> 3. Cluster -> Scheduler
> 4. Application -> About

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2787) MR-279: Performance improvement in running Uber MapTasks

2011-08-11 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan resolved MAPREDUCE-2787.
-

Resolution: Won't Fix

Thanks Arun and Vinod for the clarification. I am closing the ticket.

> MR-279: Performance improvement in running Uber MapTasks
> 
>
> Key: MAPREDUCE-2787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2787
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Attachments: MAPREDUCE-2787.patch
>
>
> The runUberMapTasks() in org.apache.hadoop.mapred.UberTask obtains the local 
> fileSystem and local job configuration for every task attempt.  This will 
> have a negative performance impact.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2600) MR-279: simplify the jars

2011-08-11 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083026#comment-13083026
 ] 

Luke Lu commented on MAPREDUCE-2600:


bq. If no of jars is the problem, can we just merge the jars at build time the 
way we want. Using maven shade plugin or some such ?

I agree with Sharad, the current modules layout is fine. It makes working on 
individual features faster and easier. People who complain about number of 
source roots should improve their IDE fu and/or use a better IDE, IMO :)

According to recent conversations with people involved, I got the impression 
that it's just a packaging issue, i.e., having 3 combined jars plus 
dependencies in the distribution tar ball. yarn-client, yarn-servers and 
hadoop-mapreduce. So just some maven-shade-plugin fu would suffice.

> MR-279: simplify the jars 
> --
>
> Key: MAPREDUCE-2600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2600
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Luke Lu
>
> Currently the MR-279 mapreduce project generates 59 jars from 59 source 
> roots, which can be dramatically simplified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2765) DistCp Rewrite

2011-08-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083021#comment-13083021
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2765:


bq. Would the reviewers/watchers kindly comment on whether it's alright to 
deprecate the "-filelimit" and "-sizelimit" options, in DistCpV2?

Haven't looked at the code yet, but can clearly understand how useless these 
knobs are. +1 for not supporting them at all, we should probably continue to 
parse these options but just print a warning saying they are not supportable.

> DistCp Rewrite
> --
>
> Key: MAPREDUCE-2765
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2765
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Affects Versions: 0.20.203.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: distcpv2.20.203.patch
>
>
> This is a slightly modified version of the DistCp rewrite that Yahoo uses in 
> production today. The rewrite was ground-up, with specific focus on:
> 1. improved startup time (postponing as much work as possible to the MR job)
> 2. support for multiple copy-strategies
> 3. new features (e.g. -atomic, -async, -bandwidth.)
> 4. improved programmatic use
> Some effort has gone into refactoring what used to be achieved by a single 
> large (1.7 KLOC) source file, into a design that (hopefully) reads better too.
> The proposed DistCpV2 preserves command-line-compatibility with the old 
> version, and should be a drop-in replacement.
> New to v2:
> 1. Copy-strategies and the DynamicInputFormat:
>   A copy-strategy determines the policy by which source-file-paths are 
> distributed between map-tasks. (These boil down to the choice of the 
> input-format.) 
>   If no strategy is explicitly specified on the command-line, the policy 
> chosen is "uniform size", where v2 behaves identically to old-DistCp. (The 
> number of bytes transferred by each map-task is roughly equal, at a per-file 
> granularity.) 
>   Alternatively, v2 ships with a "dynamic" copy-strategy (in the 
> DynamicInputFormat). This policy acknowledges that 
>   (a)  dividing files based only on file-size might not be an 
> even distribution (E.g. if some datanodes are slower than others, or if some 
> files are skipped.)
>   (b) a "static" association of a source-path to a map increases 
> the likelihood of long-tails during copy.
>   The "dynamic" strategy divides the list-of-source-paths into a number 
> (> nMaps) of smaller parts. When each map completes its current list of 
> paths, it picks up a new list to process, if available. So if a map-task is 
> stuck on a slow (and not necessarily large) file, other maps can pick up the 
> slack. The thinner the file-list is sliced, the greater the parallelism (and 
> the lower the chances of long-tails). Within reason, of course: the number of 
> these short-lived list-files is capped at an overridable maximum.
>   Internal benchmarks against source/target clusters with some slow(ish) 
> datanodes have indicated significant performance gains when using the 
> dynamic-strategy. Gains are most pronounced when nFiles greatly exceeds nMaps.
>   Please note that the DynamicInputFormat might prove useful outside of 
> DistCp. It is hence available as a mapred/lib, unfettered to DistCpV2. Also 
> note that the copy-strategies have no bearing on the CopyMapper.map() 
> implementation.
>   
> 2. Improved startup-time and programmatic use:
>   When the old-DistCp runs with -update, and creates the 
> list-of-source-paths, it attempts to filter out files that might be skipped 
> (by comparing file-sizes, checksums, etc.) This significantly increases the 
> startup time (or the time spent in serial processing till the MR job is 
> launched), blocking the calling-thread. This becomes pronounced as nFiles 
> increases. (Internal benchmarks have seen situations where more time is spent 
> setting up the job than on the actual transfer.)
>   DistCpV2 postpones as much work as possible to the MR job. The 
> file-listing isn't filtered until the map-task runs (at which time, identical 
> files are skipped). DistCpV2 can now be run "asynchronously". The program 
> quits at job-launch, logging the job-id for tracking. Programmatically, the 
> DistCp.execute() returns a Job instance for progress-tracking.
>   
> 3. New features:
>   (a)   -async: As described in #2.
>   (b)   -atomic: Data is copied to a (user-specifiable) tmp-location, and 
> then moved atomically to destination.
>   (c)   -bandwidth: Enforces a limit on the bandwidth consumed per map.
>   (d)   -strategy: As above.
>   
> A more comprehensive description the newer features, how the dynamic-stra

[jira] [Created] (MAPREDUCE-2807) MR-279: AM restart does not work after RM refactor

2011-08-11 Thread Sharad Agarwal (JIRA)
MR-279: AM restart does not work after RM refactor
--

 Key: MAPREDUCE-2807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Sharad Agarwal


When the AM crashes, RM is not able to launch a new App attempt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2807) MR-279: AM restart does not work after RM refactor

2011-08-11 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-2807:
--


Seeing in RM logs:
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ATTEMPT_FAILED at RUNNING
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:379)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:294)


> MR-279: AM restart does not work after RM refactor
> --
>
> Key: MAPREDUCE-2807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2807
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sharad Agarwal
>
> When the AM crashes, RM is not able to launch a new App attempt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2787) MR-279: Performance improvement in running Uber MapTasks

2011-08-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083010#comment-13083010
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2787:


Just to give more clarity to what Arun said: even though there are patches on 
MR-279 branch which implement the uber-task feature for the classic runtime 
(JT/TT), those patches are not going to be ported to trunk when we merge MR-279 
to trunk. OTOH, the uber-task feature for yarn+MR runtime is implemented via 
LocalContainerAllocator and LocalContainerLauncher which is what you should 
look at.

bq. Should we close this as won't fix? 
+1. Ahmed, please close this once you are convinced. Thanks!

> MR-279: Performance improvement in running Uber MapTasks
> 
>
> Key: MAPREDUCE-2787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2787
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
> Attachments: MAPREDUCE-2787.patch
>
>
> The runUberMapTasks() in org.apache.hadoop.mapred.UberTask obtains the local 
> fileSystem and local job configuration for every task attempt.  This will 
> have a negative performance impact.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-109) Setting up ctr-A as custom delimiter for "mapred.textoutputformat.separator"

2011-08-11 Thread Michael Katzenellenbogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Katzenellenbogen updated MAPREDUCE-109:
---

Attachment: MAPREDUCE-109-v3.patch

Fixing per ATMs comments.

> Setting up ctr-A as custom delimiter for "mapred.textoutputformat.separator"
> 
>
> Key: MAPREDUCE-109
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-109
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 0.23.0
>Reporter: Suhas Gogate
>Assignee: Michael Katzenellenbogen
> Attachments: MAPREDUCE-109-v2.patch, MAPREDUCE-109-v3.patch, 
> MAPREDUCE-109.patch
>
>
> Feature added by this Jira has a problem while setting up some of the invalid 
> xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = "\u0001"
> e,g,
> String delim = "\u0001";
> Conf.set("mapred.textoutputformat.separator", delim);
> Job client serializes the jobconf with mapred.textoutputformat.separator set 
> to "\u0001" (ctrl-A) and problem happens when it is de-serialized (read back) 
> by job tracker, where it encounters invalid xml character.
> The test for this feature public : testFormatWithCustomSeparator() does not 
> serialize the jobconf after adding the separator as ctrl-A and hence does not 
> detect the specific problem.
> Here is an exception:
> 08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 
> 1
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> java.lang.RuntimeException: org.xml.sax.SAXParseException: Character 
> reference "" is an invalid XML
> character.
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:291)
> at
> org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163)
> at
> org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:179)
> at
> org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> at org.apache.hadoop.ipc.Client.call(Client.java:715)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2806) [Gridmix] Load job fails with timeout errors when resource emulation is turned on

2011-08-11 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082993#comment-13082993
 ] 

Amar Kamat commented on MAPREDUCE-2806:
---

This is a corner case and can be seen when the task finishes too soon. Thanks 
Vinay for reporting this.

> [Gridmix] Load job fails with timeout errors when resource emulation is 
> turned on
> -
>
> Key: MAPREDUCE-2806
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2806
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>  Labels: gridmix, loadjob, timeout
> Fix For: 0.23.0
>
>
> When the Load job's tasks are emulating cpu/memory, the task-tracker kills 
> the emulating task due to lack of status updates. Load job has its own status 
> reporter which dies too soon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-109) Setting up ctr-A as custom delimiter for "mapred.textoutputformat.separator"

2011-08-11 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082990#comment-13082990
 ] 

Aaron T. Myers commented on MAPREDUCE-109:
--

Hey Michael, patch looks pretty good to me. Two tiny stylistic comments:

# Please put spaces around "=" in "{{out=new BufferedWriter(new 
FileWriter(CONFIG));}}"
# The indentation is wrong in the change to {{Configuration.java}}. Hadoop uses 
2 space indentation, not 4.

> Setting up ctr-A as custom delimiter for "mapred.textoutputformat.separator"
> 
>
> Key: MAPREDUCE-109
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-109
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 0.23.0
>Reporter: Suhas Gogate
>Assignee: Michael Katzenellenbogen
> Attachments: MAPREDUCE-109-v2.patch, MAPREDUCE-109.patch
>
>
> Feature added by this Jira has a problem while setting up some of the invalid 
> xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = "\u0001"
> e,g,
> String delim = "\u0001";
> Conf.set("mapred.textoutputformat.separator", delim);
> Job client serializes the jobconf with mapred.textoutputformat.separator set 
> to "\u0001" (ctrl-A) and problem happens when it is de-serialized (read back) 
> by job tracker, where it encounters invalid xml character.
> The test for this feature public : testFormatWithCustomSeparator() does not 
> serialize the jobconf after adding the separator as ctrl-A and hence does not 
> detect the specific problem.
> Here is an exception:
> 08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 
> 1
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> java.lang.RuntimeException: org.xml.sax.SAXParseException: Character 
> reference "" is an invalid XML
> character.
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961)
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864)
> at
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:291)
> at
> org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163)
> at
> org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:179)
> at
> org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> at org.apache.hadoop.ipc.Client.call(Client.java:715)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >