[jira] [Commented] (MAPREDUCE-4798) TestJobHistoryServer fails some times with 'java.lang.AssertionError: Address already in use'

2012-12-03 Thread sam liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509435#comment-13509435
 ] 

sam liu commented on MAPREDUCE-4798:


Thanks Eric! Pls let me know if there is any further action

> TestJobHistoryServer fails some times with 'java.lang.AssertionError: Address 
> already in use'
> -
>
> Key: MAPREDUCE-4798
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4798
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Affects Versions: 1.0.3
> Environment: Red Hat Ent Server 6.2
>Reporter: sam liu
>Priority: Minor
>  Labels: patch
> Attachments: MAPREDUCE-4798_branch-1.patch, MAPREDUCE-4798.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> UT Failure in IHC 1.0.3: org.apache.hadoop.mapred.TestJobHistoryServer. This 
> UT fails sometimes.
> The error message is:
> 'Testcase: testHistoryServerStandalone took 5.376 sec
>   Caused an ERROR
> Address already in use
> java.lang.AssertionError: Address already in use
>   at 
> org.apache.hadoop.mapred.TestJobHistoryServer.testHistoryServerStandalone(TestJobHistoryServer.java:113)'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4835) AM job metrics can double-count a job if it errors after entering a completion state

2012-12-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509385#comment-13509385
 ] 

Xuan Gong commented on MAPREDUCE-4835:
--

The method "Have JobImpl.finished ignore incrementing any metrics if the job is 
already in a terminal state (SUCCEEDED/FAILED/KILLED) to avoid double-counting 
a job." may not work. But before we call the finished, the current states is 
already changed. So, it is very difficult to check previous status is terminal 
states or not.
For example, somehow we did InternalErrorTransition, it will change to state 
from succeeded to error. From the code at InternalErrorTransition, 
public void transition(JobImpl job, JobEvent event) {
  //TODO Is this JH event required.
  job.setFinishTime();
  JobUnsuccessfulCompletionEvent failedEvent =
  new JobUnsuccessfulCompletionEvent(job.oldJobId,
  job.finishTime, 0, 0,
  JobStateInternal.ERROR.toString());
  job.eventHandler.handle(new JobHistoryEvent(job.jobId, failedEvent)); <-- 
this line is actually change the states
  job.finished(JobStateInternal.ERROR); <-- this line will increase the 
failure count that is duplicate
}
So, what we can do is add JobStateInternal previousState = getInternalState() 
before job.eventHandler.handle(new JobHistoryEvent(job.jobId, failedEvent)), 
and check the previousState to decide whether we need to increase the count or 
not.
For example, if we do not want to increase the count when we change the 
terminal states to error state. We can do:
In InternalErrorTransition, 
public void transition(JobImpl job, JobEvent event) {
  //TODO Is this JH event required.
  job.setFinishTime();
  JobUnsuccessfulCompletionEvent failedEvent =
  new JobUnsuccessfulCompletionEvent(job.oldJobId,
  job.finishTime, 0, 0,
  JobStateInternal.ERROR.toString());
  JobStateInternal previousState = job.getInternalState();
  job.eventHandler.handle(new JobHistoryEvent(job.jobId, failedEvent));
  //check the previous state is not terminal states, is not error states, 
when we meet error states, we should have already increase the count, we do not 
want to do it again
  if(previousState != JobStateInternal.SUCCEEDED || previousState != 
JobStateInternal.KILLED || previousState != JobStateInternal.FAILED || 
previousState != JobStateInternal.ERROR)
  {
  job.finished(JobStateInternal.ERROR);
  }
}

> AM job metrics can double-count a job if it errors after entering a 
> completion state
> 
>
> Key: MAPREDUCE-4835
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4835
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Priority: Minor
>
> If JobImpl enters the SUCCEEDED, FAILED, or KILLED state but then encounters 
> an invalid state transition, it could double-count the job since jobs that 
> encounter an error are considered failed jobs.  Therefore the job could be 
> counted initially as a successful, failed, or killed job, respectively, then 
> counted again as a failed job due to the internal error afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509066#comment-13509066
 ] 

Jason Lowe commented on MAPREDUCE-4842:
---

Here's the sequence of events that I believe led to the hang during shuffle.  
See {{MergeManager}} for context of variable references.

# Fetchers started fetching data
# Enough data finishes transferring to reach the {{commitMemory}} threshold and 
an in-memory merge starts
# While the merge takes place some of the output data is freed before the merge 
completes, lowering {{commitMemory}} and {{usedMemory}} which allows more data 
to be fetched
# Eventually we try to fetch too much data because {{usedMemory}} exceeds 
{{memoryLimit}} and further fetchers are told to WAIT
# All of the outstanding fetches complete and call {{closeInMemoryFile}}, but 
we don't start a merge because the previous merge is still marked in progress
# Merge completes, allowing a new merge to be started on the next 
{{closeInMemoryFile}} call
# With no outstanding fetches and no new fetches allowed, we never call 
{{closeInMemoryFile}} again and never start the next merge
# With no merge in progress and therefore nothing to wait upon, fetcher threads 
proceed to pummel the {{MergeManager}} asking for merge data reservations that 
are never given, and the reducer log grows rather rapidly

> Shuffle race can hang reducer
> -
>
> Key: MAPREDUCE-4842
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>
> Saw an instance where the shuffle caused multiple reducers in a job to hang.  
> It looked similar to the problem described in MAPREDUCE-3721, where the 
> fetchers were all being told to WAIT by the MergeManager but no merge was 
> taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4842) Shuffle race can hang reducer

2012-12-03 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-4842:
-

 Summary: Shuffle race can hang reducer
 Key: MAPREDUCE-4842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.5, 2.0.3-alpha
Reporter: Jason Lowe


Saw an instance where the shuffle caused multiple reducers in a job to hang.  
It looked similar to the problem described in MAPREDUCE-3721, where the 
fetchers were all being told to WAIT by the MergeManager but no merge was 
taking place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4824) Provide a mechanism for jobs to indicate they should not be recovered on restart

2012-12-03 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4824:
-

Attachment: MAPREDUCE-4824.patch

> I'm concerned that this might blow up different schedulers in different ways.

I don't think that's a problem since the code change only affects job 
submission, which kicks in before scheduling code is run.

> Maybe we need to do an 'if' check during recovery and not throw an 
> IOException?

I had another look at this and came up with a new patch. Does it look better?

The Hadoop 2 change sounds like the right approach. At first I thought we 
didn't need the property in Hadoop 2, due to MAPREDUCE-2702, but actually it 
would allow users to mark a job as non-recoverable on a per-instance basis. It 
would build on YARN-128.


> Provide a mechanism for jobs to indicate they should not be recovered on 
> restart
> 
>
> Key: MAPREDUCE-4824
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4824
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mrv1
>Affects Versions: 1.1.0
>Reporter: Tom White
>Assignee: Tom White
> Attachments: MAPREDUCE-4824.patch, MAPREDUCE-4824.patch, 
> MAPREDUCE-4824.patch, MAPREDUCE-4824.patch
>
>
> Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be 
> recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, 
> however the approach there is not applicable for MR1, since even if we only 
> use the job-level part of the patch and add a isRecoverySupported method to 
> OutputCommitter, there is no way to use that information from the JT (which 
> initiates recovery), since the JT does not instantiate OutputCommitters - and 
> it shouldn't since they are user-level code. (In MR2 it's OK since the MR AM 
> calls the method.)
> Instead, we can add a MR configuration property to say that a job is not 
> recoverable, and the JT could safely read this from the job conf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2584) Check for serializers early, and give out more information regarding missing serializers

2012-12-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated MAPREDUCE-2584:
---

Status: Open  (was: Patch Available)

> Check for serializers early, and give out more information regarding missing 
> serializers
> 
>
> Key: MAPREDUCE-2584
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2584
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.2
>Reporter: Harsh J
>Assignee: Harsh J
>  Labels: serializers, tasks
> Attachments: MAPREDUCE-2584.r2.diff, MAPREDUCE-2584.r3.diff, 
> MAPREDUCE-2584.r4.diff, MAPREDUCE-2584.r5.diff, MAPREDUCE-2584.r6.diff, 
> MAPREDUCE-2584.r7.diff, MAPREDUCE-2584.r7.diff, MAPREDUCE-2584.r8.diff, 
> MAPREDUCE-2584.r9.diff
>
>
> As discussed on HADOOP-7328, MapReduce can handle serializers in a much 
> better way in case of bad configuration, improper imports (Some odd Text 
> class instead of the Writable Text set as key), etc..
> This issue covers the MapReduce parts of the improvements (made to IFile, 
> MapOutputBuffer, etc. and possible early-check of serializer availability 
> pre-submit) that provide more information than just an NPE as is the current 
> case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2584) Check for serializers early, and give out more information regarding missing serializers

2012-12-03 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508585#comment-13508585
 ] 

Harsh J commented on MAPREDUCE-2584:


Problem is coming from the fact that when we lookup the IO_SERIALIZATIONS_KEY, 
we don't pass a default classes set.

We could add that in, but it may get harder to maintain it if we extend the 
list. In any case, I'll upload another patch.

{code}
java.io.IOException: Couldn't find a serializer for the Map-Output Key class: 
'class org.apache.hadoop.io.LongWritable'. If custom serialization is being 
used, ensure that the 'io.serializations' property is appropriately configured 
for the Job.
at 
org.apache.hadoop.mapreduce.JobSubmitter.checkSerializerSpecs(JobSubmitter.java:462)
at 
org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:424)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1437)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1437)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:890)
at 
org.apache.hadoop.conf.TestNoDefaultsJobConf.testNoDefaults(TestNoDefaultsJobConf.java:82)
{code}

> Check for serializers early, and give out more information regarding missing 
> serializers
> 
>
> Key: MAPREDUCE-2584
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2584
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.2
>Reporter: Harsh J
>Assignee: Harsh J
>  Labels: serializers, tasks
> Attachments: MAPREDUCE-2584.r2.diff, MAPREDUCE-2584.r3.diff, 
> MAPREDUCE-2584.r4.diff, MAPREDUCE-2584.r5.diff, MAPREDUCE-2584.r6.diff, 
> MAPREDUCE-2584.r7.diff, MAPREDUCE-2584.r7.diff, MAPREDUCE-2584.r8.diff, 
> MAPREDUCE-2584.r9.diff
>
>
> As discussed on HADOOP-7328, MapReduce can handle serializers in a much 
> better way in case of bad configuration, improper imports (Some odd Text 
> class instead of the Writable Text set as key), etc..
> This issue covers the MapReduce parts of the improvements (made to IFile, 
> MapOutputBuffer, etc. and possible early-check of serializer availability 
> pre-submit) that provide more information than just an NPE as is the current 
> case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira