[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

Miklos Szegedi (JIRA) Thu, 28 Dec 2017 11:16:21 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305673#comment-16305673
 ]


Miklos Szegedi commented on MAPREDUCE-7028:
-------------------------------------------

Thank you, [~grepas] for the patch. I have a few comments:
{code}
591         while (!done) {
592           TaskAttemptStatus lastStatus = lastStatusRef.get();
593           List<TaskAttemptId> fetchFailedMaps = 
taskAttemptStatus.fetchFailedMaps;
{code}
Since the code runs a lot, there is no need to do the save within the loop 
every time. You can do it before the while. Also I would name it as something 
more specific like savedFailedMaps.
{code}
599               taskAttemptStatus.fetchFailedMaps =
600                   new ArrayList<>(taskAttemptStatus.fetchFailedMaps);
601               taskAttemptStatus.fetchFailedMaps.addAll(
602                   lastStatus.fetchFailedMaps);
{code}
The arraylist should be created with an initial capacity of the sum of the 
length of the two base lists. Otherwise the addAll will do unnecessary copies. 
I was thinking about something like:
{code}
          taskAttemptStatus.fetchFailedMaps =
              new ArrayList<>(taskAttemptStatus.fetchFailedMaps.size() +
              lastStatus.fetchFailedMaps.size());
          taskAttemptStatus.fetchFailedMaps.addAll(fetchFailedMaps);
          taskAttemptStatus.fetchFailedMaps.addAll(
              lastStatus.fetchFailedMaps);
{code}
Also we discussed this offline with [~rkanter]. This pattern does not ensure 
that the updates keep an order meaning that an later update with progress 100% 
can be succeeded by an update with progress 50%. A fair ReentrantLock would 
solve this since compareAndSet does not.


> Concurrent task progress updates causing NPE in Application Master
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7028
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>         Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the 
> Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_000002_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_000002_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>         at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1513780867907_0001_m_000002_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce 
> this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

Reply via email to