ConfX created MAPREDUCE-7445:
--------------------------------

             Summary: ShuffleSchedulerImpl causes ArithmeticException due to 
improper detailsInterval value checking
                 Key: MAPREDUCE-7445
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened

There is no value checking for parameter 
{{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper 
calculations and crashes the system like division by 0.
h2. Buggy code

In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for 
{{maxFetchFailuresBeforeReporting}} and this variable is directly passed to 
method {{{}checkAndInformMRAppMaster{}}}. When 
{{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would 
cause division by 0 and throw ArithmeticException to crash the system.

 
{noformat}
private void checkAndInformMRAppMaster(
     ...
    if (connectExcpt || (reportReadErrorImmediately && readError)
        || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed) {
      ...
  }{noformat}
h2. How to reproduce

(1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}}, 
{{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}}
(2) run {{mvn surefire:test 
-Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}}
h2. Stacktrace
{noformat}
java.lang.ArithmeticException: / by zero
    at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347)
    at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308)
    at 
org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to