[
https://issues.apache.org/jira/browse/MAPREDUCE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029367#comment-18029367
]
ASF GitHub Bot commented on MAPREDUCE-7445:
-------------------------------------------
github-actions[bot] commented on PR #6051:
URL: https://github.com/apache/hadoop/pull/6051#issuecomment-3395512619
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> ShuffleSchedulerImpl causes ArithmeticException due to improper
> detailsInterval value checking
> ----------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-7445
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 3.3.3
> Reporter: ConfX
> Priority: Critical
> Labels: pull-request-available
> Attachments: reproduce.sh
>
>
> h2. What happened
> There is no value checking for parameter
> {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper
> calculations and crashes the system like division by 0.
> h2. Buggy code
> In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for
> {{maxFetchFailuresBeforeReporting}} and this variable is directly passed to
> method {{{}checkAndInformMRAppMaster{}}}. When
> {{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would
> cause division by 0 and throw ArithmeticException to crash the system.
>
> {noformat}
> private void checkAndInformMRAppMaster(
> ...
> if (connectExcpt || (reportReadErrorImmediately && readError)
> || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed)
> {
> ...
> }{noformat}
> h2. How to reproduce
> (1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}},
> {{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}}
> (2) run {{mvn surefire:test
> -Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}}
> h2. Stacktrace
> {noformat}
> java.lang.ArithmeticException: / by zero
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347)
> at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308)
> at
> org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]