[
https://issues.apache.org/jira/browse/MAPREDUCE-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024379#comment-18024379
]
ASF GitHub Bot commented on MAPREDUCE-7470:
-------------------------------------------
github-actions[bot] commented on PR #6469:
URL: https://github.com/apache/hadoop/pull/6469#issuecomment-3363697965
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> multi-thread mapreduce v1 FileOutputcommitter
> ---------------------------------------------
>
> Key: MAPREDUCE-7470
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv2
> Reporter: TianyiMa
> Priority: Major
> Labels: mapreduce, pull-request-available
> Attachments: MAPREDUCE-7470.0.patch
>
>
> In cloud environment, such as aws, aliyun etc., the internet delay is
> non-trival when we commit thounds of files.
> In our situation, the ping delay is about 0.03ms in IDC, but when move to
> Coud, the ping delay is about 3ms, which is roughly 100x slower. We found
> that, committing tens thounds of files will cost a few tens of minutes. The
> more files there are, the logger it takes.
> So we propose a new committer algorithm, which is a variant of committer
> algorithm version 1, called 3. In this new algorithm 3, in order to decrease
> the committer time, we use a thread pool to commit job's final output.
> Our test result in Cloud production shows that, the new algorithm 3 has
> decrease the committer time by serveral tens of times.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]