[
https://issues.apache.org/jira/browse/MAPREDUCE-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801918#comment-17801918
]
Steve Loughran commented on MAPREDUCE-7465:
-------------------------------------------
bq. I understand that developpers are reluctant to integrate the PR, but it
does solve my problem correctly.
I would appreciate that it is disabled by default, but configurable with a
property, so I can enable it and still use the official version without local
patch applied to my jars.
i really, really really don't want to do this as (a) its a critical piece of
code and (b) we'd be committing to maintaining it.
bq. There is no problem of "controlling the throttling" in Hadoop Azure Abfss
code... it is already very bad at using legacy class
java.net.HttpURLConnection, so establishing very slow https connection 1 by 1
without keeping TCP sockets alive.
bq. We do have throthling however, not from a single JVM, but because we have
so many (>= 1000) spark applications running concurrently, and so many useless
"Prefetching Threads" ! Trying to control throtling on a single JVM is in my
opinion useless. Azure Abfss can support 20 Millions operations per hours (and
per StorageAccount), and Microsoft Azure was even able to increase it more.
bq. trying to control throtling on a single JVM is in my opinion useless.
we see scale issues in job commits from renames...if you aren't seeing them
then it's because if our attempts to handle this (HADOOP-18002 and related).
Abfs also supports 100-continue, self-throttling and other things, for
cross-JVM throttling. better than the s3a code by a large margin, where even
large directory deletions can blow up and make a mess of retries within the aws
sdk itself.
it's precisely because of rename scale issues during job commit that the
manifest committer was written and is used in production ABFS deployments.
If you are targeting abfs storage, it's the way to get robust job commit even
on heavily loaded stores. And it is shipping in current releases. If you are
having problems getting it working, happy to assist you getting set up.
> performance problem in FileOutputCommiter for big list processed by single
> thread
> ----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-7465
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7465
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: performance
> Affects Versions: 3.2.3, 3.3.2, 3.2.4, 3.3.5, 3.3.3, 3.3.4, 3.3.6
> Reporter: Arnaud Nauwynck
> Priority: Minor
> Labels: pull-request-available
>
> when commiting a big hadoop job (for example via Spark) having many
> partitions,
> the class FileOutputCommiter process thousands of dirs/files to rename with a
> single Thread. This is performance issue, caused by lot of waits on
> FileStystem storage operations.
> I propose that above a configurable threshold (default=3, configurable via
> property 'mapreduce.fileoutputcommitter.parallel.threshold'), the class
> FileOutputCommiter process the list of files to rename using parallel
> threads, using the default jvm ExecutorService (ForkJoinPool.commonPool())
> See Pull-Request:
> [https://github.com/apache/hadoop/pull/6378|https://github.com/apache/hadoop/pull/6378]
> Notice that sub-class instances of FileOutputCommiter are supposed to be
> created at runtime dependending of a configurable property
> ([https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/PathOutputCommitterFactory.java|PathOutputCommitterFactory.java]).
> But for example in Parquet + Spark, this is buggy and can not be changed at
> runtime.
> There is an ongoing Jira and PR to fix it in Parquet + Spark:
> [https://issues.apache.org/jira/browse/PARQUET-2416|https://issues.apache.org/jira/browse/PARQUET-2416]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]