Moditha Hewasinghage created HADOOP-18757:
---------------------------------------------

             Summary: S3A Committer finalyzing the commits in a single thread
                 Key: HADOOP-18757
                 URL: https://issues.apache.org/jira/browse/HADOOP-18757
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.3.5
            Reporter: Moditha Hewasinghage


S3A Committer is being bottle-necked on the driver when finalizing the commits. 
It seems like only a single thread is being used to finalize the commit. In the 
experiment we are saving 36,000 files ending committing for almost 2 hours each 
file taking 0.1 - 0.5 seconds while all the executors stay idle while the 
driver commits. I have attached the driver log snippets to support this theory 
in comparison to spark 3.4.0.

The most likely reason is the usage of 
[ThreadPoolExecutor]([https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html])
 in 
[https://github.com/apache/hadoop/blob/706d88266abcee09ed78fbaa0ad5f74d818ab0e9/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java#L239-L244]
where an LinkedBlockingQueue is used with a corePoolSize of 0 which ends up 
with a single thread and no new threads created.

>From the documentation


{code:java}
Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue 
without a predefined capacity) will cause new tasks to wait in the queue when 
all corePoolSize threads are busy. Thus, no more than corePoolSize threads will 
ever be created. (And the value of the maximumPoolSize therefore doesn't have 
any effect.) This may be appropriate when each task is completely independent 
of others, so tasks cannot affect each others execution; for example, in a web 
page server. While this style of queuing can be useful in smoothing out 
transient bursts of requests, it admits the possibility of unbounded work queue 
growth when commands continue to arrive on average faster than they can be 
processed.{code}

Magic Committer spark 3.5.0-SNAPSHOT
{code:java}
2023-05-26 15:35:04,852 DEBUG impl.CommitContext: creating thread pool of size 
32
2023-05-26 15:35:04,922 INFO yarn.YarnAllocator: Driver requested a total 
number of 0 executor(s) for resource profile id: 0.
2023-05-26 15:35:07,910 INFO commit.AbstractS3ACommitter: Starting: committing 
the output of 36000 task(s)
2023-05-26 15:35:07,914 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset
2023-05-26 15:35:07,941 DEBUG files.PersistentCommitData: Reading commit data 
from file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
task_202305261454174742717183892533843_0031_m_000000.pendingset
2023-05-26 15:35:08,036 DEBUG impl.CommitContext: creating thread pool of size 
32
2023-05-26 15:35:08,037 DEBUG impl.CommitOperations: Committing single commit 
SinglePendingCommit
{version=2, 
uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 
448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000000_1920855/__base/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
 
destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
 
uploadId='SCp78J9aYOmnPwrtfd5a.Q6B9Zu6olQw3eZcpyt.W.BCu6.M6fz54nlTe2ZYUDicSskFgPocsjlVXAbfiPn3Xu
 26MzXNdWcg5j_kBGg9iWpeoWh4K21gt7bbKetML95MXAck15yP.VGFeOleoAspUg--', 
created=1685113555232, saved=1685113555232, size=110636173, date='Fri May 26 
15:05:55 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', 
taskId='attemp t_202305261454174742717183892533843_0031_m_000000_1920855', 
notes='', 
etags=[64b9d14a89890cf7f931ff05074f6b6a,7a8c22d95d85736806205ad0f84094a5,82f55a1b0595b38420c6b55cb2763a46,212a3105afcf4a9045730f885a622e0a,b563ec91932a66498
 
e8e9d717b625961,0425b7ae17a8a2af80734f0adf4debe9,496e9fbc08a583657fd29779dcd848d6,72bff33d090699baacb6a6171760a675,b2c6b9b76256416d304392bfbabca382,1a8f63d960cf50589b83b407ddfc77c7,d79cf49f16cd61fb00faf67034268529,e3b7ea2bd0ff
 
f2e20a89eccd0ebbeb4a,8edc254a94902fee2884350b4a89f223,f0858853a4542b562c6973e28ee40c19]}
2023-05-26 15:35:08,037 INFO impl.CommitOperations: Starting: Committing file 
<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
 s
ize 110636173
2023-05-26 15:35:08,198 DEBUG impl.CommitOperations: Successful commit of file 
length 110636173
2023-05-26 15:35:08,199 INFO impl.CommitOperations: Committing file 
<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
 size 110636
173: duration 0:00.162s
2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Loading and 
committing files in pendingset 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset: 
duration 0:00.294s
2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset
2023-05-26 15:35:08,208 DEBUG files.PersistentCommitData: Reading commit data 
from file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
task_202305261454174742717183892533843_0031_m_000001.pendingset
2023-05-26 15:35:08,232 DEBUG impl.CommitOperations: Committing single commit 
SinglePendingCommit
{version=2, 
uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 
448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000001_1920856/__base/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
 
destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
 
uploadId='I.5Kns.n5JKLbupr5.AWbzApL9YDGhQyI438WQT3JXUjevkQS8u1aQwQhkVbwBoxnFXItCbKJCS8EwPdtoijtc
 UFrjU0eU3pNvt7lwcYRJ0iklLwXexHbYMpZq3pDj0udwsHCq0trnqyq2Ee.o.prw--', 
created=1685113548923, saved=1685113548923, size=110570244, date='Fri May 26 
15:05:48 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', 
taskId='attemp t_202305261454174742717183892533843_0031_m_000001_1920856', 
notes='', 
etags=[835dd18d4ce995c22c1a84de5cd26fd8,c1f2acd3360b2f94ce90ef0d80edd7e5,208787fce146d917dbc50b1cf4adff4c,a66b228379201a0a278061b096dc2faf,58134b7ae228f0d2a
 
6879ec53a3f35c1,e847a7614639d3f1a43a1b3090b603e1,2d8eac2399238e6d356fe0bdfafd076e,e0c3a964ee2a3c270f67e7e4a7791440,84b05192e896e41e1d16aa363f0eb48b,c3cf764e4361da4573b8dfa6361e4174,78b145f682946c7e18750758da3578cd,d9cd438899d2
 
0522668b9cd7f61cc098,e8b3944929f17d927ed7aef31800bdbf,52d279a75e9382ead94fe4d9b81c3bf9]}
2023-05-26 15:35:08,232 INFO impl.CommitOperations: Starting: Committing file 
<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
 s
ize 110570244
2023-05-26 15:35:08,438 DEBUG impl.CommitOperations: Successful commit of file 
length 110570244
2023-05-26 15:35:08,438 INFO impl.CommitOperations: Committing file 
<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
 size 110570
244: duration 0:00.206s
2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Loading and 
committing files in pendingset 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset: 
duration 0:00.240s
2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000002.pendingset
2023-05-26 15:35:08,448 DEBUG files.PersistentCommitData: Reading commit data 
from file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
task_202305261454174742717183892533843_0031_m_000002.pendingset
2023-05-26 15:35:08,489 DEBUG impl.CommitOperations: Committing single commit 
SinglePendingCommit
{version=2, 
uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 
448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000002_1920857/__base/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
 
destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
 
uploadId='AB982jhG0EAfnSxXWphX7R.QfqrkS2ipP6APqJZYGlqDl335WX4X2xn81wy7l2i5NBQrvG.eVcQgYFudk51wi5
 EMDJRNVz__SEWyF0xTncpIHpaC5xn8La.AY8gzCHQrAfRJ83nErQhe4Idmhcs04w--', 
created=1685113553072, saved=1685113553072, size=110136058, date='Fri May 26 
15:05:53 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', 
taskId='attemp t_202305261454174742717183892533843_0031_m_000002_1920857', 
notes='', 
etags=[f043628528fc9bdd6db4107fd175eb1d,11e36b4bde35bdb1d2e160ffd5fcc3f6,5105cb3ebee1b27cfec66f7682679316,1a85e5c9477e855e38f7bd4e81bed28e,7c57360865e12c19b
 
3125cbfb8c786f8,59d928629e7afff4c4c038a1dadd0ce7,bfcedee2f51b45c20a097fbbae85a48f,7e3ff2665714d8cdc6ec20134856eb4c,0117e5c47f94a9ff4d8d0eafd1f6b76e,3c4f27e69544e4cc5297c56413231639,a32cd6422aaae63bdcbaafa7400ab889,1d5d07f0a0b2
 
58773993a456bc7ec7ee,a220e786aa36f643edc330a184e23d88,e19a85ab3accaa4dc697e7dfbf5d5325]}
2023-05-26 15:35:08,489 INFO impl.CommitOperations: Starting: Committing file 
<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
 s
ize 110136058
2023-05-26 15:35:08,646 DEBUG impl.CommitOperations: Successful commit of file 
length 110136058
2023-05-26 15:35:08,646 INFO impl.CommitOperations: Committing file 
<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
 size 110136058: duration 0:00.157s
{code}
Magic Committer spark 3.4.0

 
{code:java}
2023-05-24 09:47:19,906 INFO yarn.YarnAllocator: Driver requested a total 
number of 524 executor(s) for resource profile id: 0.
2023-05-24 09:47:23,064 INFO commit.AbstractS3ACommitter: Starting: committing 
the output of 36000 task(s)
2023-05-24 09:47:23,069 DEBUG commit.AbstractS3ACommitter: Task committer 
attempt_202305240933298436217226765687147_0000_m_000000_0: creating thread pool 
of size 32
2023-05-24 09:47:23,074 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,074 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000006.pendingset
2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
e3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading and 
committing files in pendingset s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
e3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
e3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
e3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
e3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
2023-05-24 09:47:23,078 DEBUG commit.Tasks: Executing task
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in file 
s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to