RussellSpitzer opened a new issue, #6367:
URL: https://github.com/apache/iceberg/issues/6367
### Apache Iceberg version
1.1.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
Partial progress currently works in the following psuedo-code
```
Rewrite Job Thread Pool In parallel {
rewriteFiles for a partition/fileGroup // Datafiles generated here
add result of rewrite to commit queue
}
Commit Thread {
when enough fileGroups have been rewritten perform a commit // Manifests
generated at this point in time
}
Once in parallel has completed {
Await Termination of Single Threaded (10 Minutes or die)
}
```
See
https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java#L350-L357
https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L179-L188
And
https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L228-L240
The original assumption here is that 10 minutes after the rewrite has
completed we should be finished performing all the commits as the commit phase
should be relatively fast and the rewrite phase is long. There are a few issues
with this, for some users they may be using a very large cluster for the
"parallel" phase allowing them to complete the rewrites quickly but these new
files will require a huge amount new metadata which in turns would require a
large amount of new manifest files.
In one of our internal examples we have a very large partial progress
rewrite in 10 parts. The rewrites start finishing all around the same time
basically just enqueuing all the commits to then occur in sequence. The
timeline looks basically like this (imagine there are only five commit groups):
```
All Rewrites Begin
1/5 of files Rewritten
1st Commit Begins
2/5 of files groups rewritten
3/5 of files groups rewritten
4/5 of files groups rewritten
1st Commit Finishes
2nd Commit Begins
5/5 of files groups rewritten
10 Minute Timer Begins to Finish Commits
2nd Commit Finishes
3rd Commit Begins
// Timeout!
```
I think the best way to improve this, and increase throughput of the
operation is to move the actual writing of manifests into the parallel portion
of the operation. In this case we could probably do this by building our commit
groups in the Service's offer method rather than in the service thread itself,
the the service thread can just be checking for completed commit groups.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]