RussellSpitzer commented on code in PR #6588:
URL: https://github.com/apache/iceberg/pull/6588#discussion_r1070261827
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java:
##########
@@ -47,4 +47,8 @@ private SparkSQLProperties() {}
public static final String PRESERVE_DATA_GROUPING =
"spark.sql.iceberg.planning.preserve-data-grouping";
public static final boolean PRESERVE_DATA_GROUPING_DEFAULT = false;
+
+ // Controls how many physical file deletes to execute in parallel when not
otherwise specified
+ public static final String DELETE_PARALLELISM =
"driver-delete-default-parallelism";
+ public static final String DELETE_PARALLELISM_DEFAULT = "25";
Review Comment:
With S3's request throttling around 4k requests a second this gives us a lot
of overhead.
Assuming a 50ms response time
4000 max requests / Second / 20 requests per thread per second =~ 200 max
concurrent requests.
Another option for this is to also incorporate the "bulk delete" apis but
that would only help with S3 based filesystems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]