kbendick commented on issue #5453:
URL: https://github.com/apache/iceberg/issues/5453#issuecomment-1207554438

   The stack trace reads like there's an S3 request timeout.
   
   Can you provide the following infirmation?
   
   1. The exact Iceberg runtime jar dependency used (ensure that you're using 
the spark 3.3 iceberg bundle
   2. The catalog are you using (eg hadoop, hive, DynamoDB, etc).
   3. The Spark configuration used, including configuration settings for 
initializing the Iceberg catalog plus any non-default Spark configs applied.
   
   Have you confirmed that this same code over _the same input data_ works with 
your previous Spark 3.2 settings? And not necessarily previous runs, but this 
_same_ input data? Given that there seems to be an S3 upload timeout, it would 
be really helpful to compare the previous setup to the new setup over the exact 
same dataset to be able to truly compare the two. Otherwise it's hard to be 
sure that the problem isn't simply input skew of your data (eg the input 
dataset is much larger than normal, it's much more skewed on the columns being 
sorted on and thus takes much more time to sort, etc). 
   
   It would also be _very_ helpful to provide the Query plan from your old 
setup and from the new setup (either the output of `EXPLAIN EXTENDED` or a 
screenshot of the whole DAG for the Query from the SQL tab for both the old 
Spark 3.2 with Iceberg 0.13 and for the new Spark 3.3 with Iceberg 0.14).
   
   Assuming that you're using the correct Spark iceberg runtime JAR for Spark 
3.3.0, I'm wondering if maybe Spark's adaptive execution is lowering the 
parallelism of the write stage, which is then increasing the size of the data 
upload to S3 and leading to the timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to