kbendick commented on issue #5453: URL: https://github.com/apache/iceberg/issues/5453#issuecomment-1207564941
To get around the issue, assuming you're using `S3FileIO` (which it seems like you are), you might consider increasing the number of multipart upload threads if the issue is indeed just a plain S3 upload timeout: - https://iceberg.apache.org/docs/latest/aws/#progressive-multipart-upload Other things that might help would be allowing Spark to sort the arrays without using the UDFs, but instead using `array_sort` (and possibly using a UDF for the comparison logic, but that might not be needed and keep in mind that UDFs are always best avoided if Spark's built-in functions can achieve the same thing). Examples of `sort_array` and `array_sort` (including without a UDF or with one but only one used for comparison and not whole array sorting): https://towardsdatascience.com/the-definitive-way-to-sort-arrays-in-spark-1224f5529961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
