[GitHub] [iceberg] kbendick commented on issue #5453: Issue after migrating to Spark 3.3.0 and Iceberg 14.0

GitBox Sun, 07 Aug 2022 18:57:14 -0700


kbendick commented on issue #5453:
URL: https://github.com/apache/iceberg/issues/5453#issuecomment-1207564941


   To get around the issue, assuming you're using `S3FileIO` (which it seems 
like you are), you might consider increasing the number of multipart upload 
threads if the issue is indeed just a plain S3 upload timeout:
   - https://iceberg.apache.org/docs/latest/aws/#progressive-multipart-upload
   
   Other things that might help would be allowing Spark to sort the arrays 
without using the UDFs, but instead using `array_sort` (and possibly using a 
UDF for the comparison logic, but that might not be needed and keep in mind 
that UDFs are always best avoided if Spark's built-in functions can achieve the 
same thing).
   
   Examples of `sort_array` and `array_sort` (including without a UDF or with 
one but only one used for comparison and not whole array sorting): 
https://towardsdatascience.com/the-definitive-way-to-sort-arrays-in-spark-1224f5529961


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on issue #5453: Issue after migrating to Spark 3.3.0 and Iceberg 14.0

Reply via email to