bendevera commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1861920740
@stevenzwu thank you for the quick response! Okay, will run some `BucketPartitioner` tests for our use case by copying code manually. Smart shuffling sounds interesting and would certainly test out. A lot of the use cases we deal with fit well with `BucketPartitioner` conceptually and so can test to verify. Current `DistributionMode.HASH` implementation is too slow computing the data file path for each record and we've noticed processing rates take a huge hit when enabling. Glad to see features being extended! Will read up more regarding the smart shuffling design and see if we can get involved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
