FangYongs opened a new pull request, #522:
URL: https://github.com/apache/flink-table-store/pull/522

   Currently sink operator in flink will shuffle data by bucket id, which cause 
data skew when there is only 1 bucket with multiple partitions in the table. 
This PR aims to support shuffling data by bucket id and partition when 
`sink.shuffle-by-partition.enable` is set.
   
   The main changes are
   1. Added config `sink.shuffle-by-partition.enable` to support shuffling data 
by partition
   2. Added `PartitionComputer` to get partition from row data
   3. Added shuffling data by partition in `BucketStreamPartitioner`
   
   The main tests are
   1. Added `FileStoreShuffleBucketTest` to shuffle data by bucket and partition
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to