Re: [PR] [core] Support upper bound in dynamic bucket mode [paimon]

via GitHub Thu, 23 Jan 2025 01:19:32 -0800


liyubin117 commented on PR #4974:
URL: https://github.com/apache/paimon/pull/4974#issuecomment-2609279542


   > I'm not sure if these modifications are effective, so let me give you my 
suggestion: We only need to modify PartitionIndex.asign inside:
   > 
   > Only code:
   > 
   > ```
   > // 3. create a new bucket
   > for (int i = 0; i < Short.MAX_VALUE; i++) {
   >     if (bucketFilter.test(i) && !totalBucket.contains(i)) {
   >         hash2Bucket.put(hash, (short) i);
   >         nonFullBucketInformation.put(i, 1L);
   >         totalBucket.add(i);
   >         return i;
   >     }
   > }
   > 
   > // 4. too many buckets, throw exception
   > @SuppressWarnings("OptionalGetWithoutIsPresent")
   > int maxBucket = 
totalBucket.stream().mapToInt(Integer::intValue).max().getAsInt();
   > throw new RuntimeException(
   >         String.format(
   >                 "Too more bucket %s, you should increase target bucket row 
number %s.",
   >                 maxBucket, targetBucketRowNumber));
   > ```
   > 
   > New code:
   > 
   > ```
   > // 3. create a new bucket
   > for (int i = 0; i < max_buckets; i++) {
   >     if (bucketFilter.test(i) && !totalBucket.contains(i)) {
   >         hash2Bucket.put(hash, (short) i);
   >         nonFullBucketInformation.put(i, 1L);
   >         totalBucket.add(i);
   >         return i;
   >     }
   > }
   > 
   > // 4. exceed max_buckets, just pick a bucket for record.
   > pick a min bucket (belongs to this task) to the record.
   > ```
   
   After offline discussion, we have reached a consesus: We can't just update 
`PartitionIndex` logic because it doesn't handle `SimpleHashBucketAssigner`; 
When the buckets are full, a random bucket is selected for writing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [core] Support upper bound in dynamic bucket mode [paimon]

Reply via email to