liyubin117 commented on PR #4974: URL: https://github.com/apache/paimon/pull/4974#issuecomment-2609279542
> I'm not sure if these modifications are effective, so let me give you my suggestion: We only need to modify PartitionIndex.asign inside: > > Only code: > > ``` > // 3. create a new bucket > for (int i = 0; i < Short.MAX_VALUE; i++) { > if (bucketFilter.test(i) && !totalBucket.contains(i)) { > hash2Bucket.put(hash, (short) i); > nonFullBucketInformation.put(i, 1L); > totalBucket.add(i); > return i; > } > } > > // 4. too many buckets, throw exception > @SuppressWarnings("OptionalGetWithoutIsPresent") > int maxBucket = totalBucket.stream().mapToInt(Integer::intValue).max().getAsInt(); > throw new RuntimeException( > String.format( > "Too more bucket %s, you should increase target bucket row number %s.", > maxBucket, targetBucketRowNumber)); > ``` > > New code: > > ``` > // 3. create a new bucket > for (int i = 0; i < max_buckets; i++) { > if (bucketFilter.test(i) && !totalBucket.contains(i)) { > hash2Bucket.put(hash, (short) i); > nonFullBucketInformation.put(i, 1L); > totalBucket.add(i); > return i; > } > } > > // 4. exceed max_buckets, just pick a bucket for record. > pick a min bucket (belongs to this task) to the record. > ``` After offline discussion, we have reached a consesus: We can't just update `PartitionIndex` logic because it doesn't handle `SimpleHashBucketAssigner`; When the buckets are full, a random bucket is selected for writing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org