izhangzhihao opened a new issue, #3776: URL: https://github.com/apache/paimon/issues/3776
### Search before asking - [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version 0.9.0 ### Compute Engine flink 1.17.2 ### Minimal reproduce step ```sql CREATE TABLE if not exists a_one_billion_table ( id STRING, ... ... PRIMARY KEY (id) NOT ENFORCED ) WITH ('bucket' = '-1'); # make sure you already inserted more then 1 billion data into table `a_one_billion_table` # then run the blow new filnk job, `source_table` may only have 10000 records, then the checkpoint will fail. insert into a_one_billion_table select * from source_table; ``` ### What doesn't meet your expectations? checkpoint failed with error: ``` 2024-07-18 16:09:32,895 WARN org.apache.flink.runtime.taskmanager.Task [] - dynamic-bucket-assigner (1/1)#0 switched from RUNNING to FAILED with failure cause: java.lang.IllegalArgumentException: Too large (1466616922 expected elements with load factor 0.75) at org.apache.paimon.shade.it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:208) at org.apache.paimon.shade.it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.<init>(Int2ShortOpenHashMap.java:103) at org.apache.paimon.shade.it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.<init>(Int2ShortOpenHashMap.java:116) at org.apache.paimon.utils.Int2ShortHashMap.<init>(Int2ShortHashMap.java:35) at org.apache.paimon.utils.Int2ShortHashMap$Builder.build(Int2ShortHashMap.java:70) at org.apache.paimon.index.PartitionIndex.loadIndex(PartitionIndex.java:138) at org.apache.paimon.index.HashBucketAssigner.loadIndex(HashBucketAssigner.java:166) at org.apache.paimon.index.HashBucketAssigner.assign(HashBucketAssigner.java:83) at org.apache.paimon.flink.sink.HashBucketAssignerOperator.processElement(HashBucketAssignerOperator.java:98) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:246) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:217) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:169) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:616) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:1080) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:1029) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:959) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:938) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:567) at java.lang.Thread.run(Thread.java:879) [?:1.8.0_372] ``` ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
