[jira] [Updated] (HUDI-2190) Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute
[ https://issues.apache.org/jira/browse/HUDI-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2190: --- Description: SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch and etc in some others class, but it is unnecessary. (was: SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch, but it is unnecessary.) > Unnecessary exception catch in > SparkBulkInsertPreppedCommitActionExecutor#execute > - > > Key: HUDI-2190 > URL: https://issues.apache.org/jira/browse/HUDI-2190 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: zhangminglei >Priority: Major > > SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch and etc in > some others class, but it is unnecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2190) Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute
zhangminglei created HUDI-2190: -- Summary: Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute Key: HUDI-2190 URL: https://issues.apache.org/jira/browse/HUDI-2190 Project: Apache Hudi Issue Type: Improvement Components: Spark Integration Reporter: zhangminglei SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch, but it is unnecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2187) Hive integration Improvment
zhangminglei created HUDI-2187: -- Summary: Hive integration Improvment Key: HUDI-2187 URL: https://issues.apache.org/jira/browse/HUDI-2187 Project: Apache Hudi Issue Type: Improvement Components: Hive Integration Reporter: zhangminglei Assignee: zhangminglei See the details from RFC doc https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2181) Refine for FlinkCreateHandle
[ https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2181: --- Summary: Refine for FlinkCreateHandle (was: Refine doc for FlinkCreateHandle) > Refine for FlinkCreateHandle > > > Key: HUDI-2181 > URL: https://issues.apache.org/jira/browse/HUDI-2181 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > > FlinkCreateHandle does not append to the original file for subsequent > mini-batches, instead every inserts batch would create a new file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2181) Refine doc for FlinkCreateHandle
[ https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2181: --- Summary: Refine doc for FlinkCreateHandle (was: Refine the doc for FlinkCreateHandle) > Refine doc for FlinkCreateHandle > > > Key: HUDI-2181 > URL: https://issues.apache.org/jira/browse/HUDI-2181 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: zhangminglei >Priority: Major > > FlinkCreateHandle does not append to the original file for subsequent > mini-batches, instead every inserts batch would create a new file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2181) Refine doc for FlinkCreateHandle
[ https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei reassigned HUDI-2181: -- Assignee: zhangminglei > Refine doc for FlinkCreateHandle > > > Key: HUDI-2181 > URL: https://issues.apache.org/jira/browse/HUDI-2181 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Major > > FlinkCreateHandle does not append to the original file for subsequent > mini-batches, instead every inserts batch would create a new file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2181) Refine the doc for FlinkCreateHandle
zhangminglei created HUDI-2181: -- Summary: Refine the doc for FlinkCreateHandle Key: HUDI-2181 URL: https://issues.apache.org/jira/browse/HUDI-2181 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: zhangminglei FlinkCreateHandle does not append to the original file for subsequent mini-batches, instead every inserts batch would create a new file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379078#comment-17379078 ] zhangminglei commented on HUDI-2162: Since it's for internal usage, and if the user do not use HoodieTableSink in there, it is confused to set the internal params. Other than that, it oks for set the timeout. > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous , and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once, This kind of usage is too weak under the > context. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Description: Since commit Instant and getting Instant are asynchronous , and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once, This kind of usage is too weak under the context. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) was: Since commit Instant and getting Instant are asynchronous , and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. This kind of usage is too weak under the context. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous , and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once, This kind of usage is too weak under the > context. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Description: Since commit Instant and getting Instant are asynchronous , and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. This kind of usage is too weak under the context. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) was: Since commit Instant and getting Instant are asynchronous , and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous , and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. This kind of usage is too weak under the > context. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei reassigned HUDI-2162: -- Assignee: zhangminglei > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous , and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Description: Since commit Instant and getting Instant are asynchronous (no order here), and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) was: Since commit Instant and getting Instant are asynchronous, and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous (no order here), > and thus instant is null the default waiting time is 0 must greater than > ckpTimeout would cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Description: Since commit Instant and getting Instant are asynchronous , and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) was: Since commit Instant and getting Instant are asynchronous (no order here), and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous , and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei reassigned HUDI-2162: -- Assignee: (was: zhangminglei) > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous, and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Summary: Instant is null cause flushBuffer failed (was: Instant is null cause flushBuffer failed) > Instant is null cause flushBuffer failed > - > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous, and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Summary: Instant is null cause flushBuffer failed in casual (was: Instant is null cause flushBuffer failed ) > Instant is null cause flushBuffer failed in casual > -- > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous, and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-2162: --- Description: Since commit Instant and getting Instant are asynchronous, and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) was: Since commit Instant and getting Instant are asynchronous, and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Caused by: org.apache.hudi.exception.HoodieException: Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) > Instant is null cause flushBuffer failed > > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous, and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Timeout(0ms) while waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2162) Instant is null cause flushBuffer failed
[ https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei reassigned HUDI-2162: -- Assignee: zhangminglei > Instant is null cause flushBuffer failed > > > Key: HUDI-2162 > URL: https://issues.apache.org/jira/browse/HUDI-2162 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Blocker > > Since commit Instant and getting Instant are asynchronous, and thus instant > is null the default waiting time is 0 must greater than ckpTimeout would > cause Exception belows as shown. > WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for > java api user under exactly once. > Caused by: org.apache.hudi.exception.HoodieException: Timeout(0ms) while > waiting for instant null to commit > at > org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) > at > org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) > at > org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) > at > org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) > at > org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2162) Instant is null cause flushBuffer failed
zhangminglei created HUDI-2162: -- Summary: Instant is null cause flushBuffer failed Key: HUDI-2162 URL: https://issues.apache.org/jira/browse/HUDI-2162 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: zhangminglei Since commit Instant and getting Instant are asynchronous, and thus instant is null the default waiting time is 0 must greater than ckpTimeout would cause Exception belows as shown. WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java api user under exactly once. Caused by: org.apache.hudi.exception.HoodieException: Timeout(0ms) while waiting for instant null to commit at org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597) at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618) at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554) at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236) at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew
[ https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-1918: --- Description: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse scenario, keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew in a way. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. was: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse scenario, partition path is mostly based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew in a way. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. > Incorrect keyby field would cause serious data skew > --- > > Key: HUDI-1918 > URL: https://issues.apache.org/jira/browse/HUDI-1918 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Critical > Labels: pull-request-available > Fix For: 0.9.0 > > > The code > ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) > that in the actual data warehouse scenario, keyBy (HoodieRecord: > :getPartitionPath) that would cause serious data skew in a way. > we can actually shuffle data by record key here to avoid multiple subtasks > write to a bucket at the same time, just like the pipeline in HoodieTableSink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew
[ https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-1918: --- Description: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse scenario, partition path is mostly based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew in a way. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. was: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse, partition path is mostly based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew in a way. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. > Incorrect keyby field would cause serious data skew > --- > > Key: HUDI-1918 > URL: https://issues.apache.org/jira/browse/HUDI-1918 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Critical > Labels: pull-request-available > > The code > ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) > that in the actual data warehouse scenario, partition path is mostly based > on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that > would cause serious data skew in a way. > we can actually shuffle data by record key here to avoid multiple subtasks > write to a bucket at the same time, just like the pipeline in HoodieTableSink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew
[ https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-1918: --- Description: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse, partition path is mostly based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew in a way. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. was: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse, partition path is mostly based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. > Incorrect keyby field would cause serious data skew > --- > > Key: HUDI-1918 > URL: https://issues.apache.org/jira/browse/HUDI-1918 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Critical > > The code > ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) > that in the actual data warehouse, partition path is mostly based on > log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would > cause serious data skew in a way. > we can actually shuffle data by record key here to avoid multiple subtasks > write to a bucket at the same time, just like the pipeline in HoodieTableSink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew
[ https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-1918: --- Description: The code ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) that in the actual data warehouse, partition path is mostly based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew. we can actually shuffle data by record key here to avoid multiple subtasks write to a bucket at the same time, just like the pipeline in HoodieTableSink. was: The code (https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92), in the actual data warehouse, partition path is most based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew. we can actually shuffle data by record key here, just like the pipeline in HoodieTableSink. > Incorrect keyby field would cause serious data skew > --- > > Key: HUDI-1918 > URL: https://issues.apache.org/jira/browse/HUDI-1918 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: zhangminglei >Assignee: zhangminglei >Priority: Critical > > The code > ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92]) > that in the actual data warehouse, partition path is mostly based on > log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would > cause serious data skew. > we can actually shuffle data by record key here to avoid multiple subtasks > write to a bucket at the same time, just like the pipeline in HoodieTableSink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1918) Incorrect keyby field would cause serious data skew
zhangminglei created HUDI-1918: -- Summary: Incorrect keyby field would cause serious data skew Key: HUDI-1918 URL: https://issues.apache.org/jira/browse/HUDI-1918 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: zhangminglei Assignee: zhangminglei The code (https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92), in the actual data warehouse, partition path is most based on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious data skew. we can actually shuffle data by record key here, just like the pipeline in HoodieTableSink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1913) Using streams instead of loops for readline
[ https://issues.apache.org/jira/browse/HUDI-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated HUDI-1913: --- Description: Using streams instead of loops improves the readability and makes the code more compact. For example, we could use _BufferReader.lines_ instead of _BufferReader.readLine_ with a bunch of for loop code, and that makes the code looks more compact. was:Using streams instead of loops improves the readability and makes the code more compact. > Using streams instead of loops for readline > --- > > Key: HUDI-1913 > URL: https://issues.apache.org/jira/browse/HUDI-1913 > Project: Apache Hudi > Issue Type: Improvement > Components: CLI, Common Core, Utilities >Reporter: zhangminglei >Priority: Minor > > Using streams instead of loops improves the readability and makes the code > more compact. > For example, we could use _BufferReader.lines_ instead of > _BufferReader.readLine_ with a bunch of for loop code, and that makes the > code looks more compact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1913) Using streams instead of loops for readline
[ https://issues.apache.org/jira/browse/HUDI-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346629#comment-17346629 ] zhangminglei commented on HUDI-1913: I will give a pr for this soon. > Using streams instead of loops for readline > --- > > Key: HUDI-1913 > URL: https://issues.apache.org/jira/browse/HUDI-1913 > Project: Apache Hudi > Issue Type: Improvement > Components: CLI, Common Core, Utilities >Reporter: zhangminglei >Priority: Minor > > Using streams instead of loops improves the readability and makes the code > more compact. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1913) Using streams instead of loops for readline
zhangminglei created HUDI-1913: -- Summary: Using streams instead of loops for readline Key: HUDI-1913 URL: https://issues.apache.org/jira/browse/HUDI-1913 Project: Apache Hudi Issue Type: Improvement Components: CLI, Common Core, Utilities Reporter: zhangminglei Using streams instead of loops improves the readability and makes the code more compact. -- This message was sent by Atlassian Jira (v8.3.4#803005)