[jira] [Updated] (HUDI-2190) Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute

2021-07-17 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2190:
---
Description: SparkBulkInsertPreppedCommitActionExecutor#execute has a try 
catch and etc in some others class, but it is unnecessary.  (was: 
SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch, but it is 
unnecessary.)

> Unnecessary exception catch in 
> SparkBulkInsertPreppedCommitActionExecutor#execute
> -
>
> Key: HUDI-2190
> URL: https://issues.apache.org/jira/browse/HUDI-2190
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: zhangminglei
>Priority: Major
>
> SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch and etc in 
> some others class, but it is unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2190) Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute

2021-07-17 Thread zhangminglei (Jira)
zhangminglei created HUDI-2190:
--

 Summary: Unnecessary exception catch in 
SparkBulkInsertPreppedCommitActionExecutor#execute
 Key: HUDI-2190
 URL: https://issues.apache.org/jira/browse/HUDI-2190
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Spark Integration
Reporter: zhangminglei


SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch, but it is 
unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2187) Hive integration Improvment

2021-07-16 Thread zhangminglei (Jira)
zhangminglei created HUDI-2187:
--

 Summary: Hive integration Improvment
 Key: HUDI-2187
 URL: https://issues.apache.org/jira/browse/HUDI-2187
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Hive Integration
Reporter: zhangminglei
Assignee: zhangminglei


See the details from RFC doc



https://cwiki.apache.org/confluence/display/HUDI/RFC+-+31%3A+Hive+integration+Improvment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2181) Refine for FlinkCreateHandle

2021-07-15 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2181:
---
Summary: Refine for FlinkCreateHandle  (was: Refine doc for 
FlinkCreateHandle)

> Refine for FlinkCreateHandle
> 
>
> Key: HUDI-2181
> URL: https://issues.apache.org/jira/browse/HUDI-2181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>
> FlinkCreateHandle does not append to the original file for subsequent 
> mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2181) Refine doc for FlinkCreateHandle

2021-07-14 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2181:
---
Summary: Refine doc for FlinkCreateHandle  (was: Refine the doc for 
FlinkCreateHandle)

> Refine doc for FlinkCreateHandle
> 
>
> Key: HUDI-2181
> URL: https://issues.apache.org/jira/browse/HUDI-2181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: zhangminglei
>Priority: Major
>
> FlinkCreateHandle does not append to the original file for subsequent 
> mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2181) Refine doc for FlinkCreateHandle

2021-07-14 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei reassigned HUDI-2181:
--

Assignee: zhangminglei

> Refine doc for FlinkCreateHandle
> 
>
> Key: HUDI-2181
> URL: https://issues.apache.org/jira/browse/HUDI-2181
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Major
>
> FlinkCreateHandle does not append to the original file for subsequent 
> mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2181) Refine the doc for FlinkCreateHandle

2021-07-14 Thread zhangminglei (Jira)
zhangminglei created HUDI-2181:
--

 Summary: Refine the doc for FlinkCreateHandle
 Key: HUDI-2181
 URL: https://issues.apache.org/jira/browse/HUDI-2181
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: zhangminglei


FlinkCreateHandle does not append to the original file for subsequent 
mini-batches, instead every inserts batch would create a new file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-12 Thread zhangminglei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379078#comment-17379078
 ] 

zhangminglei commented on HUDI-2162:


Since it's for internal usage, and if the user do not use HoodieTableSink in 
there, it is confused to set the internal params. Other than that, it oks for 
set the timeout.

> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown. 
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once, This kind of usage is too weak under the 
> context.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Description: 
Since commit Instant and getting Instant are asynchronous , and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown. 

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once, This kind of usage is too weak under the context.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)

  was:
Since commit Instant and getting Instant are asynchronous , and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown. This kind of usage is too weak under the context.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)


> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown. 
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once, This kind of usage is too weak under the 
> context.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Description: 
Since commit Instant and getting Instant are asynchronous , and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown. This kind of usage is too weak under the context.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)

  was:
Since commit Instant and getting Instant are asynchronous , and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)


> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown. This kind of usage is too weak under the 
> context.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei reassigned HUDI-2162:
--

Assignee: zhangminglei

> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Description: 
Since commit Instant and getting Instant are asynchronous (no order here), and 
thus instant is null the default waiting time is 0 must greater than ckpTimeout 
would cause Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)

  was:
Since commit Instant and getting Instant are asynchronous, and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)


> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous (no order here), 
> and thus instant is null the default waiting time is 0 must greater than 
> ckpTimeout would cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Description: 
Since commit Instant and getting Instant are asynchronous , and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)

  was:
Since commit Instant and getting Instant are asynchronous (no order here), and 
thus instant is null the default waiting time is 0 must greater than ckpTimeout 
would cause Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)


> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous , and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei reassigned HUDI-2162:
--

Assignee: (was: zhangminglei)

> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous, and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Summary: Instant is null cause flushBuffer failed   (was: Instant is null 
cause flushBuffer failed)

> Instant is null cause flushBuffer failed 
> -
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous, and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed in casual

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Summary: Instant is null cause flushBuffer failed in casual  (was: Instant 
is null cause flushBuffer failed )

> Instant is null cause flushBuffer failed in casual
> --
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous, and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2162) Instant is null cause flushBuffer failed

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-2162:
---
Description: 
Since commit Instant and getting Instant are asynchronous, and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Timeout(0ms) while waiting for instant null to commit
 at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
 at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
 at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
 at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
 at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)

  was:
Since commit Instant and getting Instant are asynchronous, and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Caused by: org.apache.hudi.exception.HoodieException: Timeout(0ms) while 
waiting for instant null to commit
at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)


> Instant is null cause flushBuffer failed
> 
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous, and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Timeout(0ms) while waiting for instant null to commit
>  at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
>  at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
>  at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2162) Instant is null cause flushBuffer failed

2021-07-10 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei reassigned HUDI-2162:
--

Assignee: zhangminglei

> Instant is null cause flushBuffer failed
> 
>
> Key: HUDI-2162
> URL: https://issues.apache.org/jira/browse/HUDI-2162
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Blocker
>
> Since commit Instant and getting Instant are asynchronous, and thus instant 
> is null the default waiting time is 0 must greater than ckpTimeout would 
> cause Exception belows as shown.
> WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for 
> java api user under exactly once.
> Caused by: org.apache.hudi.exception.HoodieException: Timeout(0ms) while 
> waiting for instant null to commit
> at 
> org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
> at 
> org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
> at 
> org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
> at 
> org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
> at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2162) Instant is null cause flushBuffer failed

2021-07-10 Thread zhangminglei (Jira)
zhangminglei created HUDI-2162:
--

 Summary: Instant is null cause flushBuffer failed
 Key: HUDI-2162
 URL: https://issues.apache.org/jira/browse/HUDI-2162
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: zhangminglei


Since commit Instant and getting Instant are asynchronous, and thus instant is 
null the default waiting time is 0 must greater than ckpTimeout would cause 
Exception belows as shown.

WRITE_COMMIT_ACK_TIMEOUT is for internal usage. so it is not suitable for java 
api user under exactly once.

Caused by: org.apache.hudi.exception.HoodieException: Timeout(0ms) while 
waiting for instant null to commit
at 
org.apache.hudi.sink.StreamWriteFunction.instantToWrite(StreamWriteFunction.java:597)
at 
org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:618)
at 
org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:554)
at 
org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:236)
at 
org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew

2021-05-26 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-1918:
---
Description: 
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse scenario, keyBy (HoodieRecord: 
:getPartitionPath) that would cause serious data skew in a way.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.

  was:
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse scenario, partition path is mostly based on 
log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would 
cause serious data skew in a way.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.


> Incorrect keyby field would cause serious data skew
> ---
>
> Key: HUDI-1918
> URL: https://issues.apache.org/jira/browse/HUDI-1918
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> The code 
> ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
>  that in the actual data warehouse scenario, keyBy (HoodieRecord: 
> :getPartitionPath) that would cause serious data skew in a way.
> we can actually shuffle data by record key here to avoid multiple subtasks 
> write to a bucket at the same time, just like the pipeline in HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew

2021-05-20 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-1918:
---
Description: 
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse scenario, partition path is mostly based on 
log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would 
cause serious data skew in a way.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.

  was:
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse, partition path is mostly based on log_date 
or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause 
serious data skew in a way.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.


> Incorrect keyby field would cause serious data skew
> ---
>
> Key: HUDI-1918
> URL: https://issues.apache.org/jira/browse/HUDI-1918
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Critical
>  Labels: pull-request-available
>
> The code 
> ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
>  that in the actual data warehouse scenario, partition path is mostly based 
> on log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that 
> would cause serious data skew in a way.
> we can actually shuffle data by record key here to avoid multiple subtasks 
> write to a bucket at the same time, just like the pipeline in HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew

2021-05-20 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-1918:
---
Description: 
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse, partition path is mostly based on log_date 
or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause 
serious data skew in a way.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.

  was:
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse, partition path is mostly based on log_date 
or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause 
serious data skew.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.


> Incorrect keyby field would cause serious data skew
> ---
>
> Key: HUDI-1918
> URL: https://issues.apache.org/jira/browse/HUDI-1918
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Critical
>
> The code 
> ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
>  that in the actual data warehouse, partition path is mostly based on 
> log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would 
> cause serious data skew in a way.
> we can actually shuffle data by record key here to avoid multiple subtasks 
> write to a bucket at the same time, just like the pipeline in HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1918) Incorrect keyby field would cause serious data skew

2021-05-20 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-1918:
---
Description: 
The code 
([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
 that in the actual data warehouse, partition path is mostly based on log_date 
or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause 
serious data skew.

we can actually shuffle data by record key here to avoid multiple subtasks 
write to a bucket at the same time, just like the pipeline in HoodieTableSink.

  was:
The code 
(https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92),
 in the actual data warehouse, partition path is most based on log_date or 
log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious 
data skew.

we can actually shuffle data by record key here, just like the pipeline in 
HoodieTableSink.


> Incorrect keyby field would cause serious data skew
> ---
>
> Key: HUDI-1918
> URL: https://issues.apache.org/jira/browse/HUDI-1918
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: zhangminglei
>Assignee: zhangminglei
>Priority: Critical
>
> The code 
> ([https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92])
>  that in the actual data warehouse, partition path is mostly based on 
> log_date or log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would 
> cause serious data skew.
> we can actually shuffle data by record key here to avoid multiple subtasks 
> write to a bucket at the same time, just like the pipeline in HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1918) Incorrect keyby field would cause serious data skew

2021-05-20 Thread zhangminglei (Jira)
zhangminglei created HUDI-1918:
--

 Summary: Incorrect keyby field would cause serious data skew
 Key: HUDI-1918
 URL: https://issues.apache.org/jira/browse/HUDI-1918
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: zhangminglei
Assignee: zhangminglei


The code 
(https://github.com/apache/hudi/blob/master/hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java#L92),
 in the actual data warehouse, partition path is most based on log_date or 
log_hour, so keyBy (HoodieRecord: :getPartitionPath) that would cause serious 
data skew.

we can actually shuffle data by record key here, just like the pipeline in 
HoodieTableSink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1913) Using streams instead of loops for readline

2021-05-18 Thread zhangminglei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangminglei updated HUDI-1913:
---
Description: 
Using streams instead of loops improves the readability and makes the code more 
compact.

For example, we could use _BufferReader.lines_ instead of 
_BufferReader.readLine_ with a bunch of for loop code, and that makes the code 
looks more compact.

  was:Using streams instead of loops improves the readability and makes the 
code more compact.


> Using streams instead of loops for readline
> ---
>
> Key: HUDI-1913
> URL: https://issues.apache.org/jira/browse/HUDI-1913
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI, Common Core, Utilities
>Reporter: zhangminglei
>Priority: Minor
>
> Using streams instead of loops improves the readability and makes the code 
> more compact.
> For example, we could use _BufferReader.lines_ instead of 
> _BufferReader.readLine_ with a bunch of for loop code, and that makes the 
> code looks more compact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1913) Using streams instead of loops for readline

2021-05-18 Thread zhangminglei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346629#comment-17346629
 ] 

zhangminglei commented on HUDI-1913:


I will give a pr for this soon.

> Using streams instead of loops for readline
> ---
>
> Key: HUDI-1913
> URL: https://issues.apache.org/jira/browse/HUDI-1913
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI, Common Core, Utilities
>Reporter: zhangminglei
>Priority: Minor
>
> Using streams instead of loops improves the readability and makes the code 
> more compact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1913) Using streams instead of loops for readline

2021-05-18 Thread zhangminglei (Jira)
zhangminglei created HUDI-1913:
--

 Summary: Using streams instead of loops for readline
 Key: HUDI-1913
 URL: https://issues.apache.org/jira/browse/HUDI-1913
 Project: Apache Hudi
  Issue Type: Improvement
  Components: CLI, Common Core, Utilities
Reporter: zhangminglei


Using streams instead of loops improves the readability and makes the code more 
compact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)