[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379662#comment-17379662
 ] 

ASF GitHub Bot commented on HUDI-2168:
--

hudi-bot edited a comment on pull request #3264:
URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938


   
   ## CI report:
   
   * e8e5e310224eee469a19bcfe7af537154843c318 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>  Labels: pull-request-available
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3264:
URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938


   
   ## CI report:
   
   * e8e5e310224eee469a19bcfe7af537154843c318 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] moranyuwen commented on pull request #3255: Update HoodieFlinkStreamer.java

2021-07-12 Thread GitBox


moranyuwen commented on pull request #3255:
URL: https://github.com/apache/hudi/pull/3255#issuecomment-878816543


   > @moranyuwen Thanks for your contribution. Please follow the official 
contribution guide: http://hudi.apache.org/contributing to refactor your PR.
   
   Please close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379638#comment-17379638
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=879)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379639#comment-17379639
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=879)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379633#comment-17379633
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

moranyuwen commented on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878814754


   Please close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379630#comment-17379630
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379632#comment-17379632
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] moranyuwen commented on pull request #3261: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


moranyuwen commented on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878814754


   Please close this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379628#comment-17379628
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

moranyuwen commented on a change in pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#discussion_r668461881



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java
##
@@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception {
 .transform(
 "bucket_assigner",
 TypeInformation.of(HoodieRecord.class),
-new KeyedProcessOperator<>(new BucketAssignFunction<>(conf)))
+new BucketAssignOperator<>(new BucketAssignFunction<>(conf)))
 .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS))

Review comment:
   yes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] moranyuwen commented on a change in pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


moranyuwen commented on a change in pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#discussion_r668461881



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java
##
@@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception {
 .transform(
 "bucket_assigner",
 TypeInformation.of(HoodieRecord.class),
-new KeyedProcessOperator<>(new BucketAssignFunction<>(conf)))
+new BucketAssignOperator<>(new BucketAssignFunction<>(conf)))
 .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS))

Review comment:
   yes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379627#comment-17379627
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

codecov-commenter edited a comment on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878802290


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3261](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (afe140f) into 
[master](https://codecov.io/gh/apache/hudi/commit/c8a2033c275e21a752893fc89311e1f6846f5a78?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c8a2033) will **increase** coverage by `11.45%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3261/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3261   +/-   ##
   =
   + Coverage 47.71%   59.16%   +11.45% 
   + Complexity 5526 1212 -4314 
   =
 Files   934  169  -765 
 Lines 41456 6553-34903 
 Branches   4167  685 -3482 
   =
   - Hits  19779 3877-15902 
   + Misses19917 2397-17520 
   + Partials   1760  279 -1481 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `66.15% <ø> (+31.70%)` | :arrow_up: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `51.13% <ø> (-8.11%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-71.82%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_cont

[GitHub] [hudi] codecov-commenter edited a comment on pull request #3261: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


codecov-commenter edited a comment on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878802290


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3261](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (afe140f) into 
[master](https://codecov.io/gh/apache/hudi/commit/c8a2033c275e21a752893fc89311e1f6846f5a78?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c8a2033) will **increase** coverage by `11.45%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3261/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#3261   +/-   ##
   =
   + Coverage 47.71%   59.16%   +11.45% 
   + Complexity 5526 1212 -4314 
   =
 Files   934  169  -765 
 Lines 41456 6553-34903 
 Branches   4167  685 -3482 
   =
   - Hits  19779 3877-15902 
   + Misses19917 2397-17520 
   + Partials   1760  279 -1481 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `66.15% <ø> (+31.70%)` | :arrow_up: |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `51.13% <ø> (-8.11%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-71.82%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 

[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379623#comment-17379623
 ] 

Vinoth Chandar commented on HUDI-2151:
--

Ensure the 1024 is not blindly used always?
{code:java}
 
public static final ConfigProperty 
COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE = ConfigProperty
 .key("hoodie.copyonwrite.record.size.estimate")
 .defaultValue(String.valueOf(1024))
 .withDocumentation("The average record size. If specified, hudi will use this 
and not compute dynamically "
 + "based on the last 24 commit’s metadata. No value set as default. This is 
critical in computing "
 + "the insert parallelism and bin-packing inserts into small files.");
{code}
 

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2103) Add rebalance before index bootstrap

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379620#comment-17379620
 ] 

ASF GitHub Bot commented on HUDI-2103:
--

SEZ9 commented on pull request #3185:
URL: https://github.com/apache/hudi/pull/3185#issuecomment-878808290


   index_bootstrap 可否单独设置parallelism,运行初始化用到的资源 比后边流数据的资源大的多


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add rebalance before index bootstrap
> 
>
> Key: HUDI-2103
> URL: https://issues.apache.org/jira/browse/HUDI-2103
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: yuzhaojing
>Assignee: yuzhaojing
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> When use flink sql upsert to hudi, user always set parallelism larger than 
> kafak partition num. Now bootstrap operator need at least one element to 
> trigger loading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] SEZ9 commented on pull request #3185: [HUDI-2103] Add rebalance before index bootstrap

2021-07-12 Thread GitBox


SEZ9 commented on pull request #3185:
URL: https://github.com/apache/hudi/pull/3185#issuecomment-878808290


   index_bootstrap 可否单独设置parallelism,运行初始化用到的资源 比后边流数据的资源大的多


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379619#comment-17379619
 ] 

Vinoth Chandar commented on HUDI-2151:
--

Make this lazy?
{code:java}
 
public static final ConfigProperty FAILED_WRITES_CLEANER_POLICY_PROP = 
ConfigProperty
 .key("hoodie.cleaner.policy.failed.writes")
 .defaultValue(HoodieFailedWritesCleaningPolicy.EAGER.name())
 .withDocumentation("Cleaning policy for failed writes to be used. Hudi will 
delete any files written by "
 + "failed writes to re-claim space. Choose to perform this rollback of failed 
writes eagerly before "
 + "every writer starts (only supported for single writer) or lazily by the 
cleaner (required for multi-writers)");
{code}
 

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2150) Rename/Restructure configs for better modularity

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379618#comment-17379618
 ] 

Vinoth Chandar commented on HUDI-2150:
--

Move to PayloadConfig?

 

 
{code:java}
public static final ConfigProperty PAYLOAD_CLASS_PROP = ConfigProperty
 .key("hoodie.compaction.payload.class")
 .defaultValue(OverwriteWithLatestAvroPayload.class.getName())
 .withDocumentation("This needs to be same as class used during insert/upserts. 
Just like writing, compaction also uses "
 + "the record payload class to merge records in the log against each other, 
merge again with the base file and "
 + "produce the final record to be written after compaction.");
 
{code}
 

> Rename/Restructure configs for better modularity
> 
>
> Key: HUDI-2150
> URL: https://issues.apache.org/jira/browse/HUDI-2150
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Given we have a framework now, that can capture configs and even their 
> alternatives well, time to clean things up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379617#comment-17379617
 ] 

Vinoth Chandar commented on HUDI-2151:
--

Why would we even want to do lazyRead=false anymore?

 

 
{code:java}
public static final ConfigProperty 
COMPACTION_LAZY_BLOCK_READ_ENABLED_PROP = ConfigProperty
 .key("hoodie.compaction.lazy.block.read")
 .defaultValue("false")
 .withDocumentation("When a CompactedLogScanner merges all log files, this 
config helps to choose whether the log blocks "
 + "should be read lazily or not. Choose true to use lazy block reading (low 
memory usage, but incurs seeks to each block" +
 " header) or false for immediate block read (higher memory usage)");
 
{code}
 

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379615#comment-17379615
 ] 

Vinoth Chandar commented on HUDI-2151:
--

Switch to default payload?

 
{code:java}
public static final ConfigProperty PAYLOAD_CLASS_PROP = ConfigProperty
 .key("hoodie.compaction.payload.class")
 .defaultValue(OverwriteWithLatestAvroPayload.class.getName())
 .withDocumentation("This needs to be same as class used during insert/upserts. 
Just like writing, compaction also uses "
 + "the record payload class to merge records in the log against each other, 
merge again with the base file and "
 + "produce the final record to be written after compaction.");{code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-commenter commented on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException

2021-07-12 Thread GitBox


codecov-commenter commented on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878802290


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#3261](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (afe140f) into 
[master](https://codecov.io/gh/apache/hudi/commit/c8a2033c275e21a752893fc89311e1f6846f5a78?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (c8a2033) will **increase** coverage by `3.42%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/3261/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#3261  +/-   ##
   
   + Coverage 47.71%   51.13%   +3.42% 
   + Complexity 5526  417-5109 
   
 Files   934   67 -867 
 Lines 41456 3049   -38407 
 Branches   4167  330-3837 
   
   - Hits  19779 1559   -18220 
   + Misses19917 1350   -18567 
   + Partials   1760  140-1620 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `?` | |
   | hudiclient | `?` | |
   | hudicommon | `?` | |
   | hudiflink | `?` | |
   | hudihadoopmr | `?` | |
   | hudisparkdatasource | `?` | |
   | hudisync | `?` | |
   | huditimelineservice | `?` | |
   | hudiutilities | `51.13% <ø> (-8.11%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | :arrow_down: |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | :arrow_down: |
   | 
[...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-71.82%)` | :arrow_down: |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | :arrow_down: |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.j

[jira] [Commented] (HUDI-2150) Rename/Restructure configs for better modularity

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379612#comment-17379612
 ] 

Vinoth Chandar commented on HUDI-2150:
--

This should be renamed consistent with base file terminlogy

 
{code:java}
public static final ConfigProperty PARQUET_SMALL_FILE_LIMIT_BYTES = 
ConfigProperty
 .key("hoodie.parquet.small.file.limit")
 .defaultValue(String.valueOf(104857600))
 .withDocumentation("Upsert uses this file size to compact new data onto 
existing files. "
 + "By default, treat any file <= 100MB as a small file.");{code}

> Rename/Restructure configs for better modularity
> 
>
> Key: HUDI-2150
> URL: https://issues.apache.org/jira/browse/HUDI-2150
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Given we have a framework now, that can capture configs and even their 
> alternatives well, time to clean things up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2150) Rename/Restructure configs for better modularity

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379561#comment-17379561
 ] 

Vinoth Chandar edited comment on HUDI-2150 at 7/13/21, 5:57 AM:


Cleaner related configs to be moved out of HoodieCompactionConfig into its own 
HoodieCleanConfig. 

 

Archival related configs to be moved out of HoodieCompactionConfig into its own 
HoodieArchivalConfig.


was (Author: vc):
Cleaner related configs to be moved out of HoodieCompactionConfig into its own 
HoodieCleanConfig

> Rename/Restructure configs for better modularity
> 
>
> Key: HUDI-2150
> URL: https://issues.apache.org/jira/browse/HUDI-2150
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Given we have a framework now, that can capture configs and even their 
> alternatives well, time to clean things up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379611#comment-17379611
 ] 

ASF GitHub Bot commented on HUDI-2168:
--

hudi-bot edited a comment on pull request #3264:
URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938


   
   ## CI report:
   
   * e8e5e310224eee469a19bcfe7af537154843c318 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>  Labels: pull-request-available
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3264:
URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938


   
   ## CI report:
   
   * e8e5e310224eee469a19bcfe7af537154843c318 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1548) Fix documentation around schema evolution

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379610#comment-17379610
 ] 

ASF GitHub Bot commented on HUDI-1548:
--

codope commented on pull request #3257:
URL: https://github.com/apache/hudi/pull/3257#issuecomment-878800281


   @vinothchandar @n3nash @nsivabalan Can you please review the doc? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fix documentation around schema evolution 
> --
>
> Key: HUDI-1548
> URL: https://issues.apache.org/jira/browse/HUDI-1548
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: Nishith Agarwal
>Priority: Blocker
>  Labels: ', pull-request-available, sev:high, user-support-issues
> Fix For: 0.9.0
>
>
> Clearly call out what kind of schema evolution is supported by hudi in 
> documentation .
> Context: https://github.com/apache/hudi/issues/2331



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope commented on pull request #3257: [HUDI-1548] Add documentation for schema evolution

2021-07-12 Thread GitBox


codope commented on pull request #3257:
URL: https://github.com/apache/hudi/pull/3257#issuecomment-878800281


   @vinothchandar @n3nash @nsivabalan Can you please review the doc? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379609#comment-17379609
 ] 

ASF GitHub Bot commented on HUDI-2168:
--

hudi-bot commented on pull request #3264:
URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938


   
   ## CI report:
   
   * e8e5e310224eee469a19bcfe7af537154843c318 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>  Labels: pull-request-available
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379608#comment-17379608
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user

2021-07-12 Thread GitBox


hudi-bot commented on pull request #3264:
URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938


   
   ## CI report:
   
   * e8e5e310224eee469a19bcfe7af537154843c318 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379607#comment-17379607
 ] 

ASF GitHub Bot commented on HUDI-2168:
--

veenaypatil opened a new pull request #3264:
URL: https://github.com/apache/hudi/pull/3264


   ## What is the purpose of the pull request
   
   To fix access control exception while running the test cases which involves 
starting the Hive service
   
   ## Brief change log
   
   Set config 
   ```
   config.setBoolean("dfs.permissions",false);
   ```
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   - Verified the tests are running locally after this change
   
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2168:
-
Labels: pull-request-available  (was: )

> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>  Labels: pull-request-available
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] veenaypatil opened a new pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user

2021-07-12 Thread GitBox


veenaypatil opened a new pull request #3264:
URL: https://github.com/apache/hudi/pull/3264


   ## What is the purpose of the pull request
   
   To fix access control exception while running the test cases which involves 
starting the Hive service
   
   ## Brief change log
   
   Set config 
   ```
   config.setBoolean("dfs.permissions",false);
   ```
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   - Verified the tests are running locally after this change
   
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379600#comment-17379600
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379599#comment-17379599
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379596#comment-17379596
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379595#comment-17379595
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   * e56bac615f087cec7817b846809c9f8fd0cc20a5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379593#comment-17379593
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379586#comment-17379586
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread Vinay (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HUDI-2168:

Status: In Progress  (was: Open)

> AccessControlException for anonymous user
> -
>
> Key: HUDI-2168
> URL: https://issues.apache.org/jira/browse/HUDI-2168
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Testing
>Reporter: Vinay
>Assignee: Vinay
>Priority: Trivial
>
> Users are facing the following exception while executing test case dependent 
> on starting Hive service
>  
> {code:java}
> Got exception: org.apache.hadoop.security.AccessControlException Permission 
> denied: user=anonymous, access=WRITE
> {code}
> This is specifically happening at the time of clearing Hive DB
> {code:java}
> client.updateHiveSQL("drop database if exists " + 
> hiveSyncConfig.databaseName);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2168) AccessControlException for anonymous user

2021-07-12 Thread Vinay (Jira)
Vinay created HUDI-2168:
---

 Summary: AccessControlException for anonymous user
 Key: HUDI-2168
 URL: https://issues.apache.org/jira/browse/HUDI-2168
 Project: Apache Hudi
  Issue Type: Task
  Components: Testing
Reporter: Vinay
Assignee: Vinay


Users are facing the following exception while executing test case dependent on 
starting Hive service

 
{code:java}
Got exception: org.apache.hadoop.security.AccessControlException Permission 
denied: user=anonymous, access=WRITE
{code}
This is specifically happening at the time of clearing Hive DB
{code:java}
client.updateHiveSQL("drop database if exists " + hiveSyncConfig.databaseName);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379584#comment-17379584
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-07-12 Thread Vinoth Govindarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379583#comment-17379583
 ] 

Vinoth Govindarajan commented on HUDI-1985:
---

Hi [~xushiyan],
I have experience in the past building websites, I can volunteer to work on 
this re-design.

 

> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Priority: Blocker
>  Labels: documentation
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379580#comment-17379580
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379577#comment-17379577
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379576#comment-17379576
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

danny0405 commented on a change in pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#discussion_r668418303



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java
##
@@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception {
 .transform(
 "bucket_assigner",
 TypeInformation.of(HoodieRecord.class),
-new KeyedProcessOperator<>(new BucketAssignFunction<>(conf)))
+new BucketAssignOperator<>(new BucketAssignFunction<>(conf)))
 .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS))

Review comment:
   Nice catch, can we fix the indentation ? And there is another PR same 
with this, can we close that ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 commented on a change in pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


danny0405 commented on a change in pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#discussion_r668418303



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java
##
@@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception {
 .transform(
 "bucket_assigner",
 TypeInformation.of(HoodieRecord.class),
-new KeyedProcessOperator<>(new BucketAssignFunction<>(conf)))
+new BucketAssignOperator<>(new BucketAssignFunction<>(conf)))
 .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS))

Review comment:
   Nice catch, can we fix the indentation ? And there is another PR same 
with this, can we close that ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379575#comment-17379575
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot commented on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


hudi-bot commented on pull request #3263:
URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248


   
   ## CI report:
   
   * f1299ed52dcf90635d4f11fef040255cfda9f35b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379572#comment-17379572
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

moranyuwen opened a new pull request #3263:
URL: https://github.com/apache/hudi/pull/3263


   JIRA Issue: https://issues.apache.org/jira/browse/HUDI-2153
   
   When you run HoodieFlinkStreamer to write data, the context in the 
bucketAssignment function load is bull, and the update resolvesthat the context 
is null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] moranyuwen opened a new pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException

2021-07-12 Thread GitBox


moranyuwen opened a new pull request #3263:
URL: https://github.com/apache/hudi/pull/3263


   JIRA Issue: https://issues.apache.org/jira/browse/HUDI-2153
   
   When you run HoodieFlinkStreamer to write data, the context in the 
bucketAssignment function load is bull, and the update resolvesthat the context 
is null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config (#3250)

2021-07-12 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b0089b8  [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config 
(#3250)
b0089b8 is described below

commit b0089b894ad12da11fbd6a0fb08508c7adee68e6
Author: Sagar Sumit 
AuthorDate: Tue Jul 13 09:54:40 2021 +0530

[MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config (#3250)
---
 .../java/org/apache/hudi/config/HoodieWriteConfig.java |  3 ++-
 .../java/org/apache/hudi/config/TestHoodieWriteConfig.java | 14 --
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 20d2846..e2e295d 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -339,8 +339,9 @@ public class HoodieWriteConfig extends HoodieConfig {
   .withDocumentation("");
 
   public static final ConfigProperty 
EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = ConfigProperty
-  .key(AVRO_SCHEMA + ".externalTransformation")
+  .key(AVRO_SCHEMA.key() + ".external.transformation")
   .defaultValue("false")
+  .withAlternatives(AVRO_SCHEMA.key() + ".externalTransformation")
   .withDocumentation("");
 
   private ConsistencyGuardConfig consistencyGuardConfig;
diff --git 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java
index 7661e1d..89f7a97 100644
--- 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java
@@ -23,6 +23,8 @@ import org.apache.hudi.config.HoodieWriteConfig.Builder;
 
 import org.apache.hudi.index.HoodieIndex;
 import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.ValueSource;
 
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
@@ -33,16 +35,23 @@ import java.util.Map;
 import java.util.Properties;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
 
 public class TestHoodieWriteConfig {
 
-  @Test
-  public void testPropertyLoading() throws IOException {
+  @ParameterizedTest
+  @ValueSource(booleans = {true, false})
+  public void testPropertyLoading(boolean withAlternative) throws IOException {
 Builder builder = HoodieWriteConfig.newBuilder().withPath("/tmp");
 Map params = new HashMap<>(3);
 params.put(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP.key(), 
"1");
 params.put(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP_PROP.key(), "5");
 params.put(HoodieCompactionConfig.MIN_COMMITS_TO_KEEP_PROP.key(), "2");
+if (withAlternative) {
+  params.put("hoodie.avro.schema.externalTransformation", "true");
+} else {
+  params.put("hoodie.avro.schema.external.transformation", "true");
+}
 ByteArrayOutputStream outStream = saveParamsIntoOutputStream(params);
 ByteArrayInputStream inputStream = new 
ByteArrayInputStream(outStream.toByteArray());
 try {
@@ -54,6 +63,7 @@ public class TestHoodieWriteConfig {
 HoodieWriteConfig config = builder.build();
 assertEquals(5, config.getMaxCommitsToKeep());
 assertEquals(2, config.getMinCommitsToKeep());
+assertTrue(config.shouldUseExternalSchemaTransformation());
   }
 
   @Test


[GitHub] [hudi] nsivabalan merged pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config

2021-07-12 Thread GitBox


nsivabalan merged pull request #3250:
URL: https://github.com/apache/hudi/pull/3250


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379567#comment-17379567
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379566#comment-17379566
 ] 

ASF GitHub Bot commented on HUDI-2161:
--

hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   * 8a212fd77769cbf7e248e971f66109381ba80f71 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support to disable meta column to BulkInsert Row Writer path
> 
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931


   
   ## CI report:
   
   * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868)
 
   * 8a212fd77769cbf7e248e971f66109381ba80f71 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2150) Rename/Restructure configs for better modularity

2021-07-12 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2150:
-
Description: Given we have a framework now, that can capture configs and 
even their alternatives well, time to clean things up.  (was: * Rename 
HoodieWriteConfig to HoodieClientConfig 
 * Move bunch of configs from  CompactionConfig to StorageConfig 
 * Introduce new HoodieCleanConfig
 * Should we consider lombok or something to automate the 
defaults/getters/setters
 * Consistent name of properties/defaults 
 * Enforce bounds more strictly)

> Rename/Restructure configs for better modularity
> 
>
> Key: HUDI-2150
> URL: https://issues.apache.org/jira/browse/HUDI-2150
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Given we have a framework now, that can capture configs and even their 
> alternatives well, time to clean things up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2150) Rename/Restructure configs for better modularity

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379561#comment-17379561
 ] 

Vinoth Chandar commented on HUDI-2150:
--

Cleaner related configs to be moved out of HoodieCompactionConfig into its own 
HoodieCleanConfig

> Rename/Restructure configs for better modularity
> 
>
> Key: HUDI-2150
> URL: https://issues.apache.org/jira/browse/HUDI-2150
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> Given we have a framework now, that can capture configs and even their 
> alternatives well, time to clean things up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379557#comment-17379557
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot edited a comment on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128


   
   ## CI report:
   
   * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128


   
   ## CI report:
   
   * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2167) HoodieCompactionConfig get HoodieCleaningPolicy NullPointerException

2021-07-12 Thread tsianglei (Jira)
tsianglei created HUDI-2167:
---

 Summary: HoodieCompactionConfig get HoodieCleaningPolicy 
NullPointerException
 Key: HUDI-2167
 URL: https://issues.apache.org/jira/browse/HUDI-2167
 Project: Apache Hudi
  Issue Type: Bug
  Components: CLI, Flink Integration
Reporter: tsianglei


Caused by: java.lang.NullPointerException: Name is null
 at java.lang.Enum.valueOf(Enum.java:236) ~[?:1.8.0_221]
 at 
org.apache.hudi.common.model.HoodieCleaningPolicy.valueOf(HoodieCleaningPolicy.java:24)
 ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
 at 
org.apache.hudi.config.HoodieCompactionConfig$Builder.build(HoodieCompactionConfig.java:368)
 ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
 at 
org.apache.hudi.util.StreamerUtil.getHoodieClientConfig(StreamerUtil.java:155) 
~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
 at org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:277) 
~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
 at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:154)
 ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT]
 at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:189)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
 at 
org.apache.flink.runtime.scheduler.SchedulerBase.startAllOperatorCoordinators(SchedulerBase.java:1253)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
 at 
org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:624)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
 at 
org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:1032)
 ~[flink-dist_2.11-1.12.2.jar:1.12.2]
 at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:705) 
~[?:1.8.0_221]
 ... 27 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379546#comment-17379546
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379543#comment-17379543
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot edited a comment on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128


   
   ## CI report:
   
   * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128


   
   ## CI report:
   
   * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379542#comment-17379542
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

hudi-bot commented on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128


   
   ## CI report:
   
   * afe140f7b9169e5a6129a10a6a12f839658c7b08 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException

2021-07-12 Thread GitBox


hudi-bot commented on pull request #3261:
URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128


   
   ## CI report:
   
   * afe140f7b9169e5a6129a10a6a12f839658c7b08 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] izhangzhihao opened a new issue #3262: [SUPPORT] No successful commits under path

2021-07-12 Thread GitBox


izhangzhihao opened a new issue #3262:
URL: https://github.com/apache/hudi/issues/3262


   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   code https://github.com/izhangzhihao/Real-time-Data-Warehouse/tree/hudi
   
   ###  create table
   
   ```sql
   CREATE TABLE accident_claims
   (
   claim_idBIGINT,
   claim_total DOUBLE,
   claim_total_receipt VARCHAR(50),
   claim_currency  VARCHAR(3),
   member_id   INT,
   accident_date   DATE,
   accident_type   VARCHAR(20),
   accident_detail VARCHAR(20),
   claim_date  DATE,
   claim_statusVARCHAR(10),
   ts_created  TIMESTAMP(3),
   ts_updated  TIMESTAMP(3),
   ds  DATE,
   PRIMARY KEY (claim_id) NOT ENFORCED
   ) PARTITIONED BY (ds) WITH (
 'connector'='hudi',
 'path' = '/data/dwd/accident_claims',
 'table.type' = 'MERGE_ON_READ',
 'read.streaming.enabled' = 'true',
 'write.batch.size' = '1',
 'write.task.max.size' = '1',
 'write.tasks' = '1',
 'compaction.tasks' = '1',
 'compaction.delta_seconds' = '60',
 'write.precombine.field' = 'ts_updated',
 'read.tasks' = '1',
 'read.streaming.check-interval' = '5',
 'read.streaming.start-commit' = '20210712134429',
   );
   ```
   
   ### insert from CDC change stream
   
   ```sql
   INSERT INTO dwd.accident_claims
   SELECT claim_id,
  claim_total,
  claim_total_receipt,
  claim_currency,
  member_id,
  CAST (accident_date as DATE),
  accident_type,
  accident_detail,
  CAST (claim_date as DATE),
  claim_status,
  CAST (ts_created as TIMESTAMP),
  CAST (ts_updated as TIMESTAMP),
  CAST (SUBSTRING(claim_date, 0, 9) as DATE)
   FROM datasource.accident_claims;
   ```
   
   **Expected behavior**
   
   ```
   SELECT * FROM accident_claims;
   ```
   
   should return results
   
   But got:
   
   ```
   Flink SQL> SELECT * FROM accident_claims;
   [ERROR] Could not execute SQL statement. Reason:
   org.apache.hudi.exception.HoodieException: No successful commits under path 
/data/dwd/accident_claims
   ```
   
   But the sample code works:
   
   ```
   CREATE TABLE t1(
 uuid VARCHAR(20), -- you can use 'PRIMARY KEY NOT ENFORCED' syntax to mark 
the field as record key
 name VARCHAR(10),
 age INT,
 ts TIMESTAMP(3),
 `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
 'connector' = 'hudi',
 'path' = '/data/t1',
 'write.tasks' = '1', -- default is 4 ,required more resource
 'compaction.tasks' = '1', -- default is 10 ,required more resource
 'table.type' = 'COPY_ON_WRITE', -- this creates a MERGE_ON_READ table, by 
default is COPY_ON_WRITE
 'read.tasks' = '1', -- default is 4 ,required more resource
 'read.streaming.enabled' = 'true',  -- this option enable the streaming 
read
 'read.streaming.start-commit' = '20210712134429', -- specifies the start 
commit instant time
 'read.streaming.check-interval' = '4' -- specifies the check interval for 
finding new source commits, default 60s.
   );
   
   -- insert data using values
   INSERT INTO t1 VALUES
 ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
 ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
 ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
 ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
 ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
 ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
 ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
 ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
   
   SELECT * FROM t1;
   ```
   
   So I didn't get what's wrong here...
   
   **Environment Description**
   
   * Hudi version : 0.9.0 SNAPSHOT
   
   * Flink version :  1.12.2
   
   * Hive version : none
   
   * Hadoop version : 2.8.3
   
   * Storage (HDFS/S3/GCS..) : local file system
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   
![image](https://user-images.githubusercontent.com/12044174/125382900-20040c80-e3c9-11eb-8ab6-be9a7c3072f5.png)
   
   Taskmanager log: 
[taskmanager.log.zip](https://github.com/apache/hudi/files/6805564/taskmanager.log.zip)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2153:
-
Labels: pull-request-available  (was: )

> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379541#comment-17379541
 ] 

ASF GitHub Bot commented on HUDI-2153:
--

moranyuwen opened a new pull request #3261:
URL: https://github.com/apache/hudi/pull/3261


   Running HoodieFlinkStreamer will encounter an exception in the 
bucketAssignFunction class where the context is null
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BucketAssignFunction NullPointerException
> -
>
> Key: HUDI-2153
> URL: https://issues.apache.org/jira/browse/HUDI-2153
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: moran
>Priority: Major
> Fix For: 0.9.0
>
>
> java.lang.NullPointerException
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198)
>   at 
> org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159)
>   at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
>   at java.lang.Thread.run(Thread.java:748)
> ERROR at 
> Line 197 of the BucketAssignFunction class  
> (this.context.setCurrentKey(recordKey))
> Why is this context null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] moranyuwen opened a new pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException

2021-07-12 Thread GitBox


moranyuwen opened a new pull request #3261:
URL: https://github.com/apache/hudi/pull/3261


   Running HoodieFlinkStreamer will encounter an exception in the 
bucketAssignFunction class where the context is null
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379522#comment-17379522
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379520#comment-17379520
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878723946


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-12 Thread GitBox


zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878723946


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config

2021-07-12 Thread GitBox


codope commented on a change in pull request #3250:
URL: https://github.com/apache/hudi/pull/3250#discussion_r668367569



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
##
@@ -339,7 +339,7 @@
   .withDocumentation("");
 
   public static final ConfigProperty 
EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = ConfigProperty
-  .key(AVRO_SCHEMA + ".externalTransformation")
+  .key(AVRO_SCHEMA.key() + ".externalTransformation")

Review comment:
   Changed the config key to `hoodie.avro.schema.external.transformation` 
and also have `hoodie.avro.schema.externalTransformation` as alternative for 
backwards compatibility.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379496#comment-17379496
 ] 

Vinoth Chandar commented on HUDI-2151:
--

These need a default value. 

 
{code:java}
public static final ConfigProperty ZK_PORT_PROP = ConfigProperty
 .key(ZK_PORT_PROP_KEY)
 .noDefaultValue()
 .sinceVersion("0.8.0")
 .withDocumentation("Zookeeper port to connect to.");

public static final ConfigProperty ZK_LOCK_KEY_PROP = ConfigProperty
 .key(ZK_LOCK_KEY_PROP_KEY)
 .noDefaultValue()
 .sinceVersion("0.8.0")
 .withDocumentation("Key name under base_path at which to create a ZNode and 
acquire lock. "
 + "Final path on zk will look like base_path/lock_key. We recommend setting 
this to the table name");{code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2151:
-
Comment: was deleted

(was: Is this correct? 5s?

 
{code:java}
public static final ConfigProperty 
LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP = ConfigProperty
 .key(LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP_KEY)
 .defaultValue(String.valueOf(5000L))
 .sinceVersion("0.8.0")
 .withDocumentation("Maximum amount of time to wait between retries by lock 
provider client. This bounds" +
 " the maximum delay from the exponential backoff.");{code})

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2144) Offline clustering(independent sparkJob) will cause insert action losing data

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379491#comment-17379491
 ] 

ASF GitHub Bot commented on HUDI-2144:
--

satishkotha merged pull request #3240:
URL: https://github.com/apache/hudi/pull/3240


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Offline clustering(independent sparkJob) will cause insert action losing data
> -
>
> Key: HUDI-2144
> URL: https://issues.apache.org/jira/browse/HUDI-2144
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-13-52-00-089.png
>
>
> For now we have two kinds of pipeline for Hudi using spark:
>  # Streaming insert data to specific partition
>  # Offline clustering spark 
> job(`org.apache.hudi.utilities.HoodieClusteringJob`) to optimize file size 
> pipeline 1 created
> But here is a bug we met that will lose data
> These steps can make the problem reproduce stably :
>  # Submit a spark job to Ingest data1 using insert mode.
>  # Schedule a clustering plan using 
> `org.apache.hudi.utilities.HoodieClusteringJob`
>  # Submit a spark job again to Ingest data2 using insert mode(Ensure that 
> there is new file slice created in the same file group which means small file 
> tuning for insert is working). Suppose this file group is called file group 1 
> and new file slice is called file slice 2.
>  # Execute that clustering job step2 planed.
>  # Query data1+data2 you will find new data for a  is lost compared with 
> common ingestion without clustering
>  
>   !image-2021-07-08-13-52-00-089.png|width=922,height=728!
> Here is the root cause:
> When ingest data using insert mode, Hudi will find small files and try to 
> append new data to them ,aiming to tuning data file size.
> [https://github.com/apache/hudi/blob/650c4455c600b0346fed8b5b6aa4cc0bf3452e8c/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L149]
> is try to filter Small Files In Clustering but only works when user set 
> `hoodie.clustering.inline` true which is not good enough when users using 
> offline clustering.
> I just raise a PR try to fix it and tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (ca440cc -> c8a2033)

2021-07-12 Thread satish
This is an automated email from the ASF dual-hosted git repository.

satish pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from ca440cc  [HUDI-2107] Support Read Log Only MOR Table For Spark (#3193)
 add c8a2033  [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) 
will cause insert action losing data (#3240)

No new revisions were added by this update.

Summary of changes:
 .../table/action/commit/UpsertPartitioner.java |  2 +-
 .../table/action/commit/TestUpsertPartitioner.java | 45 +-
 .../hudi/common/testutils/ClusteringTestUtils.java | 54 ++
 3 files changed, 99 insertions(+), 2 deletions(-)
 create mode 100644 
hudi-common/src/test/java/org/apache/hudi/common/testutils/ClusteringTestUtils.java


[GitHub] [hudi] satishkotha merged pull request #3240: [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data

2021-07-12 Thread GitBox


satishkotha merged pull request #3240:
URL: https://github.com/apache/hudi/pull/3240


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2144) Offline clustering(independent sparkJob) will cause insert action losing data

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379490#comment-17379490
 ] 

ASF GitHub Bot commented on HUDI-2144:
--

lw309637554 commented on a change in pull request #3240:
URL: https://github.com/apache/hudi/pull/3240#discussion_r668356708



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##
@@ -146,7 +146,7 @@ private int addUpdateBucket(String partitionPath, String 
fileIdHint) {
* @return smallFiles not in clustering
*/
   private List filterSmallFilesInClustering(final Set 
pendingClusteringFileGroupsId, final List smallFiles) {
-if (this.config.isClusteringEnabled()) {

Review comment:
   @satishkotha @zhangyue19921010 
   Use "if (!pendingClusteringFileGroupsId.isEmpty())" will improve ease of 
use. 
   Another need to  modify. But if this will bring performance loss? 
@satishkotha 
   
   "  private JavaRDD> 
clusteringHandleUpdate(JavaRDD> inputRecordsRDD) {
   if (config.isClusteringEnabled()) {"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Offline clustering(independent sparkJob) will cause insert action losing data
> -
>
> Key: HUDI-2144
> URL: https://issues.apache.org/jira/browse/HUDI-2144
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-13-52-00-089.png
>
>
> For now we have two kinds of pipeline for Hudi using spark:
>  # Streaming insert data to specific partition
>  # Offline clustering spark 
> job(`org.apache.hudi.utilities.HoodieClusteringJob`) to optimize file size 
> pipeline 1 created
> But here is a bug we met that will lose data
> These steps can make the problem reproduce stably :
>  # Submit a spark job to Ingest data1 using insert mode.
>  # Schedule a clustering plan using 
> `org.apache.hudi.utilities.HoodieClusteringJob`
>  # Submit a spark job again to Ingest data2 using insert mode(Ensure that 
> there is new file slice created in the same file group which means small file 
> tuning for insert is working). Suppose this file group is called file group 1 
> and new file slice is called file slice 2.
>  # Execute that clustering job step2 planed.
>  # Query data1+data2 you will find new data for a  is lost compared with 
> common ingestion without clustering
>  
>   !image-2021-07-08-13-52-00-089.png|width=922,height=728!
> Here is the root cause:
> When ingest data using insert mode, Hudi will find small files and try to 
> append new data to them ,aiming to tuning data file size.
> [https://github.com/apache/hudi/blob/650c4455c600b0346fed8b5b6aa4cc0bf3452e8c/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L149]
> is try to filter Small Files In Clustering but only works when user set 
> `hoodie.clustering.inline` true which is not good enough when users using 
> offline clustering.
> I just raise a PR try to fix it and tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] lw309637554 commented on a change in pull request #3240: [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data

2021-07-12 Thread GitBox


lw309637554 commented on a change in pull request #3240:
URL: https://github.com/apache/hudi/pull/3240#discussion_r668356708



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##
@@ -146,7 +146,7 @@ private int addUpdateBucket(String partitionPath, String 
fileIdHint) {
* @return smallFiles not in clustering
*/
   private List filterSmallFilesInClustering(final Set 
pendingClusteringFileGroupsId, final List smallFiles) {
-if (this.config.isClusteringEnabled()) {

Review comment:
   @satishkotha @zhangyue19921010 
   Use "if (!pendingClusteringFileGroupsId.isEmpty())" will improve ease of 
use. 
   Another need to  modify. But if this will bring performance loss? 
@satishkotha 
   
   "  private JavaRDD> 
clusteringHandleUpdate(JavaRDD> inputRecordsRDD) {
   if (config.isClusteringEnabled()) {"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2151) Make performant out-of-box configs

2021-07-12 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379488#comment-17379488
 ] 

Vinoth Chandar commented on HUDI-2151:
--

Is this correct? 5s?

 
{code:java}
public static final ConfigProperty 
LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP = ConfigProperty
 .key(LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP_KEY)
 .defaultValue(String.valueOf(5000L))
 .sinceVersion("0.8.0")
 .withDocumentation("Maximum amount of time to wait between retries by lock 
provider client. This bounds" +
 " the maximum delay from the exponential backoff.");{code}

> Make performant out-of-box configs
> --
>
> Key: HUDI-2151
> URL: https://issues.apache.org/jira/browse/HUDI-2151
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup, Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> We have quite a few configs which deliver better performance or usability, 
> but guarded by flags. 
>  This is to identify them, change them, test (functionally, perf) and make 
> them default
>  
> Need to ensure we also capture all the backwards compatibility issues that 
> can arise



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2144) Offline clustering(independent sparkJob) will cause insert action losing data

2021-07-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379487#comment-17379487
 ] 

ASF GitHub Bot commented on HUDI-2144:
--

lw309637554 commented on a change in pull request #3240:
URL: https://github.com/apache/hudi/pull/3240#discussion_r668354266



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##
@@ -146,7 +146,7 @@ private int addUpdateBucket(String partitionPath, String 
fileIdHint) {
* @return smallFiles not in clustering
*/
   private List filterSmallFilesInClustering(final Set 
pendingClusteringFileGroupsId, final List smallFiles) {
-if (this.config.isClusteringEnabled()) {

Review comment:
   @satishkotha @zhangyue19921010
   At first we have two config for clustering. If set 
ASYNC_CLUSTERING_ENABLE_OPT_KEY will be ok.
 public boolean isAsyncClusteringEnabled() {
   return 
Boolean.parseBoolean(props.getProperty(HoodieClusteringConfig.ASYNC_CLUSTERING_ENABLE_OPT_KEY));
 }
   
 public boolean isClusteringEnabled() {
   // TODO: future support async clustering
   return inlineClusteringEnabled() || isAsyncClusteringEnabled();
 }





-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Offline clustering(independent sparkJob) will cause insert action losing data
> -
>
> Key: HUDI-2144
> URL: https://issues.apache.org/jira/browse/HUDI-2144
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-07-08-13-52-00-089.png
>
>
> For now we have two kinds of pipeline for Hudi using spark:
>  # Streaming insert data to specific partition
>  # Offline clustering spark 
> job(`org.apache.hudi.utilities.HoodieClusteringJob`) to optimize file size 
> pipeline 1 created
> But here is a bug we met that will lose data
> These steps can make the problem reproduce stably :
>  # Submit a spark job to Ingest data1 using insert mode.
>  # Schedule a clustering plan using 
> `org.apache.hudi.utilities.HoodieClusteringJob`
>  # Submit a spark job again to Ingest data2 using insert mode(Ensure that 
> there is new file slice created in the same file group which means small file 
> tuning for insert is working). Suppose this file group is called file group 1 
> and new file slice is called file slice 2.
>  # Execute that clustering job step2 planed.
>  # Query data1+data2 you will find new data for a  is lost compared with 
> common ingestion without clustering
>  
>   !image-2021-07-08-13-52-00-089.png|width=922,height=728!
> Here is the root cause:
> When ingest data using insert mode, Hudi will find small files and try to 
> append new data to them ,aiming to tuning data file size.
> [https://github.com/apache/hudi/blob/650c4455c600b0346fed8b5b6aa4cc0bf3452e8c/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L149]
> is try to filter Small Files In Clustering but only works when user set 
> `hoodie.clustering.inline` true which is not good enough when users using 
> offline clustering.
> I just raise a PR try to fix it and tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >