[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379662#comment-17379662 ] ASF GitHub Bot commented on HUDI-2168: -- hudi-bot edited a comment on pull request #3264: URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938 ## CI report: * e8e5e310224eee469a19bcfe7af537154843c318 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > Labels: pull-request-available > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user
hudi-bot edited a comment on pull request #3264: URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938 ## CI report: * e8e5e310224eee469a19bcfe7af537154843c318 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] moranyuwen commented on pull request #3255: Update HoodieFlinkStreamer.java
moranyuwen commented on pull request #3255: URL: https://github.com/apache/hudi/pull/3255#issuecomment-878816543 > @moranyuwen Thanks for your contribution. Please follow the official contribution guide: http://hudi.apache.org/contributing to refactor your PR. Please close this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379638#comment-17379638 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=879) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379639#comment-17379639 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=879) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379633#comment-17379633 ] ASF GitHub Bot commented on HUDI-2153: -- moranyuwen commented on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878814754 Please close this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379630#comment-17379630 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379632#comment-17379632 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] moranyuwen commented on pull request #3261: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
moranyuwen commented on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878814754 Please close this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) * 9bbb4762bed91d2d4cef01c6e3d274c667235e2c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=878) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379628#comment-17379628 ] ASF GitHub Bot commented on HUDI-2153: -- moranyuwen commented on a change in pull request #3263: URL: https://github.com/apache/hudi/pull/3263#discussion_r668461881 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java ## @@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception { .transform( "bucket_assigner", TypeInformation.of(HoodieRecord.class), -new KeyedProcessOperator<>(new BucketAssignFunction<>(conf))) +new BucketAssignOperator<>(new BucketAssignFunction<>(conf))) .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS)) Review comment: yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] moranyuwen commented on a change in pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
moranyuwen commented on a change in pull request #3263: URL: https://github.com/apache/hudi/pull/3263#discussion_r668461881 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java ## @@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception { .transform( "bucket_assigner", TypeInformation.of(HoodieRecord.class), -new KeyedProcessOperator<>(new BucketAssignFunction<>(conf))) +new BucketAssignOperator<>(new BucketAssignFunction<>(conf))) .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS)) Review comment: yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379627#comment-17379627 ] ASF GitHub Bot commented on HUDI-2153: -- codecov-commenter edited a comment on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878802290 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3261](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (afe140f) into [master](https://codecov.io/gh/apache/hudi/commit/c8a2033c275e21a752893fc89311e1f6846f5a78?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8a2033) will **increase** coverage by `11.45%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3261/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3261 +/- ## = + Coverage 47.71% 59.16% +11.45% + Complexity 5526 1212 -4314 = Files 934 169 -765 Lines 41456 6553-34903 Branches 4167 685 -3482 = - Hits 19779 3877-15902 + Misses19917 2397-17520 + Partials 1760 279 -1481 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `66.15% <ø> (+31.70%)` | :arrow_up: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `51.13% <ø> (-8.11%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==) | `0.00% <0.00%> (-71.82%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_cont
[GitHub] [hudi] codecov-commenter edited a comment on pull request #3261: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
codecov-commenter edited a comment on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878802290 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3261](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (afe140f) into [master](https://codecov.io/gh/apache/hudi/commit/c8a2033c275e21a752893fc89311e1f6846f5a78?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8a2033) will **increase** coverage by `11.45%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3261/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3261 +/- ## = + Coverage 47.71% 59.16% +11.45% + Complexity 5526 1212 -4314 = Files 934 169 -765 Lines 41456 6553-34903 Branches 4167 685 -3482 = - Hits 19779 3877-15902 + Misses19917 2397-17520 + Partials 1760 279 -1481 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `66.15% <ø> (+31.70%)` | :arrow_up: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `51.13% <ø> (-8.11%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==) | `0.00% <0.00%> (-71.82%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | |
[jira] [Commented] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379623#comment-17379623 ] Vinoth Chandar commented on HUDI-2151: -- Ensure the 1024 is not blindly used always? {code:java} public static final ConfigProperty COPY_ON_WRITE_TABLE_RECORD_SIZE_ESTIMATE = ConfigProperty .key("hoodie.copyonwrite.record.size.estimate") .defaultValue(String.valueOf(1024)) .withDocumentation("The average record size. If specified, hudi will use this and not compute dynamically " + "based on the last 24 commit’s metadata. No value set as default. This is critical in computing " + "the insert parallelism and bin-packing inserts into small files."); {code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2103) Add rebalance before index bootstrap
[ https://issues.apache.org/jira/browse/HUDI-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379620#comment-17379620 ] ASF GitHub Bot commented on HUDI-2103: -- SEZ9 commented on pull request #3185: URL: https://github.com/apache/hudi/pull/3185#issuecomment-878808290 index_bootstrap 可否单独设置parallelism,运行初始化用到的资源 比后边流数据的资源大的多 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add rebalance before index bootstrap > > > Key: HUDI-2103 > URL: https://issues.apache.org/jira/browse/HUDI-2103 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: yuzhaojing >Assignee: yuzhaojing >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > When use flink sql upsert to hudi, user always set parallelism larger than > kafak partition num. Now bootstrap operator need at least one element to > trigger loading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] SEZ9 commented on pull request #3185: [HUDI-2103] Add rebalance before index bootstrap
SEZ9 commented on pull request #3185: URL: https://github.com/apache/hudi/pull/3185#issuecomment-878808290 index_bootstrap 可否单独设置parallelism,运行初始化用到的资源 比后边流数据的资源大的多 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379619#comment-17379619 ] Vinoth Chandar commented on HUDI-2151: -- Make this lazy? {code:java} public static final ConfigProperty FAILED_WRITES_CLEANER_POLICY_PROP = ConfigProperty .key("hoodie.cleaner.policy.failed.writes") .defaultValue(HoodieFailedWritesCleaningPolicy.EAGER.name()) .withDocumentation("Cleaning policy for failed writes to be used. Hudi will delete any files written by " + "failed writes to re-claim space. Choose to perform this rollback of failed writes eagerly before " + "every writer starts (only supported for single writer) or lazily by the cleaner (required for multi-writers)"); {code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2150) Rename/Restructure configs for better modularity
[ https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379618#comment-17379618 ] Vinoth Chandar commented on HUDI-2150: -- Move to PayloadConfig? {code:java} public static final ConfigProperty PAYLOAD_CLASS_PROP = ConfigProperty .key("hoodie.compaction.payload.class") .defaultValue(OverwriteWithLatestAvroPayload.class.getName()) .withDocumentation("This needs to be same as class used during insert/upserts. Just like writing, compaction also uses " + "the record payload class to merge records in the log against each other, merge again with the base file and " + "produce the final record to be written after compaction."); {code} > Rename/Restructure configs for better modularity > > > Key: HUDI-2150 > URL: https://issues.apache.org/jira/browse/HUDI-2150 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > Given we have a framework now, that can capture configs and even their > alternatives well, time to clean things up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379617#comment-17379617 ] Vinoth Chandar commented on HUDI-2151: -- Why would we even want to do lazyRead=false anymore? {code:java} public static final ConfigProperty COMPACTION_LAZY_BLOCK_READ_ENABLED_PROP = ConfigProperty .key("hoodie.compaction.lazy.block.read") .defaultValue("false") .withDocumentation("When a CompactedLogScanner merges all log files, this config helps to choose whether the log blocks " + "should be read lazily or not. Choose true to use lazy block reading (low memory usage, but incurs seeks to each block" + " header) or false for immediate block read (higher memory usage)"); {code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379615#comment-17379615 ] Vinoth Chandar commented on HUDI-2151: -- Switch to default payload? {code:java} public static final ConfigProperty PAYLOAD_CLASS_PROP = ConfigProperty .key("hoodie.compaction.payload.class") .defaultValue(OverwriteWithLatestAvroPayload.class.getName()) .withDocumentation("This needs to be same as class used during insert/upserts. Just like writing, compaction also uses " + "the record payload class to merge records in the log against each other, merge again with the base file and " + "produce the final record to be written after compaction.");{code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-commenter commented on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException
codecov-commenter commented on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878802290 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3261](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (afe140f) into [master](https://codecov.io/gh/apache/hudi/commit/c8a2033c275e21a752893fc89311e1f6846f5a78?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8a2033) will **increase** coverage by `3.42%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3261/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3261 +/- ## + Coverage 47.71% 51.13% +3.42% + Complexity 5526 417-5109 Files 934 67 -867 Lines 41456 3049 -38407 Branches 4167 330-3837 - Hits 19779 1559 -18220 + Misses19917 1350 -18567 + Partials 1760 140-1620 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `?` | | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `51.13% <ø> (-8.11%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3261?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | :arrow_down: | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | :arrow_down: | | [...org/apache/hudi/utilities/HDFSParquetImporter.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hERlNQYXJxdWV0SW1wb3J0ZXIuamF2YQ==) | `0.00% <0.00%> (-71.82%)` | :arrow_down: | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/3261/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | :arrow_down: | | [...in/java/org/apache/hudi/utilities/UtilHelpers.j
[jira] [Commented] (HUDI-2150) Rename/Restructure configs for better modularity
[ https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379612#comment-17379612 ] Vinoth Chandar commented on HUDI-2150: -- This should be renamed consistent with base file terminlogy {code:java} public static final ConfigProperty PARQUET_SMALL_FILE_LIMIT_BYTES = ConfigProperty .key("hoodie.parquet.small.file.limit") .defaultValue(String.valueOf(104857600)) .withDocumentation("Upsert uses this file size to compact new data onto existing files. " + "By default, treat any file <= 100MB as a small file.");{code} > Rename/Restructure configs for better modularity > > > Key: HUDI-2150 > URL: https://issues.apache.org/jira/browse/HUDI-2150 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > Given we have a framework now, that can capture configs and even their > alternatives well, time to clean things up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-2150) Rename/Restructure configs for better modularity
[ https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379561#comment-17379561 ] Vinoth Chandar edited comment on HUDI-2150 at 7/13/21, 5:57 AM: Cleaner related configs to be moved out of HoodieCompactionConfig into its own HoodieCleanConfig. Archival related configs to be moved out of HoodieCompactionConfig into its own HoodieArchivalConfig. was (Author: vc): Cleaner related configs to be moved out of HoodieCompactionConfig into its own HoodieCleanConfig > Rename/Restructure configs for better modularity > > > Key: HUDI-2150 > URL: https://issues.apache.org/jira/browse/HUDI-2150 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > Given we have a framework now, that can capture configs and even their > alternatives well, time to clean things up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379611#comment-17379611 ] ASF GitHub Bot commented on HUDI-2168: -- hudi-bot edited a comment on pull request #3264: URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938 ## CI report: * e8e5e310224eee469a19bcfe7af537154843c318 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > Labels: pull-request-available > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user
hudi-bot edited a comment on pull request #3264: URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938 ## CI report: * e8e5e310224eee469a19bcfe7af537154843c318 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=877) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1548) Fix documentation around schema evolution
[ https://issues.apache.org/jira/browse/HUDI-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379610#comment-17379610 ] ASF GitHub Bot commented on HUDI-1548: -- codope commented on pull request #3257: URL: https://github.com/apache/hudi/pull/3257#issuecomment-878800281 @vinothchandar @n3nash @nsivabalan Can you please review the doc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix documentation around schema evolution > -- > > Key: HUDI-1548 > URL: https://issues.apache.org/jira/browse/HUDI-1548 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: sivabalan narayanan >Assignee: Nishith Agarwal >Priority: Blocker > Labels: ', pull-request-available, sev:high, user-support-issues > Fix For: 0.9.0 > > > Clearly call out what kind of schema evolution is supported by hudi in > documentation . > Context: https://github.com/apache/hudi/issues/2331 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codope commented on pull request #3257: [HUDI-1548] Add documentation for schema evolution
codope commented on pull request #3257: URL: https://github.com/apache/hudi/pull/3257#issuecomment-878800281 @vinothchandar @n3nash @nsivabalan Can you please review the doc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379609#comment-17379609 ] ASF GitHub Bot commented on HUDI-2168: -- hudi-bot commented on pull request #3264: URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938 ## CI report: * e8e5e310224eee469a19bcfe7af537154843c318 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > Labels: pull-request-available > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379608#comment-17379608 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot commented on pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user
hudi-bot commented on pull request #3264: URL: https://github.com/apache/hudi/pull/3264#issuecomment-878799938 ## CI report: * e8e5e310224eee469a19bcfe7af537154843c318 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379607#comment-17379607 ] ASF GitHub Bot commented on HUDI-2168: -- veenaypatil opened a new pull request #3264: URL: https://github.com/apache/hudi/pull/3264 ## What is the purpose of the pull request To fix access control exception while running the test cases which involves starting the Hive service ## Brief change log Set config ``` config.setBoolean("dfs.permissions",false); ``` ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. - Verified the tests are running locally after this change ## Committer checklist - [X] Has a corresponding JIRA in PR title & commit - [X] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2168: - Labels: pull-request-available (was: ) > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > Labels: pull-request-available > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] veenaypatil opened a new pull request #3264: [HUDI-2168] Fix for AccessControlException for anonymous user
veenaypatil opened a new pull request #3264: URL: https://github.com/apache/hudi/pull/3264 ## What is the purpose of the pull request To fix access control exception while running the test cases which involves starting the Hive service ## Brief change log Set config ``` config.setBoolean("dfs.permissions",false); ``` ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. - Verified the tests are running locally after this change ## Committer checklist - [X] Has a corresponding JIRA in PR title & commit - [X] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379600#comment-17379600 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379599#comment-17379599 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) * caffa0a76af64dddc658d15a1dd3a371f3a8bcda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379596#comment-17379596 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=876) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379595#comment-17379595 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) * e56bac615f087cec7817b846809c9f8fd0cc20a5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379593#comment-17379593 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379586#comment-17379586 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=875) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2168) AccessControlException for anonymous user
[ https://issues.apache.org/jira/browse/HUDI-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HUDI-2168: Status: In Progress (was: Open) > AccessControlException for anonymous user > - > > Key: HUDI-2168 > URL: https://issues.apache.org/jira/browse/HUDI-2168 > Project: Apache Hudi > Issue Type: Task > Components: Testing >Reporter: Vinay >Assignee: Vinay >Priority: Trivial > > Users are facing the following exception while executing test case dependent > on starting Hive service > > {code:java} > Got exception: org.apache.hadoop.security.AccessControlException Permission > denied: user=anonymous, access=WRITE > {code} > This is specifically happening at the time of clearing Hive DB > {code:java} > client.updateHiveSQL("drop database if exists " + > hiveSyncConfig.databaseName); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2168) AccessControlException for anonymous user
Vinay created HUDI-2168: --- Summary: AccessControlException for anonymous user Key: HUDI-2168 URL: https://issues.apache.org/jira/browse/HUDI-2168 Project: Apache Hudi Issue Type: Task Components: Testing Reporter: Vinay Assignee: Vinay Users are facing the following exception while executing test case dependent on starting Hive service {code:java} Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE {code} This is specifically happening at the time of clearing Hive DB {code:java} client.updateHiveSQL("drop database if exists " + hiveSyncConfig.databaseName); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379584#comment-17379584 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1985) Website re-design implementation
[ https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379583#comment-17379583 ] Vinoth Govindarajan commented on HUDI-1985: --- Hi [~xushiyan], I have experience in the past building websites, I can volunteer to work on this re-design. > Website re-design implementation > > > Key: HUDI-1985 > URL: https://issues.apache.org/jira/browse/HUDI-1985 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Raymond Xu >Priority: Blocker > Labels: documentation > Fix For: 0.9.0 > > > To provide better navigation and organization of Hudi website's info, we have > done a re-design of the web pages. > Previous discussion > [https://github.com/apache/hudi/issues/2905] > > See the wireframe and final design in > [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6] > (login Figma to comment) > The design is ready for implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) * 7ae050ed4b5ff0ce124a0ec580d51b3dfbb7f51a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379580#comment-17379580 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379577#comment-17379577 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
hudi-bot edited a comment on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=873) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379576#comment-17379576 ] ASF GitHub Bot commented on HUDI-2153: -- danny0405 commented on a change in pull request #3263: URL: https://github.com/apache/hudi/pull/3263#discussion_r668418303 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java ## @@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception { .transform( "bucket_assigner", TypeInformation.of(HoodieRecord.class), -new KeyedProcessOperator<>(new BucketAssignFunction<>(conf))) +new BucketAssignOperator<>(new BucketAssignFunction<>(conf))) .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS)) Review comment: Nice catch, can we fix the indentation ? And there is another PR same with this, can we close that ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] danny0405 commented on a change in pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
danny0405 commented on a change in pull request #3263: URL: https://github.com/apache/hudi/pull/3263#discussion_r668418303 ## File path: hudi-flink/src/main/java/org/apache/hudi/streamer/HoodieFlinkStreamer.java ## @@ -109,7 +109,7 @@ public static void main(String[] args) throws Exception { .transform( "bucket_assigner", TypeInformation.of(HoodieRecord.class), -new KeyedProcessOperator<>(new BucketAssignFunction<>(conf))) +new BucketAssignOperator<>(new BucketAssignFunction<>(conf))) .setParallelism(conf.getInteger(FlinkOptions.BUCKET_ASSIGN_TASKS)) Review comment: Nice catch, can we fix the indentation ? And there is another PR same with this, can we close that ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379575#comment-17379575 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot commented on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot commented on pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
hudi-bot commented on pull request #3263: URL: https://github.com/apache/hudi/pull/3263#issuecomment-878768248 ## CI report: * f1299ed52dcf90635d4f11fef040255cfda9f35b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379572#comment-17379572 ] ASF GitHub Bot commented on HUDI-2153: -- moranyuwen opened a new pull request #3263: URL: https://github.com/apache/hudi/pull/3263 JIRA Issue: https://issues.apache.org/jira/browse/HUDI-2153 When you run HoodieFlinkStreamer to write data, the context in the bucketAssignment function load is bull, and the update resolvesthat the context is null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] moranyuwen opened a new pull request #3263: [HUDI-2153] Fix BucketAssignFunction Context NullPointerException
moranyuwen opened a new pull request #3263: URL: https://github.com/apache/hudi/pull/3263 JIRA Issue: https://issues.apache.org/jira/browse/HUDI-2153 When you run HoodieFlinkStreamer to write data, the context in the bucketAssignment function load is bull, and the update resolvesthat the context is null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config (#3250)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b0089b8 [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config (#3250) b0089b8 is described below commit b0089b894ad12da11fbd6a0fb08508c7adee68e6 Author: Sagar Sumit AuthorDate: Tue Jul 13 09:54:40 2021 +0530 [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config (#3250) --- .../java/org/apache/hudi/config/HoodieWriteConfig.java | 3 ++- .../java/org/apache/hudi/config/TestHoodieWriteConfig.java | 14 -- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java index 20d2846..e2e295d 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java @@ -339,8 +339,9 @@ public class HoodieWriteConfig extends HoodieConfig { .withDocumentation(""); public static final ConfigProperty EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = ConfigProperty - .key(AVRO_SCHEMA + ".externalTransformation") + .key(AVRO_SCHEMA.key() + ".external.transformation") .defaultValue("false") + .withAlternatives(AVRO_SCHEMA.key() + ".externalTransformation") .withDocumentation(""); private ConsistencyGuardConfig consistencyGuardConfig; diff --git a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java index 7661e1d..89f7a97 100644 --- a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java +++ b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/config/TestHoodieWriteConfig.java @@ -23,6 +23,8 @@ import org.apache.hudi.config.HoodieWriteConfig.Builder; import org.apache.hudi.index.HoodieIndex; import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.ValueSource; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; @@ -33,16 +35,23 @@ import java.util.Map; import java.util.Properties; import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; public class TestHoodieWriteConfig { - @Test - public void testPropertyLoading() throws IOException { + @ParameterizedTest + @ValueSource(booleans = {true, false}) + public void testPropertyLoading(boolean withAlternative) throws IOException { Builder builder = HoodieWriteConfig.newBuilder().withPath("/tmp"); Map params = new HashMap<>(3); params.put(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP.key(), "1"); params.put(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP_PROP.key(), "5"); params.put(HoodieCompactionConfig.MIN_COMMITS_TO_KEEP_PROP.key(), "2"); +if (withAlternative) { + params.put("hoodie.avro.schema.externalTransformation", "true"); +} else { + params.put("hoodie.avro.schema.external.transformation", "true"); +} ByteArrayOutputStream outStream = saveParamsIntoOutputStream(params); ByteArrayInputStream inputStream = new ByteArrayInputStream(outStream.toByteArray()); try { @@ -54,6 +63,7 @@ public class TestHoodieWriteConfig { HoodieWriteConfig config = builder.build(); assertEquals(5, config.getMaxCommitsToKeep()); assertEquals(2, config.getMinCommitsToKeep()); +assertTrue(config.shouldUseExternalSchemaTransformation()); } @Test
[GitHub] [hudi] nsivabalan merged pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config
nsivabalan merged pull request #3250: URL: https://github.com/apache/hudi/pull/3250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379567#comment-17379567 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) * 8a212fd77769cbf7e248e971f66109381ba80f71 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=872) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path
[ https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379566#comment-17379566 ] ASF GitHub Bot commented on HUDI-2161: -- hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) * 8a212fd77769cbf7e248e971f66109381ba80f71 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support to disable meta column to BulkInsert Row Writer path > > > Key: HUDI-2161 > URL: https://issues.apache.org/jira/browse/HUDI-2161 > Project: Apache Hudi > Issue Type: Improvement >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > Objective here is to disable all meta columns so as to avoid storage cost. > Also, some benefits could be seen in write latency with row writer path as no > special handling is required at RowCreateHandle layer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3247: [HUDI-2161] Adding support to disable meta columns with bulk insert operation
hudi-bot edited a comment on pull request #3247: URL: https://github.com/apache/hudi/pull/3247#issuecomment-876918931 ## CI report: * 860eabd8a3d02e8709874cb67788e61d0d43d9c5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=868) * 8a212fd77769cbf7e248e971f66109381ba80f71 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2150) Rename/Restructure configs for better modularity
[ https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-2150: - Description: Given we have a framework now, that can capture configs and even their alternatives well, time to clean things up. (was: * Rename HoodieWriteConfig to HoodieClientConfig * Move bunch of configs from CompactionConfig to StorageConfig * Introduce new HoodieCleanConfig * Should we consider lombok or something to automate the defaults/getters/setters * Consistent name of properties/defaults * Enforce bounds more strictly) > Rename/Restructure configs for better modularity > > > Key: HUDI-2150 > URL: https://issues.apache.org/jira/browse/HUDI-2150 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > Given we have a framework now, that can capture configs and even their > alternatives well, time to clean things up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2150) Rename/Restructure configs for better modularity
[ https://issues.apache.org/jira/browse/HUDI-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379561#comment-17379561 ] Vinoth Chandar commented on HUDI-2150: -- Cleaner related configs to be moved out of HoodieCompactionConfig into its own HoodieCleanConfig > Rename/Restructure configs for better modularity > > > Key: HUDI-2150 > URL: https://issues.apache.org/jira/browse/HUDI-2150 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > Given we have a framework now, that can capture configs and even their > alternatives well, time to clean things up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379557#comment-17379557 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot edited a comment on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128 ## CI report: * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException
hudi-bot edited a comment on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128 ## CI report: * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2167) HoodieCompactionConfig get HoodieCleaningPolicy NullPointerException
tsianglei created HUDI-2167: --- Summary: HoodieCompactionConfig get HoodieCleaningPolicy NullPointerException Key: HUDI-2167 URL: https://issues.apache.org/jira/browse/HUDI-2167 Project: Apache Hudi Issue Type: Bug Components: CLI, Flink Integration Reporter: tsianglei Caused by: java.lang.NullPointerException: Name is null at java.lang.Enum.valueOf(Enum.java:236) ~[?:1.8.0_221] at org.apache.hudi.common.model.HoodieCleaningPolicy.valueOf(HoodieCleaningPolicy.java:24) ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT] at org.apache.hudi.config.HoodieCompactionConfig$Builder.build(HoodieCompactionConfig.java:368) ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT] at org.apache.hudi.util.StreamerUtil.getHoodieClientConfig(StreamerUtil.java:155) ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT] at org.apache.hudi.util.StreamerUtil.createWriteClient(StreamerUtil.java:277) ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT] at org.apache.hudi.sink.StreamWriteOperatorCoordinator.start(StreamWriteOperatorCoordinator.java:154) ~[hudi-flink-bundle_2.11-0.9.0-SNAPSHOT.jar:0.9.0-SNAPSHOT] at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.start(OperatorCoordinatorHolder.java:189) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.scheduler.SchedulerBase.startAllOperatorCoordinators(SchedulerBase.java:1253) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.scheduler.SchedulerBase.startScheduling(SchedulerBase.java:624) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at org.apache.flink.runtime.jobmaster.JobMaster.startScheduling(JobMaster.java:1032) ~[flink-dist_2.11-1.12.2.jar:1.12.2] at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:705) ~[?:1.8.0_221] ... 27 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379546#comment-17379546 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379543#comment-17379543 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot edited a comment on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128 ## CI report: * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException
hudi-bot edited a comment on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128 ## CI report: * afe140f7b9169e5a6129a10a6a12f839658c7b08 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=871) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379542#comment-17379542 ] ASF GitHub Bot commented on HUDI-2153: -- hudi-bot commented on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128 ## CI report: * afe140f7b9169e5a6129a10a6a12f839658c7b08 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot commented on pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException
hudi-bot commented on pull request #3261: URL: https://github.com/apache/hudi/pull/3261#issuecomment-878740128 ## CI report: * afe140f7b9169e5a6129a10a6a12f839658c7b08 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] izhangzhihao opened a new issue #3262: [SUPPORT] No successful commits under path
izhangzhihao opened a new issue #3262: URL: https://github.com/apache/hudi/issues/3262 **To Reproduce** Steps to reproduce the behavior: code https://github.com/izhangzhihao/Real-time-Data-Warehouse/tree/hudi ### create table ```sql CREATE TABLE accident_claims ( claim_idBIGINT, claim_total DOUBLE, claim_total_receipt VARCHAR(50), claim_currency VARCHAR(3), member_id INT, accident_date DATE, accident_type VARCHAR(20), accident_detail VARCHAR(20), claim_date DATE, claim_statusVARCHAR(10), ts_created TIMESTAMP(3), ts_updated TIMESTAMP(3), ds DATE, PRIMARY KEY (claim_id) NOT ENFORCED ) PARTITIONED BY (ds) WITH ( 'connector'='hudi', 'path' = '/data/dwd/accident_claims', 'table.type' = 'MERGE_ON_READ', 'read.streaming.enabled' = 'true', 'write.batch.size' = '1', 'write.task.max.size' = '1', 'write.tasks' = '1', 'compaction.tasks' = '1', 'compaction.delta_seconds' = '60', 'write.precombine.field' = 'ts_updated', 'read.tasks' = '1', 'read.streaming.check-interval' = '5', 'read.streaming.start-commit' = '20210712134429', ); ``` ### insert from CDC change stream ```sql INSERT INTO dwd.accident_claims SELECT claim_id, claim_total, claim_total_receipt, claim_currency, member_id, CAST (accident_date as DATE), accident_type, accident_detail, CAST (claim_date as DATE), claim_status, CAST (ts_created as TIMESTAMP), CAST (ts_updated as TIMESTAMP), CAST (SUBSTRING(claim_date, 0, 9) as DATE) FROM datasource.accident_claims; ``` **Expected behavior** ``` SELECT * FROM accident_claims; ``` should return results But got: ``` Flink SQL> SELECT * FROM accident_claims; [ERROR] Could not execute SQL statement. Reason: org.apache.hudi.exception.HoodieException: No successful commits under path /data/dwd/accident_claims ``` But the sample code works: ``` CREATE TABLE t1( uuid VARCHAR(20), -- you can use 'PRIMARY KEY NOT ENFORCED' syntax to mark the field as record key name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = '/data/t1', 'write.tasks' = '1', -- default is 4 ,required more resource 'compaction.tasks' = '1', -- default is 10 ,required more resource 'table.type' = 'COPY_ON_WRITE', -- this creates a MERGE_ON_READ table, by default is COPY_ON_WRITE 'read.tasks' = '1', -- default is 4 ,required more resource 'read.streaming.enabled' = 'true', -- this option enable the streaming read 'read.streaming.start-commit' = '20210712134429', -- specifies the start commit instant time 'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s. ); -- insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); SELECT * FROM t1; ``` So I didn't get what's wrong here... **Environment Description** * Hudi version : 0.9.0 SNAPSHOT * Flink version : 1.12.2 * Hive version : none * Hadoop version : 2.8.3 * Storage (HDFS/S3/GCS..) : local file system * Running on Docker? (yes/no) : yes **Additional context** Add any other context about the problem here. ![image](https://user-images.githubusercontent.com/12044174/125382900-20040c80-e3c9-11eb-8ab6-be9a7c3072f5.png) Taskmanager log: [taskmanager.log.zip](https://github.com/apache/hudi/files/6805564/taskmanager.log.zip) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2153: - Labels: pull-request-available (was: ) > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2153) BucketAssignFunction NullPointerException
[ https://issues.apache.org/jira/browse/HUDI-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379541#comment-17379541 ] ASF GitHub Bot commented on HUDI-2153: -- moranyuwen opened a new pull request #3261: URL: https://github.com/apache/hudi/pull/3261 Running HoodieFlinkStreamer will encounter an exception in the bucketAssignFunction class where the context is null ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > BucketAssignFunction NullPointerException > - > > Key: HUDI-2153 > URL: https://issues.apache.org/jira/browse/HUDI-2153 > Project: Apache Hudi > Issue Type: Bug >Reporter: moran >Priority: Major > Fix For: 0.9.0 > > > java.lang.NullPointerException > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processRecord(BucketAssignFunction.java:198) > at > org.apache.hudi.sink.partitioner.BucketAssignFunction.processElement(BucketAssignFunction.java:159) > at > org.apache.flink.streaming.api.operators.KeyedProcessOperator.processElement(KeyedProcessOperator.java:83) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:191) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:204) > at > org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:174) > at > org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:396) > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) > at java.lang.Thread.run(Thread.java:748) > ERROR at > Line 197 of the BucketAssignFunction class > (this.context.setCurrentKey(recordKey)) > Why is this context null -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] moranyuwen opened a new pull request #3261: [HUDI-2153] Fix BucketAssignFunction NullPointerException
moranyuwen opened a new pull request #3261: URL: https://github.com/apache/hudi/pull/3261 Running HoodieFlinkStreamer will encounter an exception in the bucketAssignFunction class where the context is null ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379522#comment-17379522 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * d369ea7aedc892c995c4cd0132e15b2bb29cfb65 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=862) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=870) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379520#comment-17379520 ] ASF GitHub Bot commented on HUDI-2164: -- zhangyue19921010 commented on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878723946 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] zhangyue19921010 commented on pull request #3259: [HUDI-2164] Build cluster plan and execute this plan at once for HoodieClusteringJob
zhangyue19921010 commented on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878723946 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a change in pull request #3250: [MINOR] Fix EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION config
codope commented on a change in pull request #3250: URL: https://github.com/apache/hudi/pull/3250#discussion_r668367569 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -339,7 +339,7 @@ .withDocumentation(""); public static final ConfigProperty EXTERNAL_RECORD_AND_SCHEMA_TRANSFORMATION = ConfigProperty - .key(AVRO_SCHEMA + ".externalTransformation") + .key(AVRO_SCHEMA.key() + ".externalTransformation") Review comment: Changed the config key to `hoodie.avro.schema.external.transformation` and also have `hoodie.avro.schema.externalTransformation` as alternative for backwards compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379496#comment-17379496 ] Vinoth Chandar commented on HUDI-2151: -- These need a default value. {code:java} public static final ConfigProperty ZK_PORT_PROP = ConfigProperty .key(ZK_PORT_PROP_KEY) .noDefaultValue() .sinceVersion("0.8.0") .withDocumentation("Zookeeper port to connect to."); public static final ConfigProperty ZK_LOCK_KEY_PROP = ConfigProperty .key(ZK_LOCK_KEY_PROP_KEY) .noDefaultValue() .sinceVersion("0.8.0") .withDocumentation("Key name under base_path at which to create a ZNode and acquire lock. " + "Final path on zk will look like base_path/lock_key. We recommend setting this to the table name");{code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-2151: - Comment: was deleted (was: Is this correct? 5s? {code:java} public static final ConfigProperty LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP = ConfigProperty .key(LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP_KEY) .defaultValue(String.valueOf(5000L)) .sinceVersion("0.8.0") .withDocumentation("Maximum amount of time to wait between retries by lock provider client. This bounds" + " the maximum delay from the exponential backoff.");{code}) > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2144) Offline clustering(independent sparkJob) will cause insert action losing data
[ https://issues.apache.org/jira/browse/HUDI-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379491#comment-17379491 ] ASF GitHub Bot commented on HUDI-2144: -- satishkotha merged pull request #3240: URL: https://github.com/apache/hudi/pull/3240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Offline clustering(independent sparkJob) will cause insert action losing data > - > > Key: HUDI-2144 > URL: https://issues.apache.org/jira/browse/HUDI-2144 > Project: Apache Hudi > Issue Type: Bug >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > Attachments: image-2021-07-08-13-52-00-089.png > > > For now we have two kinds of pipeline for Hudi using spark: > # Streaming insert data to specific partition > # Offline clustering spark > job(`org.apache.hudi.utilities.HoodieClusteringJob`) to optimize file size > pipeline 1 created > But here is a bug we met that will lose data > These steps can make the problem reproduce stably : > # Submit a spark job to Ingest data1 using insert mode. > # Schedule a clustering plan using > `org.apache.hudi.utilities.HoodieClusteringJob` > # Submit a spark job again to Ingest data2 using insert mode(Ensure that > there is new file slice created in the same file group which means small file > tuning for insert is working). Suppose this file group is called file group 1 > and new file slice is called file slice 2. > # Execute that clustering job step2 planed. > # Query data1+data2 you will find new data for a is lost compared with > common ingestion without clustering > > !image-2021-07-08-13-52-00-089.png|width=922,height=728! > Here is the root cause: > When ingest data using insert mode, Hudi will find small files and try to > append new data to them ,aiming to tuning data file size. > [https://github.com/apache/hudi/blob/650c4455c600b0346fed8b5b6aa4cc0bf3452e8c/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L149] > is try to filter Small Files In Clustering but only works when user set > `hoodie.clustering.inline` true which is not good enough when users using > offline clustering. > I just raise a PR try to fix it and tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (ca440cc -> c8a2033)
This is an automated email from the ASF dual-hosted git repository. satish pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from ca440cc [HUDI-2107] Support Read Log Only MOR Table For Spark (#3193) add c8a2033 [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data (#3240) No new revisions were added by this update. Summary of changes: .../table/action/commit/UpsertPartitioner.java | 2 +- .../table/action/commit/TestUpsertPartitioner.java | 45 +- .../hudi/common/testutils/ClusteringTestUtils.java | 54 ++ 3 files changed, 99 insertions(+), 2 deletions(-) create mode 100644 hudi-common/src/test/java/org/apache/hudi/common/testutils/ClusteringTestUtils.java
[GitHub] [hudi] satishkotha merged pull request #3240: [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data
satishkotha merged pull request #3240: URL: https://github.com/apache/hudi/pull/3240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2144) Offline clustering(independent sparkJob) will cause insert action losing data
[ https://issues.apache.org/jira/browse/HUDI-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379490#comment-17379490 ] ASF GitHub Bot commented on HUDI-2144: -- lw309637554 commented on a change in pull request #3240: URL: https://github.com/apache/hudi/pull/3240#discussion_r668356708 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java ## @@ -146,7 +146,7 @@ private int addUpdateBucket(String partitionPath, String fileIdHint) { * @return smallFiles not in clustering */ private List filterSmallFilesInClustering(final Set pendingClusteringFileGroupsId, final List smallFiles) { -if (this.config.isClusteringEnabled()) { Review comment: @satishkotha @zhangyue19921010 Use "if (!pendingClusteringFileGroupsId.isEmpty())" will improve ease of use. Another need to modify. But if this will bring performance loss? @satishkotha " private JavaRDD> clusteringHandleUpdate(JavaRDD> inputRecordsRDD) { if (config.isClusteringEnabled()) {" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Offline clustering(independent sparkJob) will cause insert action losing data > - > > Key: HUDI-2144 > URL: https://issues.apache.org/jira/browse/HUDI-2144 > Project: Apache Hudi > Issue Type: Bug >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > Attachments: image-2021-07-08-13-52-00-089.png > > > For now we have two kinds of pipeline for Hudi using spark: > # Streaming insert data to specific partition > # Offline clustering spark > job(`org.apache.hudi.utilities.HoodieClusteringJob`) to optimize file size > pipeline 1 created > But here is a bug we met that will lose data > These steps can make the problem reproduce stably : > # Submit a spark job to Ingest data1 using insert mode. > # Schedule a clustering plan using > `org.apache.hudi.utilities.HoodieClusteringJob` > # Submit a spark job again to Ingest data2 using insert mode(Ensure that > there is new file slice created in the same file group which means small file > tuning for insert is working). Suppose this file group is called file group 1 > and new file slice is called file slice 2. > # Execute that clustering job step2 planed. > # Query data1+data2 you will find new data for a is lost compared with > common ingestion without clustering > > !image-2021-07-08-13-52-00-089.png|width=922,height=728! > Here is the root cause: > When ingest data using insert mode, Hudi will find small files and try to > append new data to them ,aiming to tuning data file size. > [https://github.com/apache/hudi/blob/650c4455c600b0346fed8b5b6aa4cc0bf3452e8c/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L149] > is try to filter Small Files In Clustering but only works when user set > `hoodie.clustering.inline` true which is not good enough when users using > offline clustering. > I just raise a PR try to fix it and tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] lw309637554 commented on a change in pull request #3240: [HUDI-2144]Bug-Fix:Offline clustering(HoodieClusteringJob) will cause insert action losing data
lw309637554 commented on a change in pull request #3240: URL: https://github.com/apache/hudi/pull/3240#discussion_r668356708 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java ## @@ -146,7 +146,7 @@ private int addUpdateBucket(String partitionPath, String fileIdHint) { * @return smallFiles not in clustering */ private List filterSmallFilesInClustering(final Set pendingClusteringFileGroupsId, final List smallFiles) { -if (this.config.isClusteringEnabled()) { Review comment: @satishkotha @zhangyue19921010 Use "if (!pendingClusteringFileGroupsId.isEmpty())" will improve ease of use. Another need to modify. But if this will bring performance loss? @satishkotha " private JavaRDD> clusteringHandleUpdate(JavaRDD> inputRecordsRDD) { if (config.isClusteringEnabled()) {" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379488#comment-17379488 ] Vinoth Chandar commented on HUDI-2151: -- Is this correct? 5s? {code:java} public static final ConfigProperty LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP = ConfigProperty .key(LOCK_ACQUIRE_RETRY_MAX_WAIT_TIME_IN_MILLIS_PROP_KEY) .defaultValue(String.valueOf(5000L)) .sinceVersion("0.8.0") .withDocumentation("Maximum amount of time to wait between retries by lock provider client. This bounds" + " the maximum delay from the exponential backoff.");{code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Sub-task > Components: Code Cleanup, Docs >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2144) Offline clustering(independent sparkJob) will cause insert action losing data
[ https://issues.apache.org/jira/browse/HUDI-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379487#comment-17379487 ] ASF GitHub Bot commented on HUDI-2144: -- lw309637554 commented on a change in pull request #3240: URL: https://github.com/apache/hudi/pull/3240#discussion_r668354266 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java ## @@ -146,7 +146,7 @@ private int addUpdateBucket(String partitionPath, String fileIdHint) { * @return smallFiles not in clustering */ private List filterSmallFilesInClustering(final Set pendingClusteringFileGroupsId, final List smallFiles) { -if (this.config.isClusteringEnabled()) { Review comment: @satishkotha @zhangyue19921010 At first we have two config for clustering. If set ASYNC_CLUSTERING_ENABLE_OPT_KEY will be ok. public boolean isAsyncClusteringEnabled() { return Boolean.parseBoolean(props.getProperty(HoodieClusteringConfig.ASYNC_CLUSTERING_ENABLE_OPT_KEY)); } public boolean isClusteringEnabled() { // TODO: future support async clustering return inlineClusteringEnabled() || isAsyncClusteringEnabled(); } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Offline clustering(independent sparkJob) will cause insert action losing data > - > > Key: HUDI-2144 > URL: https://issues.apache.org/jira/browse/HUDI-2144 > Project: Apache Hudi > Issue Type: Bug >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > Attachments: image-2021-07-08-13-52-00-089.png > > > For now we have two kinds of pipeline for Hudi using spark: > # Streaming insert data to specific partition > # Offline clustering spark > job(`org.apache.hudi.utilities.HoodieClusteringJob`) to optimize file size > pipeline 1 created > But here is a bug we met that will lose data > These steps can make the problem reproduce stably : > # Submit a spark job to Ingest data1 using insert mode. > # Schedule a clustering plan using > `org.apache.hudi.utilities.HoodieClusteringJob` > # Submit a spark job again to Ingest data2 using insert mode(Ensure that > there is new file slice created in the same file group which means small file > tuning for insert is working). Suppose this file group is called file group 1 > and new file slice is called file slice 2. > # Execute that clustering job step2 planed. > # Query data1+data2 you will find new data for a is lost compared with > common ingestion without clustering > > !image-2021-07-08-13-52-00-089.png|width=922,height=728! > Here is the root cause: > When ingest data using insert mode, Hudi will find small files and try to > append new data to them ,aiming to tuning data file size. > [https://github.com/apache/hudi/blob/650c4455c600b0346fed8b5b6aa4cc0bf3452e8c/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L149] > is try to filter Small Files In Clustering but only works when user set > `hoodie.clustering.inline` true which is not good enough when users using > offline clustering. > I just raise a PR try to fix it and tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)