[jira] [Created] (HUDI-4723) Add document about Hoodie Catalog
Danny Chen created HUDI-4723: Summary: Add document about Hoodie Catalog Key: HUDI-4723 URL: https://issues.apache.org/jira/browse/HUDI-4723 Project: Apache Hudi Issue Type: Task Components: docs Reporter: Danny Chen Fix For: 0.12.1, 0.12.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6493: [HUDI-4715] Needed To ReSync Hive In StreamWriteOperatorCoordinator's…
YuweiXiao commented on code in PR #6493: URL: https://github.com/apache/hudi/pull/6493#discussion_r955729960 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java: ## @@ -396,7 +396,11 @@ private void initInstant(String instant) { reset(); } else { LOG.info("Recommit instant {}", instant); -commitInstant(instant); +boolean success = commitInstant(instant); +if (success) { + LOG.info("instant {} ReSync Hive", instant); + syncHive(); +} Review Comment: Hi Danny, want to bring up another topic here. If the commit is not success, e.g., the last batch has no data, we will reuse the instant of the last batch, and in the meanwhile, we will also start a new instant. This then leads to an inconsistent ckp_meta. We encounter this issue this our inner branch. I could fire up a fix for this if necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index
[ https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-3983: - Description: I ran a spark job and encountered several ClassNotFoundExceptions. spark version is 3.1 and scala version is 2.12. 1. {code:java} java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier$Kind at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.translateException(RpcRetryingCallerImpl.java:222) at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:195) at org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:395) at org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369) at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108) {code} including org.apache.hbase:hbase-protocol in packaging/hudi-spark-bundle/pom.xml can solve this error. 2. {code:java} java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hbase.thirdparty.com.google.gson.GsonBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code} including org.apache.hbase.thirdparty:hbase-shaded-gson n packaging/hudi-spark-bundle/pom.xml can solve this error. 3. {code:java} java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found {code} There is a configuration in hbase-site.xml {code:java} hbase.status.listener.class</name> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener Implementation of the status listener with a multicast message. {code} I set _*hbase.status.listener.class*_ to _*org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener*_ in hbase configureation, the ClassNotFoundException has resolved, but get another exception {code:java} org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions: 2022-08-26T07:12:57.603Z, RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, maxAttempts=1}, org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146) at org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411) at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118) at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133) at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203) at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211) at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) a
[jira] [Commented] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index
[ https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585193#comment-17585193 ] xi chaomin commented on HUDI-3983: -- Are there known conflicts about hbase dependencies? I remove the relocations about hbase, the job succeed. Can we remove these relocations in spark bundle? {code:java} org.apache.hadoop.hbase. org.apache.hudi.org.apache.hadoop.hbase. org.apache.hadoop.hbase.KeyValue$KeyComparator org.apache.hbase. org.apache.hudi.org.apache.hbase. org.apache.htrace. org.apache.hudi.org.apache.htrace. {code} > ClassNotFoundException when using hudi-spark-bundle to write table with hbase > index > --- > > Key: HUDI-3983 > URL: https://issues.apache.org/jira/browse/HUDI-3983 > Project: Apache Hudi > Issue Type: Bug >Reporter: xi chaomin >Priority: Critical > Fix For: 0.12.1 > > > I ran a spark job and encountered several ClassNotFoundExceptions. spark > version is 3.1 and scala version is 2.12. > 1. > {code:java} > java.lang.NoClassDefFoundError: > org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier$Kind > at > org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.translateException(RpcRetryingCallerImpl.java:222) > at > org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:195) > at > org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:395) > at > org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369) > at > org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108) > {code} > including org.apache.hbase:hbase-protocol in > packaging/hudi-spark-bundle/pom.xml can solve this error. > 2. > {code:java} > java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.hbase.thirdparty.com.google.gson.GsonBuilder > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code} > including org.apache.hbase.thirdparty:hbase-shaded-gson n > packaging/hudi-spark-bundle/pom.xml can solve this error. > 3. > {code:java} > java.lang.ClassNotFoundException: Class > org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not > found {code} > There is a configuration in hbase-site.xml > {code:java} > > hbase.status.listener.class</name> > > org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener > > Implementation of the status listener with a multicast message. > > {code} > I set _*hbase.status.listener.class*_ to > _*org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener*_ > in hbase configureation, the ClassNotFoundException has resolved, but get > another exception > {code:java} > org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=1, exceptions: > 2022-08-26T07:12:57.603Z, > RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, > maxAttempts=1}, > org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: > Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local > exception: > org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: > Connection closed at > org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146) > at > org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: > Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local > exception: > org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: > Connection closed > at > org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214) > at > org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384) > at > org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89) > at > org.apache.hudi.org.apache.hado
[GitHub] [hudi] danny0405 opened a new pull request, #6508: [HUDI-4723] Add document about Hoodie Catalog
danny0405 opened a new pull request, #6508: URL: https://github.com/apache/hudi/pull/6508 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4723) Add document about Hoodie Catalog
[ https://issues.apache.org/jira/browse/HUDI-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4723: - Labels: pull-request-available (was: ) > Add document about Hoodie Catalog > - > > Key: HUDI-4723 > URL: https://issues.apache.org/jira/browse/HUDI-4723 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0, 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index
[ https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-3983: - Description: I ran a spark job and encountered several ClassNotFoundExceptions. spark version is 3.1 and scala version is 2.12. 1. {code:java} java.lang.NoClassDefFoundError: org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier$Kind at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.translateException(RpcRetryingCallerImpl.java:222) at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:195) at org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:395) at org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369) at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108) {code} including org.apache.hbase:hbase-protocol in packaging/hudi-spark-bundle/pom.xml can solve this error. 2. {code:java} java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hbase.thirdparty.com.google.gson.GsonBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code} including org.apache.hbase.thirdparty:hbase-shaded-gson n packaging/hudi-spark-bundle/pom.xml can solve this error. 3. {code:java} java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not found {code} There is a configuration in hbase-site.xml {code:java} hbase.status.listener.class</name> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener Implementation of the status listener with a multicast message. {code} I set _*hbase.status.listener.class*_ to _*org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener*_ in hbase configureation, the ClassNotFoundException has resolved, but get another exception {code:java} org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions: 2022-08-26T07:12:57.603Z, RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, maxAttempts=1}, org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146) at org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411) at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118) at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133) at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203) at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211) at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at org.apache.hudi.org.apa
[GitHub] [hudi] boneanxs commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance
boneanxs commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r955765271 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -131,6 +161,53 @@ public abstract HoodieData performClusteringWithRecordsRDD(final Ho final Map strategyParams, final Schema schema, final List fileGroupIdList, final boolean preserveHoodieMetadata); + protected HoodieData performRowWrite(Dataset inputRecords, Map parameters) { +String uuid = UUID.randomUUID().toString(); +parameters.put(HoodieWriteConfig.BULKINSERT_ROW_IDENTIFY_ID.key(), uuid); +try { + inputRecords.write() + .format("hudi") + .options(JavaConverters.mapAsScalaMapConverter(parameters).asScala()) + .mode(SaveMode.Append) + .save(getWriteConfig().getBasePath()); Review Comment: I see, yeah, this is a good improvement, will change it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false
hudi-bot commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1228179181 ## CI report: * 473a8b74676e345ee91093a3fe9885e062ca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10969) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type
hudi-bot commented on PR #6486: URL: https://github.com/apache/hudi/pull/6486#issuecomment-1228179560 ## CI report: * 9d687afca94b7bfcc592c69cfebd73eb846b3b70 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10967) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6500: [HUDI-4720] Fix HoodieInternalRow return wrong num of fields when sou…
hudi-bot commented on PR #6500: URL: https://github.com/apache/hudi/pull/6500#issuecomment-1228179685 ## CI report: * 2d75af2a075741142bbfd4b6f50e541661e55bdd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10968) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #6508: [HUDI-4723] Add document about Hoodie Catalog
danny0405 merged PR #6508: URL: https://github.com/apache/hudi/pull/6508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [HUDI-4723] Add document about Hoodie Catalog (#6508)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new f025f0b7af [HUDI-4723] Add document about Hoodie Catalog (#6508) f025f0b7af is described below commit f025f0b7af5531f478f91fdd7c37f804c507f9b3 Author: Danny Chan AuthorDate: Fri Aug 26 16:00:35 2022 +0800 [HUDI-4723] Add document about Hoodie Catalog (#6508) --- website/docs/flink-quick-start-guide.md| 13 +- website/docs/table_management.md | 28 +- .../version-0.12.0/flink-quick-start-guide.md | 13 +- .../version-0.12.0/table_management.md | 28 +- 4 files changed, 66 insertions(+), 16 deletions(-) diff --git a/website/docs/flink-quick-start-guide.md b/website/docs/flink-quick-start-guide.md index 4cf2b1042b..f4b9668178 100644 --- a/website/docs/flink-quick-start-guide.md +++ b/website/docs/flink-quick-start-guide.md @@ -24,14 +24,13 @@ quick start tool for SQL users. Step.1 download Flink jar -Hudi works with both Flink 1.13 and Flink 1.14. You can follow the +Hudi works with both Flink 1.13, Flink 1.14, Flink 1.15. You can follow the instructions [here](https://flink.apache.org/downloads) for setting up Flink. Then choose the desired Hudi-Flink bundle jar to work with different Flink and Scala versions: -- `hudi-flink1.13-bundle_2.11` -- `hudi-flink1.13-bundle_2.12` -- `hudi-flink1.14-bundle_2.11` -- `hudi-flink1.14-bundle_2.12` +- `hudi-flink1.13-bundle` +- `hudi-flink1.14-bundle` +- `hudi-flink1.15-bundle` Step.2 start Flink cluster Start a standalone Flink cluster within hadoop environment. @@ -117,8 +116,8 @@ INSERT INTO t1 VALUES select * from t1; ``` -This query provides snapshot querying of the ingested data. -Refer to [Table types and queries](/docs/concepts#table-types--queries) for more info on all table types and query types supported. +This statement queries snapshot view of the dataset. +Refers to [Table types and queries](/docs/concepts#table-types--queries) for more info on all table types and query types supported. ### Update Data diff --git a/website/docs/table_management.md b/website/docs/table_management.md index 6099476c31..7dbccd19ed 100644 --- a/website/docs/table_management.md +++ b/website/docs/table_management.md @@ -208,7 +208,33 @@ set hoodie.upsert.shuffle.parallelism = 100; set hoodie.delete.shuffle.parallelism = 100; ``` -## Flink +## Flink + +### Create Catalog + +The catalog helps to manage the SQL tables, the table can be shared among CLI sessions if the catalog persists the table DDLs. +For `hms` mode, the catalog also supplements the hive syncing options. + +HMS mode catalog SQL demo: +```sql +CREATE CATALOG hoodie_catalog + WITH ( +'type'='hudi', +'catalog.path' = '${catalog default root path}', +'hive.conf.dir' = '${directory where hive-site.xml is located}', +'mode'='hms' -- supports 'dfs' mode that uses the DFS backend for table DDLs persistence + ); +``` + + Options +| Option Name | Required | Default | Remarks | +| --- | --- | --- | --- | +| `catalog.path` | true | -- | Default root path for the catalog, the path is used to infer the table path automatically, the default table path: `${catalog.path}/${db_name}/${table_name}` | +| `default-database` | false | default | default database name | +| `hive.conf.dir` | false | -- | The directory where hive-site.xml is located, only valid in `hms` mode | +| `mode` | false | dfs | Supports `hms` mode that uses HMS to persist the table options | +| `table.external` | false | false | Whether to create the external table, only valid in `hms` mode | + ### Create Table The following is a Flink example to create a table. [Read the Flink Quick Start](/docs/flink-quick-start-guide) guide for more examples. diff --git a/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md b/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md index 4cf2b1042b..4a926aad04 100644 --- a/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md +++ b/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md @@ -24,14 +24,13 @@ quick start tool for SQL users. Step.1 download Flink jar -Hudi works with both Flink 1.13 and Flink 1.14. You can follow the +Hudi works with both Flink 1.13, Flink 1.14 and Flink 1.15. You can follow the instructions [here](https://flink.apache.org/downloads) for setting up Flink. Then choose the desired Hudi-Flink bundle jar to work with different Flink and Scala versions: -- `hudi-flink1.13-bundle_2.11` -- `hudi-flink1.13-bundle_2.12` -- `hudi-flink1.14-bundle_2.11` -- `hudi-flink1.14-bundle_2.12` +- `hudi-flink1.13-bundle` +- `hudi-flink1.14-bundle` +- `hudi-flink1.15-bundle` Step.2 start Flink clust
[jira] [Commented] (HUDI-4723) Add document about Hoodie Catalog
[ https://issues.apache.org/jira/browse/HUDI-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585211#comment-17585211 ] Danny Chen commented on HUDI-4723: -- Fixed via asf-site: f025f0b7af5531f478f91fdd7c37f804c507f9b3 > Add document about Hoodie Catalog > - > > Key: HUDI-4723 > URL: https://issues.apache.org/jira/browse/HUDI-4723 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0, 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-4723) Add document about Hoodie Catalog
[ https://issues.apache.org/jira/browse/HUDI-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-4723. -- > Add document about Hoodie Catalog > - > > Key: HUDI-4723 > URL: https://issues.apache.org/jira/browse/HUDI-4723 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0, 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1228184438 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN * a0e2f520a7f422bd396b984c3cec2c5653a41743 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10965) * ee8c930fdd2e713a5d220bd6bccc13cbc41ba6a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10973) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xicm opened a new issue, #6509: [SUPPORT]
xicm opened a new issue, #6509: URL: https://github.com/apache/hudi/issues/6509 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I described in https://issues.apache.org/jira/browse/HUDI-3983. I get a connection closed exception with HBase index. We use relocation in spark bundle, when I remove the relocations, the job succeed. I have been debugging the differences between with relocation and without relocation for a long time, but found nothing. **To Reproduce** Steps to reproduce the behavior: 1. As we use relocation in spark bundle, this conf will cause ClassNotFoundException, comment the listener class in in hudi-common/src/main/resources/hbase-site.xml. ``` hbase.status.listener.class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener Implementation of the status listener with a multicast message. ``` Add this conf we can get the error message quickly. ``` hbase.client.retries.number 0 ``` 2. Add org.apache.hbase.thirdparty:hbase-shaded-gson in packaging/hudi-spark-bundle/pom.xml 3. write data with hbase index. ``` df.write.format("org.apache.hudi"). options(getQuickstartWriteConfigs). option(PRECOMBINE_FIELD.key, "ts"). option(RECORDKEY_FIELD.key, "uuid"). option(PARTITIONPATH_FIELD.key, "partitionpath"). option(TBL_NAME.key, tableName). option(TABLENAME.key(), tableName). option(INDEX_TYPE.key, "HBASE"). option(ZKQUORUM.key, "${hbase.zookeeper.quorum}"). option(ZKPORT.key, "2181"). option(ZK_NODE_PATH.key, "${zooKeeper.znode.parent }"). option("hoodie.metadata.index.column.stats.enable", "true"). option("hoodie.embed.timeline.server", "false"). mode(Overwrite). save(tablePath) ``` **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : 3.1.1 * Hive version : 3.1.2 * Hadoop version : 3.3.0 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ``` org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=1, exceptions: 2022-08-26T07:12:57.603Z, RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, maxAttempts=1}, org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closedat org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146) at org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection closed at org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415) at org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411) at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118) at org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133) at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203) at org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211) at org.apache.hudi.org.apache.hbase.
[GitHub] [hudi] jsbali commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi
jsbali commented on PR #6502: URL: https://github.com/apache/hudi/pull/6502#issuecomment-1228203400 @nsivabalan Can you please review this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] linfey90 commented on a diff in pull request #6456: [HUDI-4674]Change the default value of inputFormat for the MOR table
linfey90 commented on code in PR #6456: URL: https://github.com/apache/hudi/pull/6456#discussion_r955794992 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/CreateHoodieTableCommand.scala: ## @@ -120,10 +119,8 @@ object CreateHoodieTableCommand { val tableType = tableConfig.getTableType.name() val inputFormat = tableType match { - case DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL => + case DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL | DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL => Review Comment: year,I began to think that Hive offline tasks are not sensitive to time and data timeliness. It is better to use read optimized tables and keep the COW InputFormat as the default value. But according to the latest data, reading snapshot data is also good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4724) add function of skip the _rt suffix for read snapshot
linfey.nie created HUDI-4724: Summary: add function of skip the _rt suffix for read snapshot Key: HUDI-4724 URL: https://issues.apache.org/jira/browse/HUDI-4724 Project: Apache Hudi Issue Type: Improvement Reporter: linfey.nie During Hive query, we usually use the original table name to write SQL. Therefore, we need to skip the _rt suffix for read snapshot, the latest data for calculation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] linfey90 opened a new pull request, #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
linfey90 opened a new pull request, #6510: URL: https://github.com/apache/hudi/pull/6510 ### Change Logs add function of skip the _rt suffix for read snapshot when hive sync meta. ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4724) add function of skip the _rt suffix for read snapshot
[ https://issues.apache.org/jira/browse/HUDI-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4724: - Labels: pull-request-available (was: ) > add function of skip the _rt suffix for read snapshot > - > > Key: HUDI-4724 > URL: https://issues.apache.org/jira/browse/HUDI-4724 > Project: Apache Hudi > Issue Type: Improvement >Reporter: linfey.nie >Priority: Major > Labels: pull-request-available > > During Hive query, we usually use the original table name to write SQL. > Therefore, we need to skip the _rt suffix for read snapshot, the latest data > for calculation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
hudi-bot commented on PR #6510: URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228256562 ## CI report: * 0d5c20a7e3f8a113b278d16a528978aa8428c71a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liqiquan opened a new issue, #6511: [SUPPORT]
liqiquan opened a new issue, #6511: URL: https://github.com/apache/hudi/issues/6511 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Using insert_overwrite_table mode, presto reads and returns data from all versions of parquet files **To Reproduce** Steps to reproduce the behavior: 1.Use the insert_overwrite_table mode to write the hudi table, at least twice 2. Presto reads the table in step 1. If the catalog is hudi, reading the hudi table is normal 3.Presto reads the table in step 1. If the catalog is hive, the version cannot be distinguished when reading the hudi table, and the data of all versions of parquet files will be read. For example, I write twice, each time I write 100 pieces of data. When using presto to read, it should read 100 pieces of data of the latest version, but actually all 200 pieces of data will be read. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.2.2 * Hive version : 2.7.3 * Hadoop version :3.3.2 * Presto version:0.275 * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
hudi-bot commented on PR #6510: URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228262156 ## CI report: * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query
YannByron commented on PR #6499: URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228302173 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6139: [HUDI-4396] Add a boolean parameter to decide whether the partition is cascade or not when hive table columns changes
hudi-bot commented on PR #6139: URL: https://github.com/apache/hudi/pull/6139#issuecomment-1228320272 ## CI report: * 41c2a64f85fae05f3794412bb0bf668f5d1adc5c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10971) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function
hudi-bot commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1228320633 ## CI report: * 37785220f2d17a1a04d136521f10c3a0314fe448 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10970) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6499: URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228320971 ## CI report: * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228325589 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897) * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6499: URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228325939 ## CI report: * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6505: AwsglueSync Turn already exist error into warning
hudi-bot commented on PR #6505: URL: https://github.com/apache/hudi/pull/6505#issuecomment-1228326014 ## CI report: * 24c8b543afd26438898efff96c98c81130c9ca54 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10960) * 7f0d738cfb3460682a3690ee53ecd5d002bdd37e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
parisni commented on code in PR #6510: URL: https://github.com/apache/hudi/pull/6510#discussion_r955903772 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -799,6 +799,12 @@ private FlinkOptions() { .defaultValue(false) .withDescription("Skip the _ro suffix for Read optimized table when registering, default false"); + public static final ConfigOption HIVE_SYNC_SKIP_RT_SUFFIX = ConfigOptions + .key("hive_sync.skip_rt_suffix") + .booleanType() + .defaultValue(false) Review Comment: may you add sinceValue("0.12.1") to track in the doc when to use? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6505: AwsglueSync Turn already exist error into warning
hudi-bot commented on PR #6505: URL: https://github.com/apache/hudi/pull/6505#issuecomment-1228330772 ## CI report: * 24c8b543afd26438898efff96c98c81130c9ca54 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10960) * 7f0d738cfb3460682a3690ee53ecd5d002bdd37e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10976) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
parisni commented on code in PR #6510: URL: https://github.com/apache/hudi/pull/6510#discussion_r955904736 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfigHolder.java: ## @@ -77,6 +77,10 @@ public class HiveSyncConfigHolder { .key("hoodie.datasource.hive_sync.skip_ro_suffix") .defaultValue("false") .withDocumentation("Skip the _ro suffix for Read optimized table, when registering"); + public static final ConfigProperty HIVE_SKIP_RT_SUFFIX_FOR_READ_SNAPSHOT_TABLE = ConfigProperty + .key("hoodie.datasource.hive_sync.skip_rt_suffix") + .defaultValue("false") Review Comment: same here (sinceVersion) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1228330635 ## CI report: * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN * ee8c930fdd2e713a5d220bd6bccc13cbc41ba6a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10973) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228330379 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897) * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] linfey90 commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
linfey90 commented on code in PR #6510: URL: https://github.com/apache/hudi/pull/6510#discussion_r955923516 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java: ## @@ -799,6 +799,12 @@ private FlinkOptions() { .defaultValue(false) .withDescription("Skip the _ro suffix for Read optimized table when registering, default false"); + public static final ConfigOption HIVE_SYNC_SKIP_RT_SUFFIX = ConfigOptions + .key("hive_sync.skip_rt_suffix") + .booleanType() + .defaultValue(false) Review Comment: Yes, I'd love to, but there's no since method. Any other suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] linfey90 commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
linfey90 commented on code in PR #6510: URL: https://github.com/apache/hudi/pull/6510#discussion_r955923699 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfigHolder.java: ## @@ -77,6 +77,10 @@ public class HiveSyncConfigHolder { .key("hoodie.datasource.hive_sync.skip_ro_suffix") .defaultValue("false") .withDocumentation("Skip the _ro suffix for Read optimized table, when registering"); + public static final ConfigProperty HIVE_SKIP_RT_SUFFIX_FOR_READ_SNAPSHOT_TABLE = ConfigProperty + .key("hoodie.datasource.hive_sync.skip_rt_suffix") + .defaultValue("false") Review Comment: done it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
hudi-bot commented on PR #6510: URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228382701 ## CI report: * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974) * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
hudi-bot commented on PR #6510: URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228386984 ## CI report: * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974) * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10977) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…
hudi-bot commented on PR #6491: URL: https://github.com/apache/hudi/pull/6491#issuecomment-1228394692 ## CI report: * b5c6e2abaf1ada46e5a17f77934f52e9b5fd61a5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10972) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] awkwardd opened a new pull request, #6512: add inoder commit for multi plan compaction
awkwardd opened a new pull request, #6512: URL: https://github.com/apache/hudi/pull/6512 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ **Risk level: none | low | medium | high** _Choose one. If medium or high, explain what verification was done to mitigate the risks._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4725) Add inorder commit for multi compaction plan
谭亚君 created HUDI-4725: - Summary: Add inorder commit for multi compaction plan Key: HUDI-4725 URL: https://issues.apache.org/jira/browse/HUDI-4725 Project: Apache Hudi Issue Type: Improvement Components: compaction Reporter: 谭亚君 Assignee: 谭亚君 when we use the multi plan,we may need the plan to commit inorder,so I try to figure that. https://github.com/apache/hudi/pull/6512 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
hudi-bot commented on PR #6510: URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228438808 ## CI report: * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974) * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10977) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228443945 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897) * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975) * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6512: add inoder commit for multi plan compaction
hudi-bot commented on PR #6512: URL: https://github.com/apache/hudi/pull/6512#issuecomment-1228444355 ## CI report: * 9949aa9d4a41bffa79c61bfb2869a7031279e894 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228449102 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897) * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975) * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6512: add inoder commit for multi plan compaction
hudi-bot commented on PR #6512: URL: https://github.com/apache/hudi/pull/6512#issuecomment-1228449452 ## CI report: * 9949aa9d4a41bffa79c61bfb2869a7031279e894 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10979) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228518501 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975) * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot
hudi-bot commented on PR #6510: URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228519040 ## CI report: * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10977) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6505: AwsglueSync Turn already exist error into warning
hudi-bot commented on PR #6505: URL: https://github.com/apache/hudi/pull/6505#issuecomment-1228605038 ## CI report: * 7f0d738cfb3460682a3690ee53ecd5d002bdd37e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10976) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6512: add inoder commit for multi plan compaction
hudi-bot commented on PR #6512: URL: https://github.com/apache/hudi/pull/6512#issuecomment-1228605133 ## CI report: * 9949aa9d4a41bffa79c61bfb2869a7031279e894 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10979) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request, #6513: [HUDI-4695] Fix inline compaction flaky test
nsivabalan opened a new pull request, #6513: URL: https://github.com/apache/hudi/pull/6513 ### Change Logs Fixed flaky InlineCompactionTest. Test had some dependency on timer. Have bumped up the timer so that its more deterministic. ### Impact Improves CI stability **Risk level: medium** ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4695) Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 expected: <4> but was: <5>
[ https://issues.apache.org/jira/browse/HUDI-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4695: - Labels: pull-request-available (was: ) > Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 > expected: <4> but was: <5> > -- > > Key: HUDI-4695 > URL: https://issues.apache.org/jira/browse/HUDI-4695 > Project: Apache Hudi > Issue Type: Task >Reporter: Raymond Xu >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.12.1 > > > https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10841&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4695) Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 expected: <4> but was: <5>
[ https://issues.apache.org/jira/browse/HUDI-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4695: -- Status: Patch Available (was: In Progress) > Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 > expected: <4> but was: <5> > -- > > Key: HUDI-4695 > URL: https://issues.apache.org/jira/browse/HUDI-4695 > Project: Apache Hudi > Issue Type: Task >Reporter: Raymond Xu >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.12.1 > > > https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10841&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4327) TestHoodieDeltaStreamer#testCleanerDeleteReplacedDataWithArchive is flaky
[ https://issues.apache.org/jira/browse/HUDI-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4327: -- Status: In Progress (was: Open) > TestHoodieDeltaStreamer#testCleanerDeleteReplacedDataWithArchive is flaky > - > > Key: HUDI-4327 > URL: https://issues.apache.org/jira/browse/HUDI-4327 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci, timeline-server >Reporter: Sagar Sumit >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.12.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6499: URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228631575 ## CI report: * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951) * c06511ee9e0c8ef0e2973242e6aafd6c0ef4e59a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6513: [HUDI-4695] Fix inline compaction flaky test
hudi-bot commented on PR #6513: URL: https://github.com/apache/hudi/pull/6513#issuecomment-1228631676 ## CI report: * 0e51b201bfd85884cbdc1e90f2d794119d0eb66a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #6416: [Stacked on 6386] Fixing `DebeziumSource` to properly commit consumed offsets
alexeykudinkin commented on PR #6416: URL: https://github.com/apache/hudi/pull/6416#issuecomment-1228645231 CI is green https://user-images.githubusercontent.com/428277/186941224-298537df-e2f1-4e1f-98ec-ba769dff7177.png";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…
hudi-bot commented on PR #6393: URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228683810 ## CI report: * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6513: [HUDI-4695] Fix inline compaction flaky test
hudi-bot commented on PR #6513: URL: https://github.com/apache/hudi/pull/6513#issuecomment-1228684267 ## CI report: * 0e51b201bfd85884cbdc1e90f2d794119d0eb66a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10981) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6499: URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228684161 ## CI report: * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951) * c06511ee9e0c8ef0e2973242e6aafd6c0ef4e59a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10980) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
HunterHunter created HUDI-4726: -- Summary: When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed. Key: HUDI-4726 URL: https://issues.apache.org/jira/browse/HUDI-4726 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: HunterHunter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table
hudi-bot commented on PR #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1228748950 ## CI report: * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN * 59eacbed10467905643880e951b9f969a86747b9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9108) * f590033bff5a7140e68bcbeba2d48f0edcb79685 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query
hudi-bot commented on PR #6499: URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228750506 ## CI report: * c06511ee9e0c8ef0e2973242e6aafd6c0ef4e59a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10980) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterHunter reassigned HUDI-4726: -- Assignee: HunterHunter > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterHunter >Assignee: HunterHunter >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table
hudi-bot commented on PR #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1228752810 ## CI report: * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN * 59eacbed10467905643880e951b9f969a86747b9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9108) * f590033bff5a7140e68bcbeba2d48f0edcb79685 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10982) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4600) Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
[ https://issues.apache.org/jira/browse/HUDI-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter reassigned HUDI-4600: --- Assignee: HunterXHunter > Hive synchronization failure : Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > -- > > Key: HUDI-4600 > URL: https://issues.apache.org/jira/browse/HUDI-4600 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Blocker > > > {code:java} > 10:32:28.039 [pool-9-thread-1] ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler - Retrying HMSHandler > after 2000 ms (attempt 1 of 10) with error: > javax.jdo.JDOFatalInternalException: Unexpected exception caught. > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550) > at > org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:659) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248) > at > org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsO
[jira] [Updated] (HUDI-3314) support merge into with no-pk condition
[ https://issues.apache.org/jira/browse/HUDI-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3314: - Epic Link: HUDI-4699 > support merge into with no-pk condition > --- > > Key: HUDI-3314 > URL: https://issues.apache.org/jira/browse/HUDI-3314 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-1885) Support Delete/Update Non-Pk Table
[ https://issues.apache.org/jira/browse/HUDI-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1885: - Epic Link: HUDI-4699 > Support Delete/Update Non-Pk Table > -- > > Key: HUDI-1885 > URL: https://issues.apache.org/jira/browse/HUDI-1885 > Project: Apache Hudi > Issue Type: New Feature > Components: spark, spark-sql >Reporter: pengzhiwei >Assignee: Yann Byron >Priority: Critical > Fix For: 0.12.1 > > > Allow to delete/update a non-pk table. > {code:java} > create table h0 ( > id int, > name string, > price double > ) using hudi; > delete from h0 where id = 10; > update h0 set price = 10 where id = 12; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-2681) Make hoodie record_key and preCombine_key optional
[ https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2681: - Component/s: spark-sql writer-core Epic Link: HUDI-4699 > Make hoodie record_key and preCombine_key optional > -- > > Key: HUDI-2681 > URL: https://issues.apache.org/jira/browse/HUDI-2681 > Project: Apache Hudi > Issue Type: New Feature > Components: Common Core, spark-sql, writer-core >Reporter: Vinoth Govindarajan >Assignee: Yann Byron >Priority: Major > > At present, Hudi needs an record key and preCombine key to create an Hudi > datasets, which puts an restriction on the kinds of datasets we can create > using Hudi. > > In order to increase the adoption of Hudi file format across all kinds of > derived datasets, similar to Parquet/ORC, we need to offer flexibility to > users. I understand that record key is used for upsert primitive and we need > preCombine key to break the tie and deduplicate, but there are event data and > other datasets without any primary key (append only datasets), which can > benefit from Hudi since Hudi ecosystem offers other features such as snapshot > isolation, indexes, clustering, delta streamer etc., which could be applied > to any datasets without record key. > > The idea of this proposal is to make both the record key and preCombine key > optional to allow variety of new use cases on top of Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4727) Direct conversion from Proto Message to Row
Timothy Brown created HUDI-4727: --- Summary: Direct conversion from Proto Message to Row Key: HUDI-4727 URL: https://issues.apache.org/jira/browse/HUDI-4727 Project: Apache Hudi Issue Type: New Feature Reporter: Timothy Brown The initial implementation for the Proto source converts from Message to Avro to Row in the SourceFormatAdapter when the source needs to be read as a Dataset. Let's remove the intermediate Avro representation and convert directly from Message to Row. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. {code} The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet. was: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. -- The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet.{code} > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should > be like exp3. > {code} > The root of the problem is `IncrementalInputSplits.inputSplits`, because > `startCommit` is out of range, `fullTable
[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.
[ https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterXHunter updated HUDI-4726: Description: {code:java} -- create CREATE TABLE hudi_4726( id string, msg string, `partition` STRING, PRIMARY KEY(id) NOT ENFORCED )PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'write.operation'='upsert', 'path' = 'hudi_4726', 'index.type' = 'BUCKET', 'hoodie.bucket.index.num.buckets' = '2', 'compaction.delta_commits' = '2', 'table.type' = 'MERGE_ON_READ', 'compaction.async.enabled'='true') -- insert INSERT INTO hudi_4726 values ('id1','t1','par1') INSERT INTO hudi_4726 values ('id1','t2','par1') INSERT INTO hudi_4726 values ('id1','t3','par1') INSERT INTO hudi_4726 values ('id1','t4','par1') -- .hoodie t1.deltacommit (t1) t2.deltacommit (t2) t3.commit (t2) t4.deltacommit (t3) t5.deltacommit (t4) t6.commit (t4) t3.parquet t6.parquet -- read exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, par1]) exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, par1]) exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, par1]) -- but 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should be like exp3. -- The root of the problem is `IncrementalInputSplits.inputSplits`, because `startCommit` is out of range, `fullTableScan` is `true`, finally, the file read is t6..parquet instead of t3.parquet.{code} > When using Flink for incremental query, when `read.start-commit is out of > range`, full table scanning should not be performed. > -- > > Key: HUDI-4726 > URL: https://issues.apache.org/jira/browse/HUDI-4726 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: HunterXHunter >Assignee: HunterXHunter >Priority: Major > > > {code:java} > -- create > CREATE TABLE hudi_4726( > id string, > msg string, > `partition` STRING, > PRIMARY KEY(id) NOT ENFORCED > )PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'write.operation'='upsert', > 'path' = 'hudi_4726', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '2', > 'compaction.delta_commits' = '2', > 'table.type' = 'MERGE_ON_READ', > 'compaction.async.enabled'='true') > -- insert > INSERT INTO hudi_4726 values ('id1','t1','par1') > INSERT INTO hudi_4726 values ('id1','t2','par1') > INSERT INTO hudi_4726 values ('id1','t3','par1') > INSERT INTO hudi_4726 values ('id1','t4','par1') > -- .hoodie > t1.deltacommit (t1) > t2.deltacommit (t2) > t3.commit (t2) > t4.deltacommit (t3) > t5.deltacommit (t4) > t6.commit (t4) > t3.parquet > t6.parquet > -- read > exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1' -- (true,+I[id1, t1, > par1]) > exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, > par1]) > exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, > par1]) > -- but > 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should > be like exp3. > -- > The root of the problem is `IncrementalInputSplits.inputSplits`, because > `startCommit` is out of range, `fullTableScan` is `true`, finally, the file > read is t6..parquet instead of t3.parquet.{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan commented on pull request #6513: [HUDI-4695] Fix inline compaction flaky test
xushiyan commented on PR #6513: URL: https://github.com/apache/hudi/pull/6513#issuecomment-1228832977 can you please separate the cli feature from this pr? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table
hudi-bot commented on PR #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1228857265 ## CI report: * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN * f590033bff5a7140e68bcbeba2d48f0edcb79685 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10982) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields
alexeykudinkin commented on code in PR #6017: URL: https://github.com/apache/hudi/pull/6017#discussion_r956400078 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala: ## @@ -162,11 +163,11 @@ object HoodieSparkUtils extends SparkAdapterSupport { if (rows.isEmpty) { Iterator.empty } else { +val readerAvroSchema = new Schema.Parser().parse(readerAvroSchemaStr) val transform: GenericRecord => GenericRecord = if (sameSchema) identity else { -val readerAvroSchema = new Schema.Parser().parse(readerAvroSchemaStr) -rewriteRecord(_, readerAvroSchema) Review Comment: BTW, one miss for the new API is that previously `rewriteRecord` was validating that the record adheres to the new schema while new method doesn't do that (this obscures the issues when conversion is not following Avro evolution rules) ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala: ## @@ -162,11 +163,11 @@ object HoodieSparkUtils extends SparkAdapterSupport { if (rows.isEmpty) { Iterator.empty } else { +val readerAvroSchema = new Schema.Parser().parse(readerAvroSchemaStr) val transform: GenericRecord => GenericRecord = if (sameSchema) identity else { -val readerAvroSchema = new Schema.Parser().parse(readerAvroSchemaStr) -rewriteRecord(_, readerAvroSchema) Review Comment: @xiarixiaoyao since we're changing this, shall we also revisit all the other places that use `rewriteRecord` and consider rebasing them onto the new methods? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (11f85d1efb -> 797e7a67a9)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 11f85d1efb Revert "[HUDI-3669] Add a remote request retry mechanism for 'Remotehoodietablefiles… (#5884)" (#6501) add 797e7a67a9 [Stacked on 6386] Fixing `DebeziumSource` to properly commit offsets; (#6416) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/utilities/deltastreamer/DeltaSync.java | 1 + .../hudi/utilities/deltastreamer/HoodieDeltaStreamer.java | 2 +- .../apache/hudi/utilities/sources/debezium/DebeziumSource.java | 8 .../apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java | 10 +++--- 4 files changed, 17 insertions(+), 4 deletions(-)
[GitHub] [hudi] nsivabalan merged pull request #6416: [Stacked on 6386] Fixing `DebeziumSource` to properly commit consumed offsets
nsivabalan merged PR #6416: URL: https://github.com/apache/hudi/pull/6416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6472: [HUDI-4549] Remove Avro shading from hudi-hive-sync-bundle
hudi-bot commented on PR #6472: URL: https://github.com/apache/hudi/pull/6472#issuecomment-1228883212 ## CI report: * faecb216bdeb30a459040846bb9a5167556fd605 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10879) * 02329634ac100d362b6e9fa714faaad3e27298f4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6472: [HUDI-4549] Remove Avro shading from hudi-hive-sync-bundle
hudi-bot commented on PR #6472: URL: https://github.com/apache/hudi/pull/6472#issuecomment-1228886953 ## CI report: * faecb216bdeb30a459040846bb9a5167556fd605 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10879) * 02329634ac100d362b6e9fa714faaad3e27298f4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10984) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4721) Fix thread safety w/ RemoteTableFileSystemView
[ https://issues.apache.org/jira/browse/HUDI-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-4721: - Assignee: sivabalan narayanan > Fix thread safety w/ RemoteTableFileSystemView > --- > > Key: HUDI-4721 > URL: https://issues.apache.org/jira/browse/HUDI-4721 > Project: Apache Hudi > Issue Type: Test > Components: reader-core, writer-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.12.1 > > > After retry mechanism was added to RemoteTableFileSystemView, looks like the > code is not thread safe. > > [https://github.com/apache/hudi/pull/5884/files#diff-0d301525ef388eb460372ea300c827728c954fdda799adfce7040158ec8b1d84R183|https://github.com/apache/hudi/pull/5884/files#r955363946] > > This might impact regular flows as well even if no retries are enabled. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4721) Fix thread safety w/ RemoteTableFileSystemView
[ https://issues.apache.org/jira/browse/HUDI-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4721. - Resolution: Fixed > Fix thread safety w/ RemoteTableFileSystemView > --- > > Key: HUDI-4721 > URL: https://issues.apache.org/jira/browse/HUDI-4721 > Project: Apache Hudi > Issue Type: Test > Components: reader-core, writer-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.12.1 > > > After retry mechanism was added to RemoteTableFileSystemView, looks like the > code is not thread safe. > > [https://github.com/apache/hudi/pull/5884/files#diff-0d301525ef388eb460372ea300c827728c954fdda799adfce7040158ec8b1d84R183|https://github.com/apache/hudi/pull/5884/files#r955363946] > > This might impact regular flows as well even if no retries are enabled. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4728) Add support to skip larger log blocks with minor log compaction
sivabalan narayanan created HUDI-4728: - Summary: Add support to skip larger log blocks with minor log compaction Key: HUDI-4728 URL: https://issues.apache.org/jira/browse/HUDI-4728 Project: Apache Hudi Issue Type: Improvement Components: compaction Reporter: sivabalan narayanan Is there a size threshold to exclude big log blocks? Why do log compaction on log blocks that are big enough? Thoughts Good point. For initial version we want to target all the blocks. In the coming iterations I will include block sizes threshold as well. Current logic of AbstractHoodieLogRecordReader should be able to handle it as well. For streaming workloads, this might be very heavy. So, we need to support this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6472: [HUDI-4549] Remove Avro shading from hudi-hive-sync-bundle
hudi-bot commented on PR #6472: URL: https://github.com/apache/hudi/pull/6472#issuecomment-1229002706 ## CI report: * 02329634ac100d362b6e9fa714faaad3e27298f4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10984) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3217: - Due Date: 4/Sep/22 (was: 30/Sep/22) > RFC-46: Optimize Record Payload handling > > > Key: HUDI-3217 > URL: https://issues.apache.org/jira/browse/HUDI-3217 > Project: Apache Hudi > Issue Type: Epic > Components: storage-management, writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.13.0 > > > Currently Hudi is biased t/w assumption of particular payload representation > (Avro), long-term we would like to steer away from this to keep the record > payload be completely opaque, so that > # We can keep record payload representation engine-specific > # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > > Binary) > h2. *Proposal* > > *Phase 2: Revisiting Record Handling* > {_}T-shirt{_}: 2-2.5 weeks > {_}Goal{_}: Avoid tight coupling with particular record representation on the > Read Path (currently Avro) and enable > * Revisit RecordPayload APIs > ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs > replacing w/ new “opaque” APIs (not returning Avro payloads) > ** Rebase RecordPayload hierarchy to be engine-specific: > *** Common engine-specific base abstracting common functionality (Spark, > Flink, Java) > *** Each feature-specific semantic will have to implement for all engines > ** Introduce new APIs > *** To access keys (record, partition) > *** To convert record to Avro (for BWC) > * Revisit RecordPayload handling > ** In WriteHandles > *** API will be accepting opaque RecordPayload (no Avro conversion) > *** Can do (opaque) record merging if necessary > *** Passes RP as is to FileWriter > ** In FileWriters > *** Will accept RecordPayload interface > *** Should be engine-specific (to handle internal record representation > ** In RecordReaders > *** API will be providing opaque RecordPayload (no Avro conversion) > > REF > [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries
yihua commented on issue #6511: URL: https://github.com/apache/hudi/issues/6511#issuecomment-1229030208 @liqiquan Did you use `INSERT OVERWRITE TABLE` in Spark SQL to write the Hudi table? How did you create the table? Is the Hudi table synced to Hive for Presto to query? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6509: [SUPPORT] HBase connection closed exception
yihua commented on issue #6509: URL: https://github.com/apache/hudi/issues/6509#issuecomment-1229038120 @xicm we have to shade the HBase classes to be compatible with Hive query engine which introduces HBase classes as well. Does changing all relevant class names with shading pattern in `hudi-common/src/main/resources/hbase-site.xml` work for you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6504: [SUPPORT]
yihua commented on issue #6504: URL: https://github.com/apache/hudi/issues/6504#issuecomment-1229041737 @santoshraj123 Could you upload the complete Spark driver log? Do you see any error logs before `Commit 20220823151531894 failed and rolled-back !`, specifically sth like `Delta Sync found errors when writing. Errors/Total=` and `Printing out the top 100 errors`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4582) Sync 11w partitions to hive by using HiveSyncTool with(--sync-mode="hms" and use-jdbc=false) with timeout
[ https://issues.apache.org/jira/browse/HUDI-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4582: - Reviewers: Raymond Xu (was: sivabalan narayanan) > Sync 11w partitions to hive by using HiveSyncTool with(--sync-mode="hms" and > use-jdbc=false) with timeout > - > > Key: HUDI-4582 > URL: https://issues.apache.org/jira/browse/HUDI-4582 > Project: Apache Hudi > Issue Type: Improvement > Components: meta-sync >Reporter: XixiHua >Assignee: XixiHua >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.1 > > > when we try to sync 11w partitions to hive by using > HiveSyncTool(--sync-mode="hms" and use-jdbc=false) with timeout error. > > With https://issues.apache.org/jira/browse/HUDI-2116, this only solved > --sync-mode = jdbc with the parameter: HIVE_BATCH_SYNC_PARTITION_NUM, and I > want to extend this to hms mode. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false
hudi-bot commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1229042889 ## CI report: * 473a8b74676e345ee91093a3fe9885e062ca UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6503: [SUPPORT] Hudi Merge Into with larger volume
yihua commented on issue #6503: URL: https://github.com/apache/hudi/issues/6503#issuecomment-1229045695 @maduraitech could you provide your `MERGE INTO` SQL statement, assuming you're using Spark SQL? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #6240: [HUDI-4482] remove guava and use caffeine instead for cache
xushiyan commented on PR #6240: URL: https://github.com/apache/hudi/pull/6240#issuecomment-1229045983 hey @KnightChess a gentle reminder: 1) guava dependency cleanup from hadoop-mr and spark bundles as shown above. 2) a separate PR to fix the styles in integ test module. thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6479: [SUPPORT] How to query the previous SNAPSHOT in Hive
yihua commented on issue #6479: URL: https://github.com/apache/hudi/issues/6479#issuecomment-1229050279 @china-shang If I'm not wrong, time travel query is not support for Hive query engine. The incremental query is supported on Hive: https://hudi.apache.org/docs/querying_data#incremental-query-1. You may try setting `fromCommitTime=0` and `maxCommits=` to approximate what you need. cc @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6469: [SUPPORT] z-order is not working
yihua commented on issue #6469: URL: https://github.com/apache/hudi/issues/6469#issuecomment-1229053212 @sangeethsasidharan could you share the Hudi timeline, i.e., file listing under `mys3path/.hoodie`? Is clustering scheduled and executed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4727) Direct conversion from Proto Message to Row
[ https://issues.apache.org/jira/browse/HUDI-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongkyun Lee reassigned HUDI-4727: -- Assignee: Yongkyun Lee > Direct conversion from Proto Message to Row > --- > > Key: HUDI-4727 > URL: https://issues.apache.org/jira/browse/HUDI-4727 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Timothy Brown >Assignee: Yongkyun Lee >Priority: Minor > > The initial implementation for the Proto source converts from Message to Avro > to Row in the SourceFormatAdapter when the source needs to be read as a > Dataset. Let's remove the intermediate Avro representation and convert > directly from Message to Row. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] functicons opened a new issue, #6514: [SUPPORT] Creating table with SparkSQL fails with FileNotFoundException
functicons opened a new issue, #6514: URL: https://github.com/apache/hudi/issues/6514 **Describe the problem you faced** I'm trying to create a new table with SparkSQL in spark-shell: ``` spark.sql("""create table test8(id int,name string) using hudi options (primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test8'""") ``` The error is really confusing to me, why does Hudi expects the path to exist in advance? ``` java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test8 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1541) at org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50) at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3369) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3368) at org.apache.spark.sql.Dataset.(Dataset.scala:194) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643) ... 45 elided ``` **To Reproduce** Steps to reproduce the behavior: Spark 2.4.8, Scala 2.12, Hudi 2.12:0.11.1 ``` $ spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.12:0.11.1 --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" scala> spark.sql("""create table test9(id int,name string) using hudi options (primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test9'""") ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test9 at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1528) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1521) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1536) at org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50) at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194) at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) at org.apache.spark.sql.Dataset.(Dataset.scala:194) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at org.apache.spark.sql.Spa
[GitHub] [hudi] nsivabalan commented on a diff in pull request #5958: [HUDI-3900] [UBER] Support log compaction action for MOR tables
nsivabalan commented on code in PR #5958: URL: https://github.com/apache/hudi/pull/5958#discussion_r956461386 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java: ## @@ -163,6 +183,21 @@ public class HoodieCompactionConfig extends HoodieConfig { + "record size estimate compute dynamically based on commit metadata. " + " This is critical in computing the insert parallelism and bin-packing inserts into small files."); + public static final ConfigProperty ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES = ConfigProperty + .key("hoodie.archive.merge.small.file.limit.bytes") Review Comment: these are already in HoodieArchivalConfig right? did you move it here or added new ones ? ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -240,8 +245,9 @@ private Pair> getFilesToCleanKeepingLatestVersions( int keepVersions = config.getCleanerFileVersionsRetained(); // do not cleanup slice required for pending compaction Iterator fileSliceIterator = - fileGroup.getAllFileSlices().filter(fs -> !isFileSliceNeededForPendingCompaction(fs)).iterator(); - if (isFileGroupInPendingCompaction(fileGroup)) { + fileGroup.getAllFileSlices().filter(fs -> !isFileSliceNeededForPendingCompaction(fs) + && !isFileSliceNeededForPendingLogCompaction(fs)).iterator(); + if (isFileGroupInPendingCompaction(fileGroup) || isFileGroupInPendingLogCompaction(fileGroup)) { Review Comment: same here. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/HoodieCompactionPlanGenerator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.table.action.compact.plan.generators; + +import org.apache.hudi.avro.model.HoodieCompactionOperation; +import org.apache.hudi.avro.model.HoodieCompactionPlan; +import org.apache.hudi.common.data.HoodieAccumulator; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.fs.FSUtils; +import org.apache.hudi.common.model.CompactionOperation; +import org.apache.hudi.common.model.HoodieBaseFile; +import org.apache.hudi.common.model.HoodieFileGroupId; +import org.apache.hudi.common.model.HoodieLogFile; +import org.apache.hudi.common.model.HoodieRecordPayload; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.view.SyncableFileSystemView; +import org.apache.hudi.common.util.CompactionUtils; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.ValidationUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.table.HoodieTable; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; + +import java.io.IOException; +import java.util.List; +import java.util.Set; +import java.util.stream.Collectors; + +import static java.util.stream.Collectors.toList; + +public class HoodieCompactionPlanGenerator extends BaseHoodieCompactionPlanGenerator { + + private static final Logger LOG = LogManager.getLogger(HoodieCompactionPlanGenerator.class); + + public HoodieCompactionPlanGenerator(HoodieTable table, HoodieEngineContext engineContext, HoodieWriteConfig writeConfig) { +super(table, engineContext, writeConfig); + } + + /** + * Generate a new compaction plan for scheduling. + * @return Compaction Plan + * @throws java.io.IOException when encountering errors + */ + @Override + public HoodieCompactionPlan generateCompactionPlan() throws IOException { Review Comment: I assume this is just moved w/o any changes. let me know if you had changed anything in these code blocks. ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -456,6 +491,23 @@ public List close() { } } + public void write(Map> recordMap) { +Iterator keyIterator = recordMap.keySet().stream().iterator(); Review Comment: can't we iterate the entries only rather than just keys?
[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false
hudi-bot commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1229065946 ## CI report: * Unknown: [CANCELED](TBD) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org