date:20220826

[jira] [Created] (HUDI-4723) Add document about Hoodie Catalog

2022-08-26 Thread Danny Chen (Jira)

Danny Chen created HUDI-4723:


 Summary: Add document about Hoodie Catalog
 Key: HUDI-4723
 URL: https://issues.apache.org/jira/browse/HUDI-4723
 Project: Apache Hudi
  Issue Type: Task
  Components: docs
Reporter: Danny Chen
 Fix For: 0.12.1, 0.12.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] YuweiXiao commented on a diff in pull request #6493: [HUDI-4715] Needed To ReSync Hive In StreamWriteOperatorCoordinator's…

2022-08-26 Thread GitBox



YuweiXiao commented on code in PR #6493:
URL: https://github.com/apache/hudi/pull/6493#discussion_r955729960


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java:
##
@@ -396,7 +396,11 @@ private void initInstant(String instant) {
 reset();
   } else {
 LOG.info("Recommit instant {}", instant);
-commitInstant(instant);
+boolean success = commitInstant(instant);
+if (success) {
+  LOG.info("instant {} ReSync Hive", instant);
+  syncHive();
+}

Review Comment:
   Hi Danny, want to bring up another topic here. If the commit is not success, 
e.g., the last batch has no data, we will reuse the instant of the last batch, 
and in the meanwhile, we will also start a new instant. This then leads to an 
inconsistent ckp_meta.
   
   We encounter this issue this our inner branch. I could fire up a fix for 
this if necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index

2022-08-26 Thread xi chaomin (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xi chaomin updated HUDI-3983:
-
Description: 
I ran a spark job and encountered several ClassNotFoundExceptions. spark 
version is 3.1 and scala version is 2.12.

1. 
{code:java}
java.lang.NoClassDefFoundError: 
org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier$Kind
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.translateException(RpcRetryingCallerImpl.java:222)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:195)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:395)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108)
 {code}
including org.apache.hbase:hbase-protocol in 
packaging/hudi-spark-bundle/pom.xml can solve this error.

2.
{code:java}
 java.lang.ClassNotFoundException: 
org.apache.hudi.org.apache.hbase.thirdparty.com.google.gson.GsonBuilder
     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code}
including org.apache.hbase.thirdparty:hbase-shaded-gson n 
packaging/hudi-spark-bundle/pom.xml can solve this error.

3.
{code:java}
 java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
found {code}
There is a configuration in hbase-site.xml 
{code:java}

  hbase.status.listener.class&amp;amp;lt;/name>
  
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener
  
    Implementation of the status listener with a multicast message.
  
 {code}
I set _*hbase.status.listener.class*_ to 
_*org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener*_
 in hbase configureation, the ClassNotFoundException has resolved, but get 
another exception
{code:java}
org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: 
Failed after attempts=1, exceptions:
2022-08-26T07:12:57.603Z, 
RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, 
maxAttempts=1}, 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local exception: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Connection closed    at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146)
    at 
org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local exception: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Connection closed
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211)
    at 
org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at 
org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at 
org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    a

[jira] [Commented] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index

2022-08-26 Thread xi chaomin (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585193#comment-17585193
 ] 

xi chaomin commented on HUDI-3983:
--

Are there known conflicts about hbase dependencies? I remove the relocations 
about hbase, the job succeed. Can we remove these relocations in spark bundle?
{code:java}

  org.apache.hadoop.hbase.
  org.apache.hudi.org.apache.hadoop.hbase.
  
org.apache.hadoop.hbase.KeyValue$KeyComparator
  


  org.apache.hbase.
  org.apache.hudi.org.apache.hbase.


  org.apache.htrace.
  org.apache.hudi.org.apache.htrace.
 {code}

> ClassNotFoundException when using hudi-spark-bundle to write table with hbase 
> index
> ---
>
> Key: HUDI-3983
> URL: https://issues.apache.org/jira/browse/HUDI-3983
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: xi chaomin
>Priority: Critical
> Fix For: 0.12.1
>
>
> I ran a spark job and encountered several ClassNotFoundExceptions. spark 
> version is 3.1 and scala version is 2.12.
> 1. 
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier$Kind
>      at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.translateException(RpcRetryingCallerImpl.java:222)
>      at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:195)
>      at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:395)
>      at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
>      at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108)
>  {code}
> including org.apache.hbase:hbase-protocol in 
> packaging/hudi-spark-bundle/pom.xml can solve this error.
> 2.
> {code:java}
>  java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.hbase.thirdparty.com.google.gson.GsonBuilder
>      at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>      at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code}
> including org.apache.hbase.thirdparty:hbase-shaded-gson n 
> packaging/hudi-spark-bundle/pom.xml can solve this error.
> 3.
> {code:java}
>  java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
> found {code}
> There is a configuration in hbase-site.xml 
> {code:java}
> 
>   hbase.status.listener.class&amp;amp;lt;/name>
>   
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener
>   
>     Implementation of the status listener with a multicast message.
>   
>  {code}
> I set _*hbase.status.listener.class*_ to 
> _*org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener*_
>  in hbase configureation, the ClassNotFoundException has resolved, but get 
> another exception
> {code:java}
> org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: 
> Failed after attempts=1, exceptions:
> 2022-08-26T07:12:57.603Z, 
> RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, 
> maxAttempts=1}, 
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
> Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local 
> exception: 
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
> Connection closed    at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
> Call to address=host-10-19-37-172/10.19.37.172:16020 failed on local 
> exception: 
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
> Connection closed
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384)
>     at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
>     at 
> org.apache.hudi.org.apache.hado

[GitHub] [hudi] danny0405 opened a new pull request, #6508: [HUDI-4723] Add document about Hoodie Catalog

2022-08-26 Thread GitBox



danny0405 opened a new pull request, #6508:
URL: https://github.com/apache/hudi/pull/6508

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4723) Add document about Hoodie Catalog

2022-08-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4723:
-
Labels: pull-request-available  (was: )

> Add document about Hoodie Catalog
> -
>
> Key: HUDI-4723
> URL: https://issues.apache.org/jira/browse/HUDI-4723
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0, 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index

2022-08-26 Thread xi chaomin (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xi chaomin updated HUDI-3983:
-
Description: 
I ran a spark job and encountered several ClassNotFoundExceptions. spark 
version is 3.1 and scala version is 2.12.

1. 
{code:java}
java.lang.NoClassDefFoundError: 
org/apache/hudi/org/apache/hadoop/hbase/protobuf/generated/AuthenticationProtos$TokenIdentifier$Kind
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.translateException(RpcRetryingCallerImpl.java:222)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:195)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:395)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
     at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:108)
 {code}
including org.apache.hbase:hbase-protocol in 
packaging/hudi-spark-bundle/pom.xml can solve this error.

2.
{code:java}
 java.lang.ClassNotFoundException: 
org.apache.hudi.org.apache.hbase.thirdparty.com.google.gson.GsonBuilder
     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code}
including org.apache.hbase.thirdparty:hbase-shaded-gson n 
packaging/hudi-spark-bundle/pom.xml can solve this error.

3.
{code:java}
 java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
found {code}
There is a configuration in hbase-site.xml 
{code:java}

  hbase.status.listener.class&amp;amp;amp;lt;/name>
  
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener
  
    Implementation of the status listener with a multicast message.
  
 {code}
I set _*hbase.status.listener.class*_ to 
_*org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener*_
 in hbase configureation, the ClassNotFoundException has resolved, but get 
another exception
{code:java}
org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: 
Failed after attempts=1, exceptions:
2022-08-26T07:12:57.603Z, 
RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, 
maxAttempts=1}, 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Connection closed    at 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146)
    at 
org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Connection closed
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203)
    at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211)
    at 
org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at 
org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at 
org.apache.hudi.org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at 
org.apache.hudi.org.apa

[GitHub] [hudi] boneanxs commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-08-26 Thread GitBox



boneanxs commented on code in PR #6046:
URL: https://github.com/apache/hudi/pull/6046#discussion_r955765271


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##
@@ -131,6 +161,53 @@ public abstract HoodieData 
performClusteringWithRecordsRDD(final Ho
final 
Map strategyParams, final Schema schema,
final 
List fileGroupIdList, final boolean preserveHoodieMetadata);
 
+  protected HoodieData performRowWrite(Dataset inputRecords, 
Map parameters) {
+String uuid = UUID.randomUUID().toString();
+parameters.put(HoodieWriteConfig.BULKINSERT_ROW_IDENTIFY_ID.key(), uuid);
+try {
+  inputRecords.write()
+  .format("hudi")
+  .options(JavaConverters.mapAsScalaMapConverter(parameters).asScala())
+  .mode(SaveMode.Append)
+  .save(getWriteConfig().getBasePath());

Review Comment:
   I see, yeah, this is a good improvement, will change it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-26 Thread GitBox



hudi-bot commented on PR #6347:
URL: https://github.com/apache/hudi/pull/6347#issuecomment-1228179181

   
   ## CI report:
   
   * 473a8b74676e345ee91093a3fe9885e062ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10969)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6486: [HUDI-4706] Fix InternalSchemaChangeApplier#applyAddChange error to add nest type

2022-08-26 Thread GitBox



hudi-bot commented on PR #6486:
URL: https://github.com/apache/hudi/pull/6486#issuecomment-1228179560

   
   ## CI report:
   
   * 9d687afca94b7bfcc592c69cfebd73eb846b3b70 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10967)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6500: [HUDI-4720] Fix HoodieInternalRow return wrong num of fields when sou…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6500:
URL: https://github.com/apache/hudi/pull/6500#issuecomment-1228179685

   
   ## CI report:
   
   * 2d75af2a075741142bbfd4b6f50e541661e55bdd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10968)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] danny0405 merged pull request #6508: [HUDI-4723] Add document about Hoodie Catalog

2022-08-26 Thread GitBox



danny0405 merged PR #6508:
URL: https://github.com/apache/hudi/pull/6508


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch asf-site updated: [HUDI-4723] Add document about Hoodie Catalog (#6508)

2022-08-26 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new f025f0b7af [HUDI-4723] Add document about Hoodie Catalog (#6508)
f025f0b7af is described below

commit f025f0b7af5531f478f91fdd7c37f804c507f9b3
Author: Danny Chan 
AuthorDate: Fri Aug 26 16:00:35 2022 +0800

[HUDI-4723] Add document about Hoodie Catalog (#6508)
---
 website/docs/flink-quick-start-guide.md| 13 +-
 website/docs/table_management.md   | 28 +-
 .../version-0.12.0/flink-quick-start-guide.md  | 13 +-
 .../version-0.12.0/table_management.md | 28 +-
 4 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/website/docs/flink-quick-start-guide.md 
b/website/docs/flink-quick-start-guide.md
index 4cf2b1042b..f4b9668178 100644
--- a/website/docs/flink-quick-start-guide.md
+++ b/website/docs/flink-quick-start-guide.md
@@ -24,14 +24,13 @@ quick start tool for SQL users.
 
  Step.1 download Flink jar
 
-Hudi works with both Flink 1.13 and Flink 1.14. You can follow the
+Hudi works with both Flink 1.13, Flink 1.14, Flink 1.15. You can follow the
 instructions [here](https://flink.apache.org/downloads) for setting up Flink. 
Then choose the desired Hudi-Flink bundle
 jar to work with different Flink and Scala versions:
 
-- `hudi-flink1.13-bundle_2.11`
-- `hudi-flink1.13-bundle_2.12`
-- `hudi-flink1.14-bundle_2.11`
-- `hudi-flink1.14-bundle_2.12`
+- `hudi-flink1.13-bundle`
+- `hudi-flink1.14-bundle`
+- `hudi-flink1.15-bundle`
 
  Step.2 start Flink cluster
 Start a standalone Flink cluster within hadoop environment.
@@ -117,8 +116,8 @@ INSERT INTO t1 VALUES
 select * from t1;
 ```
 
-This query provides snapshot querying of the ingested data. 
-Refer to [Table types and queries](/docs/concepts#table-types--queries) for 
more info on all table types and query types supported.
+This statement queries snapshot view of the dataset. 
+Refers to [Table types and queries](/docs/concepts#table-types--queries) for 
more info on all table types and query types supported.
 
 ### Update Data
 
diff --git a/website/docs/table_management.md b/website/docs/table_management.md
index 6099476c31..7dbccd19ed 100644
--- a/website/docs/table_management.md
+++ b/website/docs/table_management.md
@@ -208,7 +208,33 @@ set hoodie.upsert.shuffle.parallelism = 100;
 set hoodie.delete.shuffle.parallelism = 100;
 ```
 
-## Flink 
+## Flink
+
+### Create Catalog
+
+The catalog helps to manage the SQL tables, the table can be shared among CLI 
sessions if the catalog persists the table DDLs.
+For `hms` mode, the catalog also supplements the hive syncing options.
+
+HMS mode catalog SQL demo:
+```sql
+CREATE CATALOG hoodie_catalog
+  WITH (
+'type'='hudi',
+'catalog.path' = '${catalog default root path}',
+'hive.conf.dir' = '${directory where hive-site.xml is located}',
+'mode'='hms' -- supports 'dfs' mode that uses the DFS backend for table 
DDLs persistence
+  );
+```
+
+ Options
+|  Option Name  | Required | Default | Remarks |
+|  ---  | ---  | --- | --- |
+| `catalog.path` | true | -- | Default root path for the catalog, the path is 
used to infer the table path automatically, the default table path: 
`${catalog.path}/${db_name}/${table_name}` |
+| `default-database` | false | default | default database name |
+| `hive.conf.dir` | false | -- | The directory where hive-site.xml is located, 
only valid in `hms` mode |
+| `mode` | false | dfs | Supports `hms` mode that uses HMS to persist the 
table options |
+| `table.external` | false | false | Whether to create the external table, 
only valid in `hms` mode |
+
 ### Create Table
 
 The following is a Flink example to create a table. [Read the Flink Quick 
Start](/docs/flink-quick-start-guide) guide for more examples.
diff --git a/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md 
b/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md
index 4cf2b1042b..4a926aad04 100644
--- a/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md
+++ b/website/versioned_docs/version-0.12.0/flink-quick-start-guide.md
@@ -24,14 +24,13 @@ quick start tool for SQL users.
 
  Step.1 download Flink jar
 
-Hudi works with both Flink 1.13 and Flink 1.14. You can follow the
+Hudi works with both Flink 1.13, Flink 1.14 and Flink 1.15. You can follow the
 instructions [here](https://flink.apache.org/downloads) for setting up Flink. 
Then choose the desired Hudi-Flink bundle
 jar to work with different Flink and Scala versions:
 
-- `hudi-flink1.13-bundle_2.11`
-- `hudi-flink1.13-bundle_2.12`
-- `hudi-flink1.14-bundle_2.11`
-- `hudi-flink1.14-bundle_2.12`
+- `hudi-flink1.13-bundle`
+- `hudi-flink1.14-bundle`
+- `hudi-flink1.15-bundle`
 
  Step.2 start Flink clust

[jira] [Commented] (HUDI-4723) Add document about Hoodie Catalog

2022-08-26 Thread Danny Chen (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585211#comment-17585211
 ] 

Danny Chen commented on HUDI-4723:
--

Fixed via asf-site: f025f0b7af5531f478f91fdd7c37f804c507f9b3

> Add document about Hoodie Catalog
> -
>
> Key: HUDI-4723
> URL: https://issues.apache.org/jira/browse/HUDI-4723
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0, 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HUDI-4723) Add document about Hoodie Catalog

2022-08-26 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-4723.
--

> Add document about Hoodie Catalog
> -
>
> Key: HUDI-4723
> URL: https://issues.apache.org/jira/browse/HUDI-4723
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0, 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-26 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1228184438

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN
   * a0e2f520a7f422bd396b984c3cec2c5653a41743 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10965)
 
   * ee8c930fdd2e713a5d220bd6bccc13cbc41ba6a4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10973)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xicm opened a new issue, #6509: [SUPPORT]

2022-08-26 Thread GitBox



xicm opened a new issue, #6509:
URL: https://github.com/apache/hudi/issues/6509

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I described in https://issues.apache.org/jira/browse/HUDI-3983.
   
   I get a connection closed exception with HBase index. We use relocation in 
spark bundle, when I remove the relocations, the job succeed.
   
   I have been debugging the differences between with relocation and without 
relocation for a long time, but found nothing.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  
   As we use relocation in spark bundle, this conf will cause 
ClassNotFoundException,
   comment the listener class in in 
hudi-common/src/main/resources/hbase-site.xml.
   ```
 
   hbase.status.listener.class
   
org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener
   
 Implementation of the status listener with a multicast message.
   
 
   
   ```
   
   Add this conf we can get the error message quickly.
   ```
   
   hbase.client.retries.number
   0

   
   ```
   
   2.  Add org.apache.hbase.thirdparty:hbase-shaded-gson in  
packaging/hudi-spark-bundle/pom.xml
   3.  write data with hbase index. 
   ```
   df.write.format("org.apache.hudi").
 options(getQuickstartWriteConfigs).
 option(PRECOMBINE_FIELD.key, "ts").
 option(RECORDKEY_FIELD.key, "uuid").
 option(PARTITIONPATH_FIELD.key, "partitionpath").
 option(TBL_NAME.key, tableName).
 option(TABLENAME.key(), tableName).
 option(INDEX_TYPE.key, "HBASE").
 option(ZKQUORUM.key, "${hbase.zookeeper.quorum}").
 option(ZKPORT.key, "2181").
 option(ZK_NODE_PATH.key, "${zooKeeper.znode.parent }").
 option("hoodie.metadata.index.column.stats.enable", "true").
 option("hoodie.embed.timeline.server", "false").
 mode(Overwrite).
 save(tablePath)
   ```
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 
   
   * Spark version : 3.1.1
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.3.0
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: 
Failed after attempts=1, exceptions:
   2022-08-26T07:12:57.603Z, 
RpcRetryingCaller{globalStartTime=2022-08-26T07:12:56.651Z, pause=100, 
maxAttempts=1}, 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Connection closedat 
org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:146)
   at 
org.apache.hudi.org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   Caused by: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Call to address=x.x.x.x/x.x.x.x:16020 failed on local exception: 
org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosedException: 
Connection closed
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:214)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:384)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:415)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:411)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:118)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.Call.setException(Call.java:133)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:203)
   at 
org.apache.hudi.org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:211)
   at 
org.apache.hudi.org.apache.hbase.

[GitHub] [hudi] jsbali commented on pull request #6502: HUDI-4722 Added locking metrics for Hudi

2022-08-26 Thread GitBox



jsbali commented on PR #6502:
URL: https://github.com/apache/hudi/pull/6502#issuecomment-1228203400

   @nsivabalan Can you please review this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] linfey90 commented on a diff in pull request #6456: [HUDI-4674]Change the default value of inputFormat for the MOR table

2022-08-26 Thread GitBox



linfey90 commented on code in PR #6456:
URL: https://github.com/apache/hudi/pull/6456#discussion_r955794992


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/CreateHoodieTableCommand.scala:
##
@@ -120,10 +119,8 @@ object CreateHoodieTableCommand {
 
 val tableType = tableConfig.getTableType.name()
 val inputFormat = tableType match {
-  case DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL =>
+  case DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL | 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL =>

Review Comment:
   year,I began to think that Hive offline tasks are not sensitive to time and 
data timeliness. It is better to use read optimized tables and keep the COW 
InputFormat as the default value. But according to the latest data, reading 
snapshot data is also good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4724) add function of skip the _rt suffix for read snapshot

2022-08-26 Thread linfey.nie (Jira)

linfey.nie created HUDI-4724:


 Summary: add function of skip the _rt suffix for read snapshot
 Key: HUDI-4724
 URL: https://issues.apache.org/jira/browse/HUDI-4724
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: linfey.nie


During Hive query, we usually use the original table name to write SQL. 
Therefore, we need to skip the _rt suffix for read snapshot, the latest data 
for calculation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] linfey90 opened a new pull request, #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



linfey90 opened a new pull request, #6510:
URL: https://github.com/apache/hudi/pull/6510

   ### Change Logs
   
   add function of skip the _rt suffix for read snapshot when hive sync meta.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4724) add function of skip the _rt suffix for read snapshot

2022-08-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4724:
-
Labels: pull-request-available  (was: )

> add function of skip the _rt suffix for read snapshot
> -
>
> Key: HUDI-4724
> URL: https://issues.apache.org/jira/browse/HUDI-4724
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: linfey.nie
>Priority: Major
>  Labels: pull-request-available
>
> During Hive query, we usually use the original table name to write SQL. 
> Therefore, we need to skip the _rt suffix for read snapshot, the latest data 
> for calculation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



hudi-bot commented on PR #6510:
URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228256562

   
   ## CI report:
   
   * 0d5c20a7e3f8a113b278d16a528978aa8428c71a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] liqiquan opened a new issue, #6511: [SUPPORT]

2022-08-26 Thread GitBox



liqiquan opened a new issue, #6511:
URL: https://github.com/apache/hudi/issues/6511

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Using insert_overwrite_table mode, presto reads and returns data from all 
versions of parquet files
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Use the insert_overwrite_table mode to write the hudi table, at least twice
   2. Presto reads the table in step 1. If the catalog is hudi, reading the 
hudi table is normal
   3.Presto reads the table in step 1. If the catalog is hive, the version 
cannot be distinguished when reading the hudi table, and the data of all 
versions of parquet files will be read.
   
   For example, I write twice, each time I write 100 pieces of data. When using 
presto to read, it should read 100 pieces of data of the latest version, but 
actually all 200 pieces of data will be read.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.2.2
   
   * Hive version : 2.7.3
   
   * Hadoop version :3.3.2
   
   * Presto version：0.275
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



hudi-bot commented on PR #6510:
URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228262156

   
   ## CI report:
   
   * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YannByron commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query

2022-08-26 Thread GitBox



YannByron commented on PR #6499:
URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228302173

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6139: [HUDI-4396] Add a boolean parameter to decide whether the partition is cascade or not when hive table columns changes

2022-08-26 Thread GitBox



hudi-bot commented on PR #6139:
URL: https://github.com/apache/hudi/pull/6139#issuecomment-1228320272

   
   ## CI report:
   
   * 41c2a64f85fae05f3794412bb0bf668f5d1adc5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10971)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2022-08-26 Thread GitBox



hudi-bot commented on PR #6384:
URL: https://github.com/apache/hudi/pull/6384#issuecomment-1228320633

   
   ## CI report:
   
   * 37785220f2d17a1a04d136521f10c3a0314fe448 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10970)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query

2022-08-26 Thread GitBox



hudi-bot commented on PR #6499:
URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228320971

   
   ## CI report:
   
   * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228325589

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897)
 
   * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query

2022-08-26 Thread GitBox



hudi-bot commented on PR #6499:
URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228325939

   
   ## CI report:
   
   * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6505: AwsglueSync Turn already exist error into warning

2022-08-26 Thread GitBox



hudi-bot commented on PR #6505:
URL: https://github.com/apache/hudi/pull/6505#issuecomment-1228326014

   
   ## CI report:
   
   * 24c8b543afd26438898efff96c98c81130c9ca54 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10960)
 
   * 7f0d738cfb3460682a3690ee53ecd5d002bdd37e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] parisni commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



parisni commented on code in PR #6510:
URL: https://github.com/apache/hudi/pull/6510#discussion_r955903772


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -799,6 +799,12 @@ private FlinkOptions() {
   .defaultValue(false)
   .withDescription("Skip the _ro suffix for Read optimized table when 
registering, default false");
 
+  public static final ConfigOption HIVE_SYNC_SKIP_RT_SUFFIX = 
ConfigOptions
+  .key("hive_sync.skip_rt_suffix")
+  .booleanType()
+  .defaultValue(false)

Review Comment:
   may you add sinceValue("0.12.1") to track in the doc when to use?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6505: AwsglueSync Turn already exist error into warning

2022-08-26 Thread GitBox



hudi-bot commented on PR #6505:
URL: https://github.com/apache/hudi/pull/6505#issuecomment-1228330772

   
   ## CI report:
   
   * 24c8b543afd26438898efff96c98c81130c9ca54 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10960)
 
   * 7f0d738cfb3460682a3690ee53ecd5d002bdd37e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10976)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] parisni commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



parisni commented on code in PR #6510:
URL: https://github.com/apache/hudi/pull/6510#discussion_r955904736


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfigHolder.java:
##
@@ -77,6 +77,10 @@ public class HiveSyncConfigHolder {
   .key("hoodie.datasource.hive_sync.skip_ro_suffix")
   .defaultValue("false")
   .withDocumentation("Skip the _ro suffix for Read optimized table, when 
registering");
+  public static final ConfigProperty 
HIVE_SKIP_RT_SUFFIX_FOR_READ_SNAPSHOT_TABLE = ConfigProperty
+  .key("hoodie.datasource.hive_sync.skip_rt_suffix")
+  .defaultValue("false")

Review Comment:
   same here (sinceVersion)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-08-26 Thread GitBox



hudi-bot commented on PR #6489:
URL: https://github.com/apache/hudi/pull/6489#issuecomment-1228330635

   
   ## CI report:
   
   * 47680402da599615de30c13a1f22f79f3573ee30 UNKNOWN
   * 5613f14b3d5f1c8aaf8de1730e2f21b78a657150 UNKNOWN
   * ee8c930fdd2e713a5d220bd6bccc13cbc41ba6a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10973)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228330379

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897)
 
   * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] linfey90 commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



linfey90 commented on code in PR #6510:
URL: https://github.com/apache/hudi/pull/6510#discussion_r955923516


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##
@@ -799,6 +799,12 @@ private FlinkOptions() {
   .defaultValue(false)
   .withDescription("Skip the _ro suffix for Read optimized table when 
registering, default false");
 
+  public static final ConfigOption HIVE_SYNC_SKIP_RT_SUFFIX = 
ConfigOptions
+  .key("hive_sync.skip_rt_suffix")
+  .booleanType()
+  .defaultValue(false)

Review Comment:
   Yes, I'd love to, but there's no since method. Any other suggestions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] linfey90 commented on a diff in pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



linfey90 commented on code in PR #6510:
URL: https://github.com/apache/hudi/pull/6510#discussion_r955923699


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfigHolder.java:
##
@@ -77,6 +77,10 @@ public class HiveSyncConfigHolder {
   .key("hoodie.datasource.hive_sync.skip_ro_suffix")
   .defaultValue("false")
   .withDocumentation("Skip the _ro suffix for Read optimized table, when 
registering");
+  public static final ConfigProperty 
HIVE_SKIP_RT_SUFFIX_FOR_READ_SNAPSHOT_TABLE = ConfigProperty
+  .key("hoodie.datasource.hive_sync.skip_rt_suffix")
+  .defaultValue("false")

Review Comment:
   done it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



hudi-bot commented on PR #6510:
URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228382701

   
   ## CI report:
   
   * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974)
 
   * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



hudi-bot commented on PR #6510:
URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228386984

   
   ## CI report:
   
   * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974)
 
   * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10977)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6491: [HUDI-4714] HoodieFlinkWriteClient can't load callback config to Hood…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6491:
URL: https://github.com/apache/hudi/pull/6491#issuecomment-1228394692

   
   ## CI report:
   
   * b5c6e2abaf1ada46e5a17f77934f52e9b5fd61a5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10972)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] awkwardd opened a new pull request, #6512: add inoder commit for multi plan compaction

2022-08-26 Thread GitBox



awkwardd opened a new pull request, #6512:
URL: https://github.com/apache/hudi/pull/6512

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to 
mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4725) Add inorder commit for multi compaction plan

2022-08-26 Thread Jira

谭亚君 created HUDI-4725:
-

 Summary: Add inorder commit for multi compaction plan
 Key: HUDI-4725
 URL: https://issues.apache.org/jira/browse/HUDI-4725
 Project: Apache Hudi
  Issue Type: Improvement
  Components: compaction
Reporter: 谭亚君
Assignee: 谭亚君


when we use the multi plan,we may need the plan to commit inorder,so I try to 
figure that.

 

https://github.com/apache/hudi/pull/6512



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



hudi-bot commented on PR #6510:
URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228438808

   
   ## CI report:
   
   * 0d5c20a7e3f8a113b278d16a528978aa8428c71a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10974)
 
   * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10977)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228443945

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897)
 
   * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975)
 
   * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6512: add inoder commit for multi plan compaction

2022-08-26 Thread GitBox



hudi-bot commented on PR #6512:
URL: https://github.com/apache/hudi/pull/6512#issuecomment-1228444355

   
   ## CI report:
   
   * 9949aa9d4a41bffa79c61bfb2869a7031279e894 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228449102

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * fc88fa16b2fd11583d30ee3aa11e028c2cbf5709 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10897)
 
   * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975)
 
   * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6512: add inoder commit for multi plan compaction

2022-08-26 Thread GitBox



hudi-bot commented on PR #6512:
URL: https://github.com/apache/hudi/pull/6512#issuecomment-1228449452

   
   ## CI report:
   
   * 9949aa9d4a41bffa79c61bfb2869a7031279e894 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10979)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228518501

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * 55b7efe48726b7e39e55a00ae85f0bf5c52c40e1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10975)
 
   * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6510: [HUDI-4724]Add function of skip the _rt suffix for read snapshot

2022-08-26 Thread GitBox



hudi-bot commented on PR #6510:
URL: https://github.com/apache/hudi/pull/6510#issuecomment-1228519040

   
   ## CI report:
   
   * 0a68191e7e6a7b6a08154810ebbf7d6a048c837f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10977)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6505: AwsglueSync Turn already exist error into warning

2022-08-26 Thread GitBox



hudi-bot commented on PR #6505:
URL: https://github.com/apache/hudi/pull/6505#issuecomment-1228605038

   
   ## CI report:
   
   * 7f0d738cfb3460682a3690ee53ecd5d002bdd37e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10976)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6512: add inoder commit for multi plan compaction

2022-08-26 Thread GitBox



hudi-bot commented on PR #6512:
URL: https://github.com/apache/hudi/pull/6512#issuecomment-1228605133

   
   ## CI report:
   
   * 9949aa9d4a41bffa79c61bfb2869a7031279e894 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10979)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan opened a new pull request, #6513: [HUDI-4695] Fix inline compaction flaky test

2022-08-26 Thread GitBox



nsivabalan opened a new pull request, #6513:
URL: https://github.com/apache/hudi/pull/6513

   ### Change Logs
   
   Fixed flaky InlineCompactionTest. Test had some dependency on timer. Have 
bumped up the timer so that its more deterministic. 
   
   ### Impact
   
   Improves CI stability 
   
   **Risk level: medium**
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4695) Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 expected: <4> but was: <5>

2022-08-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4695:
-
Labels: pull-request-available  (was: )

> Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 
> expected: <4> but was: <5>
> --
>
> Key: HUDI-4695
> URL: https://issues.apache.org/jira/browse/HUDI-4695
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10841&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4695) Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 expected: <4> but was: <5>

2022-08-26 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4695:
--
Status: Patch Available  (was: In Progress)

> Flaky: TestInlineCompaction.testCompactionRetryOnFailureBasedOnTime:308 
> expected: <4> but was: <5>
> --
>
> Key: HUDI-4695
> URL: https://issues.apache.org/jira/browse/HUDI-4695
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Raymond Xu
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10841&view=logs&j=600e7de6-e133-5e69-e615-50ee129b3c08&t=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4327) TestHoodieDeltaStreamer#testCleanerDeleteReplacedDataWithArchive is flaky

2022-08-26 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4327:
--
Status: In Progress  (was: Open)

> TestHoodieDeltaStreamer#testCleanerDeleteReplacedDataWithArchive is flaky
> -
>
> Key: HUDI-4327
> URL: https://issues.apache.org/jira/browse/HUDI-4327
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci, timeline-server
>Reporter: Sagar Sumit
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.12.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query

2022-08-26 Thread GitBox



hudi-bot commented on PR #6499:
URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228631575

   
   ## CI report:
   
   * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951)
 
   * c06511ee9e0c8ef0e2973242e6aafd6c0ef4e59a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6513: [HUDI-4695] Fix inline compaction flaky test

2022-08-26 Thread GitBox



hudi-bot commented on PR #6513:
URL: https://github.com/apache/hudi/pull/6513#issuecomment-1228631676

   
   ## CI report:
   
   * 0e51b201bfd85884cbdc1e90f2d794119d0eb66a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on pull request #6416: [Stacked on 6386] Fixing `DebeziumSource` to properly commit consumed offsets

2022-08-26 Thread GitBox



alexeykudinkin commented on PR #6416:
URL: https://github.com/apache/hudi/pull/6416#issuecomment-1228645231

   CI is green
   https://user-images.githubusercontent.com/428277/186941224-298537df-e2f1-4e1f-98ec-ba769dff7177.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6393: [HUDI-4619] Fix The retry mechanism of remotehoodietablefilesystemvie…

2022-08-26 Thread GitBox



hudi-bot commented on PR #6393:
URL: https://github.com/apache/hudi/pull/6393#issuecomment-1228683810

   
   ## CI report:
   
   * 09f49abeeca229df307426ba79bd77ed0392b79f UNKNOWN
   * 0dd2a468fb99ca57ccf6da47dd6baa79b20f7f9d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10978)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6513: [HUDI-4695] Fix inline compaction flaky test

2022-08-26 Thread GitBox



hudi-bot commented on PR #6513:
URL: https://github.com/apache/hudi/pull/6513#issuecomment-1228684267

   
   ## CI report:
   
   * 0e51b201bfd85884cbdc1e90f2d794119d0eb66a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10981)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query

2022-08-26 Thread GitBox



hudi-bot commented on PR #6499:
URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228684161

   
   ## CI report:
   
   * 91e047073b4ff4389bf1e3e4f5ce59342756ebd1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10951)
 
   * c06511ee9e0c8ef0e2973242e6aafd6c0ef4e59a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10980)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-26 Thread HunterHunter (Jira)

HunterHunter created HUDI-4726:
--

 Summary: When using Flink for incremental query, when 
`read.start-commit is out of range`, full table scanning should not be 
performed.
 Key: HUDI-4726
 URL: https://issues.apache.org/jira/browse/HUDI-4726
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: HunterHunter






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table

2022-08-26 Thread GitBox



hudi-bot commented on PR #4676:
URL: https://github.com/apache/hudi/pull/4676#issuecomment-1228748950

   
   ## CI report:
   
   * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN
   * 59eacbed10467905643880e951b9f969a86747b9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9108)
 
   * f590033bff5a7140e68bcbeba2d48f0edcb79685 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6499: [HUDI-4703] use the historical schema to response time travel query

2022-08-26 Thread GitBox



hudi-bot commented on PR #6499:
URL: https://github.com/apache/hudi/pull/6499#issuecomment-1228750506

   
   ## CI report:
   
   * c06511ee9e0c8ef0e2973242e6aafd6c0ef4e59a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10980)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-26 Thread HunterHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterHunter reassigned HUDI-4726:
--

Assignee: HunterHunter

> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterHunter
>Assignee: HunterHunter
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table

2022-08-26 Thread GitBox



hudi-bot commented on PR #4676:
URL: https://github.com/apache/hudi/pull/4676#issuecomment-1228752810

   
   ## CI report:
   
   * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN
   * 59eacbed10467905643880e951b9f969a86747b9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9108)
 
   * f590033bff5a7140e68bcbeba2d48f0edcb79685 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10982)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-4600) Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2022-08-26 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter reassigned HUDI-4600:
---

Assignee: HunterXHunter

> Hive synchronization failure : Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> --
>
> Key: HUDI-4600
> URL: https://issues.apache.org/jira/browse/HUDI-4600
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Blocker
>
>  
> {code:java}
> 10:32:28.039 [pool-9-thread-1] ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler - Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOFatalInternalException: Unexpected exception caught.
>   at 
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>   at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:659)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:70)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsO

[jira] [Updated] (HUDI-3314) support merge into with no-pk condition

2022-08-26 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3314:
-
Epic Link: HUDI-4699

> support merge into with no-pk condition
> ---
>
> Key: HUDI-3314
> URL: https://issues.apache.org/jira/browse/HUDI-3314
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-1885) Support Delete/Update Non-Pk Table

2022-08-26 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1885:
-
Epic Link: HUDI-4699

> Support Delete/Update Non-Pk Table
> --
>
> Key: HUDI-1885
> URL: https://issues.apache.org/jira/browse/HUDI-1885
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark, spark-sql
>Reporter: pengzhiwei
>Assignee: Yann Byron
>Priority: Critical
> Fix For: 0.12.1
>
>
> Allow to delete/update a non-pk table.
> {code:java}
> create table h0 (
>   id int,
>   name string,
>   price double
> ) using hudi;
> delete from h0 where id = 10;
> update h0 set price = 10 where id = 12;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-2681) Make hoodie record_key and preCombine_key optional

2022-08-26 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2681:
-
Component/s: spark-sql
 writer-core
  Epic Link: HUDI-4699

> Make hoodie record_key and preCombine_key optional
> --
>
> Key: HUDI-2681
> URL: https://issues.apache.org/jira/browse/HUDI-2681
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Common Core, spark-sql, writer-core
>Reporter: Vinoth Govindarajan
>Assignee: Yann Byron
>Priority: Major
>
> At present, Hudi needs an record key and preCombine key to create an Hudi 
> datasets, which puts an restriction on the kinds of datasets we can create 
> using Hudi.
>  
> In order to increase the adoption of Hudi file format across all kinds of 
> derived datasets, similar to Parquet/ORC, we need to offer flexibility to 
> users. I understand that record key is used for upsert primitive and we need 
> preCombine key to break the tie and deduplicate, but there are event data and 
> other datasets without any primary key (append only datasets), which can 
> benefit from Hudi since Hudi ecosystem offers other features such as snapshot 
> isolation, indexes, clustering, delta streamer etc., which could be applied 
> to any datasets without record key.
>  
> The idea of this proposal is to make both the record key and preCombine key 
> optional to allow variety of new use cases on top of Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4727) Direct conversion from Proto Message to Row

2022-08-26 Thread Timothy Brown (Jira)

Timothy Brown created HUDI-4727:
---

 Summary: Direct conversion from Proto Message to Row
 Key: HUDI-4727
 URL: https://issues.apache.org/jira/browse/HUDI-4727
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Timothy Brown


The initial implementation for the Proto source converts from Message to Avro 
to Row in the SourceFormatAdapter when the source needs to be read as a 
Dataset. Let's remove the intermediate Avro representation and convert 
directly from Message to Row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-26 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.
{code}
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.

 

  was:
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.

-- 
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.{code}
 

 


> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
>  
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
> be like exp3.
> {code}
> The root of the problem is `IncrementalInputSplits.inputSplits`, because 
> `startCommit` is out of range, `fullTable

[jira] [Updated] (HUDI-4726) When using Flink for incremental query, when `read.start-commit is out of range`, full table scanning should not be performed.

2022-08-26 Thread HunterXHunter (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HunterXHunter updated HUDI-4726:

Description: 
 
{code:java}
-- create
CREATE TABLE hudi_4726(
id string,
msg string,
`partition` STRING,
PRIMARY KEY(id) NOT ENFORCED
)PARTITIONED BY (`partition`)
 WITH (
        'connector' = 'hudi',
        'write.operation'='upsert',
        'path' = 'hudi_4726',
        'index.type' = 'BUCKET',
        'hoodie.bucket.index.num.buckets' = '2', 
       'compaction.delta_commits' = '2', 
       'table.type' = 'MERGE_ON_READ', 
       'compaction.async.enabled'='true')
-- insert 
INSERT INTO hudi_4726 values ('id1','t1','par1')
INSERT INTO hudi_4726 values ('id1','t2','par1')
INSERT INTO hudi_4726 values ('id1','t3','par1')
INSERT INTO hudi_4726 values ('id1','t4','par1')
-- .hoodie
t1.deltacommit  (t1)
t2.deltacommit  (t2)
t3.commit   (t2)
t4.deltacommit  (t3)
t5.deltacommit  (t4)
t6.commit       (t4)

t3.parquet
t6.parquet
-- read

exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
par1])
exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
par1])
exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
par1])
-- but 
'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
be like exp3.

-- 
The root of the problem is `IncrementalInputSplits.inputSplits`, because 
`startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
read is t6..parquet instead of t3.parquet.{code}
 

 

> When using Flink for incremental query, when `read.start-commit is out of 
> range`, full table scanning should not be performed.
> --
>
> Key: HUDI-4726
> URL: https://issues.apache.org/jira/browse/HUDI-4726
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: HunterXHunter
>Assignee: HunterXHunter
>Priority: Major
>
>  
> {code:java}
> -- create
> CREATE TABLE hudi_4726(
> id string,
> msg string,
> `partition` STRING,
> PRIMARY KEY(id) NOT ENFORCED
> )PARTITIONED BY (`partition`)
>  WITH (
>         'connector' = 'hudi',
>         'write.operation'='upsert',
>         'path' = 'hudi_4726',
>         'index.type' = 'BUCKET',
>         'hoodie.bucket.index.num.buckets' = '2', 
>        'compaction.delta_commits' = '2', 
>        'table.type' = 'MERGE_ON_READ', 
>        'compaction.async.enabled'='true')
> -- insert 
> INSERT INTO hudi_4726 values ('id1','t1','par1')
> INSERT INTO hudi_4726 values ('id1','t2','par1')
> INSERT INTO hudi_4726 values ('id1','t3','par1')
> INSERT INTO hudi_4726 values ('id1','t4','par1')
> -- .hoodie
> t1.deltacommit  (t1)
> t2.deltacommit  (t2)
> t3.commit   (t2)
> t4.deltacommit  (t3)
> t5.deltacommit  (t4)
> t6.commit       (t4)
> t3.parquet
> t6.parquet
> -- read
> exp1 : 'read.start-commit'='t1', 'read.end-commit'='t1'  -- (true,+I[id1, t1, 
> par1])
> exp2 : 'read.start-commit'='t1', 'read.end-commit'='t2' -- (true,+I[id1, t2, 
> par1])
> exp3 : 'read.start-commit'='t1', 'read.end-commit'='t3' -- (true,+I[id1, t2, 
> par1])
> -- but 
> 'read.start-commit'='0', 'read.end-commit'='t3' -- (nothing) -- expect should 
> be like exp3.
> -- 
> The root of the problem is `IncrementalInputSplits.inputSplits`, because 
> `startCommit` is out of range, `fullTableScan` is `true`, finally, the file 
> read is t6..parquet instead of t3.parquet.{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] xushiyan commented on pull request #6513: [HUDI-4695] Fix inline compaction flaky test

2022-08-26 Thread GitBox



xushiyan commented on PR #6513:
URL: https://github.com/apache/hudi/pull/6513#issuecomment-1228832977

   can you please separate the cli feature from this pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table

2022-08-26 Thread GitBox



hudi-bot commented on PR #4676:
URL: https://github.com/apache/hudi/pull/4676#issuecomment-1228857265

   
   ## CI report:
   
   * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN
   * f590033bff5a7140e68bcbeba2d48f0edcb79685 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10982)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6017: [HUDI-4276] Reconcile schema-inject null values for missing fields and add new fields

2022-08-26 Thread GitBox



alexeykudinkin commented on code in PR #6017:
URL: https://github.com/apache/hudi/pull/6017#discussion_r956400078


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -162,11 +163,11 @@ object HoodieSparkUtils extends SparkAdapterSupport {
   if (rows.isEmpty) {
 Iterator.empty
   } else {
+val readerAvroSchema = new Schema.Parser().parse(readerAvroSchemaStr)
 val transform: GenericRecord => GenericRecord =
   if (sameSchema) identity
   else {
-val readerAvroSchema = new 
Schema.Parser().parse(readerAvroSchemaStr)
-rewriteRecord(_, readerAvroSchema)

Review Comment:
   BTW, one miss for the new API is that previously `rewriteRecord` was 
validating that the record adheres to the new schema while new method doesn't 
do that (this obscures the issues when conversion is not following Avro 
evolution rules)



##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -162,11 +163,11 @@ object HoodieSparkUtils extends SparkAdapterSupport {
   if (rows.isEmpty) {
 Iterator.empty
   } else {
+val readerAvroSchema = new Schema.Parser().parse(readerAvroSchemaStr)
 val transform: GenericRecord => GenericRecord =
   if (sameSchema) identity
   else {
-val readerAvroSchema = new 
Schema.Parser().parse(readerAvroSchemaStr)
-rewriteRecord(_, readerAvroSchema)

Review Comment:
   @xiarixiaoyao since we're changing this, shall we also revisit all the other 
places that use `rewriteRecord` and consider rebasing them onto the new methods?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (11f85d1efb -> 797e7a67a9)

2022-08-26 Thread sivabalan

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 11f85d1efb Revert "[HUDI-3669] Add a remote request retry mechanism 
for 'Remotehoodietablefiles… (#5884)" (#6501)
 add 797e7a67a9 [Stacked on 6386] Fixing `DebeziumSource` to properly 
commit offsets; (#6416)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/utilities/deltastreamer/DeltaSync.java |  1 +
 .../hudi/utilities/deltastreamer/HoodieDeltaStreamer.java  |  2 +-
 .../apache/hudi/utilities/sources/debezium/DebeziumSource.java |  8 
 .../apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java  | 10 +++---
 4 files changed, 17 insertions(+), 4 deletions(-)

[GitHub] [hudi] nsivabalan merged pull request #6416: [Stacked on 6386] Fixing `DebeziumSource` to properly commit consumed offsets

2022-08-26 Thread GitBox



nsivabalan merged PR #6416:
URL: https://github.com/apache/hudi/pull/6416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6472: [HUDI-4549] Remove Avro shading from hudi-hive-sync-bundle

2022-08-26 Thread GitBox



hudi-bot commented on PR #6472:
URL: https://github.com/apache/hudi/pull/6472#issuecomment-1228883212

   
   ## CI report:
   
   * faecb216bdeb30a459040846bb9a5167556fd605 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10879)
 
   * 02329634ac100d362b6e9fa714faaad3e27298f4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6472: [HUDI-4549] Remove Avro shading from hudi-hive-sync-bundle

2022-08-26 Thread GitBox



hudi-bot commented on PR #6472:
URL: https://github.com/apache/hudi/pull/6472#issuecomment-1228886953

   
   ## CI report:
   
   * faecb216bdeb30a459040846bb9a5167556fd605 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10879)
 
   * 02329634ac100d362b6e9fa714faaad3e27298f4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10984)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-4721) Fix thread safety w/ RemoteTableFileSystemView

2022-08-26 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-4721:
-

Assignee: sivabalan narayanan

> Fix thread safety w/ RemoteTableFileSystemView 
> ---
>
> Key: HUDI-4721
> URL: https://issues.apache.org/jira/browse/HUDI-4721
> Project: Apache Hudi
>  Issue Type: Test
>  Components: reader-core, writer-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> After retry mechanism was added to RemoteTableFileSystemView, looks like the 
> code is not thread safe. 
>  
> [https://github.com/apache/hudi/pull/5884/files#diff-0d301525ef388eb460372ea300c827728c954fdda799adfce7040158ec8b1d84R183|https://github.com/apache/hudi/pull/5884/files#r955363946]
>  
> This might impact regular flows as well even if no retries are enabled. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-4721) Fix thread safety w/ RemoteTableFileSystemView

2022-08-26 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-4721.
-
Resolution: Fixed

> Fix thread safety w/ RemoteTableFileSystemView 
> ---
>
> Key: HUDI-4721
> URL: https://issues.apache.org/jira/browse/HUDI-4721
> Project: Apache Hudi
>  Issue Type: Test
>  Components: reader-core, writer-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> After retry mechanism was added to RemoteTableFileSystemView, looks like the 
> code is not thread safe. 
>  
> [https://github.com/apache/hudi/pull/5884/files#diff-0d301525ef388eb460372ea300c827728c954fdda799adfce7040158ec8b1d84R183|https://github.com/apache/hudi/pull/5884/files#r955363946]
>  
> This might impact regular flows as well even if no retries are enabled. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-4728) Add support to skip larger log blocks with minor log compaction

2022-08-26 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-4728:
-

 Summary: Add support to skip larger log blocks with minor log 
compaction
 Key: HUDI-4728
 URL: https://issues.apache.org/jira/browse/HUDI-4728
 Project: Apache Hudi
  Issue Type: Improvement
  Components: compaction
Reporter: sivabalan narayanan


Is there a size threshold to exclude big log blocks? Why do log compaction on 
log blocks that are big enough? Thoughts

 

Good point. For initial version we want to target all the blocks. In the coming 
iterations I will include block sizes threshold as well. Current logic of 
AbstractHoodieLogRecordReader should be able to handle it as well.

 

For streaming workloads, this might be very heavy. So, we need to support this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6472: [HUDI-4549] Remove Avro shading from hudi-hive-sync-bundle

2022-08-26 Thread GitBox



hudi-bot commented on PR #6472:
URL: https://github.com/apache/hudi/pull/6472#issuecomment-1229002706

   
   ## CI report:
   
   * 02329634ac100d362b6e9fa714faaad3e27298f4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10984)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2022-08-26 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3217:
-
Due Date: 4/Sep/22  (was: 30/Sep/22)

> RFC-46: Optimize Record Payload handling
> 
>
> Key: HUDI-3217
> URL: https://issues.apache.org/jira/browse/HUDI-3217
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: storage-management, writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.13.0
>
>
> Currently Hudi is biased t/w assumption of particular payload representation 
> (Avro), long-term we would like to steer away from this to keep the record 
> payload be completely opaque, so that
>  # We can keep record payload representation engine-specific
>  # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > 
> Binary)
> h2. *Proposal*
>  
> *Phase 2: Revisiting Record Handling*
> {_}T-shirt{_}: 2-2.5 weeks
> {_}Goal{_}: Avoid tight coupling with particular record representation on the 
> Read Path (currently Avro) and enable
>   * Revisit RecordPayload APIs
>  ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs 
> replacing w/ new “opaque” APIs (not returning Avro payloads)
>  ** Rebase RecordPayload hierarchy to be engine-specific:
>  *** Common engine-specific base abstracting common functionality (Spark, 
> Flink, Java)
>  *** Each feature-specific semantic will have to implement for all engines
>  ** Introduce new APIs
>  *** To access keys (record, partition)
>  *** To convert record to Avro (for BWC)
>  * Revisit RecordPayload handling
>  ** In WriteHandles 
>  *** API will be accepting opaque RecordPayload (no Avro conversion)
>  *** Can do (opaque) record merging if necessary
>  *** Passes RP as is to FileWriter
>  ** In FileWriters
>  *** Will accept RecordPayload interface
>  *** Should be engine-specific (to handle internal record representation
>  ** In RecordReaders
>  *** API will be providing opaque RecordPayload (no Avro conversion)
>  
> REF
> [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] yihua commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

2022-08-26 Thread GitBox



yihua commented on issue #6511:
URL: https://github.com/apache/hudi/issues/6511#issuecomment-1229030208

   @liqiquan Did you use `INSERT OVERWRITE TABLE` in Spark SQL to write the 
Hudi table?  How did you create the table?  Is the Hudi table synced to Hive 
for Presto to query?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6509: [SUPPORT] HBase connection closed exception

2022-08-26 Thread GitBox



yihua commented on issue #6509:
URL: https://github.com/apache/hudi/issues/6509#issuecomment-1229038120

   @xicm we have to shade the HBase classes to be compatible with Hive query 
engine which introduces HBase classes as well.  Does changing all relevant 
class names with shading pattern in 
`hudi-common/src/main/resources/hbase-site.xml` work for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6504: [SUPPORT]

2022-08-26 Thread GitBox



yihua commented on issue #6504:
URL: https://github.com/apache/hudi/issues/6504#issuecomment-1229041737

   @santoshraj123 Could you upload the complete Spark driver log?  Do you see 
any error logs before `Commit 20220823151531894 failed and rolled-back !`, 
specifically sth like `Delta Sync found errors when writing. Errors/Total=` and 
`Printing out the top 100 errors`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-4582) Sync 11w partitions to hive by using HiveSyncTool with(--sync-mode="hms" and use-jdbc=false) with timeout

2022-08-26 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4582:
-
Reviewers: Raymond Xu  (was: sivabalan narayanan)

> Sync 11w partitions to hive by using HiveSyncTool with(--sync-mode="hms" and 
> use-jdbc=false) with timeout
> -
>
> Key: HUDI-4582
> URL: https://issues.apache.org/jira/browse/HUDI-4582
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: meta-sync
>Reporter: XixiHua
>Assignee: XixiHua
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> when we try to sync 11w partitions to hive by using 
> HiveSyncTool(--sync-mode="hms" and use-jdbc=false)  with timeout error. 
>  
> With https://issues.apache.org/jira/browse/HUDI-2116, this only solved 
> --sync-mode = jdbc with the parameter: HIVE_BATCH_SYNC_PARTITION_NUM, and I 
> want to extend this to hms mode. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-26 Thread GitBox



hudi-bot commented on PR #6347:
URL: https://github.com/apache/hudi/pull/6347#issuecomment-1229042889

   
   ## CI report:
   
   * 473a8b74676e345ee91093a3fe9885e062ca UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6503: [SUPPORT] Hudi Merge Into with larger volume

2022-08-26 Thread GitBox



yihua commented on issue #6503:
URL: https://github.com/apache/hudi/issues/6503#issuecomment-1229045695

   @maduraitech could you provide your `MERGE INTO` SQL statement, assuming 
you're using Spark SQL?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on pull request #6240: [HUDI-4482] remove guava and use caffeine instead for cache

2022-08-26 Thread GitBox



xushiyan commented on PR #6240:
URL: https://github.com/apache/hudi/pull/6240#issuecomment-1229045983

   hey @KnightChess a gentle reminder: 1) guava dependency cleanup from 
hadoop-mr and spark bundles as shown above. 2) a separate PR to fix the styles 
in integ test module. thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6479: [SUPPORT] How to query the previous SNAPSHOT in Hive

2022-08-26 Thread GitBox



yihua commented on issue #6479:
URL: https://github.com/apache/hudi/issues/6479#issuecomment-1229050279

   @china-shang If I'm not wrong, time travel query is not support for Hive 
query engine.  The incremental query is supported on Hive: 
https://hudi.apache.org/docs/querying_data#incremental-query-1.  You may try 
setting `fromCommitTime=0` and `maxCommits=` to approximate what you need.
   cc @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6469: [SUPPORT] z-order is not working

2022-08-26 Thread GitBox



yihua commented on issue #6469:
URL: https://github.com/apache/hudi/issues/6469#issuecomment-1229053212

   @sangeethsasidharan could you share the Hudi timeline, i.e., file listing 
under `mys3path/.hoodie`?  Is clustering scheduled and executed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-4727) Direct conversion from Proto Message to Row

2022-08-26 Thread Yongkyun Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongkyun Lee reassigned HUDI-4727:
--

Assignee: Yongkyun Lee

> Direct conversion from Proto Message to Row
> ---
>
> Key: HUDI-4727
> URL: https://issues.apache.org/jira/browse/HUDI-4727
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Timothy Brown
>Assignee: Yongkyun Lee
>Priority: Minor
>
> The initial implementation for the Proto source converts from Message to Avro 
> to Row in the SourceFormatAdapter when the source needs to be read as a 
> Dataset. Let's remove the intermediate Avro representation and convert 
> directly from Message to Row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] functicons opened a new issue, #6514: [SUPPORT] Creating table with SparkSQL fails with FileNotFoundException

2022-08-26 Thread GitBox



functicons opened a new issue, #6514:
URL: https://github.com/apache/hudi/issues/6514

   **Describe the problem you faced**
   
   I'm trying to create a new table with SparkSQL in spark-shell:
   
   ```
   spark.sql("""create table test8(id int,name string) using hudi options 
(primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test8'""")
   ```
   
   The error is really confusing to me, why does Hudi expects the path to exist 
in advance?
   
   ```
   java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test8
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1541)
 at 
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
 at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
 at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
 at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
 at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3369)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3368)
 at org.apache.spark.sql.Dataset.(Dataset.scala:194)
 at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
 at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
 ... 45 elided
   ```
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Spark 2.4.8, Scala 2.12, Hudi 2.12:0.11.1
   
   ```
   $ spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.12:0.11.1 
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
   
   scala> spark.sql("""create table test9(id int,name string) using hudi 
options (primaryKey='id', type='cow') LOCATION 'hdfs:///hudi/test9'""")
   ivysettings.xml file not found in HIVE_HOME or 
HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used
   java.io.FileNotFoundException: File does not exist: hdfs:/hudi/test9
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1528)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1521)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1536)
 at 
org.apache.hudi.common.util.TablePathUtils.getTablePath(TablePathUtils.java:50)
 at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:79)
 at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
 at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
 at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
 at org.apache.spark.sql.Dataset.(Dataset.scala:194)
 at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
 at org.apache.spark.sql.Spa

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5958: [HUDI-3900] [UBER] Support log compaction action for MOR tables

2022-08-26 Thread GitBox



nsivabalan commented on code in PR #5958:
URL: https://github.com/apache/hudi/pull/5958#discussion_r956461386


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##
@@ -163,6 +183,21 @@ public class HoodieCompactionConfig extends HoodieConfig {
   + "record size estimate compute dynamically based on commit 
metadata. "
   + " This is critical in computing the insert parallelism and 
bin-packing inserts into small files.");
 
+  public static final ConfigProperty 
ARCHIVE_MERGE_SMALL_FILE_LIMIT_BYTES = ConfigProperty
+  .key("hoodie.archive.merge.small.file.limit.bytes")

Review Comment:
   these are already in HoodieArchivalConfig right? did you move it here or 
added new ones ? 



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -240,8 +245,9 @@ private Pair> 
getFilesToCleanKeepingLatestVersions(
   int keepVersions = config.getCleanerFileVersionsRetained();
   // do not cleanup slice required for pending compaction
   Iterator fileSliceIterator =
-  fileGroup.getAllFileSlices().filter(fs -> 
!isFileSliceNeededForPendingCompaction(fs)).iterator();
-  if (isFileGroupInPendingCompaction(fileGroup)) {
+  fileGroup.getAllFileSlices().filter(fs -> 
!isFileSliceNeededForPendingCompaction(fs)
+  && !isFileSliceNeededForPendingLogCompaction(fs)).iterator();
+  if (isFileGroupInPendingCompaction(fileGroup) || 
isFileGroupInPendingLogCompaction(fileGroup)) {

Review Comment:
   same here.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/HoodieCompactionPlanGenerator.java:
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.compact.plan.generators;
+
+import org.apache.hudi.avro.model.HoodieCompactionOperation;
+import org.apache.hudi.avro.model.HoodieCompactionPlan;
+import org.apache.hudi.common.data.HoodieAccumulator;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.model.CompactionOperation;
+import org.apache.hudi.common.model.HoodieBaseFile;
+import org.apache.hudi.common.model.HoodieFileGroupId;
+import org.apache.hudi.common.model.HoodieLogFile;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.view.SyncableFileSystemView;
+import org.apache.hudi.common.util.CompactionUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+import static java.util.stream.Collectors.toList;
+
+public class HoodieCompactionPlanGenerator extends BaseHoodieCompactionPlanGenerator {
+
+  private static final Logger LOG = 
LogManager.getLogger(HoodieCompactionPlanGenerator.class);
+
+  public HoodieCompactionPlanGenerator(HoodieTable table, HoodieEngineContext 
engineContext, HoodieWriteConfig writeConfig) {
+super(table, engineContext, writeConfig);
+  }
+
+  /**
+   * Generate a new compaction plan for scheduling.
+   * @return Compaction Plan
+   * @throws java.io.IOException when encountering errors
+   */
+  @Override
+  public HoodieCompactionPlan generateCompactionPlan() throws IOException {

Review Comment:
   I assume this is just moved w/o any changes. let me know if you had changed 
anything in these code blocks.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##
@@ -456,6 +491,23 @@ public List close() {
 }
   }
 
+  public void write(Map> 
recordMap) {
+Iterator keyIterator = recordMap.keySet().stream().iterator();

Review Comment:
   can't we iterate the entries only rather than just keys?

[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-26 Thread GitBox



hudi-bot commented on PR #6347:
URL: https://github.com/apache/hudi/pull/6347#issuecomment-1229065946

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 >

1 - 100 of 134 matches

Mail list logo