Re: [PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11256:
URL: https://github.com/apache/hudi/pull/11256#issuecomment-2118658240

   
   ## CI report:
   
   * 89005916c14107710828a1a76d68cfa58e80bf88 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23991)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11256:
URL: https://github.com/apache/hudi/pull/11256#issuecomment-2118644661

   
   ## CI report:
   
   * 89005916c14107710828a1a76d68cfa58e80bf88 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23991)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11255:
URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118642593

   
   ## CI report:
   
   * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11256:
URL: https://github.com/apache/hudi/pull/11256#issuecomment-2118642605

   
   ## CI report:
   
   * 89005916c14107710828a1a76d68cfa58e80bf88 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7778) Duplicate Key exception with RLI

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7778:
-
Labels: pull-request-available  (was: )

> Duplicate Key exception with RLI 
> -
>
> Key: HUDI-7778
> URL: https://issues.apache.org/jira/browse/HUDI-7778
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> We are occasionally hitting an exception as below meaning, two records are 
> ingested to RLI for the same record key from data table. This is not expected 
> to happen. 
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
> appending records to 
> file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476
>  at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
>  at 
> org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439)  
> at 
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
>  at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355)
>   ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: 
> Writing multiple records with same key 1 not supported for 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146)
>   at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121)
>  at 
> org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166)
>   at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467)
>  ... 31 more
> Driver stacktrace:51301 [main] INFO  org.apache.spark.scheduler.DAGScheduler 
> [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 
> [main] INFO  org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline 
> service !!51303 [main] INFO  
> org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline 
> server51303 [main] INFO  org.apache.hudi.timeline.service.TimelineService [] 
> - Closing Timeline Service51321 [main] INFO  
> org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline 
> Service51321 [main] INFO  
> org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline 
> server
> org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
> time 197001012
>   at 
> org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80)
>at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47)
>   at 
> org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98)
> at 
> org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88)
> at 
> org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156)
>   at 
> org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
> at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>  at

[PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]

2024-05-17 Thread via GitHub


nsivabalan opened a new pull request, #11256:
URL: https://github.com/apache/hudi/pull/11256

   ### Change Logs
   
   We occasionally this duplicate keys being ingested to RLI partition in MDT. 
Fixing the root cause in this patch. 
   
   Root cause:
   After fetching record locations from RLI partition in MDT, before doing 
snapshot read to honor payload merge and ordering field, we fetch unique 
Partition and fileId pairs. Instead of fetching unique pair of {Partition, 
fileId}s, we were using 
[HoodieRecordGlobalLocation](https://github.com/apache/hudi/blob/e4b56b090fdcb76416c60bd7ddd4247f0955c152/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java#L298)
 which also contains "instantTime" in addition to partition path and fileId. 
So, this was resulting in 1 record from incoming resulting in 2 to 3 or N 
records after joining because of this. 
   
   I have written tests to reproduce the issue. If not for the fix, we will 
encounter below exception 
   ```
   
   Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
appending records to 
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476
at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439)
at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355)
... 28 more
   Caused by: org.apache.hudi.exception.HoodieException: Writing multiple 
records with same key 1 not supported for 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock
at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146)
at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121)
at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166)
at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467)
... 31 more
   
   Driver stacktrace:
   51301 [main] INFO  org.apache.spark.scheduler.DAGScheduler [] - Job 78 
failed: collect at HoodieJavaRDD.java:177, took 0.245313 s
   51303 [main] INFO  org.apache.hudi.client.BaseHoodieClient [] - Stopping 
Timeline service !!
   51303 [main] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService 
[] - Closing Timeline server
   51303 [main] INFO  org.apache.hudi.timeline.service.TimelineService [] - 
Closing Timeline Service
   51321 [main] INFO  org.apache.hudi.timeline.service.TimelineService [] - 
Closed Timeline Service
   51321 [main] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService 
[] - Closed Timeline server
   
   org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time 197001012
   
at 
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80)
at 
org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47)
at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98)
at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88)
at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156)
at 
org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$of

Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11255:
URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118630713

   
   ## CI report:
   
   * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11255:
URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118628451

   
   ## CI report:
   
   * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7778) Duplicate Key exception with RLI

2024-05-17 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-7778:
-

 Summary: Duplicate Key exception with RLI 
 Key: HUDI-7778
 URL: https://issues.apache.org/jira/browse/HUDI-7778
 Project: Apache Hudi
  Issue Type: Bug
  Components: metadata
Reporter: sivabalan narayanan


We are occasionally hitting an exception as below meaning, two records are 
ingested to RLI for the same record key from data table. This is not expected 
to happen. 

 
{code:java}
Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
appending records to 
file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476
   at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
 at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439) 
 at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
 at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355)
  ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: Writing 
multiple records with same key 1 not supported for 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at 
org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146)
  at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121)
 at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166)
  at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467)
 ... 31 more
Driver stacktrace:51301 [main] INFO  org.apache.spark.scheduler.DAGScheduler [] 
- Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 [main] 
INFO  org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline service 
!!51303 [main] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService [] 
- Closing Timeline server51303 [main] INFO  
org.apache.hudi.timeline.service.TimelineService [] - Closing Timeline 
Service51321 [main] INFO  org.apache.hudi.timeline.service.TimelineService [] - 
Closed Timeline Service51321 [main] INFO  
org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline 
server
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
time 197001012
at 
org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80)
   at 
org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47)
  at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98)
at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88)
at 
org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156) 
 at 
org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
   at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
  at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
   at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
 at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke

[jira] [Assigned] (HUDI-7778) Duplicate Key exception with RLI

2024-05-17 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-7778:
-

Assignee: sivabalan narayanan

> Duplicate Key exception with RLI 
> -
>
> Key: HUDI-7778
> URL: https://issues.apache.org/jira/browse/HUDI-7778
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> We are occasionally hitting an exception as below meaning, two records are 
> ingested to RLI for the same record key from data table. This is not expected 
> to happen. 
>  
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
> appending records to 
> file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476
>  at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475)
>  at 
> org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439)  
> at 
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90)
>  at 
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355)
>   ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: 
> Writing multiple records with same key 1 not supported for 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at 
> org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146)
>   at 
> org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121)
>  at 
> org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166)
>   at 
> org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467)
>  ... 31 more
> Driver stacktrace:51301 [main] INFO  org.apache.spark.scheduler.DAGScheduler 
> [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 
> [main] INFO  org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline 
> service !!51303 [main] INFO  
> org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline 
> server51303 [main] INFO  org.apache.hudi.timeline.service.TimelineService [] 
> - Closing Timeline Service51321 [main] INFO  
> org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline 
> Service51321 [main] INFO  
> org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline 
> server
> org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit 
> time 197001012
>   at 
> org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80)
>at 
> org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47)
>   at 
> org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98)
> at 
> org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88)
> at 
> org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156)
>   at 
> org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
>at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
> at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>  at 
> org.junit.jupiter.engine.execution.

Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11255:
URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118613223

   
   ## CI report:
   
   * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11253:
URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118613207

   
   ## CI report:
   
   * 6d49988d2438be5710fd46e7e41af5008d4054eb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23989)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11255:
URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118602623

   
   ## CI report:
   
   * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11255:
URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118600636

   
   ## CI report:
   
   * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7777) Add function of instantiating HoodieStorage instance to meta client

2024-05-17 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-:
---

 Summary: Add function of instantiating HoodieStorage instance to 
meta client
 Key: HUDI-
 URL: https://issues.apache.org/jira/browse/HUDI-
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7776) Simplify HoodieStorage instance fetching

2024-05-17 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7776:
---

 Summary: Simplify HoodieStorage instance fetching
 Key: HUDI-7776
 URL: https://issues.apache.org/jira/browse/HUDI-7776
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7775:
-
Labels: pull-request-available  (was: )

> Remove unused APIs in HoodieStorage
> ---
>
> Key: HUDI-7775
> URL: https://issues.apache.org/jira/browse/HUDI-7775
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]

2024-05-17 Thread via GitHub


yihua opened a new pull request, #11255:
URL: https://github.com/apache/hudi/pull/11255

   ### Change Logs
   
   As above.
   
   ### Impact
   
   Simplifies `HoodieStorage` APIs.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage

2024-05-17 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7775:

Story Points: 0.5

> Remove unused APIs in HoodieStorage
> ---
>
> Key: HUDI-7775
> URL: https://issues.apache.org/jira/browse/HUDI-7775
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7775) Remove unused APIs in HoodieStorage

2024-05-17 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7775:
---

 Summary: Remove unused APIs in HoodieStorage
 Key: HUDI-7775
 URL: https://issues.apache.org/jira/browse/HUDI-7775
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage

2024-05-17 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7775:

Fix Version/s: 0.15.0
   1.0.0

> Remove unused APIs in HoodieStorage
> ---
>
> Key: HUDI-7775
> URL: https://issues.apache.org/jira/browse/HUDI-7775
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7775) Remove unused APIs in HoodieStorage

2024-05-17 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7775:
---

Assignee: Ethan Guo

> Remove unused APIs in HoodieStorage
> ---
>
> Key: HUDI-7775
> URL: https://issues.apache.org/jira/browse/HUDI-7775
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11253:
URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118545243

   
   ## CI report:
   
   * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988)
 
   * 6d49988d2438be5710fd46e7e41af5008d4054eb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23989)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11253:
URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118542723

   
   ## CI report:
   
   * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988)
 
   * 6d49988d2438be5710fd46e7e41af5008d4054eb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6207) Files pruning for bucket index table pk filtering queries using Spark SQL

2024-05-17 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6207:
-
Sprint: Sprint 2023-04-26

>  Files pruning for bucket index table pk filtering queries using Spark SQL
> --
>
> Key: HUDI-6207
> URL: https://issues.apache.org/jira/browse/HUDI-6207
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
>
> HUDI-6070 already supports files pruning for bucket index table pk filtering 
> queries using Flink SQL. This JIRA would add this improvement to Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6207) Files pruning for bucket index table pk filtering queries using Spark SQL

2024-05-17 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6207:
-
Reviewers: Danny Chen

>  Files pruning for bucket index table pk filtering queries using Spark SQL
> --
>
> Key: HUDI-6207
> URL: https://issues.apache.org/jira/browse/HUDI-6207
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
>
> HUDI-6070 already supports files pruning for bucket index table pk filtering 
> queries using Flink SQL. This JIRA would add this improvement to Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11253:
URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118374697

   
   ## CI report:
   
   * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7774) MercifulJsonConvertor should support Avro logical type

2024-05-17 Thread Davis Zhang (Jira)
Davis Zhang created HUDI-7774:
-

 Summary: MercifulJsonConvertor should support Avro logical type
 Key: HUDI-7774
 URL: https://issues.apache.org/jira/browse/HUDI-7774
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Davis Zhang


MercifulJsonConverter should be able to convert raw json string entries to Avro 
GenericRecord whose format is compliant with the required avro schema.

 

The list of conversion we should support with input:
 * UUID: String
 * Decimal: Number, Number with String representation
 * Date: Either Number / String Number or human readable timestamp in 
DateTimeFormatter.ISO_LOCAL_DATE format
 * Time (milli/micro sec): Number / String Number or human readable timestamp 
in 
DateTimeFormatter.ISO_LOCAL_TIME format
 * Timestamp (milli/micro second): Number / String Number or human readable 
timestamp in DateTimeFormatter.ISO_INSTANT format
 * Local Timestamp (milli/micro second): Number / String Number or human 
readable timestamp in DateTimeFormatter.ISO_LOCAL_DATE_TIME format



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11253:
URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118228216

   
   ## CI report:
   
   * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11253:
URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118219071

   
   ## CI report:
   
   * b035079e68c0392ec6061b31dcbba85f238bc66a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Fails to create a `_ro` table hive when writing table [hudi]

2024-05-17 Thread via GitHub


shubhamn21 opened a new issue, #11254:
URL: https://github.com/apache/hudi/issues/11254

   **Describe the problem you faced**
   
   Unable to write a hudi table to aws hadoop emr setup.
   From the error it seems that it is failing while creating a metadata table 
(with suffix `_ro`) with hive/glue. Am I missing a setting with hive to allow 
it create Null type tables? Are there alternative solutions?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. While writing data: `df.write.format("hudi")\
   .mode('append') \
   .options(**options)\
 .partitionBy("kafka_topic", "event_dt") \
   .saveAsTable('db_name.snimbalkar_test_table')`
   
   
   **Expected behavior**
   
   Creates and stores table.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version : 3.30
   
   * Hive version : 
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : EMRFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Py4JJavaError: An error occurred while calling o1204.saveAsTable.
   : org.apache.hudi.exception.HoodieException: Could not sync using the meta 
sync class org.apache.hudi.hive.HiveSyncTool
at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61)
at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:888)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:886)
at 
org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:826)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:322)
at 
org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:107)
at 
org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:106)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala

[jira] [Created] (HUDI-7773) Allow Users to extend S3/GCS HoodieIncrSource to bring in additional columns from upstream

2024-05-17 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-7773:


 Summary: Allow Users to extend S3/GCS HoodieIncrSource to bring in 
additional columns from upstream
 Key: HUDI-7773
 URL: https://issues.apache.org/jira/browse/HUDI-7773
 Project: Apache Hudi
  Issue Type: Improvement
  Components: deltastreamer
Reporter: Balaji Varadarajan
Assignee: Balaji Varadarajan


Current S3/GCS HoodieIncrSource reads file-paths from upstream tables and 
ingests to downstream tables. We need ability to extend this functionality by 
joining additional columns in the upstream table before writing to the 
downstream table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7761] Changes to make Manifest Writer extendable [hudi]

2024-05-17 Thread via GitHub


csivaguru opened a new pull request, #11253:
URL: https://github.com/apache/hudi/pull/11253

   ### Change Logs
   
   - Change the visibility of private constructor to make it possible to extend 
and pluing custom manifest writer classes.
   - Make fetchLatestFilesForAllPartitions method in ManifestWriter to be 
non-static to avoid sharing multiple local variables with the inherited class.
   - Change the visibility of BigQuerySchemaResolver so that it can be 
instantiaed outside the hudi repository.
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7761) Make the manifest Writer Extendable

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7761:
-
Labels: pull-request-available  (was: )

> Make the manifest Writer Extendable
> ---
>
> Key: HUDI-7761
> URL: https://issues.apache.org/jira/browse/HUDI-7761
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sivaguru Kannan
>Priority: Major
>  Labels: pull-request-available
>
> * Make the manifest writer extendable such that clients can plugin in the 
> custom instance of manifest writer for their syncs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[I] [SUPPORT] What Class Name to use for hoodie.errortable.write.class [hudi]

2024-05-17 Thread via GitHub


soumilshah1995 opened a new issue, #11252:
URL: https://github.com/apache/hudi/issues/11252

   I'm trying out Hudi error tables, but I'm having trouble finding the 
documentation for the hoodie.errortable.write.class value. Could you please 
assist me?
   
   # sample config 
   ```
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
   hoodie.datasource.write.recordkey.field=invoiceid
   hoodie.datasource.write.partitionpath.field=destinationstate
   
hoodie.streamer.source.dfs.root=file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/sampledata/
   hoodie.datasource.write.precombine.field=replicadmstimestamp
   hoodie.streamer.transformer.sql=SELECT * FROM  a where sas
   
hoodie.errortable.base.path=file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/error/
   hoodie.errortable.target.table.name=error_invoice
   hoodie.errortable.enable=true
   hoodie.errortable.write.class=
   
   
   ```
   
   # Job
   ```
   
   spark-submit \
 --class org.apache.hudi.utilities.streamer.HoodieStreamer \
 --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0 \
 --properties-file spark-config.properties \
 --master 'local[*]' \
 --executor-memory 1g \
  
/Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar
 \
 --table-type COPY_ON_WRITE \
 --op UPSERT \
 --transformer-class 
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
 --source-ordering-field replicadmstimestamp \
 --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
 --target-base-path 
file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/
 \
 --target-table invoice \
 --props hudi_tbl.props
   ```
   
   I want to purposely fail the job and I want to see error tables being 
created 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7769) Fix Hudi CDC read with legacy parquet file format on Spark

2024-05-17 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7769:

Summary: Fix Hudi CDC read with legacy parquet file format on Spark  (was: 
Fix Hudi CDC read on Spark 3.3.4 and 3.4.3)

> Fix Hudi CDC read with legacy parquet file format on Spark
> --
>
> Key: HUDI-7769
> URL: https://issues.apache.org/jira/browse/HUDI-7769
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7769) Fix Hudi CDC read with legacy parquet file format on Spark

2024-05-17 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7769:

Fix Version/s: 0.15.0
   1.0.0

> Fix Hudi CDC read with legacy parquet file format on Spark
> --
>
> Key: HUDI-7769
> URL: https://issues.apache.org/jira/browse/HUDI-7769
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch branch-0.x updated: [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store (#11247)

2024-05-17 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch branch-0.x
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/branch-0.x by this push:
 new e0cf1ce147a [MINOR] [BRANCH-0.x] Added condition to check default 
value to fix extracting password from credential store (#11247)
e0cf1ce147a is described below

commit e0cf1ce147a52feba7db766ca73e7221d2be616b
Author: Aditya Goenka <63430370+ad1happy...@users.noreply.github.com>
AuthorDate: Fri May 17 21:18:08 2024 +0530

[MINOR] [BRANCH-0.x] Added condition to check default value to fix 
extracting password from credential store (#11247)
---
 .../src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
index 853dd1ac97c..41657377753 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
@@ -994,7 +994,7 @@ class HoodieSparkSqlWriterInternal {
   properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, 
SPARK_VERSION)
   
properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, 
hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE))
   if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || 
fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) &&
-(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){
+(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase(HiveSyncConfigHolder.HIVE_PASS.defaultValue({
 try {
   val passwd = 
ShimLoader.getHadoopShims.getPassword(spark.sparkContext.hadoopConfiguration, 
HiveConf.ConfVars.METASTOREPWD.varname)
   if (passwd != null && !passwd.isEmpty) {



Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


yihua commented on PR #11247:
URL: https://github.com/apache/hudi/pull/11247#issuecomment-2117886061

   The CI failure is unrelated.  Merging this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


yihua merged PR #11247:
URL: https://github.com/apache/hudi/pull/11247


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [MINOR] Added condition to check default value to fix extracting password from credential store (#11246)

2024-05-17 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e4b56b090fd [MINOR] Added condition to check default value to fix 
extracting password from credential store (#11246)
e4b56b090fd is described below

commit e4b56b090fdcb76416c60bd7ddd4247f0955c152
Author: Aditya Goenka <63430370+ad1happy...@users.noreply.github.com>
AuthorDate: Fri May 17 21:17:07 2024 +0530

[MINOR] Added condition to check default value to fix extracting password 
from credential store (#11246)
---
 .../src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
index e852445283c..3c28b1a2e0a 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
@@ -878,7 +878,7 @@ class HoodieSparkSqlWriterInternal {
   properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, 
SPARK_VERSION)
   
properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, 
hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE))
   if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || 
fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) &&
-(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){
+(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase(HiveSyncConfigHolder.HIVE_PASS.defaultValue({
 try {
   val passwd = 
ShimLoader.getHadoopShims.getPassword(spark.sparkContext.hadoopConfiguration, 
HiveConf.ConfVars.METASTOREPWD.varname)
   if (passwd != null && !passwd.isEmpty) {



Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


yihua merged PR #11246:
URL: https://github.com/apache/hudi/pull/11246


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11246:
URL: https://github.com/apache/hudi/pull/11246#issuecomment-2117792852

   
   ## CI report:
   
   * f965f6a09d5e3d70693061314b035bd93dec687b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23985)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #10191:
URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117790278

   
   ## CI report:
   
   * e3223a6ef0dd865dcbd672cca9f5fb979f80ddc5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23984)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Intermittent stall of S3 PUT request for about 17 minutes [hudi]

2024-05-17 Thread via GitHub


hgudladona commented on issue #11203:
URL: https://github.com/apache/hudi/issues/11203#issuecomment-2117663830

   We are mostly certain this is not due to S3 throttling but a bad socket 
state and its handling in the JDK 11. If you see the debug log you will notice 
that the socket write fails and a retry succeeds, We are tuning some network 
setting on the container to fail fast in this situation and let the retry 
handle the failure, Will keep youposted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG] Spark3.3 overwrite partitioned mor table failed with hudi 0.14.1 [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on issue #10831:
URL: https://github.com/apache/hudi/issues/10831#issuecomment-2117629792

   @Xuehai-Chen Are you good with this? Please let us know in case you still 
faces error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on code in PR #11246:
URL: https://github.com/apache/hudi/pull/11246#discussion_r1604994780


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -887,7 +887,7 @@ class HoodieSparkSqlWriterInternal {
   properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, 
SPARK_VERSION)
   
properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, 
hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE))
   if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || 
fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) &&
-(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){
+(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase("hive"))){

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11247:
URL: https://github.com/apache/hudi/pull/11247#issuecomment-2117580619

   
   ## CI report:
   
   * c25bdceefc761b15f50eec65b47e941e3b676916 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23973)
 
   * 8d4842e47fabc05a0e9ebf63d311bfcb386ed9cf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23986)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11246:
URL: https://github.com/apache/hudi/pull/11246#issuecomment-2117580492

   
   ## CI report:
   
   * 2b979fee4a605e06c01a3a80eab2ae4aa2f4f599 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23972)
 
   * f965f6a09d5e3d70693061314b035bd93dec687b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23985)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #10191:
URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117577092

   
   ## CI report:
   
   * ef29826c5973ac624100b38717c685d3a1059fe2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23976)
 
   * e3223a6ef0dd865dcbd672cca9f5fb979f80ddc5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23984)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11247:
URL: https://github.com/apache/hudi/pull/11247#issuecomment-2117563448

   
   ## CI report:
   
   * c25bdceefc761b15f50eec65b47e941e3b676916 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23973)
 
   * 8d4842e47fabc05a0e9ebf63d311bfcb386ed9cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11246:
URL: https://github.com/apache/hudi/pull/11246#issuecomment-2117563341

   
   ## CI report:
   
   * 2b979fee4a605e06c01a3a80eab2ae4aa2f4f599 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23972)
 
   * f965f6a09d5e3d70693061314b035bd93dec687b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #10191:
URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117560864

   
   ## CI report:
   
   * ef29826c5973ac624100b38717c685d3a1059fe2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23976)
 
   * e3223a6ef0dd865dcbd672cca9f5fb979f80ddc5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-05-17 Thread via GitHub


KnightChess commented on code in PR #10191:
URL: https://github.com/apache/hudi/pull/10191#discussion_r1604930648


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala:
##
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.generic.GenericData
+import org.apache.hadoop.fs.FileStatus
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.table.HoodieTableConfig
+import org.apache.hudi.config.HoodieIndexConfig
+import org.apache.hudi.index.HoodieIndex
+import org.apache.hudi.index.HoodieIndex.IndexType
+import org.apache.hudi.index.bucket.BucketIdentifier
+import org.apache.hudi.keygen.KeyGenerator
+import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory
+import org.apache.spark.sql.catalyst.expressions
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, 
Expression, Literal}
+import org.apache.hudi.common.util.collection.Pair
+import org.apache.spark.sql.types.{DoubleType, FloatType, StructType}
+import org.apache.spark.util.collection.BitSet
+import org.slf4j.LoggerFactory
+
+import scala.collection.{JavaConverters, mutable}
+
+class BucketIndexSupport(metadataConfig: HoodieMetadataConfig, schema: 
StructType) {
+
+  private val log = LoggerFactory.getLogger(getClass)
+
+  private val keyGenerator =
+HoodieSparkKeyGeneratorFactory.createKeyGenerator(metadataConfig.getProps)
+
+  private lazy val avroSchema = 
AvroConversionUtils.convertStructTypeToAvroSchema(schema, "record", "")

Review Comment:
   good catch, fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-05-17 Thread via GitHub


KnightChess commented on PR #10191:
URL: https://github.com/apache/hudi/pull/10191#issuecomment-211750

   @danny0405 yes, is ready for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on code in PR #11247:
URL: https://github.com/apache/hudi/pull/11247#discussion_r1604925546


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -1003,7 +1003,7 @@ class HoodieSparkSqlWriterInternal {
   properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, 
SPARK_VERSION)
   
properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, 
hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE))
   if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || 
fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) &&
-(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){
+(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || 
properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase("hive"))){

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] RLI index slowing down [hudi]

2024-05-17 Thread via GitHub


manishgaurav84 commented on issue #11243:
URL: https://github.com/apache/hudi/issues/11243#issuecomment-2117462211

   @ad1happy2go I have provided the logs on slack message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117439453

   
   ## CI report:
   
   * 3cef36f9284541a6cad8974b2e2e9984673c6627 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23983)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]

2024-05-17 Thread via GitHub


jai20242 commented on issue #11249:
URL: https://github.com/apache/hudi/issues/11249#issuecomment-2117392043

   I tried it adding the configuration using compaction schedule and compaction 
run but it didn't work.
   
   hudi->connect --path /tmp/dep_hudi2
   2024-05-17 13:25:30.737  INFO 21882 --- [   main] 
o.a.h.c.t.HoodieTableMetaClient  : Loading HoodieTableMetaClient from 
/tmp/dep_hudi2
   2024-05-17 13:25:30.906  WARN 21882 --- [   main] 
o.a.h.u.NativeCodeLoader : Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
   2024-05-17 13:25:31.243  WARN 21882 --- [   main] o.a.h.f.FileSystem 
  : Cannot load filesystem: 
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider 
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem could not be instantiated
   2024-05-17 13:25:31.243  WARN 21882 --- [   main] o.a.h.f.FileSystem 
  : java.lang.NoSuchMethodError: 
com.google.common.base.Preconditions.checkState(ZLjava/lang/String;J)V
   2024-05-17 13:25:31.442  INFO 21882 --- [   main] 
o.a.h.c.t.HoodieTableConfig  : Loading table properties from 
/tmp/dep_hudi2/.hoodie/hoodie.properties
   2024-05-17 13:25:31.457  INFO 21882 --- [   main] 
o.a.h.c.t.HoodieTableMetaClient  : Finished Loading Table of type 
MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /tmp/dep_hudi2
   Metadata for table prueba loaded
   hudi:prueba->compaction schedule —hoodieConfigs 
"hoodie.compact.inline.max.delta.commits=1"
   Attempted to schedule compaction for 20240517132533480
   hudi:prueba->compaction run --tableName prueba
   2024-05-17 13:25:40.853  INFO 21882 --- [   main] 
o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : 
Option{val=[20240517082810322__deltacommit__COMPLETED__20240517083132000]}
   NO PENDING COMPACTION TO RUN
   hudi:prueba->compactions show all
   ╔═╤═══╤═══╗
   ║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║
   ╠═╧═══╧═══╣
   ║ (empty) ║
   ╚═╝
   
   hudi:prueba->compaction run --tableName prueba —hoodieConfigs 
"hoodie.compact.inline.max.delta.commits=1"
   2024-05-17 13:26:17.293  INFO 21882 --- [   main] 
o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : 
Option{val=[20240517082810322__deltacommit__COMPLETED__20240517083132000]}
   NO PENDING COMPACTION TO RUN
   hudi:prueba->compaction run --tableName prueba —hoodieConfigs 
"hoodie.compact.inline.max.delta.commits=1"
   2024-05-17 13:26:30.318  INFO 21882 --- [   main] 
o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : 
Option{val=[20240517082810322__deltacommit__COMPLETED__20240517083132000]}
   NO PENDING COMPACTION TO RUN
   hudi:prueba->compactions show all
   ╔═╤═══╤═══╗
   ║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║
   ╠═╧═══╧═══╣
   ║ (empty) ║
   ╚═╝
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117319785

   
   ## CI report:
   
   * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980)
 
   * 3cef36f9284541a6cad8974b2e2e9984673c6627 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23983)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] RLI index slowing down [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on issue #11243:
URL: https://github.com/apache/hudi/issues/11243#issuecomment-2117254871

   @manishgaurav84 Not sure why I couldn't download event logs. Can you ping me 
on slack and provide me there also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on issue #11249:
URL: https://github.com/apache/hudi/issues/11249#issuecomment-2117253103

   @jai20242 That is writer configuration. Hoodie don't save them.  When you do 
compaction from cli. you need to pass there too
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi fails ACID verification test [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on issue #11170:
URL: https://github.com/apache/hudi/issues/11170#issuecomment-2117238692

   Thanks @matthijseikelenboom for the update


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117213033

   
   ## CI report:
   
   * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980)
 
   * 3cef36f9284541a6cad8974b2e2e9984673c6627 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23983)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117194321

   
   ## CI report:
   
   * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980)
 
   * 3cef36f9284541a6cad8974b2e2e9984673c6627 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11031:
URL: https://github.com/apache/hudi/pull/11031#issuecomment-2117193597

   
   ## CI report:
   
   * e159472757b2475611e99dc4afd8fe2def6967f4 UNKNOWN
   * c4a9e9a0debe32518a84877c79c4831740b95caa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23979)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]

2024-05-17 Thread via GitHub


jai20242 commented on issue #11249:
URL: https://github.com/apache/hudi/issues/11249#issuecomment-2117157686

   I put the param hoodie.compact.inline.max.delta.commits to 1 (you can see it 
in the first comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Exceptions with Partition TTL [hudi]

2024-05-17 Thread via GitHub


xicm closed issue #11223: [SUPPORT] Exceptions with Partition TTL
URL: https://github.com/apache/hudi/issues/11223


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7652) Add new MergeKey API to support simple and composite keys

2024-05-17 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7652.
-
Resolution: Done

> Add new MergeKey API to support simple and composite keys
> -
>
> Key: HUDI-7652
> URL: https://issues.apache.org/jira/browse/HUDI-7652
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys (#11077)

2024-05-17 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e0ca6dd0d52 [HUDI-7652] Add new `HoodieMergeKey` API to support simple 
and composite keys (#11077)
e0ca6dd0d52 is described below

commit e0ca6dd0d52c4171d5b4ee83cbc7ef684cc471dc
Author: Sagar Sumit 
AuthorDate: Fri May 17 14:57:29 2024 +0530

[HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite 
keys (#11077)

Introduce a new abstract class `BaseHoodieMergedLogRecordScanner` which
subclasses from `AbstractHoodieLogRecordReader`. The new abstract class
holds the `records` map as `ExternalSpillableMap` and 
exposes
`public abstract Map getRecords()` API. The existing
`HoodieMergedLogRecordScanner` now derives from the new abstract class
(instead of `AbstractHoodieLogRecordReader`) and uses String keys.
---
 .../hudi/client/TestJavaHoodieBackedMetadata.java  |   2 +-
 .../functional/TestHoodieBackedMetadata.java   |   2 +-
 .../functional/TestHoodieBackedTableMetadata.java  |   2 +-
 .../common/model/HoodieMetadataRecordMerger.java   |  43 
 .../hudi/common/model/HoodieRecordMerger.java  |   8 +
 .../table/log/AbstractHoodieLogRecordReader.java   |  25 ++
 .../log/BaseHoodieMergedLogRecordScanner.java  | 260 
 .../table/log/HoodieMergedLogRecordScanner.java| 265 +++-
 .../log/HoodieMetadataMergedLogRecordScanner.java  | 270 +
 .../hudi/metadata/HoodieBackedTableMetadata.java   |   2 +-
 .../metadata/HoodieMetadataLogRecordReader.java|  41 +++-
 .../hudi/metadata/HoodieTableMetadataUtil.java |   3 +-
 .../model/TestHoodieMetadataRecordMerger.java  |  65 +
 .../apache/hudi/functional/TestMORDataSource.scala |   2 +-
 14 files changed, 740 insertions(+), 250 deletions(-)

diff --git 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java
 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java
index ae17a34da3e..c2b85bd70b0 100644
--- 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java
+++ 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java
@@ -946,7 +946,7 @@ public class TestJavaHoodieBackedMetadata extends 
TestHoodieMetadataBase {
 if (enableMetaFields) {
   schema = HoodieAvroUtils.addMetadataFields(schema);
 }
-HoodieMetadataLogRecordReader logRecordReader = 
HoodieMetadataLogRecordReader.newBuilder()
+HoodieMetadataLogRecordReader logRecordReader = 
HoodieMetadataLogRecordReader.newBuilder(FILES.getPartitionPath())
 .withStorage(metadataMetaClient.getStorage())
 .withBasePath(metadataMetaClient.getBasePath())
 .withLogFilePaths(logFilePaths)
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
index 700b9f1cd24..4da78d84980 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
@@ -1413,7 +1413,7 @@ public class TestHoodieBackedMetadata extends 
TestHoodieMetadataBase {
 if (enableMetaFields) {
   schema = HoodieAvroUtils.addMetadataFields(schema);
 }
-HoodieMetadataLogRecordReader logRecordReader = 
HoodieMetadataLogRecordReader.newBuilder()
+HoodieMetadataLogRecordReader logRecordReader = 
HoodieMetadataLogRecordReader.newBuilder(FILES.getPartitionPath())
 .withStorage(metadataMetaClient.getStorage())
 .withBasePath(metadataMetaClient.getBasePath())
 .withLogFilePaths(logFilePaths)
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java
index d3dcf94f641..cec201ee754 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java
@@ -491,7 +491,7 @@ public class TestHoodieBackedTableMetadata extends 
TestHoodieMetadataBase {
*/
   private void verifyMetadataMergedRecords(HoodieTableMetaClient 
metadataMetaClient, List logFilePaths, String latestCommitTimestamp) {
 Schema schema = 
HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema());
-HoodieMetadataLogRecordReader logRecordReader = 
Hoo

Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-17 Thread via GitHub


codope merged PR #11077:
URL: https://github.com/apache/hudi/pull/11077


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]

2024-05-17 Thread via GitHub


zhuanshenbsj1 commented on code in PR #11031:
URL: https://github.com/apache/hudi/pull/11031#discussion_r1604626804


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/SanityChecks.java:
##
@@ -41,23 +42,22 @@
 /**
  * Utilities for HoodieTableFactory sanity check.
  */
-public class SanityChecksUtil {
+public class SanityChecks {
 
-  private static final Logger LOG = 
LoggerFactory.getLogger(SanityChecksUtil.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(SanityChecks.class);
 
   /**
* The sanity check.
-   * If the metaClient is not null, it means that this is a table source 
sanity check and the source table has
-   * already been initialized.
*
-   * @param conf   The table options
-   * @param schema The table schema
-   * @param metaClient  The table meta client
+   * @param conf  The table options
+   * @param schema  The table schema
+   * @param checkMetaData  Weather to check metadata
*/
-  public static void sanitCheck(Configuration conf, ResolvedSchema schema, 
HoodieTableMetaClient metaClient) {
+  public static void sanitCheck(Configuration conf, ResolvedSchema schema, 
Boolean checkMetaData) {
 checkTableType(conf);
 List schemaFields = schema.getColumnNames();
-if (metaClient != null) {
+if (checkMetaData) {
+  HoodieTableMetaClient metaClient = 
StreamerUtil.metaClientForReader(conf, 
HadoopConfigurations.getHadoopConf(conf));
   List latestTablefields = 
StreamerUtil.getLatestTableFields(metaClient);
   if (latestTablefields != null) {

Review Comment:
   I put this logic into function checkRecordKey,  and both the source and sink 
need to be checked.
   
   ```
 public static void checkRecordKey(Configuration conf,List 
existingFields) {
   if (OptionsResolver.isAppendMode(conf)) {
 return;
   }
  
 }
   ```
   And also do this in function checkIndexType.
   
   ```
   public static void checkIndexType(Configuration conf) {
   if (OptionsResolver.isAppendMode(conf)) {
 return;
   }
   
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117105740

   
   ## CI report:
   
   * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-05-17 Thread via GitHub


danny0405 commented on PR #10191:
URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117101176

   @KnightChess Is this patch ready for review again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


danny0405 commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117099130

   Looks reasonable, cc @nsivabalan for another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]

2024-05-17 Thread via GitHub


danny0405 commented on code in PR #11031:
URL: https://github.com/apache/hudi/pull/11031#discussion_r1604603189


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/SanityChecks.java:
##
@@ -41,23 +42,22 @@
 /**
  * Utilities for HoodieTableFactory sanity check.
  */
-public class SanityChecksUtil {
+public class SanityChecks {
 
-  private static final Logger LOG = 
LoggerFactory.getLogger(SanityChecksUtil.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(SanityChecks.class);
 
   /**
* The sanity check.
-   * If the metaClient is not null, it means that this is a table source 
sanity check and the source table has
-   * already been initialized.
*
-   * @param conf   The table options
-   * @param schema The table schema
-   * @param metaClient  The table meta client
+   * @param conf  The table options
+   * @param schema  The table schema
+   * @param checkMetaData  Weather to check metadata
*/
-  public static void sanitCheck(Configuration conf, ResolvedSchema schema, 
HoodieTableMetaClient metaClient) {
+  public static void sanitCheck(Configuration conf, ResolvedSchema schema, 
Boolean checkMetaData) {
 checkTableType(conf);
 List schemaFields = schema.getColumnNames();
-if (metaClient != null) {
+if (checkMetaData) {
+  HoodieTableMetaClient metaClient = 
StreamerUtil.metaClientForReader(conf, 
HadoopConfigurations.getHadoopConf(conf));
   List latestTablefields = 
StreamerUtil.getLatestTableFields(metaClient);
   if (latestTablefields != null) {

Review Comment:
   The logic for sink has been changed with this patch. We have this code for 
the original sink:
   
   ```java
   if (!OptionsResolver.isAppendMode(conf)) {
 checkRecordKey(conf, schema);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11251:
URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117092824

   
   ## CI report:
   
   * d92e58eeaecc8b8835b317b269386fa715ca92e7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (7fc5adad7aa -> d93e4eb9d70)

2024-05-17 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 7fc5adad7aa [HUDI-7717] Disable row writer for bulk insert if 
combining before insert is set (#11216)
 add d93e4eb9d70 [MINOR] Remove legacy code and add try catch to listStatus 
of partition (#11250)

No new revisions were added by this update.

Summary of changes:
 .../hudi/common/table/view/AbstractTableFileSystemView.java   |  4 
 .../apache/hudi/metadata/FileSystemBackedTableMetadata.java   | 11 +--
 2 files changed, 9 insertions(+), 6 deletions(-)



Re: [PR] [MINOR] Remove legacy code and add try catch to listStatus of partition. [hudi]

2024-05-17 Thread via GitHub


danny0405 merged PR #11250:
URL: https://github.com/apache/hudi/pull/11250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7772] HoodieTimelineArchiver##getCommitInstantsToArchive need skip limiting archiving of instants [hudi]

2024-05-17 Thread via GitHub


danny0405 commented on code in PR #11245:
URL: https://github.com/apache/hudi/pull/11245#discussion_r1604598008


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##
@@ -217,6 +217,10 @@ private List getCommitInstantsToArchive() 
throws IOException {
   earliestInstantToRetainCandidates.add(
   
completedCommitsTimeline.findInstantsModifiedAfterByCompletionTime(latestCompactionTime.get()).firstInstant());
 }
+  } catch (UnsupportedOperationException unsupportedOperationException) {
+// If tableMetadata is FileSystemBackedTableMetadata would throw 
UnsupportedOperationException, should skip it to
+// confirm next operation success
+LOG.warn("tableMetadata is FileSystemBackedTableMetadata and skip 
limiting archiving of instants.");

Review Comment:
   The `isMetadataTableAvailable` checks the `hoodie.properties` for `FILES` 
metadata partition instead, it should be empty if the MDT does not initialize 
successfully.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi fails ACID verification test [hudi]

2024-05-17 Thread via GitHub


matthijseikelenboom closed issue #11170: [SUPPORT] Hudi fails ACID verification 
test
URL: https://github.com/apache/hudi/issues/11170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi fails ACID verification test [hudi]

2024-05-17 Thread via GitHub


matthijseikelenboom commented on issue #11170:
URL: https://github.com/apache/hudi/issues/11170#issuecomment-2117016086

   Tested and verified. Closing issues.
   
    More info
   Solution has been tested on:
   - Java 8 ✅
   - Java 11 ✅
   - Java 17 ❌ (As of this moment, Hudi doesn't support this version)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5505:
-
Labels: pull-request-available  (was: )

> Compaction NUM_COMMITS policy should only judge completed deltacommit
> -
>
> Key: HUDI-5505
> URL: https://issues.apache.org/jira/browse/HUDI-5505
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: compaction, table-service
>Reporter: HunterXHunter
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-01-05-13-10-57-918.png
>
>
> `compaction.delta_commits =1`
>  
> {code:java}
> 20230105115229301.deltacommit
> 20230105115229301.deltacommit.inflight
> 20230105115229301.deltacommit.requested
> 20230105115253118.commit
> 20230105115253118.compaction.inflight
> 20230105115253118.compaction.requested
> 20230105115330994.deltacommit.inflight
> 20230105115330994.deltacommit.requested{code}
> The return result of `ScheduleCompactionActionExecutor.needCompact ` is 
> `true`, 
> This should not be expected.
>  
> And In the `Occ` or `lazy clean` mode,this will cause compaction trigger 
> early.
> `compaction.delta_commits =3`
>  
> {code:java}
> 20230105125650541.deltacommit.inflight
> 20230105125650541.deltacommit.requested
> 20230105125715081.deltacommit
> 20230105125715081.deltacommit.inflight
> 20230105125715081.deltacommit.requested
> 20230105130018070.deltacommit.inflight
> 20230105130018070.deltacommit.requested {code}
>  
> And compaction will be trigger, this should not be expected.
> !image-2023-01-05-13-10-57-918.png|width=699,height=158!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]

2024-05-17 Thread via GitHub


a-erofeev opened a new pull request, #11251:
URL: https://github.com/apache/hudi/pull/11251

   …heduleCompactionActionExecutor.getLatestDeltaCommitInfo
   
   ### Change Logs
   
   Fixed incorrect calculation of the number of delta commits when determining 
whether to schedule compaction
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11031:
URL: https://github.com/apache/hudi/pull/11031#issuecomment-2116997198

   
   ## CI report:
   
   * e159472757b2475611e99dc4afd8fe2def6967f4 UNKNOWN
   * 30f50eb580ec3dec52ca87eab5a39ce027910344 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23971)
 
   * c4a9e9a0debe32518a84877c79c4831740b95caa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23979)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]

2024-05-17 Thread via GitHub


hudi-bot commented on PR #11031:
URL: https://github.com/apache/hudi/pull/11031#issuecomment-2116984096

   
   ## CI report:
   
   * e159472757b2475611e99dc4afd8fe2def6967f4 UNKNOWN
   * 30f50eb580ec3dec52ca87eab5a39ce027910344 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23971)
 
   * c4a9e9a0debe32518a84877c79c4831740b95caa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Intermittent stall of S3 PUT request for about 17 minutes [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on issue #11203:
URL: https://github.com/apache/hudi/issues/11203#issuecomment-2116965831

   @gudladona Looks like S3 throttling is happening. Did you checked if you 
have lot of small file groups in your data?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] After upgrading hudi version 0.9.0 -> 0.13.1, it is slower and had mermory issue. [hudi]

2024-05-17 Thread via GitHub


codope closed issue #11241: [SUPPORT] After upgrading hudi version 0.9.0 -> 
0.13.1, it is slower and had mermory issue.
URL: https://github.com/apache/hudi/issues/11241


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]

2024-05-17 Thread via GitHub


ad1happy2go commented on issue #11249:
URL: https://github.com/apache/hudi/issues/11249#issuecomment-2116948074

   @jai20242 If you have only 2 delta commits then there will be nothing to 
compact as default 
`[hoodie.compact.inline.max.delta.commits](https://hudi.apache.org/docs/configurations/#hoodiecompactinlinemaxdeltacommits)`
 will be 5. set this config to 1 if you want to do so
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org