Re: [PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]
hudi-bot commented on PR #11256: URL: https://github.com/apache/hudi/pull/11256#issuecomment-2118658240 ## CI report: * 89005916c14107710828a1a76d68cfa58e80bf88 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23991) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]
hudi-bot commented on PR #11256: URL: https://github.com/apache/hudi/pull/11256#issuecomment-2118644661 ## CI report: * 89005916c14107710828a1a76d68cfa58e80bf88 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23991) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
hudi-bot commented on PR #11255: URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118642593 ## CI report: * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]
hudi-bot commented on PR #11256: URL: https://github.com/apache/hudi/pull/11256#issuecomment-2118642605 ## CI report: * 89005916c14107710828a1a76d68cfa58e80bf88 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7778) Duplicate Key exception with RLI
[ https://issues.apache.org/jira/browse/HUDI-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7778: - Labels: pull-request-available (was: ) > Duplicate Key exception with RLI > - > > Key: HUDI-7778 > URL: https://issues.apache.org/jira/browse/HUDI-7778 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > We are occasionally hitting an exception as below meaning, two records are > ingested to RLI for the same record key from data table. This is not expected > to happen. > > {code:java} > Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while > appending records to > file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476 > at > org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475) > at > org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439) > at > org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355) > ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: > Writing multiple records with same key 1 not supported for > org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at > org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121) > at > org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166) > at > org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467) > ... 31 more > Driver stacktrace:51301 [main] INFO org.apache.spark.scheduler.DAGScheduler > [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 > [main] INFO org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline > service !!51303 [main] INFO > org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline > server51303 [main] INFO org.apache.hudi.timeline.service.TimelineService [] > - Closing Timeline Service51321 [main] INFO > org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline > Service51321 [main] INFO > org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline > server > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 197001012 > at > org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80) >at > org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47) > at > org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98) > at > org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156) > at > org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) >at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) >at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at
[PR] [HUDI-7778] Fixing global index for duplicate updates [hudi]
nsivabalan opened a new pull request, #11256: URL: https://github.com/apache/hudi/pull/11256 ### Change Logs We occasionally this duplicate keys being ingested to RLI partition in MDT. Fixing the root cause in this patch. Root cause: After fetching record locations from RLI partition in MDT, before doing snapshot read to honor payload merge and ordering field, we fetch unique Partition and fileId pairs. Instead of fetching unique pair of {Partition, fileId}s, we were using [HoodieRecordGlobalLocation](https://github.com/apache/hudi/blob/e4b56b090fdcb76416c60bd7ddd4247f0955c152/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java#L298) which also contains "instantTime" in addition to partition path and fileId. So, this was resulting in 1 record from incoming resulting in 2 to 3 or N records after joining because of this. I have written tests to reproduce the issue. If not for the fix, we will encounter below exception ``` Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: Writing multiple records with same key 1 not supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146) at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121) at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166) at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467) ... 31 more Driver stacktrace: 51301 [main] INFO org.apache.spark.scheduler.DAGScheduler [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s 51303 [main] INFO org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline service !! 51303 [main] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline server 51303 [main] INFO org.apache.hudi.timeline.service.TimelineService [] - Closing Timeline Service 51321 [main] INFO org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline Service 51321 [main] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline server org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 197001012 at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80) at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156) at org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$of
Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
hudi-bot commented on PR #11255: URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118630713 ## CI report: * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
hudi-bot commented on PR #11255: URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118628451 ## CI report: * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7778) Duplicate Key exception with RLI
sivabalan narayanan created HUDI-7778: - Summary: Duplicate Key exception with RLI Key: HUDI-7778 URL: https://issues.apache.org/jira/browse/HUDI-7778 Project: Apache Hudi Issue Type: Bug Components: metadata Reporter: sivabalan narayanan We are occasionally hitting an exception as below meaning, two records are ingested to RLI for the same record key from data table. This is not expected to happen. {code:java} Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355) ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: Writing multiple records with same key 1 not supported for org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146) at org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121) at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166) at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467) ... 31 more Driver stacktrace:51301 [main] INFO org.apache.spark.scheduler.DAGScheduler [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 [main] INFO org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline service !!51303 [main] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline server51303 [main] INFO org.apache.hudi.timeline.service.TimelineService [] - Closing Timeline Service51321 [main] INFO org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline Service51321 [main] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline server org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 197001012 at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80) at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156) at org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke
[jira] [Assigned] (HUDI-7778) Duplicate Key exception with RLI
[ https://issues.apache.org/jira/browse/HUDI-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-7778: - Assignee: sivabalan narayanan > Duplicate Key exception with RLI > - > > Key: HUDI-7778 > URL: https://issues.apache.org/jira/browse/HUDI-7778 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > > We are occasionally hitting an exception as below meaning, two records are > ingested to RLI for the same record key from data table. This is not expected > to happen. > > {code:java} > Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while > appending records to > file:/var/folders/ym/8yjkm3n90kq8tk4gfmvk7y14gn/T/junit2792173348364470678/.hoodie/metadata/record_index/.record-index-0009-0_00011.log.3_3-275-476 > at > org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:475) > at > org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:439) > at > org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:90) > at > org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:355) > ... 28 moreCaused by: org.apache.hudi.exception.HoodieException: > Writing multiple records with same key 1 not supported for > org.apache.hudi.common.table.log.block.HoodieHFileDataBlock at > org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.serializeRecords(HoodieHFileDataBlock.java:146) > at > org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:121) > at > org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:166) > at > org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:467) > ... 31 more > Driver stacktrace:51301 [main] INFO org.apache.spark.scheduler.DAGScheduler > [] - Job 78 failed: collect at HoodieJavaRDD.java:177, took 0.245313 s51303 > [main] INFO org.apache.hudi.client.BaseHoodieClient [] - Stopping Timeline > service !!51303 [main] INFO > org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closing Timeline > server51303 [main] INFO org.apache.hudi.timeline.service.TimelineService [] > - Closing Timeline Service51321 [main] INFO > org.apache.hudi.timeline.service.TimelineService [] - Closed Timeline > Service51321 [main] INFO > org.apache.hudi.client.embedded.EmbeddedTimelineService [] - Closed Timeline > server > org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit > time 197001012 > at > org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:80) >at > org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:47) > at > org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:98) > at > org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:88) > at > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:156) > at > org.apache.hudi.functional.TestGlobalIndexEnableUpdatePartitions.testUdpateSubsetOfRecUpdates(TestGlobalIndexEnableUpdatePartitions.java:225) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) >at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) >at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) > at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > at > org.junit.jupiter.engine.execution.
Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
hudi-bot commented on PR #11255: URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118613223 ## CI report: * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]
hudi-bot commented on PR #11253: URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118613207 ## CI report: * 6d49988d2438be5710fd46e7e41af5008d4054eb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23989) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
hudi-bot commented on PR #11255: URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118602623 ## CI report: * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23990) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
hudi-bot commented on PR #11255: URL: https://github.com/apache/hudi/pull/11255#issuecomment-2118600636 ## CI report: * 3b2ee376708bc3e71e9b310ad4f862a26c4da627 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7777) Add function of instantiating HoodieStorage instance to meta client
Ethan Guo created HUDI-: --- Summary: Add function of instantiating HoodieStorage instance to meta client Key: HUDI- URL: https://issues.apache.org/jira/browse/HUDI- Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7776) Simplify HoodieStorage instance fetching
Ethan Guo created HUDI-7776: --- Summary: Simplify HoodieStorage instance fetching Key: HUDI-7776 URL: https://issues.apache.org/jira/browse/HUDI-7776 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage
[ https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7775: - Labels: pull-request-available (was: ) > Remove unused APIs in HoodieStorage > --- > > Key: HUDI-7775 > URL: https://issues.apache.org/jira/browse/HUDI-7775 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7775] Remove unused APIs in HoodieStorage [hudi]
yihua opened a new pull request, #11255: URL: https://github.com/apache/hudi/pull/11255 ### Change Logs As above. ### Impact Simplifies `HoodieStorage` APIs. ### Risk level none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage
[ https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7775: Story Points: 0.5 > Remove unused APIs in HoodieStorage > --- > > Key: HUDI-7775 > URL: https://issues.apache.org/jira/browse/HUDI-7775 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7775) Remove unused APIs in HoodieStorage
Ethan Guo created HUDI-7775: --- Summary: Remove unused APIs in HoodieStorage Key: HUDI-7775 URL: https://issues.apache.org/jira/browse/HUDI-7775 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7775) Remove unused APIs in HoodieStorage
[ https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7775: Fix Version/s: 0.15.0 1.0.0 > Remove unused APIs in HoodieStorage > --- > > Key: HUDI-7775 > URL: https://issues.apache.org/jira/browse/HUDI-7775 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7775) Remove unused APIs in HoodieStorage
[ https://issues.apache.org/jira/browse/HUDI-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7775: --- Assignee: Ethan Guo > Remove unused APIs in HoodieStorage > --- > > Key: HUDI-7775 > URL: https://issues.apache.org/jira/browse/HUDI-7775 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]
hudi-bot commented on PR #11253: URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118545243 ## CI report: * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988) * 6d49988d2438be5710fd46e7e41af5008d4054eb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23989) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]
hudi-bot commented on PR #11253: URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118542723 ## CI report: * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988) * 6d49988d2438be5710fd46e7e41af5008d4054eb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6207) Files pruning for bucket index table pk filtering queries using Spark SQL
[ https://issues.apache.org/jira/browse/HUDI-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6207: - Sprint: Sprint 2023-04-26 > Files pruning for bucket index table pk filtering queries using Spark SQL > -- > > Key: HUDI-6207 > URL: https://issues.apache.org/jira/browse/HUDI-6207 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > Labels: pull-request-available > > HUDI-6070 already supports files pruning for bucket index table pk filtering > queries using Flink SQL. This JIRA would add this improvement to Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6207) Files pruning for bucket index table pk filtering queries using Spark SQL
[ https://issues.apache.org/jira/browse/HUDI-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6207: - Reviewers: Danny Chen > Files pruning for bucket index table pk filtering queries using Spark SQL > -- > > Key: HUDI-6207 > URL: https://issues.apache.org/jira/browse/HUDI-6207 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > Labels: pull-request-available > > HUDI-6070 already supports files pruning for bucket index table pk filtering > queries using Flink SQL. This JIRA would add this improvement to Spark. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]
hudi-bot commented on PR #11253: URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118374697 ## CI report: * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7774) MercifulJsonConvertor should support Avro logical type
Davis Zhang created HUDI-7774: - Summary: MercifulJsonConvertor should support Avro logical type Key: HUDI-7774 URL: https://issues.apache.org/jira/browse/HUDI-7774 Project: Apache Hudi Issue Type: Improvement Reporter: Davis Zhang MercifulJsonConverter should be able to convert raw json string entries to Avro GenericRecord whose format is compliant with the required avro schema. The list of conversion we should support with input: * UUID: String * Decimal: Number, Number with String representation * Date: Either Number / String Number or human readable timestamp in DateTimeFormatter.ISO_LOCAL_DATE format * Time (milli/micro sec): Number / String Number or human readable timestamp in DateTimeFormatter.ISO_LOCAL_TIME format * Timestamp (milli/micro second): Number / String Number or human readable timestamp in DateTimeFormatter.ISO_INSTANT format * Local Timestamp (milli/micro second): Number / String Number or human readable timestamp in DateTimeFormatter.ISO_LOCAL_DATE_TIME format -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]
hudi-bot commented on PR #11253: URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118228216 ## CI report: * b035079e68c0392ec6061b31dcbba85f238bc66a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23988) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7761] Make the ManifestWriter Extendable [hudi]
hudi-bot commented on PR #11253: URL: https://github.com/apache/hudi/pull/11253#issuecomment-2118219071 ## CI report: * b035079e68c0392ec6061b31dcbba85f238bc66a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Fails to create a `_ro` table hive when writing table [hudi]
shubhamn21 opened a new issue, #11254: URL: https://github.com/apache/hudi/issues/11254 **Describe the problem you faced** Unable to write a hudi table to aws hadoop emr setup. From the error it seems that it is failing while creating a metadata table (with suffix `_ro`) with hive/glue. Am I missing a setting with hive to allow it create Null type tables? Are there alternative solutions? **To Reproduce** Steps to reproduce the behavior: 1. While writing data: `df.write.format("hudi")\ .mode('append') \ .options(**options)\ .partitionBy("kafka_topic", "event_dt") \ .saveAsTable('db_name.snimbalkar_test_table')` **Expected behavior** Creates and stores table. **Environment Description** * Hudi version : 0.13.1 * Spark version : 3.30 * Hive version : * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : EMRFS * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```Py4JJavaError: An error occurred while calling o1204.saveAsTable. : org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:61) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:888) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:886) at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:826) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:322) at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:107) at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:106) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:103) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:114) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$7(SQLExecution.scala:139) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:224) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:139) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:245) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:138) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:100) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:96) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:615) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:177) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:615) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala
[jira] [Created] (HUDI-7773) Allow Users to extend S3/GCS HoodieIncrSource to bring in additional columns from upstream
Balaji Varadarajan created HUDI-7773: Summary: Allow Users to extend S3/GCS HoodieIncrSource to bring in additional columns from upstream Key: HUDI-7773 URL: https://issues.apache.org/jira/browse/HUDI-7773 Project: Apache Hudi Issue Type: Improvement Components: deltastreamer Reporter: Balaji Varadarajan Assignee: Balaji Varadarajan Current S3/GCS HoodieIncrSource reads file-paths from upstream tables and ingests to downstream tables. We need ability to extend this functionality by joining additional columns in the upstream table before writing to the downstream table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7761] Changes to make Manifest Writer extendable [hudi]
csivaguru opened a new pull request, #11253: URL: https://github.com/apache/hudi/pull/11253 ### Change Logs - Change the visibility of private constructor to make it possible to extend and pluing custom manifest writer classes. - Make fetchLatestFilesForAllPartitions method in ManifestWriter to be non-static to avoid sharing multiple local variables with the inherited class. - Change the visibility of BigQuerySchemaResolver so that it can be instantiaed outside the hudi repository. ### Impact None ### Risk level (write none, low medium or high below) low ### Documentation Update ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7761) Make the manifest Writer Extendable
[ https://issues.apache.org/jira/browse/HUDI-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7761: - Labels: pull-request-available (was: ) > Make the manifest Writer Extendable > --- > > Key: HUDI-7761 > URL: https://issues.apache.org/jira/browse/HUDI-7761 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sivaguru Kannan >Priority: Major > Labels: pull-request-available > > * Make the manifest writer extendable such that clients can plugin in the > custom instance of manifest writer for their syncs -- This message was sent by Atlassian Jira (v8.20.10#820010)
[I] [SUPPORT] What Class Name to use for hoodie.errortable.write.class [hudi]
soumilshah1995 opened a new issue, #11252: URL: https://github.com/apache/hudi/issues/11252 I'm trying out Hudi error tables, but I'm having trouble finding the documentation for the hoodie.errortable.write.class value. Could you please assist me? # sample config ``` hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator hoodie.datasource.write.recordkey.field=invoiceid hoodie.datasource.write.partitionpath.field=destinationstate hoodie.streamer.source.dfs.root=file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/sampledata/ hoodie.datasource.write.precombine.field=replicadmstimestamp hoodie.streamer.transformer.sql=SELECT * FROM a where sas hoodie.errortable.base.path=file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/error/ hoodie.errortable.target.table.name=error_invoice hoodie.errortable.enable=true hoodie.errortable.write.class= ``` # Job ``` spark-submit \ --class org.apache.hudi.utilities.streamer.HoodieStreamer \ --packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0 \ --properties-file spark-config.properties \ --master 'local[*]' \ --executor-memory 1g \ /Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \ --table-type COPY_ON_WRITE \ --op UPSERT \ --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ --source-ordering-field replicadmstimestamp \ --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ --target-base-path file:///Users/soumilshah/IdeaProjects/SparkProject/apache-hudi-delta-streamer-labs/E1/silver/ \ --target-table invoice \ --props hudi_tbl.props ``` I want to purposely fail the job and I want to see error tables being created -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7769) Fix Hudi CDC read with legacy parquet file format on Spark
[ https://issues.apache.org/jira/browse/HUDI-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7769: Summary: Fix Hudi CDC read with legacy parquet file format on Spark (was: Fix Hudi CDC read on Spark 3.3.4 and 3.4.3) > Fix Hudi CDC read with legacy parquet file format on Spark > -- > > Key: HUDI-7769 > URL: https://issues.apache.org/jira/browse/HUDI-7769 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7769) Fix Hudi CDC read with legacy parquet file format on Spark
[ https://issues.apache.org/jira/browse/HUDI-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7769: Fix Version/s: 0.15.0 1.0.0 > Fix Hudi CDC read with legacy parquet file format on Spark > -- > > Key: HUDI-7769 > URL: https://issues.apache.org/jira/browse/HUDI-7769 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch branch-0.x updated: [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store (#11247)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch branch-0.x in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/branch-0.x by this push: new e0cf1ce147a [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store (#11247) e0cf1ce147a is described below commit e0cf1ce147a52feba7db766ca73e7221d2be616b Author: Aditya Goenka <63430370+ad1happy...@users.noreply.github.com> AuthorDate: Fri May 17 21:18:08 2024 +0530 [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store (#11247) --- .../src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala index 853dd1ac97c..41657377753 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala @@ -994,7 +994,7 @@ class HoodieSparkSqlWriterInternal { properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, SPARK_VERSION) properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE)) if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) && -(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){ +(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase(HiveSyncConfigHolder.HIVE_PASS.defaultValue({ try { val passwd = ShimLoader.getHadoopShims.getPassword(spark.sparkContext.hadoopConfiguration, HiveConf.ConfVars.METASTOREPWD.varname) if (passwd != null && !passwd.isEmpty) {
Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]
yihua commented on PR #11247: URL: https://github.com/apache/hudi/pull/11247#issuecomment-2117886061 The CI failure is unrelated. Merging this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]
yihua merged PR #11247: URL: https://github.com/apache/hudi/pull/11247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [MINOR] Added condition to check default value to fix extracting password from credential store (#11246)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e4b56b090fd [MINOR] Added condition to check default value to fix extracting password from credential store (#11246) e4b56b090fd is described below commit e4b56b090fdcb76416c60bd7ddd4247f0955c152 Author: Aditya Goenka <63430370+ad1happy...@users.noreply.github.com> AuthorDate: Fri May 17 21:17:07 2024 +0530 [MINOR] Added condition to check default value to fix extracting password from credential store (#11246) --- .../src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala index e852445283c..3c28b1a2e0a 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala @@ -878,7 +878,7 @@ class HoodieSparkSqlWriterInternal { properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, SPARK_VERSION) properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE)) if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) && -(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){ +(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase(HiveSyncConfigHolder.HIVE_PASS.defaultValue({ try { val passwd = ShimLoader.getHadoopShims.getPassword(spark.sparkContext.hadoopConfiguration, HiveConf.ConfVars.METASTOREPWD.varname) if (passwd != null && !passwd.isEmpty) {
Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]
yihua merged PR #11246: URL: https://github.com/apache/hudi/pull/11246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]
hudi-bot commented on PR #11246: URL: https://github.com/apache/hudi/pull/11246#issuecomment-2117792852 ## CI report: * f965f6a09d5e3d70693061314b035bd93dec687b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23985) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
hudi-bot commented on PR #10191: URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117790278 ## CI report: * e3223a6ef0dd865dcbd672cca9f5fb979f80ddc5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23984) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Intermittent stall of S3 PUT request for about 17 minutes [hudi]
hgudladona commented on issue #11203: URL: https://github.com/apache/hudi/issues/11203#issuecomment-2117663830 We are mostly certain this is not due to S3 throttling but a bad socket state and its handling in the JDK 11. If you see the debug log you will notice that the socket write fails and a retry succeeds, We are tuning some network setting on the container to fail fast in this situation and let the retry handle the failure, Will keep youposted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [BUG] Spark3.3 overwrite partitioned mor table failed with hudi 0.14.1 [hudi]
ad1happy2go commented on issue #10831: URL: https://github.com/apache/hudi/issues/10831#issuecomment-2117629792 @Xuehai-Chen Are you good with this? Please let us know in case you still faces error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]
ad1happy2go commented on code in PR #11246: URL: https://github.com/apache/hudi/pull/11246#discussion_r1604994780 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -887,7 +887,7 @@ class HoodieSparkSqlWriterInternal { properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, SPARK_VERSION) properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE)) if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) && -(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){ +(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase("hive"))){ Review Comment: Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]
hudi-bot commented on PR #11247: URL: https://github.com/apache/hudi/pull/11247#issuecomment-2117580619 ## CI report: * c25bdceefc761b15f50eec65b47e941e3b676916 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23973) * 8d4842e47fabc05a0e9ebf63d311bfcb386ed9cf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23986) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]
hudi-bot commented on PR #11246: URL: https://github.com/apache/hudi/pull/11246#issuecomment-2117580492 ## CI report: * 2b979fee4a605e06c01a3a80eab2ae4aa2f4f599 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23972) * f965f6a09d5e3d70693061314b035bd93dec687b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23985) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
hudi-bot commented on PR #10191: URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117577092 ## CI report: * ef29826c5973ac624100b38717c685d3a1059fe2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23976) * e3223a6ef0dd865dcbd672cca9f5fb979f80ddc5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23984) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]
hudi-bot commented on PR #11247: URL: https://github.com/apache/hudi/pull/11247#issuecomment-2117563448 ## CI report: * c25bdceefc761b15f50eec65b47e941e3b676916 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23973) * 8d4842e47fabc05a0e9ebf63d311bfcb386ed9cf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added condition to check default value to fix extracting password from credential store [hudi]
hudi-bot commented on PR #11246: URL: https://github.com/apache/hudi/pull/11246#issuecomment-2117563341 ## CI report: * 2b979fee4a605e06c01a3a80eab2ae4aa2f4f599 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23972) * f965f6a09d5e3d70693061314b035bd93dec687b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
hudi-bot commented on PR #10191: URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117560864 ## CI report: * ef29826c5973ac624100b38717c685d3a1059fe2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23976) * e3223a6ef0dd865dcbd672cca9f5fb979f80ddc5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
KnightChess commented on code in PR #10191: URL: https://github.com/apache/hudi/pull/10191#discussion_r1604930648 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BucketIndexSupport.scala: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi + +import org.apache.avro.generic.GenericData +import org.apache.hadoop.fs.FileStatus +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.table.HoodieTableConfig +import org.apache.hudi.config.HoodieIndexConfig +import org.apache.hudi.index.HoodieIndex +import org.apache.hudi.index.HoodieIndex.IndexType +import org.apache.hudi.index.bucket.BucketIdentifier +import org.apache.hudi.keygen.KeyGenerator +import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory +import org.apache.spark.sql.catalyst.expressions +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, EmptyRow, Expression, Literal} +import org.apache.hudi.common.util.collection.Pair +import org.apache.spark.sql.types.{DoubleType, FloatType, StructType} +import org.apache.spark.util.collection.BitSet +import org.slf4j.LoggerFactory + +import scala.collection.{JavaConverters, mutable} + +class BucketIndexSupport(metadataConfig: HoodieMetadataConfig, schema: StructType) { + + private val log = LoggerFactory.getLogger(getClass) + + private val keyGenerator = +HoodieSparkKeyGeneratorFactory.createKeyGenerator(metadataConfig.getProps) + + private lazy val avroSchema = AvroConversionUtils.convertStructTypeToAvroSchema(schema, "record", "") Review Comment: good catch, fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
KnightChess commented on PR #10191: URL: https://github.com/apache/hudi/pull/10191#issuecomment-211750 @danny0405 yes, is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] [BRANCH-0.x] Added condition to check default value to fix extracting password from credential store [hudi]
ad1happy2go commented on code in PR #11247: URL: https://github.com/apache/hudi/pull/11247#discussion_r1604925546 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -1003,7 +1003,7 @@ class HoodieSparkSqlWriterInternal { properties.put(HoodieSyncConfig.META_SYNC_SPARK_VERSION.key, SPARK_VERSION) properties.put(HoodieSyncConfig.META_SYNC_USE_FILE_LISTING_FROM_METADATA.key, hoodieConfig.getBoolean(HoodieMetadataConfig.ENABLE)) if ((fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname) == null || fs.getConf.get(HiveConf.ConfVars.METASTOREPWD.varname).isEmpty) && -(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty)){ +(properties.get(HiveSyncConfigHolder.HIVE_PASS.key()) == null || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.isEmpty || properties.get(HiveSyncConfigHolder.HIVE_PASS.key()).toString.equalsIgnoreCase("hive"))){ Review Comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] RLI index slowing down [hudi]
manishgaurav84 commented on issue #11243: URL: https://github.com/apache/hudi/issues/11243#issuecomment-2117462211 @ad1happy2go I have provided the logs on slack message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
hudi-bot commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117439453 ## CI report: * 3cef36f9284541a6cad8974b2e2e9984673c6627 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23983) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]
jai20242 commented on issue #11249: URL: https://github.com/apache/hudi/issues/11249#issuecomment-2117392043 I tried it adding the configuration using compaction schedule and compaction run but it didn't work. hudi->connect --path /tmp/dep_hudi2 2024-05-17 13:25:30.737 INFO 21882 --- [ main] o.a.h.c.t.HoodieTableMetaClient : Loading HoodieTableMetaClient from /tmp/dep_hudi2 2024-05-17 13:25:30.906 WARN 21882 --- [ main] o.a.h.u.NativeCodeLoader : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-05-17 13:25:31.243 WARN 21882 --- [ main] o.a.h.f.FileSystem : Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem could not be instantiated 2024-05-17 13:25:31.243 WARN 21882 --- [ main] o.a.h.f.FileSystem : java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkState(ZLjava/lang/String;J)V 2024-05-17 13:25:31.442 INFO 21882 --- [ main] o.a.h.c.t.HoodieTableConfig : Loading table properties from /tmp/dep_hudi2/.hoodie/hoodie.properties 2024-05-17 13:25:31.457 INFO 21882 --- [ main] o.a.h.c.t.HoodieTableMetaClient : Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /tmp/dep_hudi2 Metadata for table prueba loaded hudi:prueba->compaction schedule —hoodieConfigs "hoodie.compact.inline.max.delta.commits=1" Attempted to schedule compaction for 20240517132533480 hudi:prueba->compaction run --tableName prueba 2024-05-17 13:25:40.853 INFO 21882 --- [ main] o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : Option{val=[20240517082810322__deltacommit__COMPLETED__20240517083132000]} NO PENDING COMPACTION TO RUN hudi:prueba->compactions show all ╔═╤═══╤═══╗ ║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║ ╠═╧═══╧═══╣ ║ (empty) ║ ╚═╝ hudi:prueba->compaction run --tableName prueba —hoodieConfigs "hoodie.compact.inline.max.delta.commits=1" 2024-05-17 13:26:17.293 INFO 21882 --- [ main] o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : Option{val=[20240517082810322__deltacommit__COMPLETED__20240517083132000]} NO PENDING COMPACTION TO RUN hudi:prueba->compaction run --tableName prueba —hoodieConfigs "hoodie.compact.inline.max.delta.commits=1" 2024-05-17 13:26:30.318 INFO 21882 --- [ main] o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : Option{val=[20240517082810322__deltacommit__COMPLETED__20240517083132000]} NO PENDING COMPACTION TO RUN hudi:prueba->compactions show all ╔═╤═══╤═══╗ ║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║ ╠═╧═══╧═══╣ ║ (empty) ║ ╚═╝ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
hudi-bot commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117319785 ## CI report: * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980) * 3cef36f9284541a6cad8974b2e2e9984673c6627 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23983) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] RLI index slowing down [hudi]
ad1happy2go commented on issue #11243: URL: https://github.com/apache/hudi/issues/11243#issuecomment-2117254871 @manishgaurav84 Not sure why I couldn't download event logs. Can you ping me on slack and provide me there also. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]
ad1happy2go commented on issue #11249: URL: https://github.com/apache/hudi/issues/11249#issuecomment-2117253103 @jai20242 That is writer configuration. Hoodie don't save them. When you do compaction from cli. you need to pass there too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi fails ACID verification test [hudi]
ad1happy2go commented on issue #11170: URL: https://github.com/apache/hudi/issues/11170#issuecomment-2117238692 Thanks @matthijseikelenboom for the update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
hudi-bot commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117213033 ## CI report: * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980) * 3cef36f9284541a6cad8974b2e2e9984673c6627 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23983) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
hudi-bot commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117194321 ## CI report: * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980) * 3cef36f9284541a6cad8974b2e2e9984673c6627 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]
hudi-bot commented on PR #11031: URL: https://github.com/apache/hudi/pull/11031#issuecomment-2117193597 ## CI report: * e159472757b2475611e99dc4afd8fe2def6967f4 UNKNOWN * c4a9e9a0debe32518a84877c79c4831740b95caa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23979) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]
jai20242 commented on issue #11249: URL: https://github.com/apache/hudi/issues/11249#issuecomment-2117157686 I put the param hoodie.compact.inline.max.delta.commits to 1 (you can see it in the first comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Exceptions with Partition TTL [hudi]
xicm closed issue #11223: [SUPPORT] Exceptions with Partition TTL URL: https://github.com/apache/hudi/issues/11223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7652) Add new MergeKey API to support simple and composite keys
[ https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-7652. - Resolution: Done > Add new MergeKey API to support simple and composite keys > - > > Key: HUDI-7652 > URL: https://issues.apache.org/jira/browse/HUDI-7652 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323 -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys (#11077)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e0ca6dd0d52 [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys (#11077) e0ca6dd0d52 is described below commit e0ca6dd0d52c4171d5b4ee83cbc7ef684cc471dc Author: Sagar Sumit AuthorDate: Fri May 17 14:57:29 2024 +0530 [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys (#11077) Introduce a new abstract class `BaseHoodieMergedLogRecordScanner` which subclasses from `AbstractHoodieLogRecordReader`. The new abstract class holds the `records` map as `ExternalSpillableMap` and exposes `public abstract Map getRecords()` API. The existing `HoodieMergedLogRecordScanner` now derives from the new abstract class (instead of `AbstractHoodieLogRecordReader`) and uses String keys. --- .../hudi/client/TestJavaHoodieBackedMetadata.java | 2 +- .../functional/TestHoodieBackedMetadata.java | 2 +- .../functional/TestHoodieBackedTableMetadata.java | 2 +- .../common/model/HoodieMetadataRecordMerger.java | 43 .../hudi/common/model/HoodieRecordMerger.java | 8 + .../table/log/AbstractHoodieLogRecordReader.java | 25 ++ .../log/BaseHoodieMergedLogRecordScanner.java | 260 .../table/log/HoodieMergedLogRecordScanner.java| 265 +++- .../log/HoodieMetadataMergedLogRecordScanner.java | 270 + .../hudi/metadata/HoodieBackedTableMetadata.java | 2 +- .../metadata/HoodieMetadataLogRecordReader.java| 41 +++- .../hudi/metadata/HoodieTableMetadataUtil.java | 3 +- .../model/TestHoodieMetadataRecordMerger.java | 65 + .../apache/hudi/functional/TestMORDataSource.scala | 2 +- 14 files changed, 740 insertions(+), 250 deletions(-) diff --git a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java index ae17a34da3e..c2b85bd70b0 100644 --- a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java +++ b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java @@ -946,7 +946,7 @@ public class TestJavaHoodieBackedMetadata extends TestHoodieMetadataBase { if (enableMetaFields) { schema = HoodieAvroUtils.addMetadataFields(schema); } -HoodieMetadataLogRecordReader logRecordReader = HoodieMetadataLogRecordReader.newBuilder() +HoodieMetadataLogRecordReader logRecordReader = HoodieMetadataLogRecordReader.newBuilder(FILES.getPartitionPath()) .withStorage(metadataMetaClient.getStorage()) .withBasePath(metadataMetaClient.getBasePath()) .withLogFilePaths(logFilePaths) diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java index 700b9f1cd24..4da78d84980 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java @@ -1413,7 +1413,7 @@ public class TestHoodieBackedMetadata extends TestHoodieMetadataBase { if (enableMetaFields) { schema = HoodieAvroUtils.addMetadataFields(schema); } -HoodieMetadataLogRecordReader logRecordReader = HoodieMetadataLogRecordReader.newBuilder() +HoodieMetadataLogRecordReader logRecordReader = HoodieMetadataLogRecordReader.newBuilder(FILES.getPartitionPath()) .withStorage(metadataMetaClient.getStorage()) .withBasePath(metadataMetaClient.getBasePath()) .withLogFilePaths(logFilePaths) diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java index d3dcf94f641..cec201ee754 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedTableMetadata.java @@ -491,7 +491,7 @@ public class TestHoodieBackedTableMetadata extends TestHoodieMetadataBase { */ private void verifyMetadataMergedRecords(HoodieTableMetaClient metadataMetaClient, List logFilePaths, String latestCommitTimestamp) { Schema schema = HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema()); -HoodieMetadataLogRecordReader logRecordReader = Hoo
Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]
codope merged PR #11077: URL: https://github.com/apache/hudi/pull/11077 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]
zhuanshenbsj1 commented on code in PR #11031: URL: https://github.com/apache/hudi/pull/11031#discussion_r1604626804 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/SanityChecks.java: ## @@ -41,23 +42,22 @@ /** * Utilities for HoodieTableFactory sanity check. */ -public class SanityChecksUtil { +public class SanityChecks { - private static final Logger LOG = LoggerFactory.getLogger(SanityChecksUtil.class); + private static final Logger LOG = LoggerFactory.getLogger(SanityChecks.class); /** * The sanity check. - * If the metaClient is not null, it means that this is a table source sanity check and the source table has - * already been initialized. * - * @param conf The table options - * @param schema The table schema - * @param metaClient The table meta client + * @param conf The table options + * @param schema The table schema + * @param checkMetaData Weather to check metadata */ - public static void sanitCheck(Configuration conf, ResolvedSchema schema, HoodieTableMetaClient metaClient) { + public static void sanitCheck(Configuration conf, ResolvedSchema schema, Boolean checkMetaData) { checkTableType(conf); List schemaFields = schema.getColumnNames(); -if (metaClient != null) { +if (checkMetaData) { + HoodieTableMetaClient metaClient = StreamerUtil.metaClientForReader(conf, HadoopConfigurations.getHadoopConf(conf)); List latestTablefields = StreamerUtil.getLatestTableFields(metaClient); if (latestTablefields != null) { Review Comment: I put this logic into function checkRecordKey, and both the source and sink need to be checked. ``` public static void checkRecordKey(Configuration conf,List existingFields) { if (OptionsResolver.isAppendMode(conf)) { return; } } ``` And also do this in function checkIndexType. ``` public static void checkIndexType(Configuration conf) { if (OptionsResolver.isAppendMode(conf)) { return; } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
hudi-bot commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117105740 ## CI report: * d92e58eeaecc8b8835b317b269386fa715ca92e7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23980) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]
danny0405 commented on PR #10191: URL: https://github.com/apache/hudi/pull/10191#issuecomment-2117101176 @KnightChess Is this patch ready for review again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
danny0405 commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117099130 Looks reasonable, cc @nsivabalan for another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]
danny0405 commented on code in PR #11031: URL: https://github.com/apache/hudi/pull/11031#discussion_r1604603189 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/SanityChecks.java: ## @@ -41,23 +42,22 @@ /** * Utilities for HoodieTableFactory sanity check. */ -public class SanityChecksUtil { +public class SanityChecks { - private static final Logger LOG = LoggerFactory.getLogger(SanityChecksUtil.class); + private static final Logger LOG = LoggerFactory.getLogger(SanityChecks.class); /** * The sanity check. - * If the metaClient is not null, it means that this is a table source sanity check and the source table has - * already been initialized. * - * @param conf The table options - * @param schema The table schema - * @param metaClient The table meta client + * @param conf The table options + * @param schema The table schema + * @param checkMetaData Weather to check metadata */ - public static void sanitCheck(Configuration conf, ResolvedSchema schema, HoodieTableMetaClient metaClient) { + public static void sanitCheck(Configuration conf, ResolvedSchema schema, Boolean checkMetaData) { checkTableType(conf); List schemaFields = schema.getColumnNames(); -if (metaClient != null) { +if (checkMetaData) { + HoodieTableMetaClient metaClient = StreamerUtil.metaClientForReader(conf, HadoopConfigurations.getHadoopConf(conf)); List latestTablefields = StreamerUtil.getLatestTableFields(metaClient); if (latestTablefields != null) { Review Comment: The logic for sink has been changed with this patch. We have this code for the original sink: ```java if (!OptionsResolver.isAppendMode(conf)) { checkRecordKey(conf, schema); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
hudi-bot commented on PR #11251: URL: https://github.com/apache/hudi/pull/11251#issuecomment-2117092824 ## CI report: * d92e58eeaecc8b8835b317b269386fa715ca92e7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (7fc5adad7aa -> d93e4eb9d70)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7fc5adad7aa [HUDI-7717] Disable row writer for bulk insert if combining before insert is set (#11216) add d93e4eb9d70 [MINOR] Remove legacy code and add try catch to listStatus of partition (#11250) No new revisions were added by this update. Summary of changes: .../hudi/common/table/view/AbstractTableFileSystemView.java | 4 .../apache/hudi/metadata/FileSystemBackedTableMetadata.java | 11 +-- 2 files changed, 9 insertions(+), 6 deletions(-)
Re: [PR] [MINOR] Remove legacy code and add try catch to listStatus of partition. [hudi]
danny0405 merged PR #11250: URL: https://github.com/apache/hudi/pull/11250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7772] HoodieTimelineArchiver##getCommitInstantsToArchive need skip limiting archiving of instants [hudi]
danny0405 commented on code in PR #11245: URL: https://github.com/apache/hudi/pull/11245#discussion_r1604598008 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java: ## @@ -217,6 +217,10 @@ private List getCommitInstantsToArchive() throws IOException { earliestInstantToRetainCandidates.add( completedCommitsTimeline.findInstantsModifiedAfterByCompletionTime(latestCompactionTime.get()).firstInstant()); } + } catch (UnsupportedOperationException unsupportedOperationException) { +// If tableMetadata is FileSystemBackedTableMetadata would throw UnsupportedOperationException, should skip it to +// confirm next operation success +LOG.warn("tableMetadata is FileSystemBackedTableMetadata and skip limiting archiving of instants."); Review Comment: The `isMetadataTableAvailable` checks the `hoodie.properties` for `FILES` metadata partition instead, it should be empty if the MDT does not initialize successfully. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi fails ACID verification test [hudi]
matthijseikelenboom closed issue #11170: [SUPPORT] Hudi fails ACID verification test URL: https://github.com/apache/hudi/issues/11170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi fails ACID verification test [hudi]
matthijseikelenboom commented on issue #11170: URL: https://github.com/apache/hudi/issues/11170#issuecomment-2117016086 Tested and verified. Closing issues. More info Solution has been tested on: - Java 8 ✅ - Java 11 ✅ - Java 17 ❌ (As of this moment, Hudi doesn't support this version) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5505) Compaction NUM_COMMITS policy should only judge completed deltacommit
[ https://issues.apache.org/jira/browse/HUDI-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5505: - Labels: pull-request-available (was: ) > Compaction NUM_COMMITS policy should only judge completed deltacommit > - > > Key: HUDI-5505 > URL: https://issues.apache.org/jira/browse/HUDI-5505 > Project: Apache Hudi > Issue Type: Bug > Components: compaction, table-service >Reporter: HunterXHunter >Priority: Major > Labels: pull-request-available > Attachments: image-2023-01-05-13-10-57-918.png > > > `compaction.delta_commits =1` > > {code:java} > 20230105115229301.deltacommit > 20230105115229301.deltacommit.inflight > 20230105115229301.deltacommit.requested > 20230105115253118.commit > 20230105115253118.compaction.inflight > 20230105115253118.compaction.requested > 20230105115330994.deltacommit.inflight > 20230105115330994.deltacommit.requested{code} > The return result of `ScheduleCompactionActionExecutor.needCompact ` is > `true`, > This should not be expected. > > And In the `Occ` or `lazy clean` mode,this will cause compaction trigger > early. > `compaction.delta_commits =3` > > {code:java} > 20230105125650541.deltacommit.inflight > 20230105125650541.deltacommit.requested > 20230105125715081.deltacommit > 20230105125715081.deltacommit.inflight > 20230105125715081.deltacommit.requested > 20230105130018070.deltacommit.inflight > 20230105130018070.deltacommit.requested {code} > > And compaction will be trigger, this should not be expected. > !image-2023-01-05-13-10-57-918.png|width=699,height=158! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-5505] Fix counting of delta commits since last compaction in Sc… [hudi]
a-erofeev opened a new pull request, #11251: URL: https://github.com/apache/hudi/pull/11251 …heduleCompactionActionExecutor.getLatestDeltaCommitInfo ### Change Logs Fixed incorrect calculation of the number of delta commits when determining whether to schedule compaction ### Impact None ### Risk level (write none, low medium or high below) None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]
hudi-bot commented on PR #11031: URL: https://github.com/apache/hudi/pull/11031#issuecomment-2116997198 ## CI report: * e159472757b2475611e99dc4afd8fe2def6967f4 UNKNOWN * 30f50eb580ec3dec52ca87eab5a39ce027910344 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23971) * c4a9e9a0debe32518a84877c79c4831740b95caa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23979) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7622] Optimize HoodieTableSource's sanity check [hudi]
hudi-bot commented on PR #11031: URL: https://github.com/apache/hudi/pull/11031#issuecomment-2116984096 ## CI report: * e159472757b2475611e99dc4afd8fe2def6967f4 UNKNOWN * 30f50eb580ec3dec52ca87eab5a39ce027910344 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23971) * c4a9e9a0debe32518a84877c79c4831740b95caa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Intermittent stall of S3 PUT request for about 17 minutes [hudi]
ad1happy2go commented on issue #11203: URL: https://github.com/apache/hudi/issues/11203#issuecomment-2116965831 @gudladona Looks like S3 throttling is happening. Did you checked if you have lot of small file groups in your data? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] After upgrading hudi version 0.9.0 -> 0.13.1, it is slower and had mermory issue. [hudi]
codope closed issue #11241: [SUPPORT] After upgrading hudi version 0.9.0 -> 0.13.1, it is slower and had mermory issue. URL: https://github.com/apache/hudi/issues/11241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Run Merge On Read Compactions [hudi]
ad1happy2go commented on issue #11249: URL: https://github.com/apache/hudi/issues/11249#issuecomment-2116948074 @jai20242 If you have only 2 delta commits then there will be nothing to compact as default `[hoodie.compact.inline.max.delta.commits](https://hudi.apache.org/docs/configurations/#hoodiecompactinlinemaxdeltacommits)` will be 5. set this config to 1 if you want to do so -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org