Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156280208 ## CI report: * 04e8b0a67a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]
beyond1920 commented on issue #11419: URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156280046 @danny0405 Thanks for your attention. I checked [#11343](https://github.com/apache/hudi/pull/11343), it could not cover the current issues. The issue should be fixed in `HoodieTable#deleteInvalidFilesByPartitions` to avoid fail to delete the invalid files, while [#11343](https://github.com/apache/hudi/pull/11343) aims to fix clean service. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156278203 ## CI report: * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24307) * 04e8b0a67a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-6787] Implement the HoodieFileGroupReader API for Hive (#10422)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 0abc00df841 [HUDI-6787] Implement the HoodieFileGroupReader API for Hive (#10422) 0abc00df841 is described below commit 0abc00df8412c5ea3d15ab50d5074d8e8bccebcb Author: Jon Vexler AuthorDate: Sat Jun 8 22:22:46 2024 -0400 [HUDI-6787] Implement the HoodieFileGroupReader API for Hive (#10422) --- .../hudi/client/TestPartitionTTLManagement.java| 2 +- .../hudi/table/TestHoodieMergeOnReadTable.java | 2 +- .../TestHoodieSparkMergeOnReadTableCompaction.java | 80 +++--- .../hudi/common/engine/HoodieReaderContext.java| 29 ++- .../org/apache/hudi/common/model/HoodieRecord.java | 2 +- .../org/apache/hudi/hadoop/fs/HadoopFSUtils.java | 15 +- .../hudi/hadoop/HiveHoodieReaderContext.java | 273 .../HoodieFileGroupReaderBasedRecordReader.java| 281 + .../org/apache/hudi/hadoop/HoodieHiveRecord.java | 221 .../apache/hudi/hadoop/HoodieHiveRecordMerger.java | 71 ++ .../hudi/hadoop/HoodieParquetInputFormat.java | 48 +++- .../hudi/hadoop/RecordReaderValueIterator.java | 13 +- .../HoodieCombineRealtimeRecordReader.java | 51 +++- .../realtime/HoodieParquetRealtimeInputFormat.java | 15 +- .../hadoop/utils/HoodieArrayWritableAvroUtils.java | 110 .../hudi/hadoop/utils/HoodieInputFormatUtils.java | 36 +++ .../hudi/hadoop/utils/ObjectInspectorCache.java| 103 .../hudi/hadoop/TestHoodieParquetInputFormat.java | 122 - .../hive/TestHoodieCombineHiveInputFormat.java | 14 +- .../TestHoodieMergeOnReadSnapshotReader.java | 2 + .../realtime/TestHoodieRealtimeRecordReader.java | 2 + .../utils/TestHoodieArrayWritableAvroUtils.java| 88 +++ .../org/apache/hudi/functional/TestBootstrap.java | 1 + .../functional/TestHiveTableSchemaEvolution.java | 2 + .../TestSparkConsistentBucketClustering.java | 2 +- .../streamer/TestHoodieStreamerUtils.java | 13 +- 26 files changed, 1470 insertions(+), 128 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java index cda76154ca6..f4e9d206f06 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java @@ -182,7 +182,7 @@ public class TestPartitionTTLManagement extends HoodieClientTestBase { private List readRecords(String[] partitions) { return HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(storageConf, Arrays.stream(partitions).map(p -> Paths.get(basePath, p).toString()).collect(Collectors.toList()), -basePath, new JobConf(storageConf.unwrap()), true, false); +basePath, new JobConf(storageConf.unwrap()), true, true); } } diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java index b0876d06103..ae81a310190 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java @@ -213,7 +213,7 @@ public class TestHoodieMergeOnReadTable extends SparkClientFunctionalTestHarness .map(baseFile -> new Path(baseFile.getPath()).getParent().toString()) .collect(Collectors.toList()); List recordsRead = HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(storageConf(), inputPaths, - basePath(), new JobConf(storageConf().unwrap()), true, false); + basePath(), new JobConf(storageConf().unwrap()), true, populateMetaFields); // Wrote 20 records in 2 batches assertEquals(40, recordsRead.size(), "Must contain 40 records"); } diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java index e2ba56f94a3..ef28980d9cf 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java @@ -22,6 +22,7 @@ package org.apache.hudi.table.functional; import org.apache.hudi.client.SparkRDDWriteClient; import
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
yihua merged PR #10422: URL: https://github.com/apache/hudi/pull/10422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] hoodie.datasource.write.precombine.field is invalid [hudi]
yangZhengW commented on issue #11421: URL: https://github.com/apache/hudi/issues/11421#issuecomment-2156274391 > did you try the `DefaultAvroPayload` ? It's valid. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
Zouxxyy commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156267038 > @Zouxxyy nice contribution, do you think we should update the site doc too? yeah, will update soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156259327 ## CI report: * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24307) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] The clean service can't clean historical version files after the savepoint instant when i set `hoodie.archive.beyond.savepoint=true` [hudi]
danny0405 commented on issue #11405: URL: https://github.com/apache/hudi/issues/11405#issuecomment-2156256748 @nsivabalan can you give some insights here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7390) [Regression] HoodieStreamer no longer works without --props being supplied
[ https://issues.apache.org/jira/browse/HUDI-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7390. Resolution: Fixed Fixed via master branch: 9f9064761bac766cc7884027432568c06817ddd7 > [Regression] HoodieStreamer no longer works without --props being supplied > -- > > Key: HUDI-7390 > URL: https://issues.apache.org/jira/browse/HUDI-7390 > Project: Apache Hudi > Issue Type: Bug > Components: deltastreamer >Affects Versions: 1.0.0-beta1, 0.14.1 >Reporter: Brandon Dahler >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > Attachments: spark.log > > > h2. Problem > When attempting to run HoodieStreamer without a props file, specifying all > required extra configuration via {{--hoodie-conf}} parameters, the execution > fails and an exception is thrown: > {code:java} > 24/02/06 22:15:13 INFO SparkContext: Successfully stopped SparkContext > Exception in thread "main" org.apache.hudi.exception.HoodieIOException: > Cannot read properties from dfs from file > file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85) > at > org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437) > at > org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525) > at > org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498) > at > org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850) > at > org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.FileNotFoundException: File > file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161) > ... 25 more {code} > h2.
[jira] [Updated] (HUDI-7390) [Regression] HoodieStreamer no longer works without --props being supplied
[ https://issues.apache.org/jira/browse/HUDI-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7390: - Fix Version/s: 0.16.0 1.0.0 (was: 0.15.0) > [Regression] HoodieStreamer no longer works without --props being supplied > -- > > Key: HUDI-7390 > URL: https://issues.apache.org/jira/browse/HUDI-7390 > Project: Apache Hudi > Issue Type: Bug > Components: deltastreamer >Affects Versions: 1.0.0-beta1, 0.14.1 >Reporter: Brandon Dahler >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > Attachments: spark.log > > > h2. Problem > When attempting to run HoodieStreamer without a props file, specifying all > required extra configuration via {{--hoodie-conf}} parameters, the execution > fails and an exception is thrown: > {code:java} > 24/02/06 22:15:13 INFO SparkContext: Successfully stopped SparkContext > Exception in thread "main" org.apache.hudi.exception.HoodieIOException: > Cannot read properties from dfs from file > file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85) > at > org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437) > at > org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525) > at > org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498) > at > org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850) > at > org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.FileNotFoundException: File > file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161) > ... 25 more {code} > h2.
[jira] [Updated] (HUDI-7390) [Regression] HoodieStreamer no longer works without --props being supplied
[ https://issues.apache.org/jira/browse/HUDI-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7390: - Status: Open (was: In Progress) > [Regression] HoodieStreamer no longer works without --props being supplied > -- > > Key: HUDI-7390 > URL: https://issues.apache.org/jira/browse/HUDI-7390 > Project: Apache Hudi > Issue Type: Bug > Components: deltastreamer >Affects Versions: 1.0.0-beta1, 0.14.1 >Reporter: Brandon Dahler >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > Attachments: spark.log > > > h2. Problem > When attempting to run HoodieStreamer without a props file, specifying all > required extra configuration via {{--hoodie-conf}} parameters, the execution > fails and an exception is thrown: > {code:java} > 24/02/06 22:15:13 INFO SparkContext: Successfully stopped SparkContext > Exception in thread "main" org.apache.hudi.exception.HoodieIOException: > Cannot read properties from dfs from file > file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85) > at > org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437) > at > org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632) > at > org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525) > at > org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498) > at > org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404) > at > org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850) > at > org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72) > at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207) > at > org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.FileNotFoundException: File > file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) > at > org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161) > ... 25 more {code} > h2. Reproduction Steps > 1. Setup clean spark install >
(hudi) branch master updated: [HUDI-7390] HoodieStreamer no longer works without --props being supplied (#11414)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9f9064761ba [HUDI-7390] HoodieStreamer no longer works without --props being supplied (#11414) 9f9064761ba is described below commit 9f9064761bac766cc7884027432568c06817ddd7 Author: Vova Kolmakov AuthorDate: Sun Jun 9 08:17:55 2024 +0700 [HUDI-7390] HoodieStreamer no longer works without --props being supplied (#11414) Co-authored-by: Vova Kolmakov --- .../main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java index 1905cfe6f31..27db59ab7cd 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java @@ -446,7 +446,7 @@ public class HoodieStreamer implements Serializable { } public static TypedProperties getProps(Configuration conf, Config cfg) { - return cfg.propsFilePath.isEmpty() + return cfg.propsFilePath.isEmpty() || cfg.propsFilePath.equals(DEFAULT_DFS_SOURCE_PROPERTIES) ? buildProperties(cfg.configs) : readConfig(conf, new Path(cfg.propsFilePath), cfg.configs).getProps(); }
Re: [PR] [HUDI-7390] fix: HoodieStreamer no longer works without --props being supplied [hudi]
danny0405 merged PR #11414: URL: https://github.com/apache/hudi/pull/11414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]
danny0405 commented on issue #11419: URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156255421 you are right, we already got a fix recently: https://github.com/apache/hudi/pull/11343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7845) Call show_fsview_latest Procedure support path_regex
[ https://issues.apache.org/jira/browse/HUDI-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7845: - Fix Version/s: 1.0.0 > Call show_fsview_latest Procedure support path_regex > > > Key: HUDI-7845 > URL: https://issues.apache.org/jira/browse/HUDI-7845 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xinyu Zou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7845) Call show_fsview_latest Procedure support path_regex
[ https://issues.apache.org/jira/browse/HUDI-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7845. Resolution: Fixed Fixed via master branch: 37564b4fd68777fd0b1f553237066a07060aa1d6 > Call show_fsview_latest Procedure support path_regex > > > Key: HUDI-7845 > URL: https://issues.apache.org/jira/browse/HUDI-7845 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xinyu Zou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
danny0405 commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156254529 @Zouxxyy nice contribution, do you think we should update the site doc too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7845] Call show_fsview_latest procedure support path_regex (#11418)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 37564b4fd68 [HUDI-7845] Call show_fsview_latest procedure support path_regex (#11418) 37564b4fd68 is described below commit 37564b4fd68777fd0b1f553237066a07060aa1d6 Author: Zouxxyy AuthorDate: Sun Jun 9 09:11:46 2024 +0800 [HUDI-7845] Call show_fsview_latest procedure support path_regex (#11418) --- .../table/view/AbstractTableFileSystemView.java| 13 +++ .../hudi/command/procedures/BaseProcedure.scala| 5 + .../procedures/ShowFileSystemViewProcedure.scala | 105 - .../sql/hudi/procedure/TestFsViewProcedure.scala | 86 - 4 files changed, 164 insertions(+), 45 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java b/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java index 550082b0aa1..90f48b660c3 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java @@ -672,6 +672,19 @@ public abstract class AbstractTableFileSystemView implements SyncableFileSystemV } } + public final List getPartitionNames() { +try { + readLock.lock(); + return fetchAllStoredFileGroups() + .filter(fg -> !isFileGroupReplaced(fg)) + .map(HoodieFileGroup::getPartitionPath) + .distinct() + .collect(Collectors.toList()); +} finally { + readLock.unlock(); +} + } + @Override public final Stream> getPendingLogCompactionOperations() { try { diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala index b0ffc0cb64e..777d1937c98 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala @@ -76,6 +76,11 @@ abstract class BaseProcedure extends Procedure { } } + protected def isArgDefined(args: ProcedureArgs, parameter: ProcedureParameter): Boolean = { +val paramKey = getParamKey(parameter, args.isNamedArgs) +args.map.containsKey(paramKey) + } + protected def getInternalRowValue(row: InternalRow, index: Int, dataType: DataType): Any = { dataType match { case StringType => row.getString(index) diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala index c7d11f4c091..f19cd105c81 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala @@ -22,17 +22,23 @@ import org.apache.hudi.common.model.{FileSlice, HoodieLogFile} import org.apache.hudi.common.table.timeline.{CompletionTimeQueryView, HoodieDefaultTimeline, HoodieInstant, HoodieTimeline} import org.apache.hudi.common.table.view.HoodieTableFileSystemView import org.apache.hudi.common.util +import org.apache.hudi.exception.HoodieException +import org.apache.hudi.common.table.HoodieTableMetaClient import org.apache.hudi.storage.StoragePath import org.apache.spark.sql.Row import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, StructType} import java.util.function.{Function, Supplier} -import java.util.stream.Collectors +import java.util.stream.{Collectors, Stream => JStream} +import java.util.{ArrayList => JArrayList, List => JList} import scala.collection.JavaConverters._ class ShowFileSystemViewProcedure(showLatest: Boolean) extends BaseProcedure with ProcedureBuilder { + + private val ALL_PARTITIONS = "ALL_PARTITIONS" + private val PARAMETERS_ALL: Array[ProcedureParameter] = Array[ProcedureParameter]( ProcedureParameter.required(0, "table", DataTypes.StringType), ProcedureParameter.optional(1, "max_instant", DataTypes.StringType, ""), @@ -40,7 +46,7 @@ class ShowFileSystemViewProcedure(showLatest: Boolean) extends BaseProcedure wit ProcedureParameter.optional(3, "include_in_flight", DataTypes.BooleanType, false), ProcedureParameter.optional(4, "exclude_compaction", DataTypes.BooleanType, false), ProcedureParameter.optional(5, "limit",
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
danny0405 merged PR #11418: URL: https://github.com/apache/hudi/pull/11418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] hoodie.datasource.write.precombine.field is invalid [hudi]
danny0405 commented on issue #11421: URL: https://github.com/apache/hudi/issues/11421#issuecomment-2156254116 did you try the `DefaultAvroPayload` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156246947 ## CI report: * f5503b5c92 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305) * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156244799 ## CI report: * f5503b5c92 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305) * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156242265 ## CI report: * f5503b5c92aa9899ee55447cd415467a255caa82 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24303) * f5503b5c92 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156242370 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN * e95bcb80e4b729677ef65be41abc30e8c4ce5c03 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24306) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156226762 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279) * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN * e95bcb80e4b729677ef65be41abc30e8c4ce5c03 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24306) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156224970 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279) * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN * e95bcb80e4b729677ef65be41abc30e8c4ce5c03 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156224903 ## CI report: * f5503b5c92aa9899ee55447cd415467a255caa82 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24303) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]
hudi-bot commented on PR #10422: URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156214446 ## CI report: * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279) * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156214340 ## CI report: * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302) * f5503b5c92aa9899ee55447cd415467a255caa82 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156212175 ## CI report: * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302) * f5503b5c92aa9899ee55447cd415467a255caa82 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156210177 ## CI report: * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (a31fda59555 -> 90011bf6314)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from a31fda59555 [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure (#11416) add 90011bf6314 [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build (#11420) No new revisions were added by this update. Summary of changes: pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
yihua merged PR #11420: URL: https://github.com/apache/hudi/pull/11420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1372348212 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -1276,5 +1291,35 @@ public HoodieTableMetaClient initTable(Configuration configuration, String baseP throws IOException { return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, basePath, build()); } + +private void validateMergeConfigs() { + boolean payloadClassNameSet = null != payloadClassName; + boolean payloadTypeSet = null != payloadType; + boolean recordMergerStrategySet = null != recordMergerStrategy; + boolean recordMergeModeSet = null != recordMergeMode; + + checkArgument(recordMergeModeSet, + "Record merge mode " + HoodieTableConfig.RECORD_MERGE_MODE.key() + " should be set"); Review Comment: This is mandatory in the table config and during table upgrade, the merge mode should be inferred from either the payload class name / type or record merger strategy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632119102 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -1276,5 +1291,35 @@ public HoodieTableMetaClient initTable(Configuration configuration, String baseP throws IOException { return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, basePath, build()); } + +private void validateMergeConfigs() { + boolean payloadClassNameSet = null != payloadClassName; + boolean payloadTypeSet = null != payloadType; + boolean recordMergerStrategySet = null != recordMergerStrategy; + boolean recordMergeModeSet = null != recordMergeMode; + + checkArgument(recordMergeModeSet, + "Record merge mode " + HoodieTableConfig.RECORD_MERGE_MODE.key() + " should be set"); Review Comment: The PR is updated and this is done in `HoodieTableMetaClient$PropertyBuilder#build`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632119003 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -242,6 +249,11 @@ public HoodieFileGroupReaderIterator getClosableIterator() { return new HoodieFileGroupReaderIterator<>(this); } + public static RecordMergeMode getRecordMergeMode(Properties props) { +String mergeMode = getStringWithAltKeys(props, HoodieCommonConfig.RECORD_MERGE_MODE, true).toUpperCase(); Review Comment: Right now, since there is only placeholder upgrade and downgrade methods from between table version 6 and 8, I added the inference of record merge mode inside `HoodieTableMetaClient$PropertyBuilder#build`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632118953 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -1382,5 +1398,35 @@ public HoodieTableMetaClient initTable(StorageConfiguration configuration, St throws IOException { return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, basePath, build()); } + +private void validateMergeConfigs() { Review Comment: I invoke this method after inferring the record merge mode in `HoodieTableMetaClient$PropertyBuilder#build`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156197076 ## CI report: * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299) * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156193662 ## CI report: * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299) * 79b7c1f744fe13e094f245b38e131c63d801ea1a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2156173131 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * 22d1bdc6320ddbd1232bb7d9edaf8162f33e2081 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24301) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
hudi-bot commented on PR #11420: URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156173137 ## CI report: * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
yihua commented on code in PR #9894: URL: https://github.com/apache/hudi/pull/9894#discussion_r1632102563 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -242,6 +249,11 @@ public HoodieFileGroupReaderIterator getClosableIterator() { return new HoodieFileGroupReaderIterator<>(this); } + public static RecordMergeMode getRecordMergeMode(Properties props) { +String mergeMode = getStringWithAltKeys(props, HoodieCommonConfig.RECORD_MERGE_MODE, true).toUpperCase(); Review Comment: Sounds good. The record merge mode is required to dictate the merging behavior in release 1.x, playing the same role as the payload class config in the release 0.x. During table upgrade, we need to infer the record merge mode based on the payload class so it's correctly set. HUDI-7847 to track the work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7847) Infer record merge mode during table upgrade
[ https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7847: Description: Record merge mode is required to dictate the merging behavior in release 1.x, playing the same role as the payload class config in the release 0.x. During table upgrade, we need to infer the record merge mode based on the payload class so it's correctly set. > Infer record merge mode during table upgrade > > > Key: HUDI-7847 > URL: https://issues.apache.org/jira/browse/HUDI-7847 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > > Record merge mode is required to dictate the merging behavior in release 1.x, > playing the same role as the payload class config in the release 0.x. During > table upgrade, we need to infer the record merge mode based on the payload > class so it's correctly set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7847) Infer record merge mode during table upgrade
[ https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7847: Fix Version/s: 1.0.0 > Infer record merge mode during table upgrade > > > Key: HUDI-7847 > URL: https://issues.apache.org/jira/browse/HUDI-7847 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 1.0.0 > > > Record merge mode is required to dictate the merging behavior in release 1.x, > playing the same role as the payload class config in the release 0.x. During > table upgrade, we need to infer the record merge mode based on the payload > class so it's correctly set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7847) Infer record merge mode during table upgrade
Ethan Guo created HUDI-7847: --- Summary: Infer record merge mode during table upgrade Key: HUDI-7847 URL: https://issues.apache.org/jira/browse/HUDI-7847 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
hudi-bot commented on PR #11420: URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156153418 ## CI report: * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2156153405 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * 795b0473b4abca7626de895e81f6750863fa67d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24293) * 22d1bdc6320ddbd1232bb7d9edaf8162f33e2081 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24301) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
hudi-bot commented on PR #11420: URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156151888 ## CI report: * 7e43c7ad60b8390e5a6020d72c18378848544f1f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2156151877 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * 795b0473b4abca7626de895e81f6750863fa67d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24293) * 22d1bdc6320ddbd1232bb7d9edaf8162f33e2081 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
hudi-bot commented on PR #11420: URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156150066 ## CI report: * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7846: Description: The following warning is thrown when doing maven parallel build with `mvn -T 1C ...` {code:java} [WARNING] Enable debug to see precisely which goals are not marked as thread-safe. [WARNING] * [WARNING] * Your build is requesting parallel execution, but this * [WARNING] * project contains the following plugin(s) that have goals not * [WARNING] * marked as thread-safe to support parallel execution. * [WARNING] * While this /may/ work fine, please look for plugin updates * [WARNING] * and/or request plugins be made thread-safe. * [WARNING] * If reporting an issue, report it against the plugin in * [WARNING] * question, not against Apache Maven. * [WARNING] * [WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr: [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} was: The following error is thrown when doing maven parallel build with `mvn -T 1C ...` {code:java} [WARNING] Enable debug to see precisely which goals are not marked as thread-safe. [WARNING] * [WARNING] * Your build is requesting parallel execution, but this * [WARNING] * project contains the following plugin(s) that have goals not * [WARNING] * marked as thread-safe to support parallel execution. * [WARNING] * While this /may/ work fine, please look for plugin updates * [WARNING] * and/or request plugins be made thread-safe. * [WARNING] * If reporting an issue, report it against the plugin in * [WARNING] * question, not against Apache Maven. * [WARNING] * [WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr: [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > The following warning is thrown when doing maven parallel build with `mvn -T > 1C ...` > {code:java} > [WARNING] Enable debug to see precisely which goals are not marked as > thread-safe. > [WARNING] * > [WARNING] * Your build is requesting parallel execution, but this * > [WARNING] * project contains the following plugin(s) that have goals not * > [WARNING] * marked as thread-safe to support parallel execution. * > [WARNING] * While this /may/ work fine, please look for plugin updates * > [WARNING] * and/or request plugins be made thread-safe. * > [WARNING] * If reporting an issue, report it against the plugin in * > [WARNING] * question, not against Apache Maven. * > [WARNING] * > [WARNING] The following plugins are not marked as thread-safe in > hudi-hadoop-mr: > [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
jonvex commented on code in PR #11415: URL: https://github.com/apache/hudi/pull/11415#discussion_r1632094178 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -123,46 +142,42 @@ public void processDataBlock(HoodieDataBlock dataBlock, Option keySpecO } } - @Override - public void processNextDataRecord(T record, Map metadata, Serializable recordPosition) throws IOException { -Pair, Map> existingRecordMetadataPair = records.get(recordPosition); -Option>> mergedRecordAndMetadata = -doProcessNextDataRecord(record, metadata, existingRecordMetadataPair); -if (mergedRecordAndMetadata.isPresent()) { - records.put(recordPosition, Pair.of( - Option.ofNullable(readerContext.seal(mergedRecordAndMetadata.get().getLeft())), - mergedRecordAndMetadata.get().getRight())); + private void fallbackToKeyBasedBuffer() { +readerContext.setShouldMergeUseRecordPosition(false); +//need to make a copy of the keys to avoid concurrent modification exception +ArrayList positions = new ArrayList<>(records.keySet()); Review Comment: No, those are positions. The map is recordpositon->record. After we fallback it becomes recordkey->record -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
jonvex commented on code in PR #11415: URL: https://github.com/apache/hudi/pull/11415#discussion_r1632093807 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java: ## @@ -319,60 +311,6 @@ protected Option merge(Option older, Map olderInfoMap, return Option.empty(); } - /** Review Comment: Moved these out of the base record buffer. extractRecordPositions is specific to position based buffer and shouldskip is only used there as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156135926 ## CI report: * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
hudi-bot commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156134125 ## CI report: * 6ba3bdf0b1736220996fdf21da6a449ae0049b47 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24298) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
hudi-bot commented on PR #11420: URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156131606 ## CI report: * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156131130 ## CI report: * 84698763395160c09cac2c1529615a900cbb4625 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296) * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
hudi-bot commented on PR #11420: URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156117676 ## CI report: * 7e43c7ad60b8390e5a6020d72c18378848544f1f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156117209 ## CI report: * 84698763395160c09cac2c1529615a900cbb4625 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296) * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
hudi-bot commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156115266 ## CI report: * be777f818218f93cddbdf760cc1a93d581f7b9d5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24297) * 6ba3bdf0b1736220996fdf21da6a449ae0049b47 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24298) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7846: - Labels: pull-request-available (was: ) > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 0.16.0, 1.0.0 > > > The following error is thrown when doing maven parallel build with `mvn -T 1C > ...` > {code:java} > [WARNING] Enable debug to see precisely which goals are not marked as > thread-safe. > [WARNING] * > [WARNING] * Your build is requesting parallel execution, but this * > [WARNING] * project contains the following plugin(s) that have goals not * > [WARNING] * marked as thread-safe to support parallel execution. * > [WARNING] * While this /may/ work fine, please look for plugin updates * > [WARNING] * and/or request plugins be made thread-safe. * > [WARNING] * If reporting an issue, report it against the plugin in * > [WARNING] * question, not against Apache Maven. * > [WARNING] * > [WARNING] The following plugins are not marked as thread-safe in > hudi-hadoop-mr: > [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]
yihua opened a new pull request, #11420: URL: https://github.com/apache/hudi/pull/11420 ### Change Logs The following error is thrown when doing maven parallel build with `mvn -T 1C ...`. This PR bumps `apache-rat-plugin` to 0.16.1 to eliminate thread-safe warning in maven parallel build. ``` [WARNING] Enable debug to see precisely which goals are not marked as thread-safe. [WARNING] * [WARNING] * Your build is requesting parallel execution, but this * [WARNING] * project contains the following plugin(s) that have goals not * [WARNING] * marked as thread-safe to support parallel execution. * [WARNING] * While this /may/ work fine, please look for plugin updates * [WARNING] * and/or request plugins be made thread-safe. * [WARNING] * If reporting an issue, report it against the plugin in * [WARNING] * question, not against Apache Maven. * [WARNING] * [WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr: [WARNING] org.apache.rat:apache-rat-plugin:0.13 ``` ### Impact Eliminates build warning. ### Risk level none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7846: Fix Version/s: 0.16.0 1.0.0 > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.16.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
Ethan Guo created HUDI-7846: --- Summary: Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build Key: HUDI-7846 URL: https://issues.apache.org/jira/browse/HUDI-7846 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7846: --- Assignee: Ethan Guo > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.16.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build
[ https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7846: Description: The following error is thrown when doing maven parallel build with `mvn -T 1C ...` {code:java} [WARNING] Enable debug to see precisely which goals are not marked as thread-safe. [WARNING] * [WARNING] * Your build is requesting parallel execution, but this * [WARNING] * project contains the following plugin(s) that have goals not * [WARNING] * marked as thread-safe to support parallel execution. * [WARNING] * While this /may/ work fine, please look for plugin updates * [WARNING] * and/or request plugins be made thread-safe. * [WARNING] * If reporting an issue, report it against the plugin in * [WARNING] * question, not against Apache Maven. * [WARNING] * [WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr: [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} > Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven > parallel build > - > > Key: HUDI-7846 > URL: https://issues.apache.org/jira/browse/HUDI-7846 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.16.0, 1.0.0 > > > The following error is thrown when doing maven parallel build with `mvn -T 1C > ...` > {code:java} > [WARNING] Enable debug to see precisely which goals are not marked as > thread-safe. > [WARNING] * > [WARNING] * Your build is requesting parallel execution, but this * > [WARNING] * project contains the following plugin(s) that have goals not * > [WARNING] * marked as thread-safe to support parallel execution. * > [WARNING] * While this /may/ work fine, please look for plugin updates * > [WARNING] * and/or request plugins be made thread-safe. * > [WARNING] * If reporting an issue, report it against the plugin in * > [WARNING] * question, not against Apache Maven. * > [WARNING] * > [WARNING] The following plugins are not marked as thread-safe in > hudi-hadoop-mr: > [WARNING] org.apache.rat:apache-rat-plugin:0.13 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
hudi-bot commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156112815 ## CI report: * be777f818218f93cddbdf760cc1a93d581f7b9d5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24297) * 6ba3bdf0b1736220996fdf21da6a449ae0049b47 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
hudi-bot commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156099204 ## CI report: * be777f818218f93cddbdf760cc1a93d581f7b9d5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24297) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]
hudi-bot commented on PR #11418: URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156097039 ## CI report: * be777f818218f93cddbdf760cc1a93d581f7b9d5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]
beyond1920 opened a new issue, #11419: URL: https://github.com/apache/hudi/issues/11419 Dear community, Our user complained that after their daily run job which written to a Hudi cow table finished, the downstream reading jobs find many duplicate records today. The daily run job has been already online for a long time, and this is the first time of such wrong result. He gives a detailed deduplicated record as example to help debug. The record appeared in 3 base files which belongs to different file groups. https://github.com/apache/hudi/assets/1525333/60b95dc4-91d6-4b40-8bca-c877a4407ae0;> I find the today's writer job, the spark application finished successfully. In the driver log, I find those two files marked as invalid files which to delete, only one file is valid files. https://github.com/apache/hudi/assets/1525333/8e19e170-e38f-4725-82a5-84ed55750db9;> And in the clean stage task log, those two files are also marked to be deleted and there is no exception in the task either. https://github.com/apache/hudi/assets/1525333/1a819bd0-2dbe-4236-a0ed-e5f4576cfa38;> Those two files already existed on the hdfs before the clean stage began, but they still existed after the clean stage. Finally, found the root cause is some corner case happened in hdfs. And `fs.delete` does not throw any exception, only return `false` if the hdfs does not delete the file successfully. https://github.com/apache/hudi/assets/1525333/4a1f46d8-0b6b-4089-bed1-7d6a2e72ac28;> And I check the `fs.delete` api, the behavior is reasonable. https://github.com/apache/hudi/assets/1525333/20b7e237-18d4-480a-aedc-6c5a57b24062;> I think we should check the return value of`fs.delete` in `HoodieTable#deleteInvalidFilesByPartitions` to avoid wrong results. Besides, it's necessary to check all places which called `fs.delete`. Any suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7845) Call show_fsview_latest Procedure support path_regex
[ https://issues.apache.org/jira/browse/HUDI-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7845: - Labels: pull-request-available (was: ) > Call show_fsview_latest Procedure support path_regex > > > Key: HUDI-7845 > URL: https://issues.apache.org/jira/browse/HUDI-7845 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Xinyu Zou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7845] Call show_fsview_latest Procedure support path_regex [hudi]
Zouxxyy opened a new pull request, #11418: URL: https://github.com/apache/hudi/pull/11418 ### Change Logs Currently `show_fsview_all` support set `path_regex`, e.g. ```sql call show_fsview_all(table => '$tableName', path_regex => 'day=d1/hh=h2') call show_fsview_all(table => '$tableName', path_regex => 'day=d1/*/') ``` while `show_fsview_latest` only support set `partition_path`, e.g. ```sql call show_fsview_latest(table => '$tableName', partition_path => 'day=d1/hh=h2') ``` This PR make `show_fsview_latest` support `path_regex` too, In fact, `partition_path` can be completely replaced by `path_regex`, but for compatibility with old versions, we keep it Other fixs: - change `partition_path` from required to optional - fix `call show_fsview_latest` when no commits in timeline ### Impact `show_fsview_latest` support `path_regex` ### Risk level (write none, low medium or high below) low ### Documentation Update will update the doc of `call show_fsview_latest` ### Contributor's checklist show_fsview_latest - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7845) Call show_fsview_latest Procedure support path_regex
Xinyu Zou created HUDI-7845: --- Summary: Call show_fsview_latest Procedure support path_regex Key: HUDI-7845 URL: https://issues.apache.org/jira/browse/HUDI-7845 Project: Apache Hudi Issue Type: Improvement Reporter: Xinyu Zou -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156058771 ## CI report: * 84698763395160c09cac2c1529615a900cbb4625 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Multi Writer DeltaStreamer (W1 and W2) Writing into Partition IN and US One of them failing [hudi]
soumilshah1995 opened a new issue, #11417: URL: https://github.com/apache/hudi/issues/11417 When running the Hoodie DeltaStreamer with two writers simultaneously, one for the US partition and the other for the IN partition, one of the writers fails with a NullPointerException. This issue occurs during the offset fetching process from Kafka. ![image](https://github.com/apache/hudi/assets/39345855/9997c228-ff87-4650-9e9b-55e8bf215ce0) # Steps ### spin up stack ``` version: "3" services: trino-coordinator: image: 'trinodb/trino:400' hostname: trino-coordinator ports: - '8080:8080' volumes: - ./trino/etc:/etc/trino metastore_db: image: postgres:11 hostname: metastore_db ports: - 5432:5432 environment: POSTGRES_USER: hive POSTGRES_PASSWORD: hive POSTGRES_DB: metastore hive-metastore: hostname: hive-metastore image: 'starburstdata/hive:3.1.2-e.18' ports: - '9083:9083' # Metastore Thrift environment: HIVE_METASTORE_DRIVER: org.postgresql.Driver HIVE_METASTORE_JDBC_URL: jdbc:postgresql://metastore_db:5432/metastore HIVE_METASTORE_USER: hive HIVE_METASTORE_PASSWORD: hive HIVE_METASTORE_WAREHOUSE_DIR: s3://datalake/ S3_ENDPOINT: http://minio:9000 S3_ACCESS_KEY: admin S3_SECRET_KEY: password S3_PATH_STYLE_ACCESS: "true" REGION: "" GOOGLE_CLOUD_KEY_FILE_PATH: "" AZURE_ADL_CLIENT_ID: "" AZURE_ADL_CREDENTIAL: "" AZURE_ADL_REFRESH_URL: "" AZURE_ABFS_STORAGE_ACCOUNT: "" AZURE_ABFS_ACCESS_KEY: "" AZURE_WASB_STORAGE_ACCOUNT: "" AZURE_ABFS_OAUTH: "" AZURE_ABFS_OAUTH_TOKEN_PROVIDER: "" AZURE_ABFS_OAUTH_CLIENT_ID: "" AZURE_ABFS_OAUTH_SECRET: "" AZURE_ABFS_OAUTH_ENDPOINT: "" AZURE_WASB_ACCESS_KEY: "" HIVE_METASTORE_USERS_IN_ADMIN_ROLE: "admin" depends_on: - metastore_db healthcheck: test: bash -c "exec 6<> /dev/tcp/localhost/9083" fast-data-dev: image: dougdonohoe/fast-data-dev ports: - "3181:3181" - "3040:3040" - "7081:7081" - "7082:7082" - "7083:7083" - "7092:7092" - "8081:8081" environment: - ZK_PORT=3181 - WEB_PORT=3040 - REGISTRY_PORT=8081 - REST_PORT=7082 - CONNECT_PORT=7083 - BROKER_PORT=7092 - ADV_HOST=127.0.0.1 volumes: hive-metastore-postgresql: networks: default: name: hudi ``` # publish some data ``` from faker import Faker from time import sleep import random import uuid from datetime import datetime from kafka_schema_registry import prepare_producer # Configuration KAFKA_BOOTSTRAP_SERVERS = ['localhost:7092'] SCHEMA_REGISTRY_URL = 'http://localhost:8081' NUM_MESSAGES = 20 SLEEP_INTERVAL = 1 TOPIC_NAME = 'orders' NUM_PARTITIONS = 1 REPLICATION_FACTOR = 1 # Avro Schema SAMPLE_SCHEMA = { "type": "record", "name": "Order", "fields": [ {"name": "order_id", "type": "string"}, {"name": "name", "type": "string"}, {"name": "order_value", "type": "string"}, {"name": "priority", "type": "string"}, {"name": "order_date", "type": "string"}, {"name": "customer_id", "type": "string"}, {"name": "ts", "type": "string"}, {"name": "country", "type": "string"} ] } # Kafka Producer producer = prepare_producer( KAFKA_BOOTSTRAP_SERVERS, SCHEMA_REGISTRY_URL, TOPIC_NAME, NUM_PARTITIONS, REPLICATION_FACTOR, value_schema=SAMPLE_SCHEMA ) # Faker instance faker = Faker() class DataGenerator: @staticmethod def get_orders_data(): """ Generate and return a dictionary with mock order data. """ country = random.choice(['US', 'IN']) # Define country variable return { "order_id": str(uuid.uuid4()), "name": faker.text(max_nb_chars=20), "order_value": str(random.randint(10, 1000)), "priority": random.choice(["LOW", "MEDIUM", "HIGH"]), "order_date": faker.date_between(start_date='-30d', end_date='today').strftime('%Y-%m-%d'), "customer_id": str(uuid.uuid4()), "ts": str(datetime.now().timestamp()), "country": country } @staticmethod def produce_avro_message(producer, data): """ Produce an Avro message and send it to the appropriate Kafka topic based on the country. """ topic = 'orders_in'
Re: [I] [SUPPORT] NoClassDefFoundError for org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile [hudi]
michael1991 commented on issue #8507: URL: https://github.com/apache/hudi/issues/8507#issuecomment-2156047802 I used Hudi0.14.1 on Dataproc2.1(Spark3.3.2 Hadoop3.3.6) to upsert COW table with PartialUpdateAvroPayload, got same warning messages sometimes, but job would succeed finally. Not sure if missing some jars or not, how to avoid this warning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156041218 ## CI report: * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24291) * 84698763395160c09cac2c1529615a900cbb4625 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]
hudi-bot commented on PR #9894: URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156038868 ## CI report: * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24291) * 84698763395160c09cac2c1529615a900cbb4625 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure
[ https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-7844. - Resolution: Fixed > Fix HoodieSparkSqlTestBase to throw error upon test failure > --- > > Key: HUDI-7844 > URL: https://issues.apache.org/jira/browse/HUDI-7844 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: Screenshot 2024-06-07 at 22.27.21.png > > > This PR ([https://github.com/apache/hudi/pull/11162]) introduces the > following changes that make `HoodieSparkSqlTestBase` to swallow test failures. > > !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397! > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated (a33b2a5e03f -> a31fda59555)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from a33b2a5e03f [HUDI-7834] Create placeholder table versions and introduce new hoodie table property to track initial table version (#11406) add a31fda59555 [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure (#11416) No new revisions were added by this update. Summary of changes: .../sql/hudi/command/index}/TestFunctionalIndex.scala| 14 +++--- .../spark/sql/hudi/common/HoodieSparkSqlTestBase.scala | 16 2 files changed, 11 insertions(+), 19 deletions(-) rename hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/{hudi/functional => spark/sql/hudi/command/index}/TestFunctionalIndex.scala (98%)
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
codope merged PR #11416: URL: https://github.com/apache/hudi/pull/11416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
hudi-bot commented on PR #11416: URL: https://github.com/apache/hudi/pull/11416#issuecomment-2156018029 ## CI report: * 8235d366753038068a1145757fe539e8db4298e3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24294) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
hudi-bot commented on PR #11416: URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155996582 ## CI report: * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292) * 8235d366753038068a1145757fe539e8db4298e3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24294) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7384] Secondary index support [hudi]
skyshineb commented on PR #10625: URL: https://github.com/apache/hudi/pull/10625#issuecomment-2155988087 Hi @bhat-vinay! Is this design of secondary index through MDT is the only one to be implemented or there plans to make some other Index Types? As I remember there was RFC for Lucene Index and maybe some other types in future? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
hudi-bot commented on PR #11416: URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155983361 ## CI report: * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292) * 8235d366753038068a1145757fe539e8db4298e3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
codope commented on code in PR #11416: URL: https://github.com/apache/hudi/pull/11416#discussion_r1632020201 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala: ## @@ -35,9 +35,9 @@ import org.apache.spark.sql.catalyst.parser.ParserInterface import org.apache.spark.sql.hudi.command.{CreateIndexCommand, ShowIndexesCommand} import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue} -import org.junit.jupiter.api.Tag +import org.scalatest.Ignore -@Tag("functional") +@Ignore Review Comment: note: moved this class out of functional package and disabled temporarily. More details in HUDI-7835. TL;DR need to check why this test is attempted twice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-7835) Spark context not stopped properly if the test had some exception that was ignored
[ https://issues.apache.org/jira/browse/HUDI-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853358#comment-17853358 ] Sagar Sumit commented on HUDI-7835: --- I observed one more thing: Somehow the TestFunctionalIndex is attempted again after TestSecondaryIndex. To verify, I disabled TestFunctionalIndex and confirmed from the logs, it is attempted twice (though it does not run because it is diabled). We need to root cause why TestFunctionalIndex is being attempted twice. Note the test always succeeds from IDE. To reproduce run the following maven command from hudi repo root, and direct the logs to some file where you can grep for any TestFunctionalIndex name, e.g. `Test Create Functional Index` {code:java} mvn test -Pwarn-log -Dscala-2.12 -Dspark3.2 -Dflink1.18 -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true -ntp -B -V -Pwarn-log -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.shade=warn -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.dependency=warn -Punit-tests -Dtest=skipJavaTests -DfailIfNoTests=false -DwildcardSuites=org.apache.hudi,org.apache.spark.hudi,org.apache.spark.sql.avro,org.apache.spark.sql.execution,org.apache.spark.sql.hudi.analysis,org.apache.spark.sql.hudi.command,org.apache.spark.sql.hudi.common,org.apache.spark.sql.hudi.dml -pl hudi-spark-datasource,hudi-spark-datasource/hudi-spark,hudi-spark-datasource/hudi-spark3.2.x,hudi-spark-datasource/hudi-spark3.2plus-common,hudi-spark-datasource/hudi-spark3-common,hudi-spark-datasource/hudi-spark-common {code} > Spark context not stopped properly if the test had some exception that was > ignored > -- > > Key: HUDI-7835 > URL: https://issues.apache.org/jira/browse/HUDI-7835 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Sagar Sumit >Priority: Major > Fix For: 1.0.0 > > > Found two tests that don't fail but throw an exception while running. The > test succeeds but the spark context is not stopped in time due to exception. > This causes issue for other tests. For example, `test("bucket index query")` > in `TestDataSkippingQuery` succeeds but when we check the > [logs|https://github.com/apache/hudi/actions/runs/9391927778/job/25865161535#step:6:5799] > of the test we will find an error as below > {code:java} > 74954 [ScalaTest-run-running-TestDataSkippingQuery] ERROR > org.apache.hudi.HoodieFileIndex [] - Failed to lookup candidate files in File > Index > java.lang.IllegalArgumentException: Property > _hoodie.record.key.gen.partition.id not found > at > org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:69) > ~[classes/:?] > at > org.apache.hudi.common.config.TypedProperties.getInteger(TypedProperties.java:94) > ~[classes/:?] > at > org.apache.hudi.keygen.AutoRecordGenWrapperKeyGenerator.generateSequenceId(AutoRecordGenWrapperKeyGenerator.java:115) > ~[classes/:?] > at > org.apache.hudi.keygen.AutoRecordGenWrapperKeyGenerator.getRecordKey(AutoRecordGenWrapperKeyGenerator.java:67) > ~[classes/:?] > at > org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:70) > ~[classes/:?] > at > org.apache.hudi.BucketIndexSupport.getBucketNumber$1(BucketIndexSupport.scala:154) > ~[classes/:?] > at > org.apache.hudi.BucketIndexSupport.getBucketSetFromValue$1(BucketIndexSupport.scala:168) > ~[classes/:?] > at > org.apache.hudi.BucketIndexSupport.getBucketsBySingleHashFields(BucketIndexSupport.scala:174) > ~[classes/:?] > at > org.apache.hudi.BucketIndexSupport.filterQueriesWithBucketHashField(BucketIndexSupport.scala:107) > ~[classes/:?] > at > org.apache.hudi.BucketIndexSupport.computeCandidateFileNames(BucketIndexSupport.scala:78) > ~[classes/:?] > at > org.apache.hudi.HoodieFileIndex.$anonfun$lookupCandidateFilesInMetadataTable$3(HoodieFileIndex.scala:354) > ~[classes/:?] > at > org.apache.hudi.HoodieFileIndex.$anonfun$lookupCandidateFilesInMetadataTable$3$adapted(HoodieFileIndex.scala:351) > ~[classes/:?] > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) > ~[scala-library-2.12.10.jar:?] > at scala.collection.immutable.List.foreach(List.scala:392) > ~[scala-library-2.12.10.jar:?] > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) > ~[scala-library-2.12.10.jar:?] > at > org.apache.hudi.HoodieFileIndex.$anonfun$lookupCandidateFilesInMetadataTable$1(HoodieFileIndex.scala:351) > ~[classes/:?] > at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.10.jar:?] > at > org.apache.hudi.HoodieFileIndex.lookupCandidateFilesInMetadataTable(HoodieFileIndex.scala:338) > ~[classes/:?] > at >
Re: [I] RLI Spark Hudi Error occurs when executing map [hudi]
michael1991 commented on issue #10609: URL: https://github.com/apache/hudi/issues/10609#issuecomment-2155893143 Hi @ad1happy2go , I can reproduce this error by following env and scala code, hope it could be helpful. Environment: Dataproc 2.1(Spark 3.3.2) with Hudi 0.14.x / Dataproc 2.2(Spark 3.5.0) with Hudi 0.15.x Scala code in `spark-shell`: ```scala import org.apache.spark.sql._ import spark.implicits._ val path = "gs://bucket/test/hudi-test" val data = Seq((1, "Alice", 29),(2, "Bob", 35),(3, "Catherine", 23)).toDF("id", "name", "age") val config = Map( "hoodie.table.name" -> "hudi-test", "hoodie.metadata.enable" -> "true", "hoodie.datasource.write.table.type" -> "COPY_ON_WRITE", "hoodie.datasource.write.payload.class" -> "org.apache.hudi.common.model.PartialUpdateAvroPayload", "hoodie.datasource.write.recordkey.field" -> "id", "hoodie.datasource.write.reconcile.schema" -> "true", "hoodie.datasource.write.new.columns.nullable" -> "true", "hoodie.combine.before.insert" -> "false", "hoodie.combine.before.upsert" -> "false", "hoodie.cleaner.commits.retained" -> "6", "hoodie.parquet.compression.codec" -> "snappy" ) data.write.format("hudi").options(config).option("hoodie.datasource.write.operation", "insert").mode(SaveMode.Append).save(path) spark.read.format("hudi").option("hoodie.metadata.enable", "true").load(path).show(10,false) val upsertData = Seq((1, 30)).toDF("id","age") upsertData.write.format("hudi").options(config).option("hoodie.datasource.write.operation", "upsert").mode(SaveMode.Append).save(path) ``` Try if possible and let me know if you can reproduce this error. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
hudi-bot commented on PR #11416: URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155855408 ## CI report: * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155855404 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * 795b0473b4abca7626de895e81f6750863fa67d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24293) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
hudi-bot commented on PR #11415: URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155853054 ## CI report: * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN * bfea0d3a2dd9e6ba2d96c1d7d20a07e085883da6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24278) * 795b0473b4abca7626de895e81f6750863fa67d3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]
codope commented on code in PR #11415: URL: https://github.com/apache/hudi/pull/11415#discussion_r1631913348 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPositionBasedMergingFallback.scala: ## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hadoop.fs.FileSystem +import org.apache.hudi.DataSourceWriteOptions +import org.apache.hudi.DataSourceWriteOptions.{OPERATION, PRECOMBINE_FIELD, RECORDKEY_FIELD, TABLE_TYPE} +import org.apache.hudi.HoodieConversionUtils.toJavaOption +import org.apache.hudi.common.config.{HoodieReaderConfig, HoodieStorageConfig} +import org.apache.hudi.common.model.HoodieRecordMerger +import org.apache.hudi.common.util +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.testutils.HoodieSparkClientTestBase +import org.apache.hudi.util.JFunction +import org.apache.spark.sql.SaveMode.{Append, Overwrite} +import org.apache.spark.sql.SparkSessionExtensions +import org.apache.spark.sql.hudi.HoodieSparkSessionExtension +import org.apache.spark.sql.internal.SQLConf +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.{Arguments, MethodSource} + +import java.util.function.Consumer + +class TestPositionBasedMergingFallback extends HoodieSparkClientTestBase { + override def getSparkSessionExtensionsInjector: util.Option[Consumer[SparkSessionExtensions]] = +toJavaOption( + Some( +JFunction.toJavaConsumer((receiver: SparkSessionExtensions) => new HoodieSparkSessionExtension().apply(receiver))) +) + + @BeforeEach override def setUp(): Unit = { +initPath() +initSparkContexts() +sparkSession.conf.set(SQLConf.PARQUET_RECORD_FILTER_ENABLED.key, "true") +initTestDataGenerator() +initHoodieStorage() + } + + @AfterEach override def tearDown(): Unit = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() +FileSystem.closeAll() +System.gc() Review Comment: let's avoid System.gc() ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -174,20 +189,97 @@ public boolean containsLogRecord(String recordKey) { } @Override - protected boolean doHasNext() throws IOException { -ValidationUtils.checkState(baseFileIterator != null, "Base file iterator has not been set yet"); - -// Handle merging. -while (baseFileIterator.hasNext()) { - T baseRecord = baseFileIterator.next(); - nextRecordPosition = readerContext.extractRecordPosition(baseRecord, readerSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME, nextRecordPosition); - Pair, Map> logRecordInfo = records.remove(nextRecordPosition++); - if (hasNextBaseRecord(baseRecord, logRecordInfo)) { -return true; + protected boolean hasNextBaseRecord(T baseRecord) throws IOException { +if (!readerContext.getShouldMergeUseRecordPosition()) { + return doHasNextFallbackBaseRecord(baseRecord); +} + +nextRecordPosition = readerContext.extractRecordPosition(baseRecord, readerSchema, +ROW_INDEX_COLUMN_NAME, nextRecordPosition); +Pair, Map> logRecordInfo = records.remove(nextRecordPosition++); + +Map metadata = readerContext.generateMetadataForRecord( +baseRecord, readerSchema); + +Option resultRecord = logRecordInfo != null +? merge(Option.of(baseRecord), metadata, logRecordInfo.getLeft(), logRecordInfo.getRight()) +: merge(Option.empty(), Collections.emptyMap(), Option.of(baseRecord), metadata); +if (resultRecord.isPresent()) { + nextRecord = readerContext.seal(resultRecord.get()); + return true; +} +return false; + } + + private boolean doHasNextFallbackBaseRecord(T baseRecord) throws IOException { +if (needToDoHybridStrategy) { Review Comment: let's test this logic as well. ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -123,46
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
hudi-bot commented on PR #11416: URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155836211 ## CI report: * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]
hudi-bot commented on PR #11416: URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155833908 ## CI report: * 70f8a79214fcd176dd275faf56bb789d4db60f7f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org