Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156280208

   
   ## CI report:
   
   * 04e8b0a67a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]

2024-06-08 Thread via GitHub


beyond1920 commented on issue #11419:
URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156280046

   @danny0405 Thanks for your attention.
   I checked [#11343](https://github.com/apache/hudi/pull/11343), it could not 
cover the current issues. The issue should be fixed in 
`HoodieTable#deleteInvalidFilesByPartitions` to avoid fail to delete the 
invalid files, while [#11343](https://github.com/apache/hudi/pull/11343) aims 
to fix clean service. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156278203

   
   ## CI report:
   
   * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24307)
 
   * 04e8b0a67a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-6787] Implement the HoodieFileGroupReader API for Hive (#10422)

2024-06-08 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0abc00df841 [HUDI-6787] Implement the HoodieFileGroupReader API for 
Hive (#10422)
0abc00df841 is described below

commit 0abc00df8412c5ea3d15ab50d5074d8e8bccebcb
Author: Jon Vexler 
AuthorDate: Sat Jun 8 22:22:46 2024 -0400

[HUDI-6787] Implement the HoodieFileGroupReader API for Hive (#10422)
---
 .../hudi/client/TestPartitionTTLManagement.java|   2 +-
 .../hudi/table/TestHoodieMergeOnReadTable.java |   2 +-
 .../TestHoodieSparkMergeOnReadTableCompaction.java |  80 +++---
 .../hudi/common/engine/HoodieReaderContext.java|  29 ++-
 .../org/apache/hudi/common/model/HoodieRecord.java |   2 +-
 .../org/apache/hudi/hadoop/fs/HadoopFSUtils.java   |  15 +-
 .../hudi/hadoop/HiveHoodieReaderContext.java   | 273 
 .../HoodieFileGroupReaderBasedRecordReader.java| 281 +
 .../org/apache/hudi/hadoop/HoodieHiveRecord.java   | 221 
 .../apache/hudi/hadoop/HoodieHiveRecordMerger.java |  71 ++
 .../hudi/hadoop/HoodieParquetInputFormat.java  |  48 +++-
 .../hudi/hadoop/RecordReaderValueIterator.java |  13 +-
 .../HoodieCombineRealtimeRecordReader.java |  51 +++-
 .../realtime/HoodieParquetRealtimeInputFormat.java |  15 +-
 .../hadoop/utils/HoodieArrayWritableAvroUtils.java | 110 
 .../hudi/hadoop/utils/HoodieInputFormatUtils.java  |  36 +++
 .../hudi/hadoop/utils/ObjectInspectorCache.java| 103 
 .../hudi/hadoop/TestHoodieParquetInputFormat.java  | 122 -
 .../hive/TestHoodieCombineHiveInputFormat.java |  14 +-
 .../TestHoodieMergeOnReadSnapshotReader.java   |   2 +
 .../realtime/TestHoodieRealtimeRecordReader.java   |   2 +
 .../utils/TestHoodieArrayWritableAvroUtils.java|  88 +++
 .../org/apache/hudi/functional/TestBootstrap.java  |   1 +
 .../functional/TestHiveTableSchemaEvolution.java   |   2 +
 .../TestSparkConsistentBucketClustering.java   |   2 +-
 .../streamer/TestHoodieStreamerUtils.java  |  13 +-
 26 files changed, 1470 insertions(+), 128 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java
index cda76154ca6..f4e9d206f06 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestPartitionTTLManagement.java
@@ -182,7 +182,7 @@ public class TestPartitionTTLManagement extends 
HoodieClientTestBase {
   private List readRecords(String[] partitions) {
 return HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(storageConf,
 Arrays.stream(partitions).map(p -> Paths.get(basePath, 
p).toString()).collect(Collectors.toList()),
-basePath, new JobConf(storageConf.unwrap()), true, false);
+basePath, new JobConf(storageConf.unwrap()), true, true);
   }
 
 }
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
index b0876d06103..ae81a310190 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java
@@ -213,7 +213,7 @@ public class TestHoodieMergeOnReadTable extends 
SparkClientFunctionalTestHarness
   .map(baseFile -> new Path(baseFile.getPath()).getParent().toString())
   .collect(Collectors.toList());
   List recordsRead = 
HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(storageConf(), inputPaths,
-  basePath(), new JobConf(storageConf().unwrap()), true, false);
+  basePath(), new JobConf(storageConf().unwrap()), true, 
populateMetaFields);
   // Wrote 20 records in 2 batches
   assertEquals(40, recordsRead.size(), "Must contain 40 records");
 }
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
index e2ba56f94a3..ef28980d9cf 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java
@@ -22,6 +22,7 @@ package org.apache.hudi.table.functional;
 import org.apache.hudi.client.SparkRDDWriteClient;
 import 

Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-06-08 Thread via GitHub


yihua merged PR #10422:
URL: https://github.com/apache/hudi/pull/10422


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] hoodie.datasource.write.precombine.field is invalid [hudi]

2024-06-08 Thread via GitHub


yangZhengW commented on issue #11421:
URL: https://github.com/apache/hudi/issues/11421#issuecomment-2156274391

   > did you try the `DefaultAvroPayload` ?
   
   It's valid. thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


Zouxxyy commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156267038

   > @Zouxxyy nice contribution, do you think we should update the site doc too?
   
   yeah, will update soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156259327

   
   ## CI report:
   
   * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24307)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] The clean service can't clean historical version files after the savepoint instant when i set `hoodie.archive.beyond.savepoint=true` [hudi]

2024-06-08 Thread via GitHub


danny0405 commented on issue #11405:
URL: https://github.com/apache/hudi/issues/11405#issuecomment-2156256748

   @nsivabalan can you give some insights here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7390) [Regression] HoodieStreamer no longer works without --props being supplied

2024-06-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7390.

Resolution: Fixed

Fixed via master branch: 9f9064761bac766cc7884027432568c06817ddd7

> [Regression] HoodieStreamer no longer works without --props being supplied
> --
>
> Key: HUDI-7390
> URL: https://issues.apache.org/jira/browse/HUDI-7390
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Affects Versions: 1.0.0-beta1, 0.14.1
>Reporter: Brandon Dahler
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
> Attachments: spark.log
>
>
> h2. Problem
> When attempting to run HoodieStreamer without a props file, specifying all 
> required extra configuration via {{--hoodie-conf}} parameters, the execution 
> fails and an exception is thrown:
> {code:java}
> 24/02/06 22:15:13 INFO SparkContext: Successfully stopped SparkContext
> Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
> Cannot read properties from dfs from file 
> file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.FileNotFoundException: File 
> file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties
>  does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161)
>         ... 25 more {code}
> h2. 

[jira] [Updated] (HUDI-7390) [Regression] HoodieStreamer no longer works without --props being supplied

2024-06-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7390:
-
Fix Version/s: 0.16.0
   1.0.0
   (was: 0.15.0)

> [Regression] HoodieStreamer no longer works without --props being supplied
> --
>
> Key: HUDI-7390
> URL: https://issues.apache.org/jira/browse/HUDI-7390
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Affects Versions: 1.0.0-beta1, 0.14.1
>Reporter: Brandon Dahler
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
> Attachments: spark.log
>
>
> h2. Problem
> When attempting to run HoodieStreamer without a props file, specifying all 
> required extra configuration via {{--hoodie-conf}} parameters, the execution 
> fails and an exception is thrown:
> {code:java}
> 24/02/06 22:15:13 INFO SparkContext: Successfully stopped SparkContext
> Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
> Cannot read properties from dfs from file 
> file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.FileNotFoundException: File 
> file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties
>  does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161)
>         ... 25 more {code}
> h2. 

[jira] [Updated] (HUDI-7390) [Regression] HoodieStreamer no longer works without --props being supplied

2024-06-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7390:
-
Status: Open  (was: In Progress)

> [Regression] HoodieStreamer no longer works without --props being supplied
> --
>
> Key: HUDI-7390
> URL: https://issues.apache.org/jira/browse/HUDI-7390
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: deltastreamer
>Affects Versions: 1.0.0-beta1, 0.14.1
>Reporter: Brandon Dahler
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
> Attachments: spark.log
>
>
> h2. Problem
> When attempting to run HoodieStreamer without a props file, specifying all 
> required extra configuration via {{--hoodie-conf}} parameters, the execution 
> fails and an exception is thrown:
> {code:java}
> 24/02/06 22:15:13 INFO SparkContext: Successfully stopped SparkContext
> Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
> Cannot read properties from dfs from file 
> file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:166)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:85)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:232)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:437)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:656)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:632)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:525)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:498)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:404)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.FileNotFoundException: File 
> file:/private/tmp/hudi-props-repro/src/test/resources/streamer-config/dfs-source.properties
>  does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:161)
>         ... 25 more {code}
> h2. Reproduction Steps
> 1. Setup clean spark install
> 

(hudi) branch master updated: [HUDI-7390] HoodieStreamer no longer works without --props being supplied (#11414)

2024-06-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f9064761ba [HUDI-7390] HoodieStreamer no longer works without --props 
being supplied (#11414)
9f9064761ba is described below

commit 9f9064761bac766cc7884027432568c06817ddd7
Author: Vova Kolmakov 
AuthorDate: Sun Jun 9 08:17:55 2024 +0700

[HUDI-7390] HoodieStreamer no longer works without --props being supplied 
(#11414)

Co-authored-by: Vova Kolmakov 
---
 .../main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
index 1905cfe6f31..27db59ab7cd 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
@@ -446,7 +446,7 @@ public class HoodieStreamer implements Serializable {
 }
 
 public static TypedProperties getProps(Configuration conf, Config cfg) {
-  return cfg.propsFilePath.isEmpty()
+  return cfg.propsFilePath.isEmpty() || 
cfg.propsFilePath.equals(DEFAULT_DFS_SOURCE_PROPERTIES)
   ? buildProperties(cfg.configs)
   : readConfig(conf, new Path(cfg.propsFilePath), 
cfg.configs).getProps();
 }



Re: [PR] [HUDI-7390] fix: HoodieStreamer no longer works without --props being supplied [hudi]

2024-06-08 Thread via GitHub


danny0405 merged PR #11414:
URL: https://github.com/apache/hudi/pull/11414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]

2024-06-08 Thread via GitHub


danny0405 commented on issue #11419:
URL: https://github.com/apache/hudi/issues/11419#issuecomment-2156255421

   you are right, we already got a fix recently: 
https://github.com/apache/hudi/pull/11343


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7845) Call show_fsview_latest Procedure support path_regex

2024-06-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7845:
-
Fix Version/s: 1.0.0

> Call show_fsview_latest Procedure support path_regex
> 
>
> Key: HUDI-7845
> URL: https://issues.apache.org/jira/browse/HUDI-7845
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7845) Call show_fsview_latest Procedure support path_regex

2024-06-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7845.

Resolution: Fixed

Fixed via master branch: 37564b4fd68777fd0b1f553237066a07060aa1d6

> Call show_fsview_latest Procedure support path_regex
> 
>
> Key: HUDI-7845
> URL: https://issues.apache.org/jira/browse/HUDI-7845
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


danny0405 commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156254529

   @Zouxxyy nice contribution, do you think we should update the site doc too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7845] Call show_fsview_latest procedure support path_regex (#11418)

2024-06-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 37564b4fd68 [HUDI-7845] Call show_fsview_latest procedure support 
path_regex (#11418)
37564b4fd68 is described below

commit 37564b4fd68777fd0b1f553237066a07060aa1d6
Author: Zouxxyy 
AuthorDate: Sun Jun 9 09:11:46 2024 +0800

[HUDI-7845] Call show_fsview_latest procedure support path_regex (#11418)
---
 .../table/view/AbstractTableFileSystemView.java|  13 +++
 .../hudi/command/procedures/BaseProcedure.scala|   5 +
 .../procedures/ShowFileSystemViewProcedure.scala   | 105 -
 .../sql/hudi/procedure/TestFsViewProcedure.scala   |  86 -
 4 files changed, 164 insertions(+), 45 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
index 550082b0aa1..90f48b660c3 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
@@ -672,6 +672,19 @@ public abstract class AbstractTableFileSystemView 
implements SyncableFileSystemV
 }
   }
 
+  public final List getPartitionNames() {
+try {
+  readLock.lock();
+  return fetchAllStoredFileGroups()
+  .filter(fg -> !isFileGroupReplaced(fg))
+  .map(HoodieFileGroup::getPartitionPath)
+  .distinct()
+  .collect(Collectors.toList());
+} finally {
+  readLock.unlock();
+}
+  }
+
   @Override
   public final Stream> 
getPendingLogCompactionOperations() {
 try {
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala
index b0ffc0cb64e..777d1937c98 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/BaseProcedure.scala
@@ -76,6 +76,11 @@ abstract class BaseProcedure extends Procedure {
 }
   }
 
+  protected def isArgDefined(args: ProcedureArgs, parameter: 
ProcedureParameter): Boolean = {
+val paramKey = getParamKey(parameter, args.isNamedArgs)
+args.map.containsKey(paramKey)
+  }
+
   protected def getInternalRowValue(row: InternalRow, index: Int, dataType: 
DataType): Any = {
 dataType match {
   case StringType => row.getString(index)
diff --git 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala
 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala
index c7d11f4c091..f19cd105c81 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowFileSystemViewProcedure.scala
@@ -22,17 +22,23 @@ import org.apache.hudi.common.model.{FileSlice, 
HoodieLogFile}
 import org.apache.hudi.common.table.timeline.{CompletionTimeQueryView, 
HoodieDefaultTimeline, HoodieInstant, HoodieTimeline}
 import org.apache.hudi.common.table.view.HoodieTableFileSystemView
 import org.apache.hudi.common.util
+import org.apache.hudi.exception.HoodieException
+import org.apache.hudi.common.table.HoodieTableMetaClient
 import org.apache.hudi.storage.StoragePath
 
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, 
StructType}
 
 import java.util.function.{Function, Supplier}
-import java.util.stream.Collectors
+import java.util.stream.{Collectors, Stream => JStream}
+import java.util.{ArrayList => JArrayList, List => JList}
 
 import scala.collection.JavaConverters._
 
 class ShowFileSystemViewProcedure(showLatest: Boolean) extends BaseProcedure 
with ProcedureBuilder {
+
+  private val ALL_PARTITIONS = "ALL_PARTITIONS"
+
   private val PARAMETERS_ALL: Array[ProcedureParameter] = 
Array[ProcedureParameter](
 ProcedureParameter.required(0, "table", DataTypes.StringType),
 ProcedureParameter.optional(1, "max_instant", DataTypes.StringType, ""),
@@ -40,7 +46,7 @@ class ShowFileSystemViewProcedure(showLatest: Boolean) 
extends BaseProcedure wit
 ProcedureParameter.optional(3, "include_in_flight", DataTypes.BooleanType, 
false),
 ProcedureParameter.optional(4, "exclude_compaction", 
DataTypes.BooleanType, false),
 ProcedureParameter.optional(5, "limit", 

Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


danny0405 merged PR #11418:
URL: https://github.com/apache/hudi/pull/11418


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] hoodie.datasource.write.precombine.field is invalid [hudi]

2024-06-08 Thread via GitHub


danny0405 commented on issue #11421:
URL: https://github.com/apache/hudi/issues/11421#issuecomment-2156254116

   did you try the `DefaultAvroPayload` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156246947

   
   ## CI report:
   
   * f5503b5c92 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305)
 
   * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24308)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156244799

   
   ## CI report:
   
   * f5503b5c92 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305)
 
   * 04e8b0a67a675dc34bede7fb3c8f72c3b137cd60 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156242265

   
   ## CI report:
   
   * f5503b5c92aa9899ee55447cd415467a255caa82 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24303)
 
   * f5503b5c92 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156242370

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN
   * e95bcb80e4b729677ef65be41abc30e8c4ce5c03 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24306)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156226762

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279)
 
   * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN
   * e95bcb80e4b729677ef65be41abc30e8c4ce5c03 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24306)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156224970

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279)
 
   * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN
   * e95bcb80e4b729677ef65be41abc30e8c4ce5c03 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156224903

   
   ## CI report:
   
   * f5503b5c92aa9899ee55447cd415467a255caa82 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24303)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-2156214446

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * 33249cc712c6dcdde12efe8536579d3c9c5f8575 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24279)
 
   * 89a4a8c0e3add3f192b7fe8fa01f0f5c7d3be71e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156214340

   
   ## CI report:
   
   * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302)
 
   * f5503b5c92aa9899ee55447cd415467a255caa82 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24305)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156212175

   
   ## CI report:
   
   * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302)
 
   * f5503b5c92aa9899ee55447cd415467a255caa82 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156210177

   
   ## CI report:
   
   * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (a31fda59555 -> 90011bf6314)

2024-06-08 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from a31fda59555 [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon 
test failure (#11416)
 add 90011bf6314 [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate 
thread-safe warning in maven parallel build (#11420)

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


yihua merged PR #11420:
URL: https://github.com/apache/hudi/pull/11420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1372348212


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -1276,5 +1291,35 @@ public HoodieTableMetaClient initTable(Configuration 
configuration, String baseP
 throws IOException {
   return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, 
basePath, build());
 }
+
+private void validateMergeConfigs() {
+  boolean payloadClassNameSet = null != payloadClassName;
+  boolean payloadTypeSet = null != payloadType;
+  boolean recordMergerStrategySet = null != recordMergerStrategy;
+  boolean recordMergeModeSet = null != recordMergeMode;
+
+  checkArgument(recordMergeModeSet,
+  "Record merge mode " + HoodieTableConfig.RECORD_MERGE_MODE.key() + " 
should be set");

Review Comment:
   This is mandatory in the table config and during table upgrade, the merge 
mode should be inferred from either the payload class name / type or record 
merger strategy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632119102


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -1276,5 +1291,35 @@ public HoodieTableMetaClient initTable(Configuration 
configuration, String baseP
 throws IOException {
   return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, 
basePath, build());
 }
+
+private void validateMergeConfigs() {
+  boolean payloadClassNameSet = null != payloadClassName;
+  boolean payloadTypeSet = null != payloadType;
+  boolean recordMergerStrategySet = null != recordMergerStrategy;
+  boolean recordMergeModeSet = null != recordMergeMode;
+
+  checkArgument(recordMergeModeSet,
+  "Record merge mode " + HoodieTableConfig.RECORD_MERGE_MODE.key() + " 
should be set");

Review Comment:
   The PR is updated and this is done in 
`HoodieTableMetaClient$PropertyBuilder#build`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632119003


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -242,6 +249,11 @@ public HoodieFileGroupReaderIterator 
getClosableIterator() {
 return new HoodieFileGroupReaderIterator<>(this);
   }
 
+  public static RecordMergeMode getRecordMergeMode(Properties props) {
+String mergeMode = getStringWithAltKeys(props, 
HoodieCommonConfig.RECORD_MERGE_MODE, true).toUpperCase();

Review Comment:
   Right now, since there is only placeholder upgrade and downgrade methods 
from between table version 6 and 8, I added the inference of record merge mode 
inside `HoodieTableMetaClient$PropertyBuilder#build`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632118953


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -1382,5 +1398,35 @@ public HoodieTableMetaClient 
initTable(StorageConfiguration configuration, St
 throws IOException {
   return HoodieTableMetaClient.initTableAndGetMetaClient(configuration, 
basePath, build());
 }
+
+private void validateMergeConfigs() {

Review Comment:
   I invoke this method after inferring the record merge mode in 
`HoodieTableMetaClient$PropertyBuilder#build`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156197076

   
   ## CI report:
   
   * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299)
 
   * 79b7c1f744fe13e094f245b38e131c63d801ea1a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24302)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156193662

   
   ## CI report:
   
   * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299)
 
   * 79b7c1f744fe13e094f245b38e131c63d801ea1a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11415:
URL: https://github.com/apache/hudi/pull/11415#issuecomment-2156173131

   
   ## CI report:
   
   * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN
   * 22d1bdc6320ddbd1232bb7d9edaf8162f33e2081 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24301)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11420:
URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156173137

   
   ## CI report:
   
   * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


yihua commented on code in PR #9894:
URL: https://github.com/apache/hudi/pull/9894#discussion_r1632102563


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -242,6 +249,11 @@ public HoodieFileGroupReaderIterator 
getClosableIterator() {
 return new HoodieFileGroupReaderIterator<>(this);
   }
 
+  public static RecordMergeMode getRecordMergeMode(Properties props) {
+String mergeMode = getStringWithAltKeys(props, 
HoodieCommonConfig.RECORD_MERGE_MODE, true).toUpperCase();

Review Comment:
   Sounds good.  The record merge mode is required to dictate the merging 
behavior in release 1.x, playing the same role as the payload class config in 
the release 0.x.  During table upgrade, we need to infer the record merge mode 
based on the payload class so it's correctly set.  HUDI-7847 to track the work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7847) Infer record merge mode during table upgrade

2024-06-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7847:

Description: Record merge mode is required to dictate the merging behavior 
in release 1.x, playing the same role as the payload class config in the 
release 0.x.  During table upgrade, we need to infer the record merge mode 
based on the payload class so it's correctly set.

> Infer record merge mode during table upgrade
> 
>
> Key: HUDI-7847
> URL: https://issues.apache.org/jira/browse/HUDI-7847
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>
> Record merge mode is required to dictate the merging behavior in release 1.x, 
> playing the same role as the payload class config in the release 0.x.  During 
> table upgrade, we need to infer the record merge mode based on the payload 
> class so it's correctly set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7847) Infer record merge mode during table upgrade

2024-06-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7847:

Fix Version/s: 1.0.0

> Infer record merge mode during table upgrade
> 
>
> Key: HUDI-7847
> URL: https://issues.apache.org/jira/browse/HUDI-7847
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 1.0.0
>
>
> Record merge mode is required to dictate the merging behavior in release 1.x, 
> playing the same role as the payload class config in the release 0.x.  During 
> table upgrade, we need to infer the record merge mode based on the payload 
> class so it's correctly set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7847) Infer record merge mode during table upgrade

2024-06-08 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7847:
---

 Summary: Infer record merge mode during table upgrade
 Key: HUDI-7847
 URL: https://issues.apache.org/jira/browse/HUDI-7847
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11420:
URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156153418

   
   ## CI report:
   
   * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11415:
URL: https://github.com/apache/hudi/pull/11415#issuecomment-2156153405

   
   ## CI report:
   
   * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN
   * 795b0473b4abca7626de895e81f6750863fa67d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24293)
 
   * 22d1bdc6320ddbd1232bb7d9edaf8162f33e2081 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24301)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11420:
URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156151888

   
   ## CI report:
   
   * 7e43c7ad60b8390e5a6020d72c18378848544f1f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11415:
URL: https://github.com/apache/hudi/pull/11415#issuecomment-2156151877

   
   ## CI report:
   
   * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN
   * 795b0473b4abca7626de895e81f6750863fa67d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24293)
 
   * 22d1bdc6320ddbd1232bb7d9edaf8162f33e2081 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11420:
URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156150066

   
   ## CI report:
   
   * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7846:

Description: 
The following warning is thrown when doing maven parallel build with `mvn -T 1C 
...`
{code:java}
[WARNING] Enable debug to see precisely which goals are not marked as 
thread-safe.
[WARNING] *
[WARNING] * Your build is requesting parallel execution, but this         *
[WARNING] * project contains the following plugin(s) that have goals not  *
[WARNING] * marked as thread-safe to support parallel execution.          *
[WARNING] * While this /may/ work fine, please look for plugin updates    *
[WARNING] * and/or request plugins be made thread-safe.                   *
[WARNING] * If reporting an issue, report it against the plugin in        *
[WARNING] * question, not against Apache Maven.                           *
[WARNING] *
[WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr:
[WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}

  was:
The following error is thrown when doing maven parallel build with `mvn -T 1C 
...`
{code:java}
[WARNING] Enable debug to see precisely which goals are not marked as 
thread-safe.
[WARNING] *
[WARNING] * Your build is requesting parallel execution, but this         *
[WARNING] * project contains the following plugin(s) that have goals not  *
[WARNING] * marked as thread-safe to support parallel execution.          *
[WARNING] * While this /may/ work fine, please look for plugin updates    *
[WARNING] * and/or request plugins be made thread-safe.                   *
[WARNING] * If reporting an issue, report it against the plugin in        *
[WARNING] * question, not against Apache Maven.                           *
[WARNING] *
[WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr:
[WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}


> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> The following warning is thrown when doing maven parallel build with `mvn -T 
> 1C ...`
> {code:java}
> [WARNING] Enable debug to see precisely which goals are not marked as 
> thread-safe.
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but this         *
> [WARNING] * project contains the following plugin(s) that have goals not  *
> [WARNING] * marked as thread-safe to support parallel execution.          *
> [WARNING] * While this /may/ work fine, please look for plugin updates    *
> [WARNING] * and/or request plugins be made thread-safe.                   *
> [WARNING] * If reporting an issue, report it against the plugin in        *
> [WARNING] * question, not against Apache Maven.                           *
> [WARNING] *
> [WARNING] The following plugins are not marked as thread-safe in 
> hudi-hadoop-mr:
> [WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


jonvex commented on code in PR #11415:
URL: https://github.com/apache/hudi/pull/11415#discussion_r1632094178


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedFileGroupRecordBuffer.java:
##
@@ -123,46 +142,42 @@ public void processDataBlock(HoodieDataBlock dataBlock, 
Option keySpecO
 }
   }
 
-  @Override
-  public void processNextDataRecord(T record, Map metadata, 
Serializable recordPosition) throws IOException {
-Pair, Map> existingRecordMetadataPair = 
records.get(recordPosition);
-Option>> mergedRecordAndMetadata =
-doProcessNextDataRecord(record, metadata, existingRecordMetadataPair);
-if (mergedRecordAndMetadata.isPresent()) {
-  records.put(recordPosition, Pair.of(
-  
Option.ofNullable(readerContext.seal(mergedRecordAndMetadata.get().getLeft())),
-  mergedRecordAndMetadata.get().getRight()));
+  private void fallbackToKeyBasedBuffer() {
+readerContext.setShouldMergeUseRecordPosition(false);
+//need to make a copy of the keys to avoid concurrent modification 
exception
+ArrayList positions = new ArrayList<>(records.keySet());

Review Comment:
   No, those are positions. The map is recordpositon->record. After we fallback 
it becomes recordkey->record



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


jonvex commented on code in PR #11415:
URL: https://github.com/apache/hudi/pull/11415#discussion_r1632093807


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieBaseFileGroupRecordBuffer.java:
##
@@ -319,60 +311,6 @@ protected Option merge(Option older, Map olderInfoMap,
 return Option.empty();
   }
 
-  /**

Review Comment:
   Moved these out of the base record buffer. extractRecordPositions is 
specific to position based buffer and shouldskip is only used there as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156135926

   
   ## CI report:
   
   * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156134125

   
   ## CI report:
   
   * 6ba3bdf0b1736220996fdf21da6a449ae0049b47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24298)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11420:
URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156131606

   
   ## CI report:
   
   * 7e43c7ad60b8390e5a6020d72c18378848544f1f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24300)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156131130

   
   ## CI report:
   
   * 84698763395160c09cac2c1529615a900cbb4625 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296)
 
   * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24299)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11420:
URL: https://github.com/apache/hudi/pull/11420#issuecomment-2156117676

   
   ## CI report:
   
   * 7e43c7ad60b8390e5a6020d72c18378848544f1f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156117209

   
   ## CI report:
   
   * 84698763395160c09cac2c1529615a900cbb4625 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296)
 
   * 54ba973a7ee35d5daf2ccc802aac5ec699cdf4af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156115266

   
   ## CI report:
   
   * be777f818218f93cddbdf760cc1a93d581f7b9d5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24297)
 
   * 6ba3bdf0b1736220996fdf21da6a449ae0049b47 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24298)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7846:
-
Labels: pull-request-available  (was: )

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> The following error is thrown when doing maven parallel build with `mvn -T 1C 
> ...`
> {code:java}
> [WARNING] Enable debug to see precisely which goals are not marked as 
> thread-safe.
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but this         *
> [WARNING] * project contains the following plugin(s) that have goals not  *
> [WARNING] * marked as thread-safe to support parallel execution.          *
> [WARNING] * While this /may/ work fine, please look for plugin updates    *
> [WARNING] * and/or request plugins be made thread-safe.                   *
> [WARNING] * If reporting an issue, report it against the plugin in        *
> [WARNING] * question, not against Apache Maven.                           *
> [WARNING] *
> [WARNING] The following plugins are not marked as thread-safe in 
> hudi-hadoop-mr:
> [WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7846] Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build [hudi]

2024-06-08 Thread via GitHub


yihua opened a new pull request, #11420:
URL: https://github.com/apache/hudi/pull/11420

   ### Change Logs
   
   The following error is thrown when doing maven parallel build with `mvn -T 
1C ...`.  This PR bumps `apache-rat-plugin` to 0.16.1 to eliminate thread-safe 
warning in maven parallel build.
   ```
   [WARNING] Enable debug to see precisely which goals are not marked as 
thread-safe.
   [WARNING] *
   [WARNING] * Your build is requesting parallel execution, but this         *
   [WARNING] * project contains the following plugin(s) that have goals not  *
   [WARNING] * marked as thread-safe to support parallel execution.          *
   [WARNING] * While this /may/ work fine, please look for plugin updates    *
   [WARNING] * and/or request plugins be made thread-safe.                   *
   [WARNING] * If reporting an issue, report it against the plugin in        *
   [WARNING] * question, not against Apache Maven.                           *
   [WARNING] *
   [WARNING] The following plugins are not marked as thread-safe in 
hudi-hadoop-mr:
   [WARNING]   org.apache.rat:apache-rat-plugin:0.13 
   ```
   ### Impact
   
   Eliminates build warning.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7846:

Fix Version/s: 0.16.0
   1.0.0

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-08 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7846:
---

 Summary: Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe 
warning in maven parallel build
 Key: HUDI-7846
 URL: https://issues.apache.org/jira/browse/HUDI-7846
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7846:
---

Assignee: Ethan Guo

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7846) Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven parallel build

2024-06-08 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7846:

Description: 
The following error is thrown when doing maven parallel build with `mvn -T 1C 
...`
{code:java}
[WARNING] Enable debug to see precisely which goals are not marked as 
thread-safe.
[WARNING] *
[WARNING] * Your build is requesting parallel execution, but this         *
[WARNING] * project contains the following plugin(s) that have goals not  *
[WARNING] * marked as thread-safe to support parallel execution.          *
[WARNING] * While this /may/ work fine, please look for plugin updates    *
[WARNING] * and/or request plugins be made thread-safe.                   *
[WARNING] * If reporting an issue, report it against the plugin in        *
[WARNING] * question, not against Apache Maven.                           *
[WARNING] *
[WARNING] The following plugins are not marked as thread-safe in hudi-hadoop-mr:
[WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}

> Bump apache-rat-plugin to 0.16.1 to eliminate thread-safe warning in maven 
> parallel build
> -
>
> Key: HUDI-7846
> URL: https://issues.apache.org/jira/browse/HUDI-7846
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> The following error is thrown when doing maven parallel build with `mvn -T 1C 
> ...`
> {code:java}
> [WARNING] Enable debug to see precisely which goals are not marked as 
> thread-safe.
> [WARNING] *
> [WARNING] * Your build is requesting parallel execution, but this         *
> [WARNING] * project contains the following plugin(s) that have goals not  *
> [WARNING] * marked as thread-safe to support parallel execution.          *
> [WARNING] * While this /may/ work fine, please look for plugin updates    *
> [WARNING] * and/or request plugins be made thread-safe.                   *
> [WARNING] * If reporting an issue, report it against the plugin in        *
> [WARNING] * question, not against Apache Maven.                           *
> [WARNING] *
> [WARNING] The following plugins are not marked as thread-safe in 
> hudi-hadoop-mr:
> [WARNING]   org.apache.rat:apache-rat-plugin:0.13 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156112815

   
   ## CI report:
   
   * be777f818218f93cddbdf760cc1a93d581f7b9d5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24297)
 
   * 6ba3bdf0b1736220996fdf21da6a449ae0049b47 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156099204

   
   ## CI report:
   
   * be777f818218f93cddbdf760cc1a93d581f7b9d5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24297)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7845] Call show_fsview_latest procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11418:
URL: https://github.com/apache/hudi/pull/11418#issuecomment-2156097039

   
   ## CI report:
   
   * be777f818218f93cddbdf760cc1a93d581f7b9d5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]

2024-06-08 Thread via GitHub


beyond1920 opened a new issue, #11419:
URL: https://github.com/apache/hudi/issues/11419

   Dear community,
   Our user complained that after their daily run job which written to a Hudi 
cow table finished, the downstream reading jobs find many duplicate records 
today. The daily run job has been already online for a long time, and this is 
the first time of such wrong result.
   He gives a detailed deduplicated record as example to help debug. The record 
appeared in 3 base files which belongs to different file groups.
   https://github.com/apache/hudi/assets/1525333/60b95dc4-91d6-4b40-8bca-c877a4407ae0;>
   I find the today's writer job, the spark application finished successfully.
In the driver log, I find those two files marked as invalid files which to 
delete, only one file is valid files.
   https://github.com/apache/hudi/assets/1525333/8e19e170-e38f-4725-82a5-84ed55750db9;>
   And in the clean stage task log, those two files are also marked to be 
deleted and there is no exception in the task either.
   https://github.com/apache/hudi/assets/1525333/1a819bd0-2dbe-4236-a0ed-e5f4576cfa38;>
   Those two files already existed on the hdfs before the clean stage began, 
but they still existed after the clean stage.
   
   Finally, found the root cause is some corner case happened in hdfs. And 
`fs.delete` does not throw any exception, only return `false` if the hdfs does 
not delete the file successfully.
   https://github.com/apache/hudi/assets/1525333/4a1f46d8-0b6b-4089-bed1-7d6a2e72ac28;>
   And I check the `fs.delete` api, the behavior is reasonable.
   https://github.com/apache/hudi/assets/1525333/20b7e237-18d4-480a-aedc-6c5a57b24062;>
   
   I think we should check the return value of`fs.delete` in 
`HoodieTable#deleteInvalidFilesByPartitions` to avoid wrong results. Besides, 
it's necessary to check all places which called  `fs.delete`.
   Any suggestion?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7845) Call show_fsview_latest Procedure support path_regex

2024-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7845:
-
Labels: pull-request-available  (was: )

> Call show_fsview_latest Procedure support path_regex
> 
>
> Key: HUDI-7845
> URL: https://issues.apache.org/jira/browse/HUDI-7845
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7845] Call show_fsview_latest Procedure support path_regex [hudi]

2024-06-08 Thread via GitHub


Zouxxyy opened a new pull request, #11418:
URL: https://github.com/apache/hudi/pull/11418

   ### Change Logs
   
   Currently `show_fsview_all` support set `path_regex`, e.g.
   ```sql
   call show_fsview_all(table => '$tableName', path_regex => 'day=d1/hh=h2')
   call show_fsview_all(table => '$tableName', path_regex => 'day=d1/*/')
   ```
   
   while `show_fsview_latest` only support set `partition_path`, e.g.
   ```sql
   call show_fsview_latest(table => '$tableName', partition_path => 
'day=d1/hh=h2')
   ```
   
   This PR make `show_fsview_latest` support `path_regex` too, In fact, 
`partition_path` can be completely replaced by  `path_regex`, but for 
compatibility with old versions, we keep it
   
   Other fixs:
   
   - change `partition_path` from required to optional
   - fix `call show_fsview_latest` when no commits in timeline
   
   ### Impact
   
   `show_fsview_latest` support  `path_regex`
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   will update the doc of `call show_fsview_latest`
   
   ### Contributor's checklist show_fsview_latest
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7845) Call show_fsview_latest Procedure support path_regex

2024-06-08 Thread Xinyu Zou (Jira)
Xinyu Zou created HUDI-7845:
---

 Summary: Call show_fsview_latest Procedure support path_regex
 Key: HUDI-7845
 URL: https://issues.apache.org/jira/browse/HUDI-7845
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Xinyu Zou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156058771

   
   ## CI report:
   
   * 84698763395160c09cac2c1529615a900cbb4625 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Multi Writer DeltaStreamer (W1 and W2) Writing into Partition IN and US One of them failing [hudi]

2024-06-08 Thread via GitHub


soumilshah1995 opened a new issue, #11417:
URL: https://github.com/apache/hudi/issues/11417

   When running the Hoodie DeltaStreamer with two writers simultaneously, one 
for the US partition and the other for the IN partition, one of the writers 
fails with a NullPointerException. This issue occurs during the offset fetching 
process from Kafka.
   
   
![image](https://github.com/apache/hudi/assets/39345855/9997c228-ff87-4650-9e9b-55e8bf215ce0)
   
   # Steps 
   ### spin up stack 
   ```
   version: "3"
   
   services:
 trino-coordinator:
   image: 'trinodb/trino:400'
   hostname: trino-coordinator
   ports:
 - '8080:8080'
   volumes:
 - ./trino/etc:/etc/trino
   
 metastore_db:
   image: postgres:11
   hostname: metastore_db
   ports:
 - 5432:5432
   environment:
 POSTGRES_USER: hive
 POSTGRES_PASSWORD: hive
 POSTGRES_DB: metastore
   
 hive-metastore:
   hostname: hive-metastore
   image: 'starburstdata/hive:3.1.2-e.18'
   ports:
 - '9083:9083' # Metastore Thrift
   environment:
 HIVE_METASTORE_DRIVER: org.postgresql.Driver
 HIVE_METASTORE_JDBC_URL: jdbc:postgresql://metastore_db:5432/metastore
 HIVE_METASTORE_USER: hive
 HIVE_METASTORE_PASSWORD: hive
 HIVE_METASTORE_WAREHOUSE_DIR: s3://datalake/
 S3_ENDPOINT: http://minio:9000
 S3_ACCESS_KEY: admin
 S3_SECRET_KEY: password
 S3_PATH_STYLE_ACCESS: "true"
 REGION: ""
 GOOGLE_CLOUD_KEY_FILE_PATH: ""
 AZURE_ADL_CLIENT_ID: ""
 AZURE_ADL_CREDENTIAL: ""
 AZURE_ADL_REFRESH_URL: ""
 AZURE_ABFS_STORAGE_ACCOUNT: ""
 AZURE_ABFS_ACCESS_KEY: ""
 AZURE_WASB_STORAGE_ACCOUNT: ""
 AZURE_ABFS_OAUTH: ""
 AZURE_ABFS_OAUTH_TOKEN_PROVIDER: ""
 AZURE_ABFS_OAUTH_CLIENT_ID: ""
 AZURE_ABFS_OAUTH_SECRET: ""
 AZURE_ABFS_OAUTH_ENDPOINT: ""
 AZURE_WASB_ACCESS_KEY: ""
 HIVE_METASTORE_USERS_IN_ADMIN_ROLE: "admin"
   depends_on:
 - metastore_db
   healthcheck:
 test: bash -c "exec 6<> /dev/tcp/localhost/9083"
   
   
 fast-data-dev:
   image: dougdonohoe/fast-data-dev
   ports:
 - "3181:3181"
 - "3040:3040"
 - "7081:7081"
 - "7082:7082"
 - "7083:7083"
 - "7092:7092"
 - "8081:8081"
   environment:
 - ZK_PORT=3181
 - WEB_PORT=3040
 - REGISTRY_PORT=8081
 - REST_PORT=7082
 - CONNECT_PORT=7083
 - BROKER_PORT=7092
 - ADV_HOST=127.0.0.1
   
   volumes:
 hive-metastore-postgresql:
   
   networks:
 default:
   name: hudi
   ```
   
   # publish some data 
   ```
   from faker import Faker
   from time import sleep
   import random
   import uuid
   from datetime import datetime
   from kafka_schema_registry import prepare_producer
   
   # Configuration
   KAFKA_BOOTSTRAP_SERVERS = ['localhost:7092']
   SCHEMA_REGISTRY_URL = 'http://localhost:8081'
   NUM_MESSAGES = 20
   SLEEP_INTERVAL = 1
   TOPIC_NAME = 'orders'
   NUM_PARTITIONS = 1
   REPLICATION_FACTOR = 1
   
   # Avro Schema
   SAMPLE_SCHEMA = {
   "type": "record",
   "name": "Order",
   "fields": [
   {"name": "order_id", "type": "string"},
   {"name": "name", "type": "string"},
   {"name": "order_value", "type": "string"},
   {"name": "priority", "type": "string"},
   {"name": "order_date", "type": "string"},
   {"name": "customer_id", "type": "string"},
   {"name": "ts", "type": "string"},
   {"name": "country", "type": "string"}
   ]
   }
   
   # Kafka Producer
   producer = prepare_producer(
   KAFKA_BOOTSTRAP_SERVERS,
   SCHEMA_REGISTRY_URL,
   TOPIC_NAME,
   NUM_PARTITIONS,
   REPLICATION_FACTOR,
   value_schema=SAMPLE_SCHEMA
   )
   
   # Faker instance
   faker = Faker()
   
   
   class DataGenerator:
   @staticmethod
   def get_orders_data():
   """
   Generate and return a dictionary with mock order data.
   """
   country = random.choice(['US', 'IN'])  # Define country variable
   
   return {
   "order_id": str(uuid.uuid4()),
   "name": faker.text(max_nb_chars=20),
   "order_value": str(random.randint(10, 1000)),
   "priority": random.choice(["LOW", "MEDIUM", "HIGH"]),
   "order_date": faker.date_between(start_date='-30d', 
end_date='today').strftime('%Y-%m-%d'),
   "customer_id": str(uuid.uuid4()),
   "ts": str(datetime.now().timestamp()),
   "country": country
   }
   
   @staticmethod
   def produce_avro_message(producer, data):
   """
   Produce an Avro message and send it to the appropriate Kafka topic 
based on the country.
   """
   topic = 'orders_in' 

Re: [I] [SUPPORT] NoClassDefFoundError for org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile [hudi]

2024-06-08 Thread via GitHub


michael1991 commented on issue #8507:
URL: https://github.com/apache/hudi/issues/8507#issuecomment-2156047802

   I used Hudi0.14.1 on Dataproc2.1(Spark3.3.2 Hadoop3.3.6) to upsert COW table 
with PartialUpdateAvroPayload, got same warning messages sometimes, but job 
would succeed finally. Not sure if missing some jars or not, how to avoid this 
warning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156041218

   
   ## CI report:
   
   * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24291)
 
   * 84698763395160c09cac2c1529615a900cbb4625 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24296)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6798] Add record merging mode and implement event-time ordering in the new file group reader [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #9894:
URL: https://github.com/apache/hudi/pull/9894#issuecomment-2156038868

   
   ## CI report:
   
   * a70cfc6db41a781bb3b6c9c8a9138892f7a12687 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24291)
 
   * 84698763395160c09cac2c1529615a900cbb4625 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7844) Fix HoodieSparkSqlTestBase to throw error upon test failure

2024-06-08 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7844.
-
Resolution: Fixed

> Fix HoodieSparkSqlTestBase to throw error upon test failure
> ---
>
> Key: HUDI-7844
> URL: https://issues.apache.org/jira/browse/HUDI-7844
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: Screenshot 2024-06-07 at 22.27.21.png
>
>
> This PR ([https://github.com/apache/hudi/pull/11162]) introduces the 
> following changes that make `HoodieSparkSqlTestBase` to swallow test failures.
>  
> !Screenshot 2024-06-07 at 22.27.21.png|width=873,height=397!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (a33b2a5e03f -> a31fda59555)

2024-06-08 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from a33b2a5e03f [HUDI-7834] Create placeholder table versions and 
introduce new hoodie table property to track initial table version (#11406)
 add a31fda59555 [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon 
test failure (#11416)

No new revisions were added by this update.

Summary of changes:
 .../sql/hudi/command/index}/TestFunctionalIndex.scala| 14 +++---
 .../spark/sql/hudi/common/HoodieSparkSqlTestBase.scala   | 16 
 2 files changed, 11 insertions(+), 19 deletions(-)
 rename 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/{hudi/functional => 
spark/sql/hudi/command/index}/TestFunctionalIndex.scala (98%)



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


codope merged PR #11416:
URL: https://github.com/apache/hudi/pull/11416


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11416:
URL: https://github.com/apache/hudi/pull/11416#issuecomment-2156018029

   
   ## CI report:
   
   * 8235d366753038068a1145757fe539e8db4298e3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24294)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11416:
URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155996582

   
   ## CI report:
   
   * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292)
 
   * 8235d366753038068a1145757fe539e8db4298e3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24294)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7384] Secondary index support [hudi]

2024-06-08 Thread via GitHub


skyshineb commented on PR #10625:
URL: https://github.com/apache/hudi/pull/10625#issuecomment-2155988087

   Hi @bhat-vinay! Is this design of secondary index through MDT is the only 
one to be implemented or there plans to make some other Index Types? As I 
remember there was RFC for Lucene Index and maybe some other types in future?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11416:
URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155983361

   
   ## CI report:
   
   * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292)
 
   * 8235d366753038068a1145757fe539e8db4298e3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


codope commented on code in PR #11416:
URL: https://github.com/apache/hudi/pull/11416#discussion_r1632020201


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/command/index/TestFunctionalIndex.scala:
##
@@ -35,9 +35,9 @@ import org.apache.spark.sql.catalyst.parser.ParserInterface
 import org.apache.spark.sql.hudi.command.{CreateIndexCommand, 
ShowIndexesCommand}
 import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
 import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
-import org.junit.jupiter.api.Tag
+import org.scalatest.Ignore
 
-@Tag("functional")
+@Ignore

Review Comment:
   note: moved this class out of functional package and disabled temporarily. 
More details in HUDI-7835. TL;DR need to check why this test is attempted twice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-7835) Spark context not stopped properly if the test had some exception that was ignored

2024-06-08 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853358#comment-17853358
 ] 

Sagar Sumit commented on HUDI-7835:
---

I observed one more thing: Somehow the TestFunctionalIndex is attempted again 
after TestSecondaryIndex. To verify, I disabled TestFunctionalIndex and 
confirmed from the logs, it is attempted twice (though it does not run because 
it is diabled). We need to root cause why TestFunctionalIndex is being 
attempted twice. Note the test always succeeds from IDE.

To reproduce run the following maven command from hudi repo root, and direct 
the logs to some file where you can grep for any TestFunctionalIndex name, e.g. 
`Test Create Functional Index`
{code:java}
mvn test -Pwarn-log -Dscala-2.12 -Dspark3.2 -Dflink1.18 -Dcheckstyle.skip=true 
-Drat.skip=true -Djacoco.skip=true -ntp -B -V -Pwarn-log 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.shade=warn 
-Dorg.slf4j.simpleLogger.log.org.apache.maven.plugins.dependency=warn 
-Punit-tests -Dtest=skipJavaTests -DfailIfNoTests=false 
-DwildcardSuites=org.apache.hudi,org.apache.spark.hudi,org.apache.spark.sql.avro,org.apache.spark.sql.execution,org.apache.spark.sql.hudi.analysis,org.apache.spark.sql.hudi.command,org.apache.spark.sql.hudi.common,org.apache.spark.sql.hudi.dml
 -pl 
hudi-spark-datasource,hudi-spark-datasource/hudi-spark,hudi-spark-datasource/hudi-spark3.2.x,hudi-spark-datasource/hudi-spark3.2plus-common,hudi-spark-datasource/hudi-spark3-common,hudi-spark-datasource/hudi-spark-common
 {code}

> Spark context not stopped properly if the test had some exception that was 
> ignored
> --
>
> Key: HUDI-7835
> URL: https://issues.apache.org/jira/browse/HUDI-7835
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Priority: Major
> Fix For: 1.0.0
>
>
> Found two tests that don't fail but throw an exception while running. The 
> test succeeds but the spark context is not stopped in time due to exception. 
> This causes issue for other tests. For example, `test("bucket index query")` 
> in `TestDataSkippingQuery` succeeds but when we check the 
> [logs|https://github.com/apache/hudi/actions/runs/9391927778/job/25865161535#step:6:5799]
>  of the test we will find an error as below
> {code:java}
> 74954 [ScalaTest-run-running-TestDataSkippingQuery] ERROR 
> org.apache.hudi.HoodieFileIndex [] - Failed to lookup candidate files in File 
> Index
> java.lang.IllegalArgumentException: Property 
> _hoodie.record.key.gen.partition.id not found
>     at 
> org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:69)
>  ~[classes/:?]
>     at 
> org.apache.hudi.common.config.TypedProperties.getInteger(TypedProperties.java:94)
>  ~[classes/:?]
>     at 
> org.apache.hudi.keygen.AutoRecordGenWrapperKeyGenerator.generateSequenceId(AutoRecordGenWrapperKeyGenerator.java:115)
>  ~[classes/:?]
>     at 
> org.apache.hudi.keygen.AutoRecordGenWrapperKeyGenerator.getRecordKey(AutoRecordGenWrapperKeyGenerator.java:67)
>  ~[classes/:?]
>     at 
> org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:70) 
> ~[classes/:?]
>     at 
> org.apache.hudi.BucketIndexSupport.getBucketNumber$1(BucketIndexSupport.scala:154)
>  ~[classes/:?]
>     at 
> org.apache.hudi.BucketIndexSupport.getBucketSetFromValue$1(BucketIndexSupport.scala:168)
>  ~[classes/:?]
>     at 
> org.apache.hudi.BucketIndexSupport.getBucketsBySingleHashFields(BucketIndexSupport.scala:174)
>  ~[classes/:?]
>     at 
> org.apache.hudi.BucketIndexSupport.filterQueriesWithBucketHashField(BucketIndexSupport.scala:107)
>  ~[classes/:?]
>     at 
> org.apache.hudi.BucketIndexSupport.computeCandidateFileNames(BucketIndexSupport.scala:78)
>  ~[classes/:?]
>     at 
> org.apache.hudi.HoodieFileIndex.$anonfun$lookupCandidateFilesInMetadataTable$3(HoodieFileIndex.scala:354)
>  ~[classes/:?]
>     at 
> org.apache.hudi.HoodieFileIndex.$anonfun$lookupCandidateFilesInMetadataTable$3$adapted(HoodieFileIndex.scala:351)
>  ~[classes/:?]
>     at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  ~[scala-library-2.12.10.jar:?]
>     at scala.collection.immutable.List.foreach(List.scala:392) 
> ~[scala-library-2.12.10.jar:?]
>     at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  ~[scala-library-2.12.10.jar:?]
>     at 
> org.apache.hudi.HoodieFileIndex.$anonfun$lookupCandidateFilesInMetadataTable$1(HoodieFileIndex.scala:351)
>  ~[classes/:?]
>     at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.10.jar:?]
>     at 
> org.apache.hudi.HoodieFileIndex.lookupCandidateFilesInMetadataTable(HoodieFileIndex.scala:338)
>  ~[classes/:?]
>     at 
> 

Re: [I] RLI Spark Hudi Error occurs when executing map [hudi]

2024-06-08 Thread via GitHub


michael1991 commented on issue #10609:
URL: https://github.com/apache/hudi/issues/10609#issuecomment-2155893143

   Hi @ad1happy2go , I can reproduce this error by following env and scala 
code, hope it could be helpful.
   Environment: Dataproc 2.1(Spark 3.3.2) with Hudi 0.14.x / Dataproc 2.2(Spark 
3.5.0) with Hudi 0.15.x
   Scala code in `spark-shell`:
   ```scala
   import org.apache.spark.sql._
   import spark.implicits._
   val path = "gs://bucket/test/hudi-test"
   val data = Seq((1, "Alice", 29),(2, "Bob", 35),(3, "Catherine", 
23)).toDF("id", "name", "age")
   val config = Map(
   "hoodie.table.name" -> "hudi-test",
   "hoodie.metadata.enable" -> "true",
   "hoodie.datasource.write.table.type" -> "COPY_ON_WRITE",
   "hoodie.datasource.write.payload.class" -> 
"org.apache.hudi.common.model.PartialUpdateAvroPayload",
   "hoodie.datasource.write.recordkey.field" -> "id",
   "hoodie.datasource.write.reconcile.schema" -> "true",
   "hoodie.datasource.write.new.columns.nullable" -> "true",
   "hoodie.combine.before.insert" -> "false",
   "hoodie.combine.before.upsert" -> "false",
   "hoodie.cleaner.commits.retained" -> "6",
   "hoodie.parquet.compression.codec" -> "snappy"
   )
   
data.write.format("hudi").options(config).option("hoodie.datasource.write.operation",
 "insert").mode(SaveMode.Append).save(path)
   spark.read.format("hudi").option("hoodie.metadata.enable", 
"true").load(path).show(10,false)
   
   val upsertData = Seq((1, 30)).toDF("id","age")
   
upsertData.write.format("hudi").options(config).option("hoodie.datasource.write.operation",
 "upsert").mode(SaveMode.Append).save(path)
   ```
   
   Try if possible and let me know if you can reproduce this error. Thanks in 
advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11416:
URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155855408

   
   ## CI report:
   
   * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11415:
URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155855404

   
   ## CI report:
   
   * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN
   * 795b0473b4abca7626de895e81f6750863fa67d3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24293)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11415:
URL: https://github.com/apache/hudi/pull/11415#issuecomment-2155853054

   
   ## CI report:
   
   * 644a1d216307d8660ff7654c5273f2356974bcb8 UNKNOWN
   * bfea0d3a2dd9e6ba2d96c1d7d20a07e085883da6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24278)
 
   * 795b0473b4abca7626de895e81f6750863fa67d3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7269] Fallback to key based merge if positions are missing from log block [hudi]

2024-06-08 Thread via GitHub


codope commented on code in PR #11415:
URL: https://github.com/apache/hudi/pull/11415#discussion_r1631913348


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPositionBasedMergingFallback.scala:
##
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hadoop.fs.FileSystem
+import org.apache.hudi.DataSourceWriteOptions
+import org.apache.hudi.DataSourceWriteOptions.{OPERATION, PRECOMBINE_FIELD, 
RECORDKEY_FIELD, TABLE_TYPE}
+import org.apache.hudi.HoodieConversionUtils.toJavaOption
+import org.apache.hudi.common.config.{HoodieReaderConfig, HoodieStorageConfig}
+import org.apache.hudi.common.model.HoodieRecordMerger
+import org.apache.hudi.common.util
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.HoodieSparkClientTestBase
+import org.apache.hudi.util.JFunction
+import org.apache.spark.sql.SaveMode.{Append, Overwrite}
+import org.apache.spark.sql.SparkSessionExtensions
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.apache.spark.sql.internal.SQLConf
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.{Arguments, MethodSource}
+
+import java.util.function.Consumer
+
+class TestPositionBasedMergingFallback extends HoodieSparkClientTestBase {
+  override def getSparkSessionExtensionsInjector: 
util.Option[Consumer[SparkSessionExtensions]] =
+toJavaOption(
+  Some(
+JFunction.toJavaConsumer((receiver: SparkSessionExtensions) => new 
HoodieSparkSessionExtension().apply(receiver)))
+)
+
+  @BeforeEach override def setUp(): Unit = {
+initPath()
+initSparkContexts()
+sparkSession.conf.set(SQLConf.PARQUET_RECORD_FILTER_ENABLED.key, "true")
+initTestDataGenerator()
+initHoodieStorage()
+  }
+
+  @AfterEach override def tearDown(): Unit = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+FileSystem.closeAll()
+System.gc()

Review Comment:
   let's avoid System.gc()



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedFileGroupRecordBuffer.java:
##
@@ -174,20 +189,97 @@ public boolean containsLogRecord(String recordKey) {
   }
 
   @Override
-  protected boolean doHasNext() throws IOException {
-ValidationUtils.checkState(baseFileIterator != null, "Base file iterator 
has not been set yet");
-
-// Handle merging.
-while (baseFileIterator.hasNext()) {
-  T baseRecord = baseFileIterator.next();
-  nextRecordPosition = readerContext.extractRecordPosition(baseRecord, 
readerSchema, ROW_INDEX_TEMPORARY_COLUMN_NAME, nextRecordPosition);
-  Pair, Map> logRecordInfo = 
records.remove(nextRecordPosition++);
-  if (hasNextBaseRecord(baseRecord, logRecordInfo)) {
-return true;
+  protected boolean hasNextBaseRecord(T baseRecord) throws IOException {
+if (!readerContext.getShouldMergeUseRecordPosition()) {
+  return doHasNextFallbackBaseRecord(baseRecord);
+}
+
+nextRecordPosition = readerContext.extractRecordPosition(baseRecord, 
readerSchema,
+ROW_INDEX_COLUMN_NAME, nextRecordPosition);
+Pair, Map> logRecordInfo = 
records.remove(nextRecordPosition++);
+
+Map metadata = readerContext.generateMetadataForRecord(
+baseRecord, readerSchema);
+
+Option resultRecord = logRecordInfo != null
+? merge(Option.of(baseRecord), metadata, logRecordInfo.getLeft(), 
logRecordInfo.getRight())
+: merge(Option.empty(), Collections.emptyMap(), Option.of(baseRecord), 
metadata);
+if (resultRecord.isPresent()) {
+  nextRecord = readerContext.seal(resultRecord.get());
+  return true;
+}
+return false;
+  }
+
+  private boolean doHasNextFallbackBaseRecord(T baseRecord) throws IOException 
{
+if (needToDoHybridStrategy) {

Review Comment:
   let's test this logic as well.



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodiePositionBasedFileGroupRecordBuffer.java:
##
@@ -123,46 

Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11416:
URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155836211

   
   ## CI report:
   
   * 70f8a79214fcd176dd275faf56bb789d4db60f7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24292)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7844] Fix HoodieSparkSqlTestBase to throw error upon test failure [hudi]

2024-06-08 Thread via GitHub


hudi-bot commented on PR #11416:
URL: https://github.com/apache/hudi/pull/11416#issuecomment-2155833908

   
   ## CI report:
   
   * 70f8a79214fcd176dd275faf56bb789d4db60f7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org