Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105577307

   
   ## CI report:
   
   * 81806555cd6c82297f2ff34b81466e653b483a61 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23850)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11070:
URL: https://github.com/apache/hudi/pull/11070#issuecomment-2105577215

   
   ## CI report:
   
   * 4e15df959494651d74e92f6a998d3310bfc91247 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23790)
 
   * 86265f6be7c6fbacf53ed76a7b60b2b64d484409 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23851)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11070:
URL: https://github.com/apache/hudi/pull/11070#issuecomment-2105574435

   
   ## CI report:
   
   * 4e15df959494651d74e92f6a998d3310bfc91247 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23790)
 
   * 86265f6be7c6fbacf53ed76a7b60b2b64d484409 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105571932

   
   ## CI report:
   
   * 3cba812f7db9eabdec2472351f74c91e14ee3767 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23848)
 
   * 81806555cd6c82297f2ff34b81466e653b483a61 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23850)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-2105571920

   
   ## CI report:
   
   * 975a7d92617080bb4c32e832796e8d13cd8d9857 UNKNOWN
   * 76ee9ca6a701a2fcaa70fce9aae46864486c8c45 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23849)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11070:
URL: https://github.com/apache/hudi/pull/11070#discussion_r1597357603


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/ProtoKafkaSource.java:
##
@@ -63,11 +67,18 @@ public ProtoKafkaSource(TypedProperties props, 
JavaSparkContext sparkContext, Sp
   public ProtoKafkaSource(TypedProperties properties, JavaSparkContext 
sparkContext, SparkSession sparkSession, HoodieIngestionMetrics metrics, 
StreamContext streamContext) {
 super(properties, sparkContext, sparkSession, SourceType.PROTO, metrics,
 new 
DefaultStreamContext(UtilHelpers.getSchemaProviderForKafkaSource(streamContext.getSchemaProvider(),
 properties, sparkContext), streamContext.getSourceProfileSupplier()));
-checkRequiredConfigProperties(props, Collections.singletonList(
-ProtoClassBasedSchemaProviderConfig.PROTO_SCHEMA_CLASS_NAME));
-props.put(NATIVE_KAFKA_KEY_DESERIALIZER_PROP, StringDeserializer.class);
-props.put(NATIVE_KAFKA_VALUE_DESERIALIZER_PROP, 
ByteArrayDeserializer.class);
-className = getStringWithAltKeys(props, 
ProtoClassBasedSchemaProviderConfig.PROTO_SCHEMA_CLASS_NAME);
+this.deserializerName = ConfigUtils.getStringWithAltKeys(props, 
KafkaSourceConfig.KAFKA_PROTO_VALUE_DESERIALIZER_CLASS, true);
+if (!deserializerName.equals(ByteArrayDeserializer.class.getName()) && 
!deserializerName.equals(KafkaProtobufDeserializer.class.getName())) {
+  throw new HoodieReadFromSourceException("Only ByteArrayDeserializer and 
KafkaProtobufDeserializer are supported for ProtoKafkaSource");
+}
+if (deserializerName.equals(ByteArrayDeserializer.class.getName())) {
+  checkRequiredConfigProperties(props, 
Collections.singletonList(ProtoClassBasedSchemaProviderConfig.PROTO_SCHEMA_CLASS_NAME));
+  className = getStringWithAltKeys(props, 
ProtoClassBasedSchemaProviderConfig.PROTO_SCHEMA_CLASS_NAME);
+} else {
+  className = null;

Review Comment:
   Avoid using `null`



##
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestProtoKafkaSource.java:
##
@@ -64,21 +69,24 @@
 import java.util.stream.IntStream;
 
 import static org.apache.hudi.common.util.StringUtils.getUTF8Bytes;
+import static 
org.apache.hudi.utilities.config.KafkaSourceConfig.KAFKA_PROTO_VALUE_DESERIALIZER_CLASS;
 import static org.junit.jupiter.api.Assertions.assertEquals;
 
 /**
  * Tests against {@link ProtoKafkaSource}.
  */
 public class TestProtoKafkaSource extends BaseTestKafkaSource {
+  private static final JsonFormat.Printer PRINTER = 
JsonFormat.printer().omittingInsignificantWhitespace();
   private static final Random RANDOM = new Random();
+  private static final String MOCK_REGISTRY_URL = "mock://127.0.0.1:8081";
 
   protected TypedProperties createPropsForKafkaSource(String topic, Long 
maxEventsToReadFromKafkaSource, String resetStrategy) {
 TypedProperties props = new TypedProperties();
-props.setProperty("hoodie.streamer.source.kafka.topic", topic);
+props.setProperty("hoodie.deltastreamer.source.kafka.topic", topic);

Review Comment:
   nit: should not be changed.



##
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestProtoKafkaSource.java:
##
@@ -158,7 +187,7 @@ private static List createSampleMessages(int count) 
{
   .setPrimitiveFixedSignedLong(RANDOM.nextLong())
   .setPrimitiveBoolean(RANDOM.nextBoolean())
   .setPrimitiveString(UUID.randomUUID().toString())
-  
.setPrimitiveBytes(ByteString.copyFrom(getUTF8Bytes(UUID.randomUUID().toString(;
+  
.setPrimitiveBytes(ByteString.copyFrom(UUID.randomUUID().toString().getBytes()));

Review Comment:
   similar here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11070:
URL: https://github.com/apache/hudi/pull/11070#discussion_r1597356380


##
packaging/hudi-utilities-bundle/pom.xml:
##
@@ -133,6 +133,7 @@
   io.confluent:common-config
   io.confluent:common-utils
   io.confluent:kafka-schema-registry-client
+  io.confluent:kafka-protobuf-serializer

Review Comment:
   I think if we're not using it in the integ tests, we should not add it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105545276

   
   ## CI report:
   
   * 5739605de6bb73d0e3982a335e243fdb356a6031 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23844)
 
   * 3cba812f7db9eabdec2472351f74c91e14ee3767 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23848)
 
   * 81806555cd6c82297f2ff34b81466e653b483a61 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23850)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-2105545250

   
   ## CI report:
   
   * 975a7d92617080bb4c32e832796e8d13cd8d9857 UNKNOWN
   * 51a199199691df091162a3d8cb71f9ee448b079a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23829)
 
   * 76ee9ca6a701a2fcaa70fce9aae46864486c8c45 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23849)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7745] Move Hadoop-dependent util methods to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11193:
URL: https://github.com/apache/hudi/pull/11193#issuecomment-2105540676

   
   ## CI report:
   
   * bbbc714d35283e5e743883ae945cfec50f99b226 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23846)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105540664

   
   ## CI report:
   
   * 5739605de6bb73d0e3982a335e243fdb356a6031 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23844)
 
   * 3cba812f7db9eabdec2472351f74c91e14ee3767 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23848)
 
   * 81806555cd6c82297f2ff34b81466e653b483a61 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Improve StoragePath usages [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-2105540642

   
   ## CI report:
   
   * 975a7d92617080bb4c32e832796e8d13cd8d9857 UNKNOWN
   * 51a199199691df091162a3d8cb71f9ee448b079a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23829)
 
   * 76ee9ca6a701a2fcaa70fce9aae46864486c8c45 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2105540571

   
   ## CI report:
   
   * a5daf71906886e6d8da62abdf2decae1e20b09ef Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23845)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105518790

   
   ## CI report:
   
   * 5739605de6bb73d0e3982a335e243fdb356a6031 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23844)
 
   * 3cba812f7db9eabdec2472351f74c91e14ee3767 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23848)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


jonvex commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597345837


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##
@@ -853,7 +852,7 @@ object HoodieBaseRelation extends SparkAdapterSupport {
   val hoodieConfig = new HoodieConfig()
   hoodieConfig.setValue(USE_NATIVE_HFILE_READER,
 options.getOrElse(USE_NATIVE_HFILE_READER.key(), 
USE_NATIVE_HFILE_READER.defaultValue().toString))
-  val reader = 
HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO)
+  val reader = (new 
HoodieSparkIOFactory).getReaderFactory(HoodieRecordType.AVRO)

Review Comment:
   Yeah, that would achieve the same thing. Do you think that would be better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


jonvex commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597345901


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkConfUtils.scala:
##
@@ -48,4 +50,10 @@ object HoodieSparkConfUtils {
   .map(HollowCommitHandling.valueOf)
   
.getOrElse(HollowCommitHandling.valueOf(INCREMENTAL_READ_HANDLE_HOLLOW_COMMIT.defaultValue))
   }
+
+  def getSparkReaderConfig(): HoodieConfig = {

Review Comment:
   no. good catch. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105512736

   
   ## CI report:
   
   * 5739605de6bb73d0e3982a335e243fdb356a6031 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23844)
 
   * 3cba812f7db9eabdec2472351f74c91e14ee3767 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


jonvex commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597344058


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##
@@ -64,6 +65,9 @@ class DefaultSource extends RelationProvider
   // Enable "passPartitionByAsOptions" to support "write.partitionBy(...)"
   
spark.conf.set("spark.sql.legacy.sources.write.passPartitionByAsOptions", 
"true")
 }
+//always use spark io factory
+
spark.sparkContext.hadoopConfiguration.set(HoodieStorageConfig.HOODIE_IO_FACTORY_CLASS.key(),
+  classOf[HoodieSparkIOFactory].getName)

Review Comment:
   SparkSQLWriter also enters here, so I think maybe we need to add the config 
in deltastreamer as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions (#10872)

2024-05-10 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 49072d1e2e7 [HUDI-7508] Avoid collecting records in 
HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions 
(#10872)
49072d1e2e7 is described below

commit 49072d1e2e721f27623dba840ad6ea41a252fd15
Author: Vinish Reddy 
AuthorDate: Sat May 11 08:50:59 2024 +0530

[HUDI-7508] Avoid collecting records in 
HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions 
(#10872)

Co-authored-by: Y Ethan Guo 
---
 .../hudi/utilities/sources/JsonKafkaSource.java  | 18 --
 .../hudi/utilities/streamer/HoodieStreamerUtils.java | 20 
 2 files changed, 16 insertions(+), 22 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
index 71f0c4db3f1..a8f70e7c854 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
@@ -21,6 +21,8 @@ package org.apache.hudi.utilities.sources;
 import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+import org.apache.hudi.common.util.collection.CloseableMappingIterator;
 import org.apache.hudi.utilities.UtilHelpers;
 import org.apache.hudi.utilities.config.JsonKafkaPostProcessorConfig;
 import org.apache.hudi.utilities.exception.HoodieSourcePostProcessException;
@@ -43,8 +45,6 @@ import org.apache.spark.streaming.kafka010.LocationStrategies;
 import org.apache.spark.streaming.kafka010.OffsetRange;
 
 import java.io.IOException;
-import java.util.LinkedList;
-import java.util.List;
 
 import static org.apache.hudi.common.util.ConfigUtils.getStringWithAltKeys;
 import static 
org.apache.hudi.utilities.schema.KafkaOffsetPostProcessor.KAFKA_SOURCE_KEY_COLUMN;
@@ -80,28 +80,26 @@ public class JsonKafkaSource extends 
KafkaSource> {
 return postProcess(maybeAppendKafkaOffsets(kafkaRDD));
   }
 
-  protected  JavaRDD 
maybeAppendKafkaOffsets(JavaRDD> kafkaRDD) {
+  protected JavaRDD 
maybeAppendKafkaOffsets(JavaRDD> kafkaRDD) {
 if (this.shouldAddOffsets) {
   return kafkaRDD.mapPartitions(partitionIterator -> {
-List stringList = new LinkedList<>();
-ObjectMapper om = new ObjectMapper();
-partitionIterator.forEachRemaining(consumerRecord -> {
+ObjectMapper objectMapper = new ObjectMapper();
+return new 
CloseableMappingIterator<>(ClosableIterator.wrap(partitionIterator), 
consumerRecord -> {
   String recordValue = consumerRecord.value().toString();
   String recordKey = StringUtils.objToString(consumerRecord.key());
   try {
-ObjectNode jsonNode = (ObjectNode) om.readTree(recordValue);
+ObjectNode jsonNode = (ObjectNode) 
objectMapper.readTree(recordValue);
 jsonNode.put(KAFKA_SOURCE_OFFSET_COLUMN, consumerRecord.offset());
 jsonNode.put(KAFKA_SOURCE_PARTITION_COLUMN, 
consumerRecord.partition());
 jsonNode.put(KAFKA_SOURCE_TIMESTAMP_COLUMN, 
consumerRecord.timestamp());
 if (recordKey != null) {
   jsonNode.put(KAFKA_SOURCE_KEY_COLUMN, recordKey);
 }
-stringList.add(om.writeValueAsString(jsonNode));
+return objectMapper.writeValueAsString(jsonNode);
   } catch (Throwable e) {
-stringList.add(recordValue);
+return recordValue;
   }
 });
-return stringList.iterator();
   });
 }
 return kafkaRDD.map(consumerRecord -> (String) consumerRecord.value());
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java
index 2ecf0b02fb6..3be64fefbb3 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamerUtils.java
@@ -31,6 +31,7 @@ import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieSparkRecord;
 import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.common.util.ConfigUtils;
 import org.apache.hudi.common.util.Either;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.collection.ClosableIterator;
@@ -55,10 +56,8 @@ import org.apache.spark.sql.avro.HoodieAvroDeserializer;
 import 

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub


yihua merged PR #10872:
URL: https://github.com/apache/hudi/pull/10872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


jonvex commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597341038


##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java:
##
@@ -43,39 +40,18 @@
 
 public class HoodieFileWriterFactory {
 
-  private static HoodieFileWriterFactory 
getWriterFactory(HoodieRecord.HoodieRecordType recordType) {

Review Comment:
   Not exactly sure what you want



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi could override users' configurations [hudi]

2024-05-10 Thread via GitHub


boneanxs commented on issue #11188:
URL: https://github.com/apache/hudi/issues/11188#issuecomment-2105500024

   > > I actually see hudi could set many spark relate configures in SparkConf, 
most of them are related to parquet reader/writer.
   > 
   > Are these options configurable?
   
   Yes, these configures could be set by users


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7745] Move Hadoop-dependent util methods to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11193:
URL: https://github.com/apache/hudi/pull/11193#issuecomment-2105497625

   
   ## CI report:
   
   * bbbc714d35283e5e743883ae945cfec50f99b226 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23846)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2105497588

   
   ## CI report:
   
   * 3de4b581b5acf16cc256b7d2cce1a43cbd166b28 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23831)
 
   * a5daf71906886e6d8da62abdf2decae1e20b09ef Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23845)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7745] Move Hadoop-dependent util methods to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11193:
URL: https://github.com/apache/hudi/pull/11193#issuecomment-2105473886

   
   ## CI report:
   
   * bbbc714d35283e5e743883ae945cfec50f99b226 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2105472543

   
   ## CI report:
   
   * 3de4b581b5acf16cc256b7d2cce1a43cbd166b28 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23831)
 
   * a5daf71906886e6d8da62abdf2decae1e20b09ef UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105473597

   
   ## CI report:
   
   * 68d1d5f75238863f544937e050f0f3015f8d7df8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23837)
 
   * 5739605de6bb73d0e3982a335e243fdb356a6031 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23844)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7746) HadoopConf loses set values when HoodieStorage.getConf is called

2024-05-10 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7746:
-

 Summary: HadoopConf loses set values when HoodieStorage.getConf is 
called
 Key: HUDI-7746
 URL: https://issues.apache.org/jira/browse/HUDI-7746
 Project: Apache Hudi
  Issue Type: Improvement
  Components: reader-core, writer-core
Reporter: Jonathan Vexler


We use StorageConf to hold 

hoodie.io.factory.class which is the IOFactory class that should be used for 
the file reader and writer factories. For now, we have added reflection into 

HoodieHadoopIOFactory to get around this, but ideally we should not need to do 
this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105463029

   
   ## CI report:
   
   * 68d1d5f75238863f544937e050f0f3015f8d7df8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23837)
 
   * 5739605de6bb73d0e3982a335e243fdb356a6031 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #10872:
URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105462852

   
   ## CI report:
   
   * acbabdc64da321e77aaabd03bcd9d5f3c322c0ec Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23841)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Status: Patch Available  (was: In Progress)

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7745:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7745] Move Hadoop-dependent util methods to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


yihua opened a new pull request, #11193:
URL: https://github.com/apache/hudi/pull/11193

   ### Change Logs
   
   This PR moves Hadoop-dependent util methods in `hudi-common` module to 
`hudi-hadoop-common` module:
   - Util methods in `FSUtils` class are moved to `HadoopFSUtils` class
   - Util methods in `FileStatusUtils` class are moved to `HadoopFSUtils` class
   - Util methods in `ConfigUtils` class are moved to `HadoopConfigUtils` class
   
   ### Impact
   
   Towards making `hudi-common` module Hadoop-indepedent.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7429] Fixing average record size estimation for delta commits [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #10763:
URL: https://github.com/apache/hudi/pull/10763#issuecomment-2105440971

   
   ## CI report:
   
   * 6a4b8370ab41ce9060dcd8c7c4ee80786cc086b0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23840)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Story Points: 2  (was: 0.5)

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Status: In Progress  (was: Open)

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Sprint: Sprint 2023-04-26

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7739) Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy

2024-05-10 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7739.

Fix Version/s: 1.0.0
 Assignee: Danny Chen
   Resolution: Fixed

Fixed via master branch: 86f7a6554df17ba558428be7c8db6316160a0c82

> Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy
> --
>
> Key: HUDI-7739
> URL: https://issues.apache.org/jira/browse/HUDI-7739
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Xinyu Zou
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7739] Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy [hudi]

2024-05-10 Thread via GitHub


danny0405 merged PR #11182:
URL: https://github.com/apache/hudi/pull/11182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (c7c636c2d18 -> 86f7a6554df)

2024-05-10 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from c7c636c2d18 [HUDI-7731] Fix usage of new Configuration() in production 
code (#11191)
 add 86f7a6554df [HUDI-7739] Shudown asyncDetectorExecutor in 
AsyncTimelineServerBasedDetectionStrategy (#11182)

No new revisions were added by this update.

Summary of changes:
 .../conflict/detection/TimelineServerBasedDetectionStrategy.java | 2 ++
 .../java/org/apache/hudi/timeline/service/RequestHandler.java| 9 +++--
 .../org/apache/hudi/timeline/service/handlers/MarkerHandler.java | 3 +++
 .../marker/AsyncTimelineServerBasedDetectionStrategy.java| 6 ++
 4 files changed, 18 insertions(+), 2 deletions(-)



[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Epic Link: HUDI-6243

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7745:
---

Assignee: Ethan Guo

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Fix Version/s: 0.15.0
   1.0.0

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Story Points: 0.5

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7745:

Labels: hoodie-storage  (was: )

> Move Hadoop-dependent util methods to hudi-hadoop-common
> 
>
> Key: HUDI-7745
> URL: https://issues.apache.org/jira/browse/HUDI-7745
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7686) Add util methods for type cast of configuration instances

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7686.
---
Resolution: Fixed

> Add util methods for type cast of configuration instances
> -
>
> Key: HUDI-7686
> URL: https://issues.apache.org/jira/browse/HUDI-7686
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7686) Add util methods for type cast of configuration instances

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7686:

Remaining Estimate: 0h
 Original Estimate: 0h

> Add util methods for type cast of configuration instances
> -
>
> Key: HUDI-7686
> URL: https://issues.apache.org/jira/browse/HUDI-7686
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7592) Remove remaining hadoop usage in hudi-common module

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7592:
---

Assignee: Ethan Guo  (was: Jonathan Vexler)

> Remove remaining hadoop usage in hudi-common module
> ---
>
> Key: HUDI-7592
> URL: https://issues.apache.org/jira/browse/HUDI-7592
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7731.
---
Resolution: Fixed

> Fix usage of new Configuration() in production code
> ---
>
> Key: HUDI-7731
> URL: https://issues.apache.org/jira/browse/HUDI-7731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> new Configuration() is used in non-test code in several places:
> HoodieParquetDataBlock.java
> Metrics.java
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7731:

Story Points: 2

> Fix usage of new Configuration() in production code
> ---
>
> Key: HUDI-7731
> URL: https://issues.apache.org/jira/browse/HUDI-7731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> new Configuration() is used in non-test code in several places:
> HoodieParquetDataBlock.java
> Metrics.java
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7726) Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7726.
---
Resolution: Fixed

> Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils
> --
>
> Key: HUDI-7726
> URL: https://issues.apache.org/jira/browse/HUDI-7726
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #10872:
URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105414177

   
   ## CI report:
   
   * ac7713c64afa1d2406463c8563a065362c95ecda Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23640)
 
   * acbabdc64da321e77aaabd03bcd9d5f3c322c0ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23841)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7429] Fixing average record size estimation for delta commits [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #10763:
URL: https://github.com/apache/hudi/pull/10763#issuecomment-2105414125

   
   ## CI report:
   
   * 34ffbbc913fab393871b866160ea2a7e1b38c53f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23804)
 
   * 6a4b8370ab41ce9060dcd8c7c4ee80786cc086b0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23840)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7731] Fix usage of new Configuration() in production code (#11191)

2024-05-10 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c7c636c2d18 [HUDI-7731] Fix usage of new Configuration() in production 
code (#11191)
c7c636c2d18 is described below

commit c7c636c2d18673a41aa0e656b6c7746808d4a001
Author: Jon Vexler 
AuthorDate: Fri May 10 20:47:33 2024 -0400

[HUDI-7731] Fix usage of new Configuration() in production code (#11191)

Co-authored-by: Jonathan Vexler <=>
---
 .../main/java/org/apache/hudi/client/BaseHoodieClient.java  |  2 +-
 .../apache/hudi/client/timeline/HoodieTimelineArchiver.java |  2 +-
 .../apache/hudi/client/transaction/lock/LockManager.java|  2 +-
 .../client/transaction/lock/metrics/HoodieLockMetrics.java  |  5 +++--
 .../main/java/org/apache/hudi/metrics/HoodieMetrics.java|  5 +++--
 .../table/action/compact/RunCompactionActionExecutor.java   |  2 +-
 .../hudi/table/action/index/RunIndexActionExecutor.java |  2 +-
 .../org/apache/hudi/metrics/TestHoodieConsoleMetrics.java   |  5 -
 .../org/apache/hudi/metrics/TestHoodieGraphiteMetrics.java  |  5 -
 .../java/org/apache/hudi/metrics/TestHoodieJmxMetrics.java  |  5 -
 .../java/org/apache/hudi/metrics/TestHoodieMetrics.java |  5 -
 .../hudi/metrics/datadog/TestDatadogMetricsReporter.java|  9 ++---
 .../test/java/org/apache/hudi/metrics/m3/TestM3Metrics.java | 10 +++---
 .../hudi/metrics/prometheus/TestPrometheusReporter.java |  7 +--
 .../hudi/metrics/prometheus/TestPushGateWayReporter.java| 13 -
 .../hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java |  2 +-
 .../hudi/metadata/JavaHoodieBackedTableMetadataWriter.java  |  2 +-
 .../apache/hudi/client/TestJavaHoodieBackedMetadata.java|  2 +-
 .../hudi/client/validator/SparkPreCommitValidator.java  |  2 +-
 .../hudi/metadata/SparkHoodieBackedTableMetadataWriter.java |  2 +-
 .../hudi/client/functional/TestHoodieBackedMetadata.java|  2 +-
 .../java/org/apache/hudi/io/TestHoodieTimelineArchiver.java |  2 +-
 .../apache/hudi/common/table/log/HoodieLogFormatWriter.java |  2 +-
 .../hudi/common/table/log/block/HoodieAvroDataBlock.java|  3 ++-
 .../hudi/common/table/log/block/HoodieCommandBlock.java |  3 ++-
 .../hudi/common/table/log/block/HoodieCorruptBlock.java |  3 ++-
 .../apache/hudi/common/table/log/block/HoodieDataBlock.java |  7 ---
 .../hudi/common/table/log/block/HoodieDeleteBlock.java  |  3 ++-
 .../hudi/common/table/log/block/HoodieHFileDataBlock.java   |  4 ++--
 .../apache/hudi/common/table/log/block/HoodieLogBlock.java  |  2 +-
 .../hudi/common/table/log/block/HoodieParquetDataBlock.java |  7 ++-
 .../java/org/apache/hudi/metadata/BaseTableMetadata.java|  3 ++-
 .../org/apache/hudi/metadata/HoodieMetadataMetrics.java |  5 +++--
 .../src/main/java/org/apache/hudi/metrics/Metrics.java  | 12 +++-
 .../apache/hudi/common/functional/TestHoodieLogFormat.java  |  2 +-
 .../hudi/common/table/log/block/TestHoodieDeleteBlock.java  |  3 ++-
 .../procedures/RepairOverwriteHoodiePropsProcedure.scala|  2 +-
 .../marker/MarkerBasedEarlyConflictDetectionRunnable.java   |  6 ++
 .../utilities/deltastreamer/HoodieDeltaStreamerMetrics.java |  9 +
 .../hudi/utilities/ingestion/HoodieIngestionMetrics.java| 10 +++---
 .../hudi/utilities/streamer/HoodieStreamerMetrics.java  | 11 ++-
 .../java/org/apache/hudi/utilities/streamer/StreamSync.java |  8 ++--
 42 files changed, 120 insertions(+), 78 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java
index fe964db6862..f982a0e4e22 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java
@@ -102,7 +102,7 @@ public abstract class BaseHoodieClient implements 
Serializable, AutoCloseable {
 this.heartbeatClient = new HoodieHeartbeatClient(storage, this.basePath,
 clientConfig.getHoodieClientHeartbeatIntervalInMs(),
 clientConfig.getHoodieClientHeartbeatTolerableMisses());
-this.metrics = new HoodieMetrics(config);
+this.metrics = new HoodieMetrics(config, context.getStorageConf());
 this.txnManager = new TransactionManager(config, storage);
 this.timeGenerator = TimeGenerators.getTimeGenerator(
 config.getTimeGeneratorConfig(), 
HadoopFSUtils.getStorageConf(hadoopConf));
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java
index 175ac5607f4..f4ab6c76e13 

Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


yihua merged PR #11191:
URL: https://github.com/apache/hudi/pull/11191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #10872:
URL: https://github.com/apache/hudi/pull/10872#issuecomment-2105411384

   
   ## CI report:
   
   * ac7713c64afa1d2406463c8563a065362c95ecda Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23640)
 
   * acbabdc64da321e77aaabd03bcd9d5f3c322c0ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7429] Fixing average record size estimation for delta commits [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #10763:
URL: https://github.com/apache/hudi/pull/10763#issuecomment-2105411335

   
   ## CI report:
   
   * 34ffbbc913fab393871b866160ea2a7e1b38c53f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23804)
 
   * 6a4b8370ab41ce9060dcd8c7c4ee80786cc086b0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105408982

   
   ## CI report:
   
   * 3a85e2b008420a061db66f6946e37234f67dd7ec Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23838)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #10872:
URL: https://github.com/apache/hudi/pull/10872#discussion_r1597306796


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java:
##
@@ -57,6 +57,8 @@
  */
 public class JsonKafkaSource extends KafkaSource {
 
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();

Review Comment:
   Good point, I think we should revert this change, because anyway the 
`ObjectMapper` is serde from Spark driver to executor if using the static 
object, so there's not much gain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (badcca2ebe8 -> 0d0e27e2b9b)

2024-05-10 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from badcca2ebe8 [HUDI-7742] Move Hadoop-dependent reader util classes to 
hudi-hadoop-common module (#11190)
 add 0d0e27e2b9b [HUDI-7673] Fixing false positive validation failure for 
RLI with MDT validation tool (#11098)

No new revisions were added by this update.

Summary of changes:
 .../utilities/HoodieMetadataTableValidator.java| 118 ++---
 .../TestHoodieMetadataTableValidator.java  | 117 +++-
 2 files changed, 194 insertions(+), 41 deletions(-)



Re: [PR] [HUDI-7673] Fixing false positive validation failure for RLI with MDT validation tool [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11098:
URL: https://github.com/apache/hudi/pull/11098#discussion_r1597304231


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java:
##
@@ -1034,6 +1018,60 @@ private void 
validateRecordIndexContent(HoodieSparkEngineContext sparkEngineCont
 }
   }
 
+  @VisibleForTesting
+  JavaPairRDD> 
getRecordLocationsFromFSBasedListing(HoodieSparkEngineContext 
sparkEngineContext,
+   
   String basePath,
+   
   String latestCompletedCommit) {
+return sparkEngineContext.getSqlContext().read().format("hudi")
+.option(DataSourceReadOptions.TIME_TRAVEL_AS_OF_INSTANT().key(), 
latestCompletedCommit)
+.load(basePath)
+.select(RECORD_KEY_METADATA_FIELD, PARTITION_PATH_METADATA_FIELD, 
FILENAME_METADATA_FIELD)
+.toJavaRDD()
+.mapToPair(row -> new 
Tuple2<>(row.getString(row.fieldIndex(RECORD_KEY_METADATA_FIELD)),
+
Pair.of(row.getString(row.fieldIndex(PARTITION_PATH_METADATA_FIELD)),
+
FSUtils.getFileId(row.getString(row.fieldIndex(FILENAME_METADATA_FIELD))
+.cache();
+  }
+
+  @VisibleForTesting
+  JavaPairRDD> 
getRecordLocationsFromRLI(HoodieSparkEngineContext sparkEngineContext,
+  String 
basePath,
+  String 
latestCompletedCommit) {
+return sparkEngineContext.getSqlContext().read().format("hudi")
+.load(getMetadataTableBasePath(basePath))

Review Comment:
   @nsivabalan one thing we can consider as a follow-up is to use the 
time-travel query on MDT as well (this might not be supported but would be good 
to have for the validation).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7673] Fixing false positive validation failure for RLI with MDT validation tool [hudi]

2024-05-10 Thread via GitHub


yihua merged PR #11098:
URL: https://github.com/apache/hudi/pull/11098


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi could override users' configurations [hudi]

2024-05-10 Thread via GitHub


danny0405 commented on issue #11188:
URL: https://github.com/apache/hudi/issues/11188#issuecomment-2105384268

   > I actually see hudi could set many spark relate configures in SparkConf, 
most of them are related to parquet reader/writer.
   
   Are these options configurable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7745) Move Hadoop-dependent util methods to hudi-hadoop-common

2024-05-10 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7745:
---

 Summary: Move Hadoop-dependent util methods to hudi-hadoop-common
 Key: HUDI-7745
 URL: https://issues.apache.org/jira/browse/HUDI-7745
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7742) Move Hadoop-dependent reader util classes to hudi-hadoop-common module

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7742.
---
Resolution: Fixed

> Move Hadoop-dependent reader util classes to hudi-hadoop-common module
> --
>
> Key: HUDI-7742
> URL: https://issues.apache.org/jira/browse/HUDI-7742
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7742) Move Hadoop-dependent reader util classes to hudi-hadoop-common module

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7742:

Sprint: Sprint 2023-04-26

> Move Hadoop-dependent reader util classes to hudi-hadoop-common module
> --
>
> Key: HUDI-7742
> URL: https://issues.apache.org/jira/browse/HUDI-7742
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7744) Create HoodieIOFactory and config to set it

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7744:

Status: In Progress  (was: Open)

> Create HoodieIOFactory and config to set it
> ---
>
> Key: HUDI-7744
> URL: https://issues.apache.org/jira/browse/HUDI-7744
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core, writer-core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Create HoodieIOFactory that will give the appropriate reader and writer 
> factories based on a config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7743) Fix simple mistakes with StoragePath in production code.

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7743:

Sprint: Sprint 2023-04-26

> Fix simple mistakes with StoragePath in production code.
> 
>
> Key: HUDI-7743
> URL: https://issues.apache.org/jira/browse/HUDI-7743
> Project: Apache Hudi
>  Issue Type: Task
>  Components: code-quality
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Fix many simple mistakes with StoragePath such as doing extra conversions, 
> not using util methods etc.
> Don't fix any mistakes in tests for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7742) Move Hadoop-dependent reader util classes to hudi-hadoop-common module

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7742:

Status: Patch Available  (was: In Progress)

> Move Hadoop-dependent reader util classes to hudi-hadoop-common module
> --
>
> Key: HUDI-7742
> URL: https://issues.apache.org/jira/browse/HUDI-7742
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7744) Create HoodieIOFactory and config to set it

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7744:

Sprint: Sprint 2023-04-26

> Create HoodieIOFactory and config to set it
> ---
>
> Key: HUDI-7744
> URL: https://issues.apache.org/jira/browse/HUDI-7744
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core, writer-core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Create HoodieIOFactory that will give the appropriate reader and writer 
> factories based on a config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7744) Create HoodieIOFactory and config to set it

2024-05-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7744:

Status: Patch Available  (was: In Progress)

> Create HoodieIOFactory and config to set it
> ---
>
> Key: HUDI-7744
> URL: https://issues.apache.org/jira/browse/HUDI-7744
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core, writer-core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Create HoodieIOFactory that will give the appropriate reader and writer 
> factories based on a config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597287731


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkConfUtils.scala:
##
@@ -48,4 +50,10 @@ object HoodieSparkConfUtils {
   .map(HollowCommitHandling.valueOf)
   
.getOrElse(HollowCommitHandling.valueOf(INCREMENTAL_READ_HANDLE_HOLLOW_COMMIT.defaultValue))
   }
+
+  def getSparkReaderConfig(): HoodieConfig = {

Review Comment:
   Is this still needed?



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##
@@ -853,7 +852,7 @@ object HoodieBaseRelation extends SparkAdapterSupport {
   val hoodieConfig = new HoodieConfig()
   hoodieConfig.setValue(USE_NATIVE_HFILE_READER,
 options.getOrElse(USE_NATIVE_HFILE_READER.key(), 
USE_NATIVE_HFILE_READER.defaultValue().toString))
-  val reader = 
HoodieFileReaderFactory.getReaderFactory(HoodieRecordType.AVRO)
+  val reader = (new 
HoodieSparkIOFactory).getReaderFactory(HoodieRecordType.AVRO)

Review Comment:
   Similar here.
   
   If the IO factory class name is already set in the storage config, could we 
use the reflection, i.e., `HoodieIOFactory.getIOFactory(conf)`, to load the 
`HoodieSparkIOFactory`, which achieves the same behavior?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597285676


##
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/HoodieHadoopIOFactory.java:
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.io.hadoop.HoodieAvroFileReaderFactory;
+import org.apache.hudi.io.hadoop.HoodieAvroFileWriterFactory;
+
+public class HoodieHadoopIOFactory extends HoodieIOFactory {

Review Comment:
   Javadocs on what is returned, and what the Avro vs Spark record type means.



##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieIOFactory.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.common.config.HoodieStorageConfig;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.util.ReflectionUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.storage.StorageConfiguration;
+
+/**
+ * Base class to get HoodieFileReaderFactory and HoodieFileWriterFactory
+ */
+public abstract class HoodieIOFactory {
+
+  public static HoodieIOFactory getIOFactory(StorageConfiguration 
storageConf) {
+String ioFactoryClass = 
storageConf.getString(HoodieStorageConfig.HOODIE_IO_FACTORY_CLASS.key())
+.orElse(HoodieStorageConfig.HOODIE_IO_FACTORY_CLASS.defaultValue());
+return getIOFactory(ioFactoryClass);
+  }
+
+  private static HoodieIOFactory getIOFactory(String ioFactoryClass) {
+try {
+  Class clazz =
+  ReflectionUtils.getClass(ioFactoryClass);
+  return (HoodieIOFactory) clazz.newInstance();
+} catch (IllegalArgumentException | IllegalAccessException | 
InstantiationException e) {
+  throw new HoodieException("Unable to create " + ioFactoryClass, e);
+}
+  }
+
+  public HoodieFileReaderFactory 
getReaderFactory(HoodieRecord.HoodieRecordType recordType) {

Review Comment:
   I'm wondering for different record type, should they be using two different 
`HoodieIOFactory` implementation instead one implementation class redirecting 
to different reader/writer factories internally?



##
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/HoodieHadoopIOFactory.java:
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.io.hadoop.HoodieAvroFileReaderFactory;
+import 

Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11192:
URL: https://github.com/apache/hudi/pull/11192#discussion_r1597281157


##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java:
##
@@ -43,39 +40,18 @@
 
 public class HoodieFileWriterFactory {
 
-  private static HoodieFileWriterFactory 
getWriterFactory(HoodieRecord.HoodieRecordType recordType) {

Review Comment:
   Let's create a JIRA ticket to make the `HoodieFileReaderFactory` and 
`HoodieFileWriterFactory` interface or abstract class so APIs to create new 
reader and writer should be abstract.



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkIOFactory.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.common.model.HoodieRecord;
+
+public class HoodieSparkIOFactory extends HoodieHadoopIOFactory {
+

Review Comment:
   nit: empty line



##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieStorageConfig.java:
##
@@ -235,6 +235,13 @@ public class HoodieStorageConfig extends HoodieConfig {
   + "and it is loaded at runtime. This is only required when trying to 
"
   + "override the existing write context when 
`hoodie.datasource.write.row.writer.enable=true`.");
 
+  public static final ConfigProperty HOODIE_IO_FACTORY_CLASS = 
ConfigProperty
+  .key("hoodie.io.factory.class")
+  .defaultValue("org.apache.hudi.io.storage.HoodieHadoopIOFactory")
+  .markAdvanced()
+  .sinceVersion("0.15.0")
+  .withDocumentation("Provided class should implement 
`org.apache.hudi.io.storage.HoodieIOFactory`");

Review Comment:
   ```suggestion
 .withDocumentation("The fully-qualified class name of the factory 
class to return readers and writers of files used by Hudi. The provided class 
should implement `org.apache.hudi.io.storage.HoodieIOFactory`");
   ```



##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieIOFactory.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.io.storage;
+
+import org.apache.hudi.common.config.HoodieStorageConfig;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.util.ReflectionUtils;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.storage.StorageConfiguration;
+
+/**
+ * Base class to get HoodieFileReaderFactory and HoodieFileWriterFactory
+ */
+public abstract class HoodieIOFactory {
+
+  public static HoodieIOFactory getIOFactory(StorageConfiguration 
storageConf) {
+String ioFactoryClass = 
storageConf.getString(HoodieStorageConfig.HOODIE_IO_FACTORY_CLASS.key())
+.orElse(HoodieStorageConfig.HOODIE_IO_FACTORY_CLASS.defaultValue());
+return getIOFactory(ioFactoryClass);
+  }
+
+  private static HoodieIOFactory getIOFactory(String ioFactoryClass) {
+try {
+  Class clazz =
+  ReflectionUtils.getClass(ioFactoryClass);
+  return (HoodieIOFactory) clazz.newInstance();
+} catch (IllegalArgumentException | IllegalAccessException | 
InstantiationException e) {
+  throw new HoodieException("Unable to create " + ioFactoryClass, e);
+}

Review Comment:
   Use `ReflectionUtils#loadClass`



##

Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105351756

   
   ## CI report:
   
   * ab418b95d057737b34fe1314e550bee213e1d2b0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23835)
 
   * 3a85e2b008420a061db66f6946e37234f67dd7ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23838)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105347053

   
   ## CI report:
   
   * 68d1d5f75238863f544937e050f0f3015f8d7df8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23837)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105347026

   
   ## CI report:
   
   * ab418b95d057737b34fe1314e550bee213e1d2b0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23835)
 
   * 3a85e2b008420a061db66f6946e37234f67dd7ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105310136

   
   ## CI report:
   
   * 68d1d5f75238863f544937e050f0f3015f8d7df8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23837)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Fix Simple Mistakes with StoragePath [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11189:
URL: https://github.com/apache/hudi/pull/11189#discussion_r1597244371


##
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##
@@ -123,7 +121,7 @@ public String addPartitionMeta(
 
client.getActiveTimeline().getCommitTimeline().lastInstant().get().getTimestamp();
 List partitionPaths =
 FSUtils.getAllPartitionFoldersThreeLevelsDown(HoodieCLI.storage, 
client.getBasePath());
-StoragePath basePath = new StoragePath(client.getBasePath());
+StoragePath basePath = client.getBasePathV2();

Review Comment:
   Could you create a JIRA to remove `getBasePathV2()` and return `StoragePath` 
from `getBasePath()` as a follow-up?



##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -294,11 +294,20 @@ public HoodieTableType getTableType() {
 
   /**
* @return Meta path
+   * @deprecated please use {@link #getMetaPathV2()}
*/
+  @Deprecated

Review Comment:
   Can we directly change this method to return `StoragePath` and `metaPath`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11192:
URL: https://github.com/apache/hudi/pull/11192#issuecomment-2105303945

   
   ## CI report:
   
   * 68d1d5f75238863f544937e050f0f3015f8d7df8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7742] Move Hadoop-dependent reader util classes to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


yihua merged PR #11190:
URL: https://github.com/apache/hudi/pull/11190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7742] Move Hadoop-dependent reader util classes to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11190:
URL: https://github.com/apache/hudi/pull/11190#discussion_r1597239746


##
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java:
##
@@ -121,7 +122,7 @@ protected byte[] serializeRecords(List 
records) throws IOException
 HFileContext context = new HFileContextBuilder()
 .withBlockSize(DEFAULT_BLOCK_SIZE)
 .withCompression(compressionAlgorithm.get())
-.withCellComparator(new HoodieHBaseKVComparator())
+
.withCellComparator(ReflectionUtils.loadClass(KV_COMPARATOR_CLASS_NAME))

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7742] Move Hadoop-dependent reader util classes to hudi-hadoop-common module (#11190)

2024-05-10 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new badcca2ebe8 [HUDI-7742] Move Hadoop-dependent reader util classes to 
hudi-hadoop-common module (#11190)
badcca2ebe8 is described below

commit badcca2ebe8c30efa3fc13cad4c3f0114101874a
Author: Y Ethan Guo 
AuthorDate: Fri May 10 14:20:00 2024 -0700

[HUDI-7742] Move Hadoop-dependent reader util classes to hudi-hadoop-common 
module (#11190)
---
 .../action/bootstrap/OrcBootstrapMetadataHandler.java   |  2 +-
 .../common/table/log/block/HoodieHFileDataBlock.java|  5 +++--
 .../hudi/common/testutils/HoodieTestDataGenerator.java  |  4 
 .../java/org/apache/hudi/common/util/AvroOrcUtils.java  |  0
 .../main/java/org/apache/hudi/common/util/OrcUtils.java |  1 +
 .../org/apache/hudi/io/hadoop/HoodieAvroOrcReader.java  |  1 -
 .../org/apache/hudi/io/hadoop}/OrcReaderIterator.java   | 17 ++---
 .../apache/hudi/io/storage/HoodieHBaseKVComparator.java |  0
 .../parquet/avro/HoodieAvroParquetReaderBuilder.java|  0
 .../org/apache/parquet/avro/HoodieAvroReadSupport.java  |  0
 .../org/apache/hudi/common/util/TestAvroOrcUtils.java   |  4 
 .../apache/hudi/io/hadoop}/TestOrcReaderIterator.java   | 17 ++---
 .../org/apache/hudi/functional/TestOrcBootstrap.java|  2 +-
 .../deltastreamer/HoodieDeltaStreamerTestBase.java  |  3 ++-
 .../hudi/utilities/testutils/UtilitiesTestBase.java |  3 ++-
 15 files changed, 34 insertions(+), 25 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/OrcBootstrapMetadataHandler.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/OrcBootstrapMetadataHandler.java
index 2d4457d575b..86944ae3f5b 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/OrcBootstrapMetadataHandler.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/bootstrap/OrcBootstrapMetadataHandler.java
@@ -25,11 +25,11 @@ import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType;
 import org.apache.hudi.common.util.AvroOrcUtils;
-import org.apache.hudi.common.util.OrcReaderIterator;
 import org.apache.hudi.common.util.queue.HoodieExecutor;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.io.HoodieBootstrapHandle;
+import org.apache.hudi.io.hadoop.OrcReaderIterator;
 import org.apache.hudi.keygen.KeyGeneratorInterface;
 import org.apache.hudi.storage.StoragePath;
 import org.apache.hudi.table.HoodieTable;
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
index a379e305d0e..0893637b956 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java
@@ -26,6 +26,7 @@ import org.apache.hudi.common.model.HoodieFileFormat;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType;
 import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.collection.ClosableIterator;
 import org.apache.hudi.common.util.collection.CloseableMappingIterator;
 import org.apache.hudi.exception.HoodieIOException;
@@ -33,7 +34,6 @@ import org.apache.hudi.io.SeekableDataInputStream;
 import org.apache.hudi.io.storage.HoodieAvroHFileReaderImplBase;
 import org.apache.hudi.io.storage.HoodieFileReader;
 import org.apache.hudi.io.storage.HoodieFileReaderFactory;
-import org.apache.hudi.io.storage.HoodieHBaseKVComparator;
 import org.apache.hudi.storage.HoodieStorage;
 import org.apache.hudi.storage.HoodieStorageUtils;
 import org.apache.hudi.storage.StorageConfiguration;
@@ -76,6 +76,7 @@ import static 
org.apache.hudi.common.util.ValidationUtils.checkState;
 public class HoodieHFileDataBlock extends HoodieDataBlock {
   private static final Logger LOG = 
LoggerFactory.getLogger(HoodieHFileDataBlock.class);
   private static final int DEFAULT_BLOCK_SIZE = 1024 * 1024;
+  private static final String KV_COMPARATOR_CLASS_NAME = 
"org.apache.hudi.io.storage.HoodieHBaseKVComparator";
 
   private final Option compressionAlgorithm;
   // This path is used for constructing HFile reader context, which should not 
be
@@ -121,7 +122,7 @@ public class HoodieHFileDataBlock extends HoodieDataBlock {
 HFileContext context = new HFileContextBuilder()
 .withBlockSize(DEFAULT_BLOCK_SIZE)
 

(hudi) branch master updated: [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils (#11185)

2024-05-10 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 23b283acf3e [HUDI-7726] Restructure TableSchemaResolver to separate 
Hadoop logic and use BaseFileUtils (#11185)
23b283acf3e is described below

commit 23b283acf3e4c30e26652edf9c710e17e47951c5
Author: Jon Vexler 
AuthorDate: Fri May 10 17:19:23 2024 -0400

[HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and 
use BaseFileUtils (#11185)

Co-authored-by: Jonathan Vexler <=>
Co-authored-by: Y Ethan Guo 
---
 .../hudi/cli/commands/HoodieLogFileCommand.java|  15 +--
 .../hudi/io/HoodieKeyLocationFetchHandle.java  |   7 +-
 .../hudi/client/TestJavaHoodieBackedMetadata.java  |  12 +-
 .../testutils/HoodieJavaClientTestHarness.java |  10 +-
 .../functional/TestHoodieBackedMetadata.java   |  12 +-
 .../functional/TestHoodieBackedTableMetadata.java  |   7 +-
 .../hudi/common/model/HoodiePartitionMetadata.java |   2 +-
 .../hudi/common/table/TableSchemaResolver.java | 122 +++
 .../org/apache/hudi/common/util/BaseFileUtils.java |  11 +-
 .../hudi/table/catalog/TableOptionProperties.java  |   4 +-
 .../common/table/ParquetTableSchemaResolver.java   |  66 +++
 .../org/apache/hudi/common/util/HFileUtils.java| 130 +
 .../hudi/common/table/TestTableSchemaResolver.java |   7 +-
 .../ShowHoodieLogFileMetadataProcedure.scala   |   3 +-
 .../ShowHoodieLogFileRecordsProcedure.scala|   9 +-
 .../apache/hudi/sync/common/HoodieSyncClient.java  |   6 +-
 .../utilities/HoodieMetadataTableValidator.java|   8 +-
 17 files changed, 259 insertions(+), 172 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
index 367dc2302ee..d3c30143072 100644
--- 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
+++ 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
@@ -49,8 +49,6 @@ import org.apache.hudi.storage.StoragePath;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import org.apache.avro.Schema;
 import org.apache.avro.generic.IndexedRecord;
-import org.apache.parquet.avro.AvroSchemaConverter;
-import org.apache.parquet.schema.MessageType;
 import org.springframework.shell.standard.ShellComponent;
 import org.springframework.shell.standard.ShellMethod;
 import org.springframework.shell.standard.ShellOption;
@@ -109,9 +107,7 @@ public class HoodieLogFileCommand {
   } else {
 fileName = path.getName();
   }
-  MessageType schema = TableSchemaResolver.readSchemaFromLogFile(storage, 
path);
-  Schema writerSchema = schema != null
-  ? new AvroSchemaConverter().convert(Objects.requireNonNull(schema)) 
: null;
+  Schema writerSchema = TableSchemaResolver.readSchemaFromLogFile(storage, 
path);
   try (Reader reader = HoodieLogFormat.newReader(storage, new 
HoodieLogFile(path), writerSchema)) {
 
 // read the avro blocks
@@ -213,14 +209,13 @@ public class HoodieLogFileCommand {
 checkArgument(logFilePaths.size() > 0, "There is no log file");
 
 // TODO : readerSchema can change across blocks/log files, fix this inside 
Scanner
-AvroSchemaConverter converter = new AvroSchemaConverter();
 Schema readerSchema = null;
 // get schema from last log file
 for (int i = logFilePaths.size() - 1; i >= 0; i--) {
-  MessageType schema = TableSchemaResolver.readSchemaFromLogFile(
+  Schema schema = TableSchemaResolver.readSchemaFromLogFile(
   storage, new StoragePath(logFilePaths.get(i)));
   if (schema != null) {
-readerSchema = converter.convert(schema);
+readerSchema = schema;
 break;
   }
 }
@@ -257,10 +252,8 @@ public class HoodieLogFileCommand {
   }
 } else {
   for (String logFile : logFilePaths) {
-MessageType schema = TableSchemaResolver.readSchemaFromLogFile(
+Schema writerSchema = TableSchemaResolver.readSchemaFromLogFile(
 client.getStorage(), new StoragePath(logFile));
-Schema writerSchema = schema != null
-? new 
AvroSchemaConverter().convert(Objects.requireNonNull(schema)) : null;
 try (HoodieLogFormat.Reader reader =
  HoodieLogFormat.newReader(storage, new HoodieLogFile(new 
StoragePath(logFile)), writerSchema)) {
   // read the avro blocks
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLocationFetchHandle.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLocationFetchHandle.java
index 30e2437485e..f05a0af3449 100644
--- 

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-10 Thread via GitHub


yihua merged PR #11185:
URL: https://github.com/apache/hudi/pull/11185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-10 Thread via GitHub


yihua commented on code in PR #11185:
URL: https://github.com/apache/hudi/pull/11185#discussion_r1597236327


##
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java:
##
@@ -300,21 +273,6 @@ private Option 
getTableParquetSchemaFromDataFile() {
 }
   }
 
-  public static MessageType convertAvroSchemaToParquet(Schema schema, 
Configuration hadoopConf) {

Review Comment:
   Our functional tests cover a few schema evolution cases that execute the 
logic in `TableSchemaResolver`.  Still, we should do more testing before the 
release to make sure everything still works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7744) Create HoodieIOFactory and config to set it

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7744:
-
Labels: pull-request-available  (was: )

> Create HoodieIOFactory and config to set it
> ---
>
> Key: HUDI-7744
> URL: https://issues.apache.org/jira/browse/HUDI-7744
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core, writer-core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Create HoodieIOFactory that will give the appropriate reader and writer 
> factories based on a config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7744] Introduce IOFactory and a config to set the factory [hudi]

2024-05-10 Thread via GitHub


jonvex opened a new pull request, #11192:
URL: https://github.com/apache/hudi/pull/11192

   ### Change Logs
   
   Remove the base static methods in reader and writer factory to create them. 
Reader and Writer factories will be created by HoodieIOFactory.getIOFactory()
   
   In spark modules, we will directly use the spark io factory.
   
   ### Impact
   
   io factory for different file systems possible now
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7744) Create HoodieIOFactory and config to set it

2024-05-10 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7744:
-

 Summary: Create HoodieIOFactory and config to set it
 Key: HUDI-7744
 URL: https://issues.apache.org/jira/browse/HUDI-7744
 Project: Apache Hudi
  Issue Type: Improvement
  Components: reader-core, writer-core
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler
 Fix For: 0.15.0, 1.0.0


Create HoodieIOFactory that will give the appropriate reader and writer 
factories based on a config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105237449

   
   ## CI report:
   
   * ab418b95d057737b34fe1314e550bee213e1d2b0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23835)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105174009

   
   ## CI report:
   
   * b8b11fa3ea8a47d89621b3c130e1bbb8066d7c4c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23834)
 
   * ab418b95d057737b34fe1314e550bee213e1d2b0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23835)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105162966

   
   ## CI report:
   
   * b8b11fa3ea8a47d89621b3c130e1bbb8066d7c4c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23834)
 
   * ab418b95d057737b34fe1314e550bee213e1d2b0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7742] Move Hadoop-dependent reader util classes to hudi-hadoop-common module [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11190:
URL: https://github.com/apache/hudi/pull/11190#issuecomment-2105162884

   
   ## CI report:
   
   * 2132f3e951ec684176e7ce6aefa3c8c467849dab Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11185:
URL: https://github.com/apache/hudi/pull/11185#issuecomment-2105162785

   
   ## CI report:
   
   * 76dc076a65432684c5217f12c264edb7cd50d9e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11152:
URL: https://github.com/apache/hudi/pull/11152#issuecomment-2105162615

   
   ## CI report:
   
   * 3de4b581b5acf16cc256b7d2cce1a43cbd166b28 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105105997

   
   ## CI report:
   
   * b8b11fa3ea8a47d89621b3c130e1bbb8066d7c4c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23834)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7731] Fix usage of new Configuration() in production code [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11191:
URL: https://github.com/apache/hudi/pull/11191#issuecomment-2105096635

   
   ## CI report:
   
   * b8b11fa3ea8a47d89621b3c130e1bbb8066d7c4c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7743] Fix Simple Mistakes with StoragePath [hudi]

2024-05-10 Thread via GitHub


hudi-bot commented on PR #11189:
URL: https://github.com/apache/hudi/pull/11189#issuecomment-2105087264

   
   ## CI report:
   
   * 975a7d92617080bb4c32e832796e8d13cd8d9857 UNKNOWN
   * 51a199199691df091162a3d8cb71f9ee448b079a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23829)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-10 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-7731:
--
Status: Patch Available  (was: In Progress)

> Fix usage of new Configuration() in production code
> ---
>
> Key: HUDI-7731
> URL: https://issues.apache.org/jira/browse/HUDI-7731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> new Configuration() is used in non-test code in several places:
> HoodieParquetDataBlock.java
> Metrics.java
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7731:
-
Labels: pull-request-available  (was: )

> Fix usage of new Configuration() in production code
> ---
>
> Key: HUDI-7731
> URL: https://issues.apache.org/jira/browse/HUDI-7731
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> new Configuration() is used in non-test code in several places:
> HoodieParquetDataBlock.java
> Metrics.java
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >