[jira] [Created] (HUDI-6009) Let the jetty server in TimelineService create daemon threads
dzcxzl created HUDI-6009: Summary: Let the jetty server in TimelineService create daemon threads Key: HUDI-6009 URL: https://issues.apache.org/jira/browse/HUDI-6009 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5731) Add guava dependency to Spark and MR bundle
dzcxzl created HUDI-5731: Summary: Add guava dependency to Spark and MR bundle Key: HUDI-5731 URL: https://issues.apache.org/jira/browse/HUDI-5731 Project: Apache Hudi Issue Type: Bug Affects Versions: 0.12.1 Reporter: dzcxzl Configure guava relocation in Spark and MR bundle pom.xml, but there is no guava dependency, resulting in failure of guava-related class loading. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5487) Reduce duplicate Logs in ExternalSpillableMap
dzcxzl created HUDI-5487: Summary: Reduce duplicate Logs in ExternalSpillableMap Key: HUDI-5487 URL: https://issues.apache.org/jira/browse/HUDI-5487 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl We see hundreds of thousands of duplicate logs in the executor log. {code:java} 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 22/12/26 21:13:40,864 [Executor task launch worker for task 0.0 in stage 480.0 (TID 211376)] INFO ExternalSpillableMap: Update Estimated Payload size to => 4567 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata
[ https://issues.apache.org/jira/browse/HUDI-5484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-5484: - Description: {code:java} org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54) at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257) at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68) at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125){code} was: {code:java} org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at
[jira] [Created] (HUDI-5484) Avoid using GenericRecord in ColumnStatMetadata
dzcxzl created HUDI-5484: Summary: Avoid using GenericRecord in ColumnStatMetadata Key: HUDI-5484 URL: https://issues.apache.org/jira/browse/HUDI-5484 Project: Apache Hudi Issue Type: Bug Reporter: dzcxzl {code:java} org.apache.hudi.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap (org.apache.avro.Schema$RecordSchema) schema (org.apache.avro.generic.GenericData$Record) maxValue (org.apache.hudi.avro.model.HoodieMetadataColumnStats) columnStatMetadata (org.apache.hudi.metadata.HoodieMetadataPayload) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:232) at org.apache.hudi.common.model.HoodieAvroRecord.readRecordPayload(HoodieAvroRecord.java:45) at org.apache.hudi.common.model.HoodieRecord.read(HoodieRecord.java:339) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:520) at org.apache.hudi.com.esotericsoftware.kryo.serializers.DefaultSerializers$KryoSerializableSerializer.read(DefaultSerializers.java:512) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:101) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:75) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199) at org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:195) at org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:54) at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:188) at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleInsert(HoodieSparkCopyOnWriteTable.java:257) at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:68) at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134) at org.apache.hudi.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40) at org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5378) Remove minlog.Log
dzcxzl created HUDI-5378: Summary: Remove minlog.Log Key: HUDI-5378 URL: https://issues.apache.org/jira/browse/HUDI-5378 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5356) Call close on SparkRDDWriteClient several places
dzcxzl created HUDI-5356: Summary: Call close on SparkRDDWriteClient several places Key: HUDI-5356 URL: https://issues.apache.org/jira/browse/HUDI-5356 Project: Apache Hudi Issue Type: Bug Reporter: dzcxzl -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5332) HiveSyncTool can avoid initializing all permanent custom functions of Hive
dzcxzl created HUDI-5332: Summary: HiveSyncTool can avoid initializing all permanent custom functions of Hive Key: HUDI-5332 URL: https://issues.apache.org/jira/browse/HUDI-5332 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl Now initializing the Hive client will load the permanent custom functions of HMS. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4316) Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner
dzcxzl created HUDI-4316: Summary: Support for spillable diskmap configuration when constructing HoodieMergedLogRecordScanner Key: HUDI-4316 URL: https://issues.apache.org/jira/browse/HUDI-4316 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl Several PRs have supported using the configuration hoodie.common.spillable.diskmap.type to decide which diskmap to use before constructing the HoodieMergedLogRecordScanner But there are several places where such a configuration is not supported. https://issues.apache.org/jira/browse/HUDI-2044 https://issues.apache.org/jira/browse/HUDI-4151 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4309) Spark3.2 custom parser should not throw exception
dzcxzl created HUDI-4309: Summary: Spark3.2 custom parser should not throw exception Key: HUDI-4309 URL: https://issues.apache.org/jira/browse/HUDI-4309 Project: Apache Hudi Issue Type: Bug Reporter: dzcxzl In HoodieSpark3_2ExtendedSqlAstBuilder, there are three places where the syntax is not supported, which causes sql to report an error in the parse stage. visitInsertOverwriteDir visitInsertOverwriteHiveDir withRepartitionByExpression -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HUDI-4111) Bump ANTLR runtime version in Spark 3.x
[ https://issues.apache.org/jira/browse/HUDI-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-4111: - Summary: Bump ANTLR runtime version in Spark 3.x (was: Bump ANTLR runtime version to 4.8 in Spark 3.2) > Bump ANTLR runtime version in Spark 3.x > --- > > Key: HUDI-4111 > URL: https://issues.apache.org/jira/browse/HUDI-4111 > Project: Apache Hudi > Issue Type: Improvement >Reporter: dzcxzl >Priority: Trivial > Labels: pull-request-available > > Spark3.2 uses antlr version 4.8, Hudi uses 4.7, use the same version to avoid > a log of antlr check versions. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-4111) Bump ANTLR runtime version to 4.8 in Spark 3.2
dzcxzl created HUDI-4111: Summary: Bump ANTLR runtime version to 4.8 in Spark 3.2 Key: HUDI-4111 URL: https://issues.apache.org/jira/browse/HUDI-4111 Project: Apache Hudi Issue Type: Improvement Reporter: dzcxzl Spark3.2 uses antlr version 4.8, Hudi uses 4.7, use the same version to avoid a log of antlr check versions. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HUDI-3849) AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration
dzcxzl created HUDI-3849: Summary: AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration Key: HUDI-3849 URL: https://issues.apache.org/jira/browse/HUDI-3849 Project: Apache Hudi Issue Type: Improvement Components: spark Reporter: dzcxzl Now the datetimeRebaseMode of AvroDeserializer is hardcode and the value is "EXCEPTION" -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-973) RemoteHoodieTableFileSystemView supports non-partitioned table queries
dzcxzl created HUDI-973: --- Summary: RemoteHoodieTableFileSystemView supports non-partitioned table queries Key: HUDI-973 URL: https://issues.apache.org/jira/browse/HUDI-973 Project: Apache Hudi Issue Type: Bug Reporter: dzcxzl When hoodie.embed.timeline.server = true, the written table is a non-partitioned table, will get an exception. {code:java} io.javalin.BadRequestResponse: Query parameter 'partition' with value '' cannot be null or empty at io.javalin.validation.TypedValidator.getOrThrow(Validator.kt:25) at org.apache.hudi.timeline.service.FileSystemViewHandler.lambda$registerDataFilesAPI$3(FileSystemViewHandler.java:172) {code} Because api checks whether the value of partition is null or empty. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled
[ https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-889: Status: In Progress (was: Open) > Writer supports useJdbc configuration when hive synchronization is enabled > -- > > Key: HUDI-889 > URL: https://issues.apache.org/jira/browse/HUDI-889 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: dzcxzl >Priority: Trivial > > hudi-hive-sync supports the useJdbc = false configuration, but the writer > does not provide this configuration at this stage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled
[ https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-889: Status: Closed (was: Patch Available) > Writer supports useJdbc configuration when hive synchronization is enabled > -- > > Key: HUDI-889 > URL: https://issues.apache.org/jira/browse/HUDI-889 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: dzcxzl >Priority: Trivial > > hudi-hive-sync supports the useJdbc = false configuration, but the writer > does not provide this configuration at this stage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled
[ https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-889: Status: Patch Available (was: In Progress) > Writer supports useJdbc configuration when hive synchronization is enabled > -- > > Key: HUDI-889 > URL: https://issues.apache.org/jira/browse/HUDI-889 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: dzcxzl >Priority: Trivial > > hudi-hive-sync supports the useJdbc = false configuration, but the writer > does not provide this configuration at this stage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled
[ https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HUDI-889: Status: Open (was: New) > Writer supports useJdbc configuration when hive synchronization is enabled > -- > > Key: HUDI-889 > URL: https://issues.apache.org/jira/browse/HUDI-889 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: Writer Core >Reporter: dzcxzl >Priority: Trivial > > hudi-hive-sync supports the useJdbc = false configuration, but the writer > does not provide this configuration at this stage -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled
dzcxzl created HUDI-889: --- Summary: Writer supports useJdbc configuration when hive synchronization is enabled Key: HUDI-889 URL: https://issues.apache.org/jira/browse/HUDI-889 Project: Apache Hudi (incubating) Issue Type: Improvement Components: Writer Core Reporter: dzcxzl hudi-hive-sync supports the useJdbc = false configuration, but the writer does not provide this configuration at this stage -- This message was sent by Atlassian Jira (v8.3.4#803005)