[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc delete records
[ https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-5798: -- Summary: spark sql query fail on mor table after flink cdc delete records (was: spark sql query fail on mor table after flink cdc application delete records) > spark sql query fail on mor table after flink cdc delete records > > > Key: HUDI-5798 > URL: https://issues.apache.org/jira/browse/HUDI-5798 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > > after flink cdc application delete records for a mor table, spark sql will > query fail on the table with below exception: > > Serialization trace: > orderingVal (org.apache.hudi.common.model.DeleteRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) > ... 23 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.avro.util.Utf8 > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc application delete records
[ https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-5798: -- Summary: spark sql query fail on mor table after flink cdc application delete records (was: spark-sql query fail on mor table after flink cdc application delete records) > spark sql query fail on mor table after flink cdc application delete records > > > Key: HUDI-5798 > URL: https://issues.apache.org/jira/browse/HUDI-5798 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > > after flink cdc application delete records for a mor table, spark sql will > query fail on the table with below exception: > > Serialization trace: > orderingVal (org.apache.hudi.common.model.DeleteRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) > ... 23 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.avro.util.Utf8 > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5798) spark-sql query fail on mor table after flink cdc application delete records
[ https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689110#comment-17689110 ] lrz commented on HUDI-5798: --- I fix this issue by add a special avro shade jar at spark/jars, and it seems not good to introduce into hudi project > spark-sql query fail on mor table after flink cdc application delete records > > > Key: HUDI-5798 > URL: https://issues.apache.org/jira/browse/HUDI-5798 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > > after flink cdc application delete records for a mor table, spark sql will > query fail on the table with below exception: > > Serialization trace: > orderingVal (org.apache.hudi.common.model.DeleteRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) > ... 23 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.avro.util.Utf8 > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5805) hive query on mor get empty result before compaction
lrz created HUDI-5805: - Summary: hive query on mor get empty result before compaction Key: HUDI-5805 URL: https://issues.apache.org/jira/browse/HUDI-5805 Project: Apache Hudi Issue Type: Bug Reporter: lrz Attachments: image-2023-02-15-20-48-08-819.png, image-2023-02-15-20-48-21-988.png when a mor table write data with flink cdc only, then before compaction the partition will only have log file, and no base file. then befor compaction, hive query result will always be empty. it's because when hive getSplit on a native table, hive will ignore a partition which only has files start with '.', and because hudi has not set storageHandle when sync hive meta, then hive treat it as native table !image-2023-02-15-20-48-08-819.png! !image-2023-02-15-20-48-21-988.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5798) spark-sql query fail on mor table after flink cdc application delete records
lrz created HUDI-5798: - Summary: spark-sql query fail on mor table after flink cdc application delete records Key: HUDI-5798 URL: https://issues.apache.org/jira/browse/HUDI-5798 Project: Apache Hudi Issue Type: Bug Reporter: lrz after flink cdc application delete records for a mor table, spark sql will query fail on the table with below exception: Serialization trace: orderingVal (org.apache.hudi.common.model.DeleteRecord) at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) at org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) at org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) at org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) at org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) ... 23 more Caused by: java.lang.ClassNotFoundException: org.apache.hudi.org.apache.avro.util.Utf8 at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
[ https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1759. --- Resolution: Fixed > Save one connection retry when hiveSyncTool run with useJdbc=false > -- > > Key: HUDI-1759 > URL: https://issues.apache.org/jira/browse/HUDI-1759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2021-04-02-15-43-15-854.png, > image-2021-04-02-15-48-42-895.png > > > when sync metadata to hive with useJdbc=false, there will have two problem: > first: if hive server enable kerberos, and hudi sync to hive with > useJdbc=false, then the metadata will miss owner, check the meta data here(I > test with hive 3.1.1): > !image-2021-04-02-15-43-15-854.png! > second: we can see there is a connection retry to hive metastore everytime > syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > !image-2021-04-02-15-48-42-895.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1744) [Rollback] rollback fail on mor table when the partition path hasn't any files
[ https://issues.apache.org/jira/browse/HUDI-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1744. --- Resolution: Fixed > [Rollback] rollback fail on mor table when the partition path hasn't any files > -- > > Key: HUDI-1744 > URL: https://issues.apache.org/jira/browse/HUDI-1744 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > when rollback on a mor table, and if the partition path hasn't any files, > then will throw exception because of call rdd.flatmap with 0 as numpartitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column
[ https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1779: -- Attachment: upsertFail.png > Fail to bootstrap/upsert a table which contains timestamp column > > > Key: HUDI-1779 > URL: https://issues.apache.org/jira/browse/HUDI-1779 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png > > > current when hudi bootstrap a parquet file, or upsert into a parquet file > which contains timestmap column, it will fail because these issues: > 1) At bootstrap operation, if the origin parquet file was written by a spark > application, then spark will default save timestamp as int96(see > spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because > of Hudi can not read Int96 type now.(this issue can be solve by upgrade > parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check > [https://github|https://github/] > <[https://github/]>.com/apache/parquet-mr/pull/831/files) > 2) after bootstrap, doing upsert will fail because we use hoodie schema to > read origin parquet file. The schema is not match because hoodie schema > treat timestamp as long and at origin file it’s Int96 > 3) after bootstrap, and partial update for a parquet file will fail, because > we copy the old record and save by hoodie schema( we miss a > convertFixedToLong operation like spark does) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column
[ https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1779: -- Attachment: unsupportInt96.png > Fail to bootstrap/upsert a table which contains timestamp column > > > Key: HUDI-1779 > URL: https://issues.apache.org/jira/browse/HUDI-1779 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png > > > current when hudi bootstrap a parquet file, or upsert into a parquet file > which contains timestmap column, it will fail because these issues: > 1) At bootstrap operation, if the origin parquet file was written by a spark > application, then spark will default save timestamp as int96(see > spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because > of Hudi can not read Int96 type now.(this issue can be solve by upgrade > parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check > [https://github|https://github/] > <[https://github/]>.com/apache/parquet-mr/pull/831/files) > 2) after bootstrap, doing upsert will fail because we use hoodie schema to > read origin parquet file. The schema is not match because hoodie schema > treat timestamp as long and at origin file it’s Int96 > 3) after bootstrap, and partial update for a parquet file will fail, because > we copy the old record and save by hoodie schema( we miss a > convertFixedToLong operation like spark does) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column
[ https://issues.apache.org/jira/browse/HUDI-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1779: -- Attachment: upsertFail2.png > Fail to bootstrap/upsert a table which contains timestamp column > > > Key: HUDI-1779 > URL: https://issues.apache.org/jira/browse/HUDI-1779 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: unsupportInt96.png, upsertFail.png, upsertFail2.png > > > current when hudi bootstrap a parquet file, or upsert into a parquet file > which contains timestmap column, it will fail because these issues: > 1) At bootstrap operation, if the origin parquet file was written by a spark > application, then spark will default save timestamp as int96(see > spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because > of Hudi can not read Int96 type now.(this issue can be solve by upgrade > parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check > [https://github|https://github/] > <[https://github/]>.com/apache/parquet-mr/pull/831/files) > 2) after bootstrap, doing upsert will fail because we use hoodie schema to > read origin parquet file. The schema is not match because hoodie schema > treat timestamp as long and at origin file it’s Int96 > 3) after bootstrap, and partial update for a parquet file will fail, because > we copy the old record and save by hoodie schema( we miss a > convertFixedToLong operation like spark does) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1779) Fail to bootstrap/upsert a table which contains timestamp column
lrz created HUDI-1779: - Summary: Fail to bootstrap/upsert a table which contains timestamp column Key: HUDI-1779 URL: https://issues.apache.org/jira/browse/HUDI-1779 Project: Apache Hudi Issue Type: Bug Reporter: lrz Fix For: 0.9.0 current when hudi bootstrap a parquet file, or upsert into a parquet file which contains timestmap column, it will fail because these issues: 1) At bootstrap operation, if the origin parquet file was written by a spark application, then spark will default save timestamp as int96(see spark.sql.parquet.int96AsTimestamp), then bootstrap will fail, it’s because of Hudi can not read Int96 type now.(this issue can be solve by upgrade parquet to 1.12.0, and set parquet.avro.readInt96AsFixed=true, please check [https://github|https://github/] <[https://github/]>.com/apache/parquet-mr/pull/831/files) 2) after bootstrap, doing upsert will fail because we use hoodie schema to read origin parquet file. The schema is not match because hoodie schema treat timestamp as long and at origin file it’s Int96 3) after bootstrap, and partial update for a parquet file will fail, because we copy the old record and save by hoodie schema( we miss a convertFixedToLong operation like spark does) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1750) Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath
[ https://issues.apache.org/jira/browse/HUDI-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1750. --- Resolution: Fixed > Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into > spark classpath > > > Key: HUDI-1750 > URL: https://issues.apache.org/jira/browse/HUDI-1750 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2021-04-01-10-55-43-760.png > > > Hudi use Class.forName(clazzName) to load user's class, which classloader is > same as call,see here: > !image-2021-04-01-10-55-43-760.png! > if user move hudi-spark-bundle jar into spark classPath, and use --jar to add > customer jars, then the caller classLoader will be AppClassLoader, and the > customer jars will be load by spark's MutableURLClassLoader, then lead to > ClassNotFoundException -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer
[ https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1751. --- Resolution: Fixed > DeltaStream print many unnecessary warn log because of passing hoodie config > to kafka consumer > -- > > Key: HUDI-1751 > URL: https://issues.apache.org/jira/browse/HUDI-1751 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > Because we add both kafka parameters and hudi configs at the same properties > file, such as kafka-source.properties, then when creating kafkaParams obj > will add some hoodie config also, which lead to the warn log printing: > !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail
[ https://issues.apache.org/jira/browse/HUDI-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz resolved HUDI-1749. --- Resolution: Fixed > Clean/Compaction/Rollback command maybe never exit when operation fail > -- > > Key: HUDI-1749 > URL: https://issues.apache.org/jira/browse/HUDI-1749 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > There are two issues: > 1) After Clean/Compaction/Rollback command finish, yarn application will > always show fail because the command exit directly without waitting for > sparkContext stop. > 2)when Clean/Compaction/Rollback command failed because of some exception, > the command will never exit because of sparkContext didn't stop. This is > because sparkUI use jetty, and introduce non-daemon thread, and > sparkContext.stop will stopUI to stop the non-daemon thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1751) DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer
[ https://issues.apache.org/jira/browse/HUDI-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1751: -- Summary: DeltaStream print many unnecessary warn log because of passing hoodie config to kafka consumer (was: DeltaStream print many unnecessary warn log) > DeltaStream print many unnecessary warn log because of passing hoodie config > to kafka consumer > -- > > Key: HUDI-1751 > URL: https://issues.apache.org/jira/browse/HUDI-1751 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0 > > > Because we add both kafka parameters and hudi configs at the same properties > file, such as kafka-source.properties, then when creating kafkaParams obj > will add some hoodie config also, which lead to the warn log printing: > !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
[ https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1759: -- Description: when sync metadata to hive with useJdbc=false, there will have two problem: first: if hive server enable kerberos, and hudi sync to hive with useJdbc=false, then the metadata will miss owner, check the meta data here(I test with hive 3.1.1): !image-2021-04-02-15-43-15-854.png! second: we can see there is a connection retry to hive metastore everytime syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": !image-2021-04-02-15-48-42-895.png! was: when sync metadata to hive with useJdbc=false, there will have two problem: first: if hive server enable kerberos, and hudi sync to hive with useJdbc=false, then the metadata will miss owner, check the meta data here(I test with hive 3.1.1): !image-2021-04-02-15-43-15-854.png! second: we can see there is a connection retry to hive metastore everytime syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > Save one connection retry when hiveSyncTool run with useJdbc=false > -- > > Key: HUDI-1759 > URL: https://issues.apache.org/jira/browse/HUDI-1759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2021-04-02-15-43-15-854.png, > image-2021-04-02-15-48-42-895.png > > > when sync metadata to hive with useJdbc=false, there will have two problem: > first: if hive server enable kerberos, and hudi sync to hive with > useJdbc=false, then the metadata will miss owner, check the meta data here(I > test with hive 3.1.1): > !image-2021-04-02-15-43-15-854.png! > second: we can see there is a connection retry to hive metastore everytime > syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > !image-2021-04-02-15-48-42-895.png! > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
[ https://issues.apache.org/jira/browse/HUDI-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1759: -- Attachment: image-2021-04-02-15-48-42-895.png > Save one connection retry when hiveSyncTool run with useJdbc=false > -- > > Key: HUDI-1759 > URL: https://issues.apache.org/jira/browse/HUDI-1759 > Project: Apache Hudi > Issue Type: Improvement >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > Attachments: image-2021-04-02-15-43-15-854.png, > image-2021-04-02-15-48-42-895.png > > > when sync metadata to hive with useJdbc=false, there will have two problem: > first: if hive server enable kerberos, and hudi sync to hive with > useJdbc=false, then the metadata will miss owner, check the meta data here(I > test with hive 3.1.1): > !image-2021-04-02-15-43-15-854.png! > second: we can see there is a connection retry to hive metastore everytime > syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1759) Save one connection retry when hiveSyncTool run with useJdbc=false
lrz created HUDI-1759: - Summary: Save one connection retry when hiveSyncTool run with useJdbc=false Key: HUDI-1759 URL: https://issues.apache.org/jira/browse/HUDI-1759 Project: Apache Hudi Issue Type: Improvement Reporter: lrz Fix For: 0.9.0 Attachments: image-2021-04-02-15-43-15-854.png when sync metadata to hive with useJdbc=false, there will have two problem: first: if hive server enable kerberos, and hudi sync to hive with useJdbc=false, then the metadata will miss owner, check the meta data here(I test with hive 3.1.1): !image-2021-04-02-15-43-15-854.png! second: we can see there is a connection retry to hive metastore everytime syncToHive, this exception also happen at UT "TestHiveSyncTool.testBasicSync": -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1751) DeltaStream print many unnecessary warn log
lrz created HUDI-1751: - Summary: DeltaStream print many unnecessary warn log Key: HUDI-1751 URL: https://issues.apache.org/jira/browse/HUDI-1751 Project: Apache Hudi Issue Type: Improvement Reporter: lrz Fix For: 0.9.0 Because we add both kafka parameters and hudi configs at the same properties file, such as kafka-source.properties, then when creating kafkaParams obj will add some hoodie config also, which lead to the warn log printing: !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/76572ba9f4094fb29b018db91fbf1450/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1750) Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath
lrz created HUDI-1750: - Summary: Fail to load user's class if user move hudi-spark-bundle_2.11-0.7.0.jar into spark classpath Key: HUDI-1750 URL: https://issues.apache.org/jira/browse/HUDI-1750 Project: Apache Hudi Issue Type: Bug Reporter: lrz Fix For: 0.9.0 Attachments: image-2021-04-01-10-55-43-760.png Hudi use Class.forName(clazzName) to load user's class, which classloader is same as call,see here: !image-2021-04-01-10-55-43-760.png! if user move hudi-spark-bundle jar into spark classPath, and use --jar to add customer jars, then the caller classLoader will be AppClassLoader, and the customer jars will be load by spark's MutableURLClassLoader, then lead to ClassNotFoundException -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail
lrz created HUDI-1749: - Summary: Clean/Compaction/Rollback command maybe never exit when operation fail Key: HUDI-1749 URL: https://issues.apache.org/jira/browse/HUDI-1749 Project: Apache Hudi Issue Type: Bug Reporter: lrz There are two issues: 1) After Clean/Compaction/Rollback command finish, yarn application will always show fail because the command exit directly without waitting for sparkContext stop. 2)when Clean/Compaction/Rollback command failed because of some exception, the command will never exit because of sparkContext didn't stop. This is because sparkUI use jetty, and introduce non-daemon thread, and sparkContext.stop will stopUI to stop the non-daemon thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1749) Clean/Compaction/Rollback command maybe never exit when operation fail
[ https://issues.apache.org/jira/browse/HUDI-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-1749: -- Fix Version/s: 0.9.0 > Clean/Compaction/Rollback command maybe never exit when operation fail > -- > > Key: HUDI-1749 > URL: https://issues.apache.org/jira/browse/HUDI-1749 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Fix For: 0.9.0 > > > There are two issues: > 1) After Clean/Compaction/Rollback command finish, yarn application will > always show fail because the command exit directly without waitting for > sparkContext stop. > 2)when Clean/Compaction/Rollback command failed because of some exception, > the command will never exit because of sparkContext didn't stop. This is > because sparkUI use jetty, and introduce non-daemon thread, and > sparkContext.stop will stopUI to stop the non-daemon thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1748) Read operation will possibility fail on mor table rt view when a write operations is concurrency running
lrz created HUDI-1748: - Summary: Read operation will possibility fail on mor table rt view when a write operations is concurrency running Key: HUDI-1748 URL: https://issues.apache.org/jira/browse/HUDI-1748 Project: Apache Hudi Issue Type: Bug Reporter: lrz Fix For: 0.9.0 during reading operation, a new base file maybe produced by a writting operation. then the reading will opooibility to get a NPE when getSplit. here is the exception stack: !https://wa.vision.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/2/15/qwx352829/7bacca8042104499b0991d50b4bc3f2a/image.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1744) [Rollback] rollback fail on mor table when the partition path hasn't any files
lrz created HUDI-1744: - Summary: [Rollback] rollback fail on mor table when the partition path hasn't any files Key: HUDI-1744 URL: https://issues.apache.org/jira/browse/HUDI-1744 Project: Apache Hudi Issue Type: Bug Reporter: lrz Fix For: 0.9.0 when rollback on a mor table, and if the partition path hasn't any files, then will throw exception because of call rdd.flatmap with 0 as numpartitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-57) [UMBRELLA] Support ORC Storage
[ https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235994#comment-17235994 ] lrz commented on HUDI-57: - Hi, [~vinoth] we are eager to use this feature. could you update any information when you free. also if you can help to deaggregate the sub tasks, then we would love to pick up some task, thank you very much > [UMBRELLA] Support ORC Storage > -- > > Key: HUDI-57 > URL: https://issues.apache.org/jira/browse/HUDI-57 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration, Writer Core >Reporter: Vinoth Chandar >Assignee: Mani Jindal >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/uber/hudi/issues/68] > https://github.com/uber/hudi/issues/155 -- This message was sent by Atlassian Jira (v8.3.4#803005)