[jira] [Closed] (HUDI-3857) NoSuchMethodError: Continuous deltastreamer test with async compaction fails on EMR spark
[ https://issues.apache.org/jira/browse/HUDI-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-3857. - Resolution: Cannot Reproduce This happens with EMR spark. It does not reproduce with OSS spark. Either use OSS spark or replace the emr specific spark-sql-amzn*.jar from with OSS spark-sql jar : https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.12/3.1.2/spark-sql_2.12-3.1.2.jar > NoSuchMethodError: Continuous deltastreamer test with async compaction fails > on EMR spark > -- > > Key: HUDI-3857 > URL: https://issues.apache.org/jira/browse/HUDI-3857 > Project: Apache Hudi > Issue Type: Test >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.11.0 > > > EMR 6.5, Spark 3.1.2 > While running continuous deltastreamer with async compaction enabled, I hit > this exception > {code:java} > Caused by: java.lang.NoSuchMethodError: > org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V > at > org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:130) > at scala.Option.map(Option.scala:230) > at > org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:128) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.hudi.MergeOnReadSnapshotRelation.buildSplits(MergeOnReadSnapshotRelation.scala:124) > at > org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:108) > at > org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:44) > at > org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:221) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`
hudi-bot commented on PR #5296: URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096197206 ## CI report: * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) * 9458d847182b0628d228211d010310ade743d431 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3843) Make Flink 1.13.x 1.14.x build with scala 2.11
[ https://issues.apache.org/jira/browse/HUDI-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3843: - Reviewers: Ethan Guo > Make Flink 1.13.x 1.14.x build with scala 2.11 > -- > > Key: HUDI-3843 > URL: https://issues.apache.org/jira/browse/HUDI-3843 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Status: Patch Available (was: In Progress) > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
[ https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3855: - Reviewers: Raymond Xu, sivabalan narayanan > Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle > > > Key: HUDI-3855 > URL: https://issues.apache.org/jira/browse/HUDI-3855 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > As was reported by the user here: > [https://github.com/apache/hudi/issues/5231] > > Quoting: > So i was able to reproduce behavior that you're seeing and it turns out to be > that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning > that during C3, all records are copied from latest base-file of the > file-group into new latest base-file (in your most recent experiment it's > {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}}) > but it doesn't update the {{_hoodie_file_name}} field which is kept pointing > at the old file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized
[ https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3738: - Reviewers: sivabalan narayanan > Perf comparison between parquet and hudi for COW snapshot and MOR read > optimized > > > Key: HUDI-3738 > URL: https://issues.apache.org/jira/browse/HUDI-3738 > Project: Apache Hudi > Issue Type: Task > Components: performance >Reporter: sivabalan narayanan >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3839) List of MT partitions to be updated is selected incorrectly
[ https://issues.apache.org/jira/browse/HUDI-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3839: - Reviewers: Sagar Sumit > List of MT partitions to be updated is selected incorrectly > --- > > Key: HUDI-3839 > URL: https://issues.apache.org/jira/browse/HUDI-3839 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
[ https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3855: - Status: Patch Available (was: In Progress) > Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle > > > Key: HUDI-3855 > URL: https://issues.apache.org/jira/browse/HUDI-3855 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > As was reported by the user here: > [https://github.com/apache/hudi/issues/5231] > > Quoting: > So i was able to reproduce behavior that you're seeing and it turns out to be > that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning > that during C3, all records are copied from latest base-file of the > file-group into new latest base-file (in your most recent experiment it's > {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}}) > but it doesn't update the {{_hoodie_file_name}} field which is kept pointing > at the old file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #5279: [HUDI-3843] Make flink profiles build with scala-2.11
hudi-bot commented on PR #5279: URL: https://github.com/apache/hudi/pull/5279#issuecomment-1096192323 ## CI report: * a6ad82e1a6d7c392f1f1b53937f4c5395b620c05 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8000) * 66bc1d1b54b7d5d7fbbc4db8e29b4ced675c2c8d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8004) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
[ https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3855: - Status: In Progress (was: Open) > Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle > > > Key: HUDI-3855 > URL: https://issues.apache.org/jira/browse/HUDI-3855 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > As was reported by the user here: > [https://github.com/apache/hudi/issues/5231] > > Quoting: > So i was able to reproduce behavior that you're seeing and it turns out to be > that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning > that during C3, all records are copied from latest base-file of the > file-group into new latest base-file (in your most recent experiment it's > {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}}) > but it doesn't update the {{_hoodie_file_name}} field which is kept pointing > at the old file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] easonwood commented on issue #5290: [SUPPORT] Problems in handling column deletions in Hudi
easonwood commented on issue #5290: URL: https://github.com/apache/hudi/issues/5290#issuecomment-1096191600 It seems this error does not influence the result. Data loaded to Hudi successfully. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…
hudi-bot commented on PR #4724: URL: https://github.com/apache/hudi/pull/4724#issuecomment-1096191258 ## CI report: * 20b1ee41afcb4cc5328bfb30e51e5a37bf0d46c7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7999) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5279: [HUDI-3843] Make flink profiles build with scala-2.11
hudi-bot commented on PR #5279: URL: https://github.com/apache/hudi/pull/5279#issuecomment-1096187448 ## CI report: * 9a6be184e3833e071054afc5d0db55bb2336dd5c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7997) * a6ad82e1a6d7c392f1f1b53937f4c5395b620c05 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8000) * 66bc1d1b54b7d5d7fbbc4db8e29b4ced675c2c8d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5297: [HUDI-3859] Fix spark profiles and utilities-slim dep
hudi-bot commented on PR #5297: URL: https://github.com/apache/hudi/pull/5297#issuecomment-1096182625 ## CI report: * cc81ebb8f2b84b9ada13927de9c30a1b69864f2f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8003) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhilinli123 commented on issue #4881: Full incremental Enable index loading to discover duplicate data(index.bootstrap.enabled)
zhilinli123 commented on issue #4881: URL: https://github.com/apache/hudi/issues/4881#issuecomment-1096179551 @nsivabalan Will the current issue be fixed when the next version is released -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5297: [HUDI-3859] Fix spark profiles and utilities-slim dep
hudi-bot commented on PR #5297: URL: https://github.com/apache/hudi/pull/5297#issuecomment-1096178036 ## CI report: * cc81ebb8f2b84b9ada13927de9c30a1b69864f2f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3859: - Labels: pull-request-available (was: ) > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] xushiyan opened a new pull request, #5297: [HUDI-3859] Fix spark profiles and utilities-slim dep
xushiyan opened a new pull request, #5297: URL: https://github.com/apache/hudi/pull/5297 ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`
hudi-bot commented on PR #5296: URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096165047 ## CI report: * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3845) Fix delete mor table's partition with urlencode's error
[ https://issues.apache.org/jira/browse/HUDI-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3845: - Priority: Critical (was: Major) > Fix delete mor table's partition with urlencode's error > --- > > Key: HUDI-3845 > URL: https://issues.apache.org/jira/browse/HUDI-3845 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.0 > > > {code:java} > // code placeholder > 4604 [ScalaTest-run-running-TestDeleteTable] WARN > org.apache.hudi.common.config.DFSPropertiesConfiguration - Properties file > file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file > 5412 [ScalaTest-run-running-TestDeleteTable] WARN > org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not > found at path > /private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/.hoodie/metadata > 7658 [Executor task launch worker for task 8] WARN > org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: > tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 20677 [Executor task launch worker for task 106] ERROR > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Got > IOException when reading log file > java.io.FileNotFoundException: File > file:/private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/2021%252F10%252F01/.a2c7f463-7f0e-48b9-aa66-e56d62847b05-0_20220410194408951.log.1_0-83-88 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:634) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:860) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:624) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:446) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at > org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:472) > at > org.apache.hudi.common.table.log.HoodieLogFileReader.(HoodieLogFileReader.java:111) > at > org.apache.hudi.common.table.log.HoodieLogFormatReader.(HoodieLogFormatReader.java:70) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:218) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:191) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:106) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:99) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:317) > at > org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:370) > at > org.apache.hudi.HoodieMergeOnReadRDD$LogFileIterator.(HoodieMergeOnReadRDD.scala:172) > at > org.apache.hudi.HoodieMergeOnReadRDD$RecordMergingFileIterator.(HoodieMergeOnReadRDD.scala:252) > {code} > The partition path has urlencode, so it cannot be found. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3845) Fix delete mor table's partition with urlencode's error
[ https://issues.apache.org/jira/browse/HUDI-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3845: - Fix Version/s: 0.12.0 > Fix delete mor table's partition with urlencode's error > --- > > Key: HUDI-3845 > URL: https://issues.apache.org/jira/browse/HUDI-3845 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > {code:java} > // code placeholder > 4604 [ScalaTest-run-running-TestDeleteTable] WARN > org.apache.hudi.common.config.DFSPropertiesConfiguration - Properties file > file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file > 5412 [ScalaTest-run-running-TestDeleteTable] WARN > org.apache.hudi.metadata.HoodieBackedTableMetadata - Metadata table was not > found at path > /private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/.hoodie/metadata > 7658 [Executor task launch worker for task 8] WARN > org.apache.hadoop.metrics2.impl.MetricsConfig - Cannot locate configuration: > tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 20677 [Executor task launch worker for task 106] ERROR > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Got > IOException when reading log file > java.io.FileNotFoundException: File > file:/private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/2021%252F10%252F01/.a2c7f463-7f0e-48b9-aa66-e56d62847b05-0_20220410194408951.log.1_0-83-88 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:634) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:860) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:624) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:446) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at > org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:472) > at > org.apache.hudi.common.table.log.HoodieLogFileReader.(HoodieLogFileReader.java:111) > at > org.apache.hudi.common.table.log.HoodieLogFormatReader.(HoodieLogFormatReader.java:70) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:218) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:191) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:106) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:99) > at > org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:317) > at > org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:370) > at > org.apache.hudi.HoodieMergeOnReadRDD$LogFileIterator.(HoodieMergeOnReadRDD.scala:172) > at > org.apache.hudi.HoodieMergeOnReadRDD$RecordMergingFileIterator.(HoodieMergeOnReadRDD.scala:252) > {code} > The partition path has urlencode, so it cannot be found. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`
hudi-bot commented on PR #5296: URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096159804 ## CI report: * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
[ https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3855: - Labels: pull-request-available (was: ) > Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle > > > Key: HUDI-3855 > URL: https://issues.apache.org/jira/browse/HUDI-3855 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > As was reported by the user here: > [https://github.com/apache/hudi/issues/5231] > > Quoting: > So i was able to reproduce behavior that you're seeing and it turns out to be > that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning > that during C3, all records are copied from latest base-file of the > file-group into new latest base-file (in your most recent experiment it's > {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}}) > but it doesn't update the {{_hoodie_file_name}} field which is kept pointing > at the old file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] alexeykudinkin opened a new pull request, #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`
alexeykudinkin opened a new pull request, #5296: URL: https://github.com/apache/hudi/pull/5296 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`, in cases when old-record is carried over from existing file as is. ## Brief change log - Revisited HoodieFileWriter API to accept HoodieKey instead of `HoodieRecord` - Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply old record is carried over - Exposing standard JVM's debugger ports in Docker setup ## Verify this pull request This pull request is already covered by existing tests, such as *(please describe tests)*. This change added tests and can be verified as follows: ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3819: Assignee: Sagar Sumit (was: Raymond Xu) > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Reviewers: Raymond Xu (was: Sagar Sumit) > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Status: In Progress (was: Open) > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Status: In Progress (was: Open) > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3746) CI ignored test failure in TestDataSkippingUtils
[ https://issues.apache.org/jira/browse/HUDI-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3746: - Priority: Blocker (was: Major) > CI ignored test failure in TestDataSkippingUtils > > > Key: HUDI-3746 > URL: https://issues.apache.org/jira/browse/HUDI-3746 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: ci.log.zip > > > failure in > TestDataSkippingUtils > was ignored. something to do with Junit in Scala maybe? > See the attached CI logs and search for `TestDataSkippingUtils` -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3746) CI ignored test failure in TestDataSkippingUtils
[ https://issues.apache.org/jira/browse/HUDI-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3746: - Fix Version/s: 0.12.0 (was: 0.11.0) > CI ignored test failure in TestDataSkippingUtils > > > Key: HUDI-3746 > URL: https://issues.apache.org/jira/browse/HUDI-3746 > Project: Apache Hudi > Issue Type: Bug > Components: tests-ci >Reporter: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.0 > > Attachments: ci.log.zip > > > failure in > TestDataSkippingUtils > was ignored. something to do with Junit in Scala maybe? > See the attached CI logs and search for `TestDataSkippingUtils` -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1976: - Component/s: dependencies > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies, hive >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1976: - Issue Type: Improvement (was: Task) > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1976: - Fix Version/s: 0.12.0 (was: 0.11.0) > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > > > Key: HUDI-1976 > URL: https://issues.apache.org/jira/browse/HUDI-1976 > Project: Apache Hudi > Issue Type: Task > Components: hive >Reporter: Nishith Agarwal >Assignee: Vinay >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.0 > > > [https://github.com/apache/hudi/issues/2827] > [https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826] > [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark
[ https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3096: Assignee: Tao Meng > fixed the bug that the cow table(contains decimalType) write by flink cannot > be read by spark > -- > > Key: HUDI-3096 > URL: https://issues.apache.org/jira/browse/HUDI-3096 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.10.0 > Environment: flink 1.13.1 > spark 3.1.1 >Reporter: Tao Meng >Assignee: Tao Meng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > now, flink will write decimalType as byte[] > when spark read that decimal Type, if spark find the precision of current > decimal is small spark treat it as int/long which caused the fllow error: > > Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet > column cannot be converted in file > hdfs://x/tmp/hudi/hudi_x/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet. > Column: [c7], Expected: decimal(10,4), Found: BINARY > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown > Source) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark
[ https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3096. Resolution: Fixed > fixed the bug that the cow table(contains decimalType) write by flink cannot > be read by spark > -- > > Key: HUDI-3096 > URL: https://issues.apache.org/jira/browse/HUDI-3096 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.10.0 > Environment: flink 1.13.1 > spark 3.1.1 >Reporter: Tao Meng >Assignee: Tao Meng >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > now, flink will write decimalType as byte[] > when spark read that decimal Type, if spark find the precision of current > decimal is small spark treat it as int/long which caused the fllow error: > > Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet > column cannot be converted in file > hdfs://x/tmp/hudi/hudi_x/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet. > Column: [c7], Expected: decimal(10,4), Found: BINARY > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) > at > org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown > Source) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3680) Update docs to reflect new Bundles Spark compatibility
[ https://issues.apache.org/jira/browse/HUDI-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3680: - Story Points: 1 (was: 2) > Update docs to reflect new Bundles Spark compatibility > --- > > Key: HUDI-3680 > URL: https://issues.apache.org/jira/browse/HUDI-3680 > Project: Apache Hudi > Issue Type: Task >Reporter: Alexey Kudinkin >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > We need to make sure that we reflect the new Spark compatibility approach for > Hudi bundles (pledging to stay compatible w/in Spark minor version branch) > Channels to update: > # Dev-list > # Docs on the website > # Docs in README > # Slack? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #5295: [HUDI-3858]Shade javax.servlet for hudi-spark-bundle
hudi-bot commented on PR #5295: URL: https://github.com/apache/hudi/pull/5295#issuecomment-1096115125 ## CI report: * c900ecc8741fdcab9c1a4c156e410d6e8462a457 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7998) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks
[ https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520924#comment-17520924 ] Raymond Xu commented on HUDI-3749: -- [~shivnarayan] what's the done criteria for this ticket? can you please put down in the description? > Run latest hudi w/ EMR spark and report to aws folks > > > Key: HUDI-3749 > URL: https://issues.apache.org/jira/browse/HUDI-3749 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2606) Ensure query engines not access MDT if disabled
[ https://issues.apache.org/jira/browse/HUDI-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2606: - Reviewers: Ethan Guo (was: Ethan Guo, sivabalan narayanan) > Ensure query engines not access MDT if disabled > --- > > Key: HUDI-2606 > URL: https://issues.apache.org/jira/browse/HUDI-2606 > Project: Apache Hudi > Issue Type: Task > Components: metadata, reader-core >Reporter: sivabalan narayanan >Assignee: Tao Meng >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > This is to visit all the read code paths and ensure when metadata is > disabled, query engines won't read from metadata table. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3826) Commands deleting partitions do so incorrectly
[ https://issues.apache.org/jira/browse/HUDI-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3826: - Reviewers: Alexey Kudinkin > Commands deleting partitions do so incorrectly > -- > > Key: HUDI-3826 > URL: https://issues.apache.org/jira/browse/HUDI-3826 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: Forward Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Currently, `TruncateHoodieTableCommand` as well as > `AlterHoodieTableDropPartitionCommand` deletes partitions from Hudi table by > simply removing corresponding partition folders w/o committing any changes > (and correspondingly updating the MT for ex) > Instead it should go t/h WriteClient's `deletePartitions` API, similar to > Spark DS does when gets Hudi's DELETE command > You can see that when enable Column Stats Index by default and running our CI > (Setting "hoodie.metadata.index.column.stats.enable" > and "hoodie.metadata.enable" to true) > https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=7926&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled
[ https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3707: Assignee: Sagar Sumit > Fix deltastreamer test with schema provider and transformer enabled > --- > > Key: HUDI-3707 > URL: https://issues.apache.org/jira/browse/HUDI-3707 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: Raymond Xu >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.11.0, 0.12.0 > > > Fix cases like this > @Disabled("To investigate problem with schema provider and transformer") > in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Reviewers: Sagar Sumit > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-1602: Assignee: Sagar Sumit > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Assignee: Sagar Sumit >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.
[jira] [Assigned] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3819: Assignee: Raymond Xu > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Reviewers: Ethan Guo > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3752) Update website content based on 0.11 new features
[ https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3752: Assignee: Raymond Xu > Update website content based on 0.11 new features > - > > Key: HUDI-3752 > URL: https://issues.apache.org/jira/browse/HUDI-3752 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > content to update > - utilities slim bundle https://github.com/apache/hudi/pull/5184/files -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3859: Assignee: Raymond Xu > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Sprint: Hudi-Sprint-Apr-12 > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Story Points: 0.5 > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Component/s: dependencies > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3859) Remove parquet-avro from utilities-slim
Raymond Xu created HUDI-3859: Summary: Remove parquet-avro from utilities-slim Key: HUDI-3859 URL: https://issues.apache.org/jira/browse/HUDI-3859 Project: Apache Hudi Issue Type: Improvement Reporter: Raymond Xu -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim
[ https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3859: - Fix Version/s: 0.11.0 > Remove parquet-avro from utilities-slim > --- > > Key: HUDI-3859 > URL: https://issues.apache.org/jira/browse/HUDI-3859 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Story Points: 0.5 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scal
[jira] [Updated] (HUDI-3838) Make Drop partition column config work with deltastreamer
[ https://issues.apache.org/jira/browse/HUDI-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3838: - Story Points: 1 > Make Drop partition column config work with deltastreamer > - > > Key: HUDI-3838 > URL: https://issues.apache.org/jira/browse/HUDI-3838 > Project: Apache Hudi > Issue Type: Improvement > Components: meta-sync >Reporter: Raymond Xu >Assignee: Vinoth Govindarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > hoodie.datasource.write.drop.partition.columns only works for datasource > writer. HoodieDeltaStreamer is not using it. We need it for deltastreamer -> > bigquery sync flow -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3724) Too many open files w/ COW spark long running tests
[ https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3724: - Story Points: 1 > Too many open files w/ COW spark long running tests > --- > > Key: HUDI-3724 > URL: https://issues.apache.org/jira/browse/HUDI-3724 > Project: Apache Hudi > Issue Type: Bug >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > We run integ tests against hudi and recently our spark long running tests are > failing for COW table with "too many open files". May be we have some leaks > and need to chase them and close it out. > {code:java} > ... 6 more > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor > driver): java.io.FileNotFoundException: > /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3 > (Too many open files) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3843) Make Flink 1.13.x 1.14.x build with scala 2.11
[ https://issues.apache.org/jira/browse/HUDI-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3843: - Story Points: 0.5 > Make Flink 1.13.x 1.14.x build with scala 2.11 > -- > > Key: HUDI-3843 > URL: https://issues.apache.org/jira/browse/HUDI-3843 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Raymond Xu >Assignee: Raymond Xu >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized
[ https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3738: - Story Points: 1 > Perf comparison between parquet and hudi for COW snapshot and MOR read > optimized > > > Key: HUDI-3738 > URL: https://issues.apache.org/jira/browse/HUDI-3738 > Project: Apache Hudi > Issue Type: Task > Components: performance >Reporter: sivabalan narayanan >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features
[ https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3752: - Sprint: Hudi-Sprint-Apr-12 > Update website content based on 0.11 new features > - > > Key: HUDI-3752 > URL: https://issues.apache.org/jira/browse/HUDI-3752 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > content to update > - utilities slim bundle https://github.com/apache/hudi/pull/5184/files -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features
[ https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3752: - Story Points: 2 > Update website content based on 0.11 new features > - > > Key: HUDI-3752 > URL: https://issues.apache.org/jira/browse/HUDI-3752 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > content to update > - utilities slim bundle https://github.com/apache/hudi/pull/5184/files -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Story Points: 0.5 > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks
[ https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3749: - Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-12 (was: Hudi-Sprint-Mar-22) > Run latest hudi w/ EMR spark and report to aws folks > > > Key: HUDI-3749 > URL: https://issues.apache.org/jira/browse/HUDI-3749 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks
[ https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3749: - Story Points: 1 > Run latest hudi w/ EMR spark and report to aws folks > > > Key: HUDI-3749 > URL: https://issues.apache.org/jira/browse/HUDI-3749 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs
[ https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1605: - Sprint: Hudi-Sprint-Apr-12 > Add more documentation around archival process and configs > -- > > Key: HUDI-1605 > URL: https://issues.apache.org/jira/browse/HUDI-1605 > Project: Apache Hudi > Issue Type: Task > Components: docs >Affects Versions: 0.9.0 >Reporter: sivabalan narayanan >Assignee: Kyle Weller >Priority: Blocker > Labels: user-support-issues > Fix For: 0.11.0 > > > Reference: > What is the trade-off in lowering {{hoodie.keep.max.commits}} and > {{hoodie.keep.min.commits}}? > https://github.com/apache/hudi/issues/2408#issuecomment-758360941 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled
[ https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3707: - Story Points: 2 > Fix deltastreamer test with schema provider and transformer enabled > --- > > Key: HUDI-3707 > URL: https://issues.apache.org/jira/browse/HUDI-3707 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0, 0.12.0 > > > Fix cases like this > @Disabled("To investigate problem with schema provider and transformer") > in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs
[ https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3036: - Sprint: Hudi-Sprint-Apr-12 > Enhance Cleaner Docs > > > Key: HUDI-3036 > URL: https://issues.apache.org/jira/browse/HUDI-3036 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > > This blog has rich info that should be in the docs: > [https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/] > Slack disc mention: > https://apache-hudi.slack.com/archives/C4D716NPQ/p1639497026391400 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
[ https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3855: - Story Points: 1 > Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle > > > Key: HUDI-3855 > URL: https://issues.apache.org/jira/browse/HUDI-3855 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > > As was reported by the user here: > [https://github.com/apache/hudi/issues/5231] > > Quoting: > So i was able to reproduce behavior that you're seeing and it turns out to be > that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning > that during C3, all records are copied from latest base-file of the > file-group into new latest base-file (in your most recent experiment it's > {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}}) > but it doesn't update the {{_hoodie_file_name}} field which is kept pointing > at the old file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs
[ https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3036: - Story Points: 0 > Enhance Cleaner Docs > > > Key: HUDI-3036 > URL: https://issues.apache.org/jira/browse/HUDI-3036 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > > This blog has rich info that should be in the docs: > [https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/] > Slack disc mention: > https://apache-hudi.slack.com/archives/C4D716NPQ/p1639497026391400 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs
[ https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1605: - Story Points: 0 > Add more documentation around archival process and configs > -- > > Key: HUDI-1605 > URL: https://issues.apache.org/jira/browse/HUDI-1605 > Project: Apache Hudi > Issue Type: Task > Components: docs >Affects Versions: 0.9.0 >Reporter: sivabalan narayanan >Assignee: Kyle Weller >Priority: Blocker > Labels: user-support-issues > Fix For: 0.11.0 > > > Reference: > What is the trade-off in lowering {{hoodie.keep.max.commits}} and > {{hoodie.keep.min.commits}}? > https://github.com/apache/hudi/issues/2408#issuecomment-758360941 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks
[ https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3749: - Priority: Blocker (was: Critical) > Run latest hudi w/ EMR spark and report to aws folks > > > Key: HUDI-3749 > URL: https://issues.apache.org/jira/browse/HUDI-3749 > Project: Apache Hudi > Issue Type: Task > Components: tests-ci >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions
[ https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2946: - Priority: Critical (was: Major) > Upgrade maven plugin to make Hudi be compatible with higher Java versions > - > > Key: HUDI-2946 > URL: https://issues.apache.org/jira/browse/HUDI-2946 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Wenning Ding >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.0 > > > I saw several issues while building Hudi w/ Java 11: > > {{[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project > hudi-common: Execution default of goal > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API > incompatibility was encountered while executing > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: > java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project > hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR > /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar > entry org/apache/hudi/hadoop/bundle/Main.class: > java.lang.IllegalArgumentException -> [Help 1]}} > > We need to upgrade maven plugin versions to make it be compatible with Java > 11. > Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 > [https://github.com/spotify/dockerfile-maven/pull/230] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions
[ https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2946: - Component/s: dependencies > Upgrade maven plugin to make Hudi be compatible with higher Java versions > - > > Key: HUDI-2946 > URL: https://issues.apache.org/jira/browse/HUDI-2946 > Project: Apache Hudi > Issue Type: Improvement > Components: dependencies >Reporter: Wenning Ding >Priority: Critical > Labels: pull-request-available > Fix For: 0.12.0 > > > I saw several issues while building Hudi w/ Java 11: > > {{[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project > hudi-common: Execution default of goal > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API > incompatibility was encountered while executing > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: > java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project > hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR > /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar > entry org/apache/hudi/hadoop/bundle/Main.class: > java.lang.IllegalArgumentException -> [Help 1]}} > > We need to upgrade maven plugin versions to make it be compatible with Java > 11. > Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 > [https://github.com/spotify/dockerfile-maven/pull/230] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions
[ https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2946: - Fix Version/s: 0.12.0 (was: 0.11.0) > Upgrade maven plugin to make Hudi be compatible with higher Java versions > - > > Key: HUDI-2946 > URL: https://issues.apache.org/jira/browse/HUDI-2946 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Wenning Ding >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > I saw several issues while building Hudi w/ Java 11: > > {{[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project > hudi-common: Execution default of goal > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API > incompatibility was encountered while executing > org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: > java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal > org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project > hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR > /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar > entry org/apache/hudi/hadoop/bundle/Main.class: > java.lang.IllegalArgumentException -> [Help 1]}} > > We need to upgrade maven plugin versions to make it be compatible with Java > 11. > Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 > [https://github.com/spotify/dockerfile-maven/pull/230] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs
[ https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3036: - Priority: Blocker (was: Major) > Enhance Cleaner Docs > > > Key: HUDI-3036 > URL: https://issues.apache.org/jira/browse/HUDI-3036 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > > This blog has rich info that should be in the docs: > [https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/] > Slack disc mention: > https://apache-hudi.slack.com/archives/C4D716NPQ/p1639497026391400 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb
[ https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3067: - Fix Version/s: 0.12.0 (was: 0.11.0) > "Table already exists" error with multiple writers and dynamodb > --- > > Key: HUDI-3067 > URL: https://issues.apache.org/jira/browse/HUDI-3067 > Project: Apache Hudi > Issue Type: Bug >Reporter: Nikita Sheremet >Assignee: Wenning Ding >Priority: Critical > Fix For: 0.12.0 > > > How reproduce: > # Set up multiple writing > [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not > forget to set _hoodie.write.lock.dynamodb.region_ and > {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb > table. > # Run multiple writers to the table > (Tested on aws EMR, so multiple writers is EMR steps) > Expected result - all steps completed. > Actual result: some steps failed with exception > {code:java} > Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: > Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status > Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77) > ... 54 more > 21/12/19 13:42:06 INFO Yar {code} > This happens because all steps tried to create table at the same time. > > Suggested solution: > A catch statment for _Table already exists_ exception should be added into > dynamodb table creation code. May be with delay and additional check that > table is present. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb
[ https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3067: - Priority: Critical (was: Major) > "Table already exists" error with multiple writers and dynamodb > --- > > Key: HUDI-3067 > URL: https://issues.apache.org/jira/browse/HUDI-3067 > Project: Apache Hudi > Issue Type: Bug >Reporter: Nikita Sheremet >Assignee: Wenning Ding >Priority: Critical > Fix For: 0.11.0 > > > How reproduce: > # Set up multiple writing > [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not > forget to set _hoodie.write.lock.dynamodb.region_ and > {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb > table. > # Run multiple writers to the table > (Tested on aws EMR, so multiple writers is EMR steps) > Expected result - all steps completed. > Actual result: some steps failed with exception > {code:java} > Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: > Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status > Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99) > at > org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77) > ... 54 more > 21/12/19 13:42:06 INFO Yar {code} > This happens because all steps tried to create table at the same time. > > Suggested solution: > A catch statment for _Table already exists_ exception should be added into > dynamodb table creation code. May be with delay and additional check that > table is present. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features
[ https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3752: - Priority: Blocker (was: Major) > Update website content based on 0.11 new features > - > > Key: HUDI-3752 > URL: https://issues.apache.org/jira/browse/HUDI-3752 > Project: Apache Hudi > Issue Type: Task > Components: docs >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0 > > > content to update > - utilities slim bundle https://github.com/apache/hudi/pull/5184/files -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3344. Reviewers: sivabalan narayanan Resolution: Done > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement > Components: code-quality >Reporter: qian >Assignee: qian >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3344: Assignee: qian > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement >Reporter: qian >Assignee: qian >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3344: - Component/s: code-quality > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement > Components: code-quality >Reporter: qian >Assignee: qian >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3344: - Priority: Minor (was: Major) > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement >Reporter: qian >Assignee: qian >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3344: - Priority: Trivial (was: Minor) > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement > Components: code-quality >Reporter: qian >Assignee: qian >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3344: - Fix Version/s: 0.12.0 (was: 0.11.0) > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement >Reporter: qian >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala
[ https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3344: - Fix Version/s: 0.11.0 (was: 0.12.0) > Standard code format for HoodieDataSourceExample.scala > --- > > Key: HUDI-3344 > URL: https://issues.apache.org/jira/browse/HUDI-3344 > Project: Apache Hudi > Issue Type: Improvement >Reporter: qian >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs
[ https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1605: - Priority: Blocker (was: Minor) > Add more documentation around archival process and configs > -- > > Key: HUDI-1605 > URL: https://issues.apache.org/jira/browse/HUDI-1605 > Project: Apache Hudi > Issue Type: Task > Components: docs >Affects Versions: 0.9.0 >Reporter: sivabalan narayanan >Assignee: Kyle Weller >Priority: Blocker > Labels: user-support-issues > Fix For: 0.11.0 > > > Reference: > What is the trade-off in lowering {{hoodie.keep.max.commits}} and > {{hoodie.keep.min.commits}}? > https://github.com/apache/hudi/issues/2408#issuecomment-758360941 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3577) NPE in HoodieTimelineArchiver
[ https://issues.apache.org/jira/browse/HUDI-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3577: - Story Points: 0.5 > NPE in HoodieTimelineArchiver > - > > Key: HUDI-3577 > URL: https://issues.apache.org/jira/browse/HUDI-3577 > Project: Apache Hudi > Issue Type: Bug > Components: archiving >Reporter: Alexey Kudinkin >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > > `testUpsertsContinuousModeWithMultipleWritersWithoutConflicts` does fail > periodically with NPE w/in HoodieTimelineArchiver > > {code:java} > 2022-03-05T22:51:18.0857636Z [ERROR] Tests run: 27, Failures: 0, Errors: 1, > Skipped: 9, Time elapsed: 423.786 s <<< FAILURE! - in JUnit Vintage > 2022-03-05T22:51:18.0858433Z [ERROR] HoodieTableType).[2] > MERGE_ON_READ(testUpsertsContinuousModeWithMultipleWritersWithoutConflicts > Time elapsed: 119.717 s <<< ERROR! > 2022-03-05T22:51:18.0859018Z java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.lang.NullPointerException > 2022-03-05T22:51:18.0859509Z at > java.util.concurrent.FutureTask.report(FutureTask.java:122) > 2022-03-05T22:51:18.0859935Z at > java.util.concurrent.FutureTask.get(FutureTask.java:192) > 2022-03-05T22:51:18.0860572Z at > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:394) > 2022-03-05T22:51:18.0861650Z at > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWritersWithoutConflicts(TestHoodieDeltaStreamerWithMultiWriter.java:204) > 2022-03-05T22:51:18.0862339Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2022-03-05T22:51:18.0862781Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2022-03-05T22:51:18.0863316Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2022-03-05T22:51:18.0863791Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2022-03-05T22:51:18.0864248Z at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688) > 2022-03-05T22:51:18.0864801Z at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2022-03-05T22:51:18.0865438Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2022-03-05T22:51:18.0866071Z at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2022-03-05T22:51:18.081Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140) > 2022-03-05T22:51:18.0867290Z at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92) > 2022-03-05T22:51:18.0867968Z at > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2022-03-05T22:51:18.0868613Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2022-03-05T22:51:18.0869275Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2022-03-05T22:51:18.0870081Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2022-03-05T22:51:18.0870716Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2022-03-05T22:51:18.0871365Z at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2022-03-05T22:51:18.0871953Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2022-03-05T22:51:18.0872494Z at > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2022-03-05T22:51:18.0873118Z at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212) > 2022-03-05T22:51:18.0873777Z at > org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > 2022-03-05T22:51:18.0874400Z at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208) > 2022-03-05T22:51:18.0875044Z at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137) > 2022-03-05T22:51:18.0875666Z at > org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71) > 2022-03-05T22:51:18.08762
[jira] [Updated] (HUDI-3804) Partition metadata is not properly created for Column Stats
[ https://issues.apache.org/jira/browse/HUDI-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3804: - Sprint: (was: Cont' improve - 2022/03/7) > Partition metadata is not properly created for Column Stats > --- > > Key: HUDI-3804 > URL: https://issues.apache.org/jira/browse/HUDI-3804 > Project: Apache Hudi > Issue Type: Bug >Reporter: Alexey Kudinkin >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > > Currently, when enabling Column Stats partition along with Files partition, > `AppendHandle` will be inserting records for both of them during MT updates. > However, AppendHandle does create Partition Metadata file only for Files > partition which leads to failures in validation of the MT. > Steps to reproduce: > # Enable MT and Column Stats > # Run `TestHoodieBackedMetadata.testTurnOffMetadataTableAfterEnable` test -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3810) Enabling point look ups does an extra full scan in addition to point look up for log reader readers with metadata
[ https://issues.apache.org/jira/browse/HUDI-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3810: - Sprint: (was: Cont' improve - 2022/03/7) > Enabling point look ups does an extra full scan in addition to point look up > for log reader readers with metadata > - > > Key: HUDI-3810 > URL: https://issues.apache.org/jira/browse/HUDI-3810 > Project: Apache Hudi > Issue Type: Bug > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3647) Ignore errors if metadata table has not been initialized fully
[ https://issues.apache.org/jira/browse/HUDI-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3647: - Sprint: Hudi-Sprint-Apr-12 > Ignore errors if metadata table has not been initialized fully > -- > > Key: HUDI-3647 > URL: https://issues.apache.org/jira/browse/HUDI-3647 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > HoodieMetadataTableValidator throws the following exceptions when the > metadata table is not fully initialized. These can be ignored and there > could be a fallback mechanism if metadata table is not ready for read. > {code:java} > org.apache.hudi.exception.HoodieIOException: Could not load Hoodie properties > from > file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_b_single_writer_async_services/b2_ds_mor_010nomt_011mt_conf/test_table/.hoodie/metadata/.hoodie/hoodie.properties > at > org.apache.hudi.common.table.HoodieTableConfig.(HoodieTableConfig.java:226) > at > org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:120) > at > org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:77) > at > org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:657) > at > org.apache.hudi.metadata.HoodieBackedTableMetadata.initIfNeeded(HoodieBackedTableMetadata.java:108) > at > org.apache.hudi.metadata.HoodieBackedTableMetadata.(HoodieBackedTableMetadata.java:97) > at > org.apache.hudi.metadata.HoodieTableMetadata.create(HoodieTableMetadata.java:111) > at > org.apache.hudi.metadata.HoodieTableMetadata.create(HoodieTableMetadata.java:105) > at > org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:296) > at > org.apache.hudi.utilities.HoodieMetadataTableValidator.validatePartitions(HoodieMetadataTableValidator.java:386) > at > org.apache.hudi.utilities.HoodieMetadataTableValidator.doMetadataTableValidation(HoodieMetadataTableValidator.java:349) > at > org.apache.hudi.utilities.HoodieMetadataTableValidator.doHoodieMetadataTableValidationOnce(HoodieMetadataTableValidator.java:324) > at > org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:310) > at > org.apache.hudi.utilities.HoodieMetadataTableValidator.main(HoodieMetadataTableValidator.java:294) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.FileNotFoundException: File > file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_b_single_writer_async_services/b2_ds_mor_010nomt_011mt_conf/test_table/.hoodie/metadata/.hoodie/hoodie.properties.backup > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976) > at > org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:460) > at > org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:287) > at > org.apache.hudi.common.table.HoodieTableConfig.(HoodieTableConfig.java:216) > ... 25 more {code} > {code:java} > org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files > in partition >
[jira] [Updated] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled
[ https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3707: - Sprint: Hudi-Sprint-Apr-12 > Fix deltastreamer test with schema provider and transformer enabled > --- > > Key: HUDI-3707 > URL: https://issues.apache.org/jira/browse/HUDI-3707 > Project: Apache Hudi > Issue Type: Test > Components: tests-ci >Reporter: Raymond Xu >Priority: Blocker > Fix For: 0.11.0, 0.12.0 > > > Fix cases like this > @Disabled("To investigate problem with schema provider and transformer") > in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Sprint: Hudi-Sprint-Apr-12 > Corrupted Avro schema extracted from parquet file > - > > Key: HUDI-1602 > URL: https://issues.apache.org/jira/browse/HUDI-1602 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Alexander Filipchik >Priority: Blocker > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > > we are running a HUDI deltastreamer on a very complex stream. Schema is > deeply nested, with several levels of hierarchy (avro schema is around 6600 > LOC). > > The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently > started attempts to upgrade to the latest. Hovewer, latest HUDI can't read > the provided dataset. Exception I get: > > > {code:java} > Got exception while parsing the arguments:Got exception while parsing the > arguments:Found recursive reference in Avro schema, which can not be > processed by Spark:{ "type" : "record", "name" : "array", "fields" : [ { > "name" : "id", "type" : [ "null", "string" ], "default" : null }, { > "name" : "type", "type" : [ "null", "string" ], "default" : null }, { > "name" : "exist", "type" : [ "null", "boolean" ], "default" : null > } ]} Stack > trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive > reference in Avro schema, which can not be processed by Spark:{ "type" : > "record", "name" : "array", "fields" : [ { "name" : "id", "type" : [ > "null", "string" ], "default" : null }, { "name" : "type", "type" : > [ "null", "string" ], "default" : null }, { "name" : "exist", > "type" : [ "null", "boolean" ], "default" : null } ]} > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at > scala.collection.AbstractTraversable.map(Traversable.scala:104) at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81) > at > org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82) > at > org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) at > scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at > scala.collection.AbstractIterable.foreach(Iterable.scala:54) at > scala.collection.TraversableLike$class.map(Traversable
[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Sprint: Hudi-Sprint-Apr-12 > upgrade spring cve-2022-22965 > - > > Key: HUDI-3819 > URL: https://issues.apache.org/jira/browse/HUDI-3819 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.9.0, 0.10.1 >Reporter: Jason-Morries Adam >Priority: Blocker > Fix For: 0.11.0 > > > We should upgrade the Spring Framework version at Hudi CLI because of > cve-2022-22965. The Qualys Scanner finds these packages and raises a warning > because of the existence of these files on the system. > The found files are: > /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar > /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar > More Information: > Spring Framework: https://spring.io/projects/spring-framework > Spring project spring-framework release notes: > https://github.com/spring-projects/spring-framework/releases > CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
[ https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3855: - Sprint: Hudi-Sprint-Apr-12 > Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle > > > Key: HUDI-3855 > URL: https://issues.apache.org/jira/browse/HUDI-3855 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Alexey Kudinkin >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > > As was reported by the user here: > [https://github.com/apache/hudi/issues/5231] > > Quoting: > So i was able to reproduce behavior that you're seeing and it turns out to be > that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning > that during C3, all records are copied from latest base-file of the > file-group into new latest base-file (in your most recent experiment it's > {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}}) > but it doesn't update the {{_hoodie_file_name}} field which is kept pointing > at the old file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3013) Docs for Presto and Hudi
[ https://issues.apache.org/jira/browse/HUDI-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3013: - Sprint: Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05) > Docs for Presto and Hudi > > > Key: HUDI-3013 > URL: https://issues.apache.org/jira/browse/HUDI-3013 > Project: Apache Hudi > Issue Type: Task > Components: trino-presto >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3724) Too many open files w/ COW spark long running tests
[ https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3724: - Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05) > Too many open files w/ COW spark long running tests > --- > > Key: HUDI-3724 > URL: https://issues.apache.org/jira/browse/HUDI-3724 > Project: Apache Hudi > Issue Type: Bug >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > We run integ tests against hudi and recently our spark long running tests are > failing for COW table with "too many open files". May be we have some leaks > and need to chase them and close it out. > {code:java} > ... 6 more > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor > driver): java.io.FileNotFoundException: > /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3 > (Too many open files) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3857) NoSuchMethodError: Continuous deltastreamer test with async compaction fails on EMR spark
[ https://issues.apache.org/jira/browse/HUDI-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3857: - Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Apr-05) > NoSuchMethodError: Continuous deltastreamer test with async compaction fails > on EMR spark > -- > > Key: HUDI-3857 > URL: https://issues.apache.org/jira/browse/HUDI-3857 > Project: Apache Hudi > Issue Type: Test >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.11.0 > > > EMR 6.5, Spark 3.1.2 > While running continuous deltastreamer with async compaction enabled, I hit > this exception > {code:java} > Caused by: java.lang.NoSuchMethodError: > org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V > at > org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:130) > at scala.Option.map(Option.scala:230) > at > org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:128) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.hudi.MergeOnReadSnapshotRelation.buildSplits(MergeOnReadSnapshotRelation.scala:124) > at > org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:108) > at > org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:44) > at > org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:221) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized
[ https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3738: - Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05) > Perf comparison between parquet and hudi for COW snapshot and MOR read > optimized > > > Key: HUDI-3738 > URL: https://issues.apache.org/jira/browse/HUDI-3738 > Project: Apache Hudi > Issue Type: Task > Components: performance >Reporter: sivabalan narayanan >Assignee: Alexey Kudinkin >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3799) Understand reason behind "Not an avro data file" with hudi
[ https://issues.apache.org/jira/browse/HUDI-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3799: - Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Apr-05) > Understand reason behind "Not an avro data file" with hudi > -- > > Key: HUDI-3799 > URL: https://issues.apache.org/jira/browse/HUDI-3799 > Project: Apache Hudi > Issue Type: Task >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > We merged [https://github.com/apache/hudi/pull/4016] to tackle "Not an avro > data file" exception while cleaning or archiving. We need to understand why > and when such exception happens. and try to mitigate it before happening if > feasible. > > Atleast we should have a good understanding of what are the conditions under > which this is expected. > > Ref: https://github.com/apache/hudi/pull/4016#pullrequestreview-841692564 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3806) Improve HoodieBloomIndex using bloom_filter and col_stats in MDT
[ https://issues.apache.org/jira/browse/HUDI-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3806: - Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Apr-05) > Improve HoodieBloomIndex using bloom_filter and col_stats in MDT > > > Key: HUDI-3806 > URL: https://issues.apache.org/jira/browse/HUDI-3806 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3207) Hudi Trino connector PR review
[ https://issues.apache.org/jira/browse/HUDI-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3207: - Sprint: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7, Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7, Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05) > Hudi Trino connector PR review > -- > > Key: HUDI-3207 > URL: https://issues.apache.org/jira/browse/HUDI-3207 > Project: Apache Hudi > Issue Type: Task > Components: trino-presto >Reporter: Ethan Guo >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.11.0 > > > https://github.com/trinodb/trino/pull/10228 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2606) Ensure query engines not access MDT if disabled
[ https://issues.apache.org/jira/browse/HUDI-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2606: - Sprint: Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6 (was: Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05) > Ensure query engines not access MDT if disabled > --- > > Key: HUDI-2606 > URL: https://issues.apache.org/jira/browse/HUDI-2606 > Project: Apache Hudi > Issue Type: Task > Components: metadata, reader-core >Reporter: sivabalan narayanan >Assignee: Tao Meng >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > This is to visit all the read code paths and ensure when metadata is > disabled, query engines won't read from metadata table. -- This message was sent by Atlassian Jira (v8.20.1#820001)