date:20220411

[jira] [Closed] (HUDI-3857) NoSuchMethodError: Continuous deltastreamer test with async compaction fails on EMR spark

2022-04-11 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-3857.
-
Resolution: Cannot Reproduce

This happens with EMR spark. It does not reproduce with OSS spark. Either use 
OSS spark or replace the emr specific spark-sql-amzn*.jar from with OSS 
spark-sql jar : 
https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.12/3.1.2/spark-sql_2.12-3.1.2.jar

> NoSuchMethodError: Continuous deltastreamer test with async compaction fails 
> on EMR spark 
> --
>
> Key: HUDI-3857
> URL: https://issues.apache.org/jira/browse/HUDI-3857
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.11.0
>
>
> EMR 6.5, Spark 3.1.2
> While running continuous deltastreamer with async compaction enabled, I hit 
> this exception
> {code:java}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:130)
>     at scala.Option.map(Option.scala:230)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:128)
>     at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>     at scala.collection.immutable.List.foreach(List.scala:392)
>     at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>     at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>     at scala.collection.immutable.List.map(List.scala:298)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.buildSplits(MergeOnReadSnapshotRelation.scala:124)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:108)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:44)
>     at 
> org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:221) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

2022-04-11 Thread GitBox



hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096197206

   
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002)
 
   * 9458d847182b0628d228211d010310ade743d431 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3843) Make Flink 1.13.x 1.14.x build with scala 2.11

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3843:
-
Reviewers: Ethan Guo

> Make Flink 1.13.x 1.14.x build with scala 2.11
> --
>
> Key: HUDI-3843
> URL: https://issues.apache.org/jira/browse/HUDI-3843
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Status: Patch Available  (was: In Progress)

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3855:
-
Reviewers: Raymond Xu, sivabalan narayanan

> Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
> 
>
> Key: HUDI-3855
> URL: https://issues.apache.org/jira/browse/HUDI-3855
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> As was reported by the user here: 
> [https://github.com/apache/hudi/issues/5231]
>  
> Quoting:
> So i was able to reproduce behavior that you're seeing and it turns out to be 
> that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning 
> that during C3, all records are copied from latest base-file of the 
> file-group into new latest base-file (in your most recent experiment it's 
> {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}})
>  but it doesn't update the {{_hoodie_file_name}} field which is kept pointing 
> at the old file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3738:
-
Reviewers: sivabalan narayanan

> Perf comparison between parquet and hudi for COW snapshot and MOR read 
> optimized
> 
>
> Key: HUDI-3738
> URL: https://issues.apache.org/jira/browse/HUDI-3738
> Project: Apache Hudi
>  Issue Type: Task
>  Components: performance
>Reporter: sivabalan narayanan
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3839) List of MT partitions to be updated is selected incorrectly

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3839:
-
Reviewers: Sagar Sumit

> List of MT partitions to be updated is selected incorrectly
> ---
>
> Key: HUDI-3839
> URL: https://issues.apache.org/jira/browse/HUDI-3839
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3855:
-
Status: Patch Available  (was: In Progress)

> Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
> 
>
> Key: HUDI-3855
> URL: https://issues.apache.org/jira/browse/HUDI-3855
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> As was reported by the user here: 
> [https://github.com/apache/hudi/issues/5231]
>  
> Quoting:
> So i was able to reproduce behavior that you're seeing and it turns out to be 
> that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning 
> that during C3, all records are copied from latest base-file of the 
> file-group into new latest base-file (in your most recent experiment it's 
> {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}})
>  but it doesn't update the {{_hoodie_file_name}} field which is kept pointing 
> at the old file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #5279: [HUDI-3843] Make flink profiles build with scala-2.11

2022-04-11 Thread GitBox



hudi-bot commented on PR #5279:
URL: https://github.com/apache/hudi/pull/5279#issuecomment-1096192323

   
   ## CI report:
   
   * a6ad82e1a6d7c392f1f1b53937f4c5395b620c05 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8000)
 
   * 66bc1d1b54b7d5d7fbbc4db8e29b4ced675c2c8d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8004)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3855:
-
Status: In Progress  (was: Open)

> Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
> 
>
> Key: HUDI-3855
> URL: https://issues.apache.org/jira/browse/HUDI-3855
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> As was reported by the user here: 
> [https://github.com/apache/hudi/issues/5231]
>  
> Quoting:
> So i was able to reproduce behavior that you're seeing and it turns out to be 
> that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning 
> that during C3, all records are copied from latest base-file of the 
> file-group into new latest base-file (in your most recent experiment it's 
> {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}})
>  but it doesn't update the {{_hoodie_file_name}} field which is kept pointing 
> at the old file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] easonwood commented on issue #5290: [SUPPORT] Problems in handling column deletions in Hudi

2022-04-11 Thread GitBox



easonwood commented on issue #5290:
URL: https://github.com/apache/hudi/issues/5290#issuecomment-1096191600

   It seems this error does not influence the result. Data loaded to Hudi 
successfully. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…

2022-04-11 Thread GitBox



hudi-bot commented on PR #4724:
URL: https://github.com/apache/hudi/pull/4724#issuecomment-1096191258

   
   ## CI report:
   
   * 20b1ee41afcb4cc5328bfb30e51e5a37bf0d46c7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7999)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5279: [HUDI-3843] Make flink profiles build with scala-2.11

2022-04-11 Thread GitBox



hudi-bot commented on PR #5279:
URL: https://github.com/apache/hudi/pull/5279#issuecomment-1096187448

   
   ## CI report:
   
   * 9a6be184e3833e071054afc5d0db55bb2336dd5c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7997)
 
   * a6ad82e1a6d7c392f1f1b53937f4c5395b620c05 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8000)
 
   * 66bc1d1b54b7d5d7fbbc4db8e29b4ced675c2c8d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5297: [HUDI-3859] Fix spark profiles and utilities-slim dep

2022-04-11 Thread GitBox



hudi-bot commented on PR #5297:
URL: https://github.com/apache/hudi/pull/5297#issuecomment-1096182625

   
   ## CI report:
   
   * cc81ebb8f2b84b9ada13927de9c30a1b69864f2f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8003)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhilinli123 commented on issue #4881: Full incremental Enable index loading to discover duplicate data(index.bootstrap.enabled)

2022-04-11 Thread GitBox



zhilinli123 commented on issue #4881:
URL: https://github.com/apache/hudi/issues/4881#issuecomment-1096179551

   @nsivabalan  Will the current issue be fixed when the next version is 
released


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5297: [HUDI-3859] Fix spark profiles and utilities-slim dep

2022-04-11 Thread GitBox



hudi-bot commented on PR #5297:
URL: https://github.com/apache/hudi/pull/5297#issuecomment-1096178036

   
   ## CI report:
   
   * cc81ebb8f2b84b9ada13927de9c30a1b69864f2f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3859:
-
Labels: pull-request-available  (was: )

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] xushiyan opened a new pull request, #5297: [HUDI-3859] Fix spark profiles and utilities-slim dep

2022-04-11 Thread GitBox



xushiyan opened a new pull request, #5297:
URL: https://github.com/apache/hudi/pull/5297

   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

2022-04-11 Thread GitBox



hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096165047

   
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8002)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3845) Fix delete mor table's partition with urlencode's error

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3845:
-
Priority: Critical  (was: Major)

> Fix delete mor table's partition with urlencode's error
> ---
>
> Key: HUDI-3845
> URL: https://issues.apache.org/jira/browse/HUDI-3845
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> {code:java}
> // code placeholder
> 4604 [ScalaTest-run-running-TestDeleteTable] WARN  
> org.apache.hudi.common.config.DFSPropertiesConfiguration  - Properties file 
> file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
> 5412 [ScalaTest-run-running-TestDeleteTable] WARN  
> org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not 
> found at path 
> /private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/.hoodie/metadata
> 7658 [Executor task launch worker for task 8] WARN  
> org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: 
> tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 20677 [Executor task launch worker for task 106] ERROR 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Got 
> IOException when reading log file
> java.io.FileNotFoundException: File 
> file:/private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/2021%252F10%252F01/.a2c7f463-7f0e-48b9-aa66-e56d62847b05-0_20220410194408951.log.1_0-83-88
>  does not exist
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:634)
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:860)
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:624)
>     at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:446)
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>     at 
> org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:472)
>     at 
> org.apache.hudi.common.table.log.HoodieLogFileReader.(HoodieLogFileReader.java:111)
>     at 
> org.apache.hudi.common.table.log.HoodieLogFormatReader.(HoodieLogFormatReader.java:70)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:218)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:191)
>     at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:106)
>     at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:99)
>     at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:317)
>     at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:370)
>     at 
> org.apache.hudi.HoodieMergeOnReadRDD$LogFileIterator.(HoodieMergeOnReadRDD.scala:172)
>     at 
> org.apache.hudi.HoodieMergeOnReadRDD$RecordMergingFileIterator.(HoodieMergeOnReadRDD.scala:252)
>  {code}
> The partition path has urlencode, so it cannot be found.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3845) Fix delete mor table's partition with urlencode's error

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3845:
-
Fix Version/s: 0.12.0

> Fix delete mor table's partition with urlencode's error
> ---
>
> Key: HUDI-3845
> URL: https://issues.apache.org/jira/browse/HUDI-3845
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: Forward Xu
>Assignee: Forward Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> {code:java}
> // code placeholder
> 4604 [ScalaTest-run-running-TestDeleteTable] WARN  
> org.apache.hudi.common.config.DFSPropertiesConfiguration  - Properties file 
> file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
> 5412 [ScalaTest-run-running-TestDeleteTable] WARN  
> org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not 
> found at path 
> /private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/.hoodie/metadata
> 7658 [Executor task launch worker for task 8] WARN  
> org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: 
> tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 20677 [Executor task launch worker for task 106] ERROR 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Got 
> IOException when reading log file
> java.io.FileNotFoundException: File 
> file:/private/var/folders/9d/qtc20f6x197431jvthgs58zrgn/T/spark-ce5c11fc-3e1d-4967-aaba-7c814f503224/h0/2021%252F10%252F01/.a2c7f463-7f0e-48b9-aa66-e56d62847b05-0_20220410194408951.log.1_0-83-88
>  does not exist
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:634)
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:860)
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:624)
>     at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:446)
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
>     at 
> org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:472)
>     at 
> org.apache.hudi.common.table.log.HoodieLogFileReader.(HoodieLogFileReader.java:111)
>     at 
> org.apache.hudi.common.table.log.HoodieLogFormatReader.(HoodieLogFormatReader.java:70)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:218)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:191)
>     at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:106)
>     at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:99)
>     at 
> org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:317)
>     at 
> org.apache.hudi.HoodieMergeOnReadRDD$.scanLog(HoodieMergeOnReadRDD.scala:370)
>     at 
> org.apache.hudi.HoodieMergeOnReadRDD$LogFileIterator.(HoodieMergeOnReadRDD.scala:172)
>     at 
> org.apache.hudi.HoodieMergeOnReadRDD$RecordMergingFileIterator.(HoodieMergeOnReadRDD.scala:252)
>  {code}
> The partition path has urlencode, so it cannot be found.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

2022-04-11 Thread GitBox



hudi-bot commented on PR #5296:
URL: https://github.com/apache/hudi/pull/5296#issuecomment-1096159804

   
   ## CI report:
   
   * e5d566882c6f3ed58a65a01065f0ae99dfb420b2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle

2022-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3855:
-
Labels: pull-request-available  (was: )

> Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
> 
>
> Key: HUDI-3855
> URL: https://issues.apache.org/jira/browse/HUDI-3855
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> As was reported by the user here: 
> [https://github.com/apache/hudi/issues/5231]
>  
> Quoting:
> So i was able to reproduce behavior that you're seeing and it turns out to be 
> that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning 
> that during C3, all records are copied from latest base-file of the 
> file-group into new latest base-file (in your most recent experiment it's 
> {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}})
>  but it doesn't update the {{_hoodie_file_name}} field which is kept pointing 
> at the old file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] alexeykudinkin opened a new pull request, #5296: [HUDI-3855] Fixing `FILENAME_METADATA_FIELD` not being correctly updated in `HoodieMergeHandle`

2022-04-11 Thread GitBox



alexeykudinkin opened a new pull request, #5296:
URL: https://github.com/apache/hudi/pull/5296

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Fixing `FILENAME_METADATA_FIELD` not being correctly updated in 
`HoodieMergeHandle`, in cases when old-record is carried over from existing 
file as is.
   
   ## Brief change log
   
- Revisited HoodieFileWriter API to accept HoodieKey instead of 
`HoodieRecord`
- Fixed FILENAME_METADATA_FIELD not being overridden in cases when simply 
old record is carried over
- Exposing standard JVM's debugger ports in Docker setup
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   This change added tests and can be verified as follows:
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3819:


Assignee: Sagar Sumit  (was: Raymond Xu)

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3819:
-
Reviewers: Raymond Xu  (was: Sagar Sumit)

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3819:
-
Status: In Progress  (was: Open)

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Status: In Progress  (was: Open)

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3746) CI ignored test failure in TestDataSkippingUtils

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3746:
-
Priority: Blocker  (was: Major)

> CI ignored test failure in TestDataSkippingUtils
> 
>
> Key: HUDI-3746
> URL: https://issues.apache.org/jira/browse/HUDI-3746
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: ci.log.zip
>
>
> failure in 
> TestDataSkippingUtils
> was ignored. something to do with Junit in Scala maybe?
> See the attached CI logs and search for `TestDataSkippingUtils`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3746) CI ignored test failure in TestDataSkippingUtils

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3746:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> CI ignored test failure in TestDataSkippingUtils
> 
>
> Key: HUDI-3746
> URL: https://issues.apache.org/jira/browse/HUDI-3746
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.0
>
> Attachments: ci.log.zip
>
>
> failure in 
> TestDataSkippingUtils
> was ignored. something to do with Junit in Scala maybe?
> See the attached CI logs and search for `TestDataSkippingUtils`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1976:
-
Component/s: dependencies

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies, hive
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1976:
-
Issue Type: Improvement  (was: Task)

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1976:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> Upgrade hive, jackson, log4j, hadoop to remove vulnerability
> 
>
> Key: HUDI-1976
> URL: https://issues.apache.org/jira/browse/HUDI-1976
> Project: Apache Hudi
>  Issue Type: Task
>  Components: hive
>Reporter: Nishith Agarwal
>Assignee: Vinay
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> [https://github.com/apache/hudi/issues/2827]
> [https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2824|https://github.com/apache/hudi/issues/2826]
> [https://github.com/apache/hudi/issues/2823|https://github.com/apache/hudi/issues/2826]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3096:


Assignee: Tao Meng

> fixed the bug that  the cow table(contains decimalType) write by flink cannot 
> be read by spark
> --
>
> Key: HUDI-3096
> URL: https://issues.apache.org/jira/browse/HUDI-3096
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.10.0
> Environment: flink  1.13.1
> spark 3.1.1
>Reporter: Tao Meng
>Assignee: Tao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> now,  flink will write decimalType as byte[]
> when spark read that decimal Type, if spark find the precision of current 
> decimal is small spark treat it as int/long which caused the fllow error:
>  
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
> column cannot be converted in file 
> hdfs://x/tmp/hudi/hudi_x/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet.
>  Column: [c7], Expected: decimal(10,4), Found: BINARY
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (HUDI-3096) fixed the bug that the cow table(contains decimalType) write by flink cannot be read by spark

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3096.

Resolution: Fixed

> fixed the bug that  the cow table(contains decimalType) write by flink cannot 
> be read by spark
> --
>
> Key: HUDI-3096
> URL: https://issues.apache.org/jira/browse/HUDI-3096
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.10.0
> Environment: flink  1.13.1
> spark 3.1.1
>Reporter: Tao Meng
>Assignee: Tao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> now,  flink will write decimalType as byte[]
> when spark read that decimal Type, if spark find the precision of current 
> decimal is small spark treat it as int/long which caused the fllow error:
>  
> Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet 
> column cannot be converted in file 
> hdfs://x/tmp/hudi/hudi_x/46d44c57-aa43-41e2-a8aa-76dcc9dac7e4_0-4-0_20211221201230.parquet.
>  Column: [c7], Expected: decimal(10,4), Found: BINARY
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:517)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>  Source)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3680) Update docs to reflect new Bundles Spark compatibility

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3680:
-
Story Points: 1  (was: 2)

> Update docs to reflect new Bundles Spark compatibility 
> ---
>
> Key: HUDI-3680
> URL: https://issues.apache.org/jira/browse/HUDI-3680
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Alexey Kudinkin
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We need to make sure that we reflect the new Spark compatibility approach for 
> Hudi bundles (pledging to stay compatible w/in Spark minor version branch)
> Channels to update:
>  # Dev-list
>  # Docs on the website
>  # Docs in README
>  # Slack?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #5295: [HUDI-3858]Shade javax.servlet for hudi-spark-bundle

2022-04-11 Thread GitBox



hudi-bot commented on PR #5295:
URL: https://github.com/apache/hudi/pull/5295#issuecomment-1096115125

   
   ## CI report:
   
   * c900ecc8741fdcab9c1a4c156e410d6e8462a457 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7998)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-11 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520924#comment-17520924
 ] 

Raymond Xu commented on HUDI-3749:
--

[~shivnarayan] what's the done criteria for this ticket? can you please put 
down in the description?

> Run latest hudi w/ EMR spark and report to aws folks
> 
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2606) Ensure query engines not access MDT if disabled

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2606:
-
Reviewers: Ethan Guo  (was: Ethan Guo, sivabalan narayanan)

> Ensure query engines not access MDT if disabled
> ---
>
> Key: HUDI-2606
> URL: https://issues.apache.org/jira/browse/HUDI-2606
> Project: Apache Hudi
>  Issue Type: Task
>  Components: metadata, reader-core
>Reporter: sivabalan narayanan
>Assignee: Tao Meng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> This is to visit all the read code paths and ensure when metadata is 
> disabled, query engines won't read from metadata table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3826) Commands deleting partitions do so incorrectly

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3826:
-
Reviewers: Alexey Kudinkin

> Commands deleting partitions do so incorrectly
> --
>
> Key: HUDI-3826
> URL: https://issues.apache.org/jira/browse/HUDI-3826
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: Forward Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Currently, `TruncateHoodieTableCommand` as well as 
> `AlterHoodieTableDropPartitionCommand` deletes partitions from Hudi table by 
> simply removing corresponding partition folders w/o committing any changes 
> (and correspondingly updating the MT for ex) 
> Instead it should go t/h WriteClient's `deletePartitions` API, similar to 
> Spark DS does when gets Hudi's DELETE command
> You can see that when enable Column Stats Index by default and running our CI 
> (Setting "hoodie.metadata.index.column.stats.enable"
> and "hoodie.metadata.enable" to true)
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=7926&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3707:


Assignee: Sagar Sumit

> Fix deltastreamer test with schema provider and transformer enabled
> ---
>
> Key: HUDI-3707
> URL: https://issues.apache.org/jira/browse/HUDI-3707
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.11.0, 0.12.0
>
>
> Fix cases like this
> @Disabled("To investigate problem with schema provider and transformer")
> in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3819:
-
Reviewers: Sagar Sumit

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-1602:


Assignee: Sagar Sumit

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.

[jira] [Assigned] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3819:


Assignee: Raymond Xu

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Reviewers: Ethan Guo

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3752) Update website content based on 0.11 new features

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3752:


Assignee: Raymond Xu

> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> content to update
> - utilities slim bundle https://github.com/apache/hudi/pull/5184/files



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3859:


Assignee: Raymond Xu

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Sprint: Hudi-Sprint-Apr-12

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Story Points: 0.5

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Component/s: dependencies

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-3859:


 Summary: Remove parquet-avro from utilities-slim
 Key: HUDI-3859
 URL: https://issues.apache.org/jira/browse/HUDI-3859
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Raymond Xu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3859) Remove parquet-avro from utilities-slim

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3859:
-
Fix Version/s: 0.11.0

> Remove parquet-avro from utilities-slim
> ---
>
> Key: HUDI-3859
> URL: https://issues.apache.org/jira/browse/HUDI-3859
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Story Points: 0.5

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scal

[jira] [Updated] (HUDI-3838) Make Drop partition column config work with deltastreamer

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3838:
-
Story Points: 1

> Make Drop partition column config work with deltastreamer
> -
>
> Key: HUDI-3838
> URL: https://issues.apache.org/jira/browse/HUDI-3838
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: meta-sync
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> hoodie.datasource.write.drop.partition.columns only works for datasource 
> writer. HoodieDeltaStreamer is not using it. We need it for deltastreamer -> 
> bigquery sync flow



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3724) Too many open files w/ COW spark long running tests

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3724:
-
Story Points: 1

> Too many open files w/ COW spark long running tests
> ---
>
> Key: HUDI-3724
> URL: https://issues.apache.org/jira/browse/HUDI-3724
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> We run integ tests against hudi and recently our spark long running tests are 
> failing for COW table with "too many open files". May be we have some leaks 
> and need to chase them and close it out. 
> {code:java}
>   ... 6 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor 
> driver): java.io.FileNotFoundException: 
> /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3
>  (Too many open files)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3843) Make Flink 1.13.x 1.14.x build with scala 2.11

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3843:
-
Story Points: 0.5

> Make Flink 1.13.x 1.14.x build with scala 2.11
> --
>
> Key: HUDI-3843
> URL: https://issues.apache.org/jira/browse/HUDI-3843
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3738:
-
Story Points: 1

> Perf comparison between parquet and hudi for COW snapshot and MOR read 
> optimized
> 
>
> Key: HUDI-3738
> URL: https://issues.apache.org/jira/browse/HUDI-3738
> Project: Apache Hudi
>  Issue Type: Task
>  Components: performance
>Reporter: sivabalan narayanan
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3752:
-
Sprint: Hudi-Sprint-Apr-12

> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> content to update
> - utilities slim bundle https://github.com/apache/hudi/pull/5184/files



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3752:
-
Story Points: 2

> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> content to update
> - utilities slim bundle https://github.com/apache/hudi/pull/5184/files



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3819:
-
Story Points: 0.5

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3749:
-
Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-12  (was: Hudi-Sprint-Mar-22)

> Run latest hudi w/ EMR spark and report to aws folks
> 
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3749:
-
Story Points: 1

> Run latest hudi w/ EMR spark and report to aws folks
> 
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1605:
-
Sprint: Hudi-Sprint-Apr-12

> Add more documentation around archival process and configs
> --
>
> Key: HUDI-1605
> URL: https://issues.apache.org/jira/browse/HUDI-1605
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Kyle Weller
>Priority: Blocker
>  Labels: user-support-issues
> Fix For: 0.11.0
>
>
> Reference:
> What is the trade-off in lowering {{hoodie.keep.max.commits}} and 
> {{hoodie.keep.min.commits}}?
> https://github.com/apache/hudi/issues/2408#issuecomment-758360941



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3707:
-
Story Points: 2

> Fix deltastreamer test with schema provider and transformer enabled
> ---
>
> Key: HUDI-3707
> URL: https://issues.apache.org/jira/browse/HUDI-3707
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0, 0.12.0
>
>
> Fix cases like this
> @Disabled("To investigate problem with schema provider and transformer")
> in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3036:
-
Sprint: Hudi-Sprint-Apr-12

> Enhance Cleaner Docs
> 
>
> Key: HUDI-3036
> URL: https://issues.apache.org/jira/browse/HUDI-3036
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Blocker
> Fix For: 0.11.0
>
>
> This blog has rich info that should be in the docs:
> [https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/]
> Slack disc mention: 
> https://apache-hudi.slack.com/archives/C4D716NPQ/p1639497026391400



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3855:
-
Story Points: 1

> Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
> 
>
> Key: HUDI-3855
> URL: https://issues.apache.org/jira/browse/HUDI-3855
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>
> As was reported by the user here: 
> [https://github.com/apache/hudi/issues/5231]
>  
> Quoting:
> So i was able to reproduce behavior that you're seeing and it turns out to be 
> that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning 
> that during C3, all records are copied from latest base-file of the 
> file-group into new latest base-file (in your most recent experiment it's 
> {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}})
>  but it doesn't update the {{_hoodie_file_name}} field which is kept pointing 
> at the old file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3036:
-
Story Points: 0

> Enhance Cleaner Docs
> 
>
> Key: HUDI-3036
> URL: https://issues.apache.org/jira/browse/HUDI-3036
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Blocker
> Fix For: 0.11.0
>
>
> This blog has rich info that should be in the docs:
> [https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/]
> Slack disc mention: 
> https://apache-hudi.slack.com/archives/C4D716NPQ/p1639497026391400



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1605:
-
Story Points: 0

> Add more documentation around archival process and configs
> --
>
> Key: HUDI-1605
> URL: https://issues.apache.org/jira/browse/HUDI-1605
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Kyle Weller
>Priority: Blocker
>  Labels: user-support-issues
> Fix For: 0.11.0
>
>
> Reference:
> What is the trade-off in lowering {{hoodie.keep.max.commits}} and 
> {{hoodie.keep.min.commits}}?
> https://github.com/apache/hudi/issues/2408#issuecomment-758360941



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3749) Run latest hudi w/ EMR spark and report to aws folks

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3749:
-
Priority: Blocker  (was: Critical)

> Run latest hudi w/ EMR spark and report to aws folks
> 
>
> Key: HUDI-3749
> URL: https://issues.apache.org/jira/browse/HUDI-3749
> Project: Apache Hudi
>  Issue Type: Task
>  Components: tests-ci
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2946:
-
Priority: Critical  (was: Major)

> Upgrade maven plugin to make Hudi be compatible with higher Java versions
> -
>
> Key: HUDI-2946
> URL: https://issues.apache.org/jira/browse/HUDI-2946
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> I saw several issues while building Hudi w/ Java 11:
>  
> {{[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project 
> hudi-common: Execution default of goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API 
> incompatibility was encountered while executing 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: 
> java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project 
> hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR 
> /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar
>  entry org/apache/hudi/hadoop/bundle/Main.class: 
> java.lang.IllegalArgumentException -> [Help 1]}}
>  
> We need to upgrade maven plugin versions to make it be compatible with Java 
> 11.
> Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 
> [https://github.com/spotify/dockerfile-maven/pull/230]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2946:
-
Component/s: dependencies

> Upgrade maven plugin to make Hudi be compatible with higher Java versions
> -
>
> Key: HUDI-2946
> URL: https://issues.apache.org/jira/browse/HUDI-2946
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Wenning Ding
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> I saw several issues while building Hudi w/ Java 11:
>  
> {{[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project 
> hudi-common: Execution default of goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API 
> incompatibility was encountered while executing 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: 
> java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project 
> hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR 
> /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar
>  entry org/apache/hudi/hadoop/bundle/Main.class: 
> java.lang.IllegalArgumentException -> [Help 1]}}
>  
> We need to upgrade maven plugin versions to make it be compatible with Java 
> 11.
> Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 
> [https://github.com/spotify/dockerfile-maven/pull/230]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2946) Upgrade maven plugin to make Hudi be compatible with higher Java versions

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2946:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> Upgrade maven plugin to make Hudi be compatible with higher Java versions
> -
>
> Key: HUDI-2946
> URL: https://issues.apache.org/jira/browse/HUDI-2946
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> I saw several issues while building Hudi w/ Java 11:
>  
> {{[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar (default) on project 
> hudi-common: Execution default of goal 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar failed: An API 
> incompatibility was encountered while executing 
> org.apache.maven.plugins:maven-jar-plugin:2.6:test-jar: 
> java.lang.ExceptionInInitializerError: null[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (default) on project 
> hudi-hadoop-mr-bundle: Error creating shaded jar: Problem shading JAR 
> /workspace/workspace/rchertar.bigtop.hudi-rpm-mainline-6.x-0.9.0/build/hudi/rpm/BUILD/hudi-0.9.0-amzn-1-SNAPSHOT/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.9.0-amzn-1-SNAPSHOT.jar
>  entry org/apache/hudi/hadoop/bundle/Main.class: 
> java.lang.IllegalArgumentException -> [Help 1]}}
>  
> We need to upgrade maven plugin versions to make it be compatible with Java 
> 11.
> Also upgrade dockerfile-maven-plugin to latest versions to support Java 11 
> [https://github.com/spotify/dockerfile-maven/pull/230]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3036:
-
Priority: Blocker  (was: Major)

> Enhance Cleaner Docs
> 
>
> Key: HUDI-3036
> URL: https://issues.apache.org/jira/browse/HUDI-3036
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Blocker
> Fix For: 0.11.0
>
>
> This blog has rich info that should be in the docs:
> [https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/]
> Slack disc mention: 
> https://apache-hudi.slack.com/archives/C4D716NPQ/p1639497026391400



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3067:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> "Table already exists" error with multiple writers and dynamodb
> ---
>
> Key: HUDI-3067
> URL: https://issues.apache.org/jira/browse/HUDI-3067
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Nikita Sheremet
>Assignee: Wenning Ding
>Priority: Critical
> Fix For: 0.12.0
>
>
> How reproduce:
>  # Set up multiple writing 
> [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not 
> forget to set _hoodie.write.lock.dynamodb.region_ and 
> {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb 
> table.
>  # Run multiple writers to the table
> (Tested on aws EMR, so multiple writers is EMR steps)
> Expected result - all steps completed.
> Actual result: some steps failed with exception 
> {code:java}
> Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: 
> Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status 
> Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124)
>   at 
> org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188)
>   at 
> org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99)
>   at 
> org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77)
>   ... 54 more
> 21/12/19 13:42:06 INFO Yar {code}
> This happens because all steps tried to create table at the same time.
>  
> Suggested solution:
> A catch statment for _Table already exists_ exception should be added into 
> dynamodb table creation code. May be with delay and additional check that 
> table is present.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3067) "Table already exists" error with multiple writers and dynamodb

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3067:
-
Priority: Critical  (was: Major)

> "Table already exists" error with multiple writers and dynamodb
> ---
>
> Key: HUDI-3067
> URL: https://issues.apache.org/jira/browse/HUDI-3067
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Nikita Sheremet
>Assignee: Wenning Ding
>Priority: Critical
> Fix For: 0.11.0
>
>
> How reproduce:
>  # Set up multiple writing 
> [https://hudi.apache.org/docs/concurrency_control/] for dynamodb (do not 
> forget to set _hoodie.write.lock.dynamodb.region_ and 
> {_}hoodie.write.lock.dynamodb.billing_mode{_}). Do not create anty dynamodb 
> table.
>  # Run multiple writers to the table
> (Tested on aws EMR, so multiple writers is EMR steps)
> Expected result - all steps completed.
> Actual result: some steps failed with exception 
> {code:java}
> Caused by: com.amazonaws.services.dynamodbv2.model.ResourceInUseException: 
> Table already exists: truedata_detections (Service: AmazonDynamoDBv2; Status 
> Code: 400; Error Code: ResourceInUseException; Request ID:; Proxy: null)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
>   at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:6214)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:6181)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeCreateTable(AmazonDynamoDBClient.java:1160)
>   at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:1124)
>   at 
> org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.createLockTableInDynamoDB(DynamoDBBasedLockProvider.java:188)
>   at 
> org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:99)
>   at 
> org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.(DynamoDBBasedLockProvider.java:77)
>   ... 54 more
> 21/12/19 13:42:06 INFO Yar {code}
> This happens because all steps tried to create table at the same time.
>  
> Suggested solution:
> A catch statment for _Table already exists_ exception should be added into 
> dynamodb table creation code. May be with delay and additional check that 
> table is present.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3752:
-
Priority: Blocker  (was: Major)

> Update website content based on 0.11 new features
> -
>
> Key: HUDI-3752
> URL: https://issues.apache.org/jira/browse/HUDI-3752
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0
>
>
> content to update
> - utilities slim bundle https://github.com/apache/hudi/pull/5184/files



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-3344.

 Reviewers: sivabalan narayanan
Resolution: Done

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: qian
>Assignee: qian
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-3344:


Assignee: qian

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: qian
>Assignee: qian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3344:
-
Component/s: code-quality

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: qian
>Assignee: qian
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3344:
-
Priority: Minor  (was: Major)

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: qian
>Assignee: qian
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3344:
-
Priority: Trivial  (was: Minor)

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: qian
>Assignee: qian
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3344:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: qian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3344) Standard code format for HoodieDataSourceExample.scala

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3344:
-
Fix Version/s: 0.11.0
   (was: 0.12.0)

> Standard code format for HoodieDataSourceExample.scala 
> ---
>
> Key: HUDI-3344
> URL: https://issues.apache.org/jira/browse/HUDI-3344
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: qian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1605:
-
Priority: Blocker  (was: Minor)

> Add more documentation around archival process and configs
> --
>
> Key: HUDI-1605
> URL: https://issues.apache.org/jira/browse/HUDI-1605
> Project: Apache Hudi
>  Issue Type: Task
>  Components: docs
>Affects Versions: 0.9.0
>Reporter: sivabalan narayanan
>Assignee: Kyle Weller
>Priority: Blocker
>  Labels: user-support-issues
> Fix For: 0.11.0
>
>
> Reference:
> What is the trade-off in lowering {{hoodie.keep.max.commits}} and 
> {{hoodie.keep.min.commits}}?
> https://github.com/apache/hudi/issues/2408#issuecomment-758360941



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3577) NPE in HoodieTimelineArchiver

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3577:
-
Story Points: 0.5

> NPE in HoodieTimelineArchiver
> -
>
> Key: HUDI-3577
> URL: https://issues.apache.org/jira/browse/HUDI-3577
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: archiving
>Reporter: Alexey Kudinkin
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>
> `testUpsertsContinuousModeWithMultipleWritersWithoutConflicts` does fail 
> periodically with NPE w/in HoodieTimelineArchiver
>  
> {code:java}
> 2022-03-05T22:51:18.0857636Z [ERROR] Tests run: 27, Failures: 0, Errors: 1, 
> Skipped: 9, Time elapsed: 423.786 s <<< FAILURE! - in JUnit Vintage
> 2022-03-05T22:51:18.0858433Z [ERROR] HoodieTableType).[2] 
> MERGE_ON_READ(testUpsertsContinuousModeWithMultipleWritersWithoutConflicts  
> Time elapsed: 119.717 s  <<< ERROR!
> 2022-03-05T22:51:18.0859018Z java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: java.lang.NullPointerException
> 2022-03-05T22:51:18.0859509Z  at 
> java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 2022-03-05T22:51:18.0859935Z  at 
> java.util.concurrent.FutureTask.get(FutureTask.java:192)
> 2022-03-05T22:51:18.0860572Z  at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.runJobsInParallel(TestHoodieDeltaStreamerWithMultiWriter.java:394)
> 2022-03-05T22:51:18.0861650Z  at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamerWithMultiWriter.testUpsertsContinuousModeWithMultipleWritersWithoutConflicts(TestHoodieDeltaStreamerWithMultiWriter.java:204)
> 2022-03-05T22:51:18.0862339Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2022-03-05T22:51:18.0862781Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2022-03-05T22:51:18.0863316Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2022-03-05T22:51:18.0863791Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2022-03-05T22:51:18.0864248Z  at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
> 2022-03-05T22:51:18.0864801Z  at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2022-03-05T22:51:18.0865438Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2022-03-05T22:51:18.0866071Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2022-03-05T22:51:18.081Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
> 2022-03-05T22:51:18.0867290Z  at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:92)
> 2022-03-05T22:51:18.0867968Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2022-03-05T22:51:18.0868613Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2022-03-05T22:51:18.0869275Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2022-03-05T22:51:18.0870081Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2022-03-05T22:51:18.0870716Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2022-03-05T22:51:18.0871365Z  at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2022-03-05T22:51:18.0871953Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2022-03-05T22:51:18.0872494Z  at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2022-03-05T22:51:18.0873118Z  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
> 2022-03-05T22:51:18.0873777Z  at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
> 2022-03-05T22:51:18.0874400Z  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
> 2022-03-05T22:51:18.0875044Z  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
> 2022-03-05T22:51:18.0875666Z  at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
> 2022-03-05T22:51:18.08762

[jira] [Updated] (HUDI-3804) Partition metadata is not properly created for Column Stats

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3804:
-
Sprint:   (was: Cont' improve - 2022/03/7)

> Partition metadata is not properly created for Column Stats
> ---
>
> Key: HUDI-3804
> URL: https://issues.apache.org/jira/browse/HUDI-3804
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Alexey Kudinkin
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.11.0
>
>
> Currently, when enabling Column Stats partition along with Files partition, 
> `AppendHandle` will be inserting records for both of them during MT updates.
> However, AppendHandle does create Partition Metadata file only for Files 
> partition which leads to failures in validation of the MT.
> Steps to reproduce:
>  # Enable MT and Column Stats
>  # Run `TestHoodieBackedMetadata.testTurnOffMetadataTableAfterEnable` test



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3810) Enabling point look ups does an extra full scan in addition to point look up for log reader readers with metadata

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3810:
-
Sprint:   (was: Cont' improve - 2022/03/7)

> Enabling point look ups does an extra full scan in addition to point look up 
> for log reader readers with metadata
> -
>
> Key: HUDI-3810
> URL: https://issues.apache.org/jira/browse/HUDI-3810
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3647) Ignore errors if metadata table has not been initialized fully

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3647:
-
Sprint: Hudi-Sprint-Apr-12

> Ignore errors if metadata table has not been initialized fully
> --
>
> Key: HUDI-3647
> URL: https://issues.apache.org/jira/browse/HUDI-3647
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> HoodieMetadataTableValidator throws the following exceptions when the 
> metadata table is not fully initialized.  These can be ignored and there 
> could be a fallback mechanism if metadata table is not ready for read.
> {code:java}
> org.apache.hudi.exception.HoodieIOException: Could not load Hoodie properties 
> from 
> file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_b_single_writer_async_services/b2_ds_mor_010nomt_011mt_conf/test_table/.hoodie/metadata/.hoodie/hoodie.properties
>     at 
> org.apache.hudi.common.table.HoodieTableConfig.(HoodieTableConfig.java:226)
>     at 
> org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:120)
>     at 
> org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:77)
>     at 
> org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:657)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.initIfNeeded(HoodieBackedTableMetadata.java:108)
>     at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.(HoodieBackedTableMetadata.java:97)
>     at 
> org.apache.hudi.metadata.HoodieTableMetadata.create(HoodieTableMetadata.java:111)
>     at 
> org.apache.hudi.metadata.HoodieTableMetadata.create(HoodieTableMetadata.java:105)
>     at 
> org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:296)
>     at 
> org.apache.hudi.utilities.HoodieMetadataTableValidator.validatePartitions(HoodieMetadataTableValidator.java:386)
>     at 
> org.apache.hudi.utilities.HoodieMetadataTableValidator.doMetadataTableValidation(HoodieMetadataTableValidator.java:349)
>     at 
> org.apache.hudi.utilities.HoodieMetadataTableValidator.doHoodieMetadataTableValidationOnce(HoodieMetadataTableValidator.java:324)
>     at 
> org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:310)
>     at 
> org.apache.hudi.utilities.HoodieMetadataTableValidator.main(HoodieMetadataTableValidator.java:294)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.FileNotFoundException: File 
> file:/Users/ethan/Work/scripts/mt_rollout_testing/deploy_b_single_writer_async_services/b2_ds_mor_010nomt_011mt_conf/test_table/.hoodie/metadata/.hoodie/hoodie.properties.backup
>  does not exist
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
>     at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
>     at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:160)
>     at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
>     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>     at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:460)
>     at 
> org.apache.hudi.common.table.HoodieTableConfig.fetchConfigs(HoodieTableConfig.java:287)
>     at 
> org.apache.hudi.common.table.HoodieTableConfig.(HoodieTableConfig.java:216)
>     ... 25 more {code}
> {code:java}
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition 
>

[jira] [Updated] (HUDI-3707) Fix deltastreamer test with schema provider and transformer enabled

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3707:
-
Sprint: Hudi-Sprint-Apr-12

> Fix deltastreamer test with schema provider and transformer enabled
> ---
>
> Key: HUDI-3707
> URL: https://issues.apache.org/jira/browse/HUDI-3707
> Project: Apache Hudi
>  Issue Type: Test
>  Components: tests-ci
>Reporter: Raymond Xu
>Priority: Blocker
> Fix For: 0.11.0, 0.12.0
>
>
> Fix cases like this
> @Disabled("To investigate problem with schema provider and transformer")
> in org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1602:
-
Sprint: Hudi-Sprint-Apr-12

> Corrupted Avro schema extracted from parquet file
> -
>
> Key: HUDI-1602
> URL: https://issues.apache.org/jira/browse/HUDI-1602
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Alexander Filipchik
>Priority: Blocker
>  Labels: core-flow-ds, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>
> we are running a HUDI deltastreamer on a very complex stream. Schema is 
> deeply nested, with several levels of hierarchy (avro schema is around 6600 
> LOC).
>  
> The version of HUDI that writes the dataset if 0.5-SNAPTHOT and we recently 
> started attempts to upgrade to the latest. Hovewer, latest HUDI can't read 
> the provided dataset. Exception I get: 
>  
>  
> {code:java}
> Got exception while parsing the arguments:Got exception while parsing the 
> arguments:Found recursive reference in Avro schema, which can not be 
> processed by Spark:{  "type" : "record",  "name" : "array",  "fields" : [ {   
>  "name" : "id",    "type" : [ "null", "string" ],    "default" : null  }, {   
>  "name" : "type",    "type" : [ "null", "string" ],    "default" : null  }, { 
>    "name" : "exist",    "type" : [ "null", "boolean" ],    "default" : null  
> } ]}          Stack 
> trace:org.apache.spark.sql.avro.IncompatibleSchemaException:Found recursive 
> reference in Avro schema, which can not be processed by Spark:{  "type" : 
> "record",  "name" : "array",  "fields" : [ {    "name" : "id",    "type" : [ 
> "null", "string" ],    "default" : null  }, {    "name" : "type",    "type" : 
> [ "null", "string" ],    "default" : null  }, {    "name" : "exist",    
> "type" : [ "null", "boolean" ],    "default" : null  } ]}
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:75)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:89)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:81)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$.toSqlTypeHelper(SchemaConverters.scala:105)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:82)
>  at 
> org.apache.spark.sql.avro.SchemaConverters$$anonfun$1.apply(SchemaConverters.scala:81)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at 
> scala.collection.TraversableLike$class.map(Traversable

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3819:
-
Sprint: Hudi-Sprint-Apr-12

> upgrade spring cve-2022-22965
> -
>
> Key: HUDI-3819
> URL: https://issues.apache.org/jira/browse/HUDI-3819
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 0.9.0, 0.10.1
>Reporter: Jason-Morries Adam
>Priority: Blocker
> Fix For: 0.11.0
>
>
> We should upgrade the Spring Framework version at Hudi CLI because of 
> cve-2022-22965. The Qualys Scanner finds these packages and raises a warning 
> because of the existence of these files on the system. 
> The found files are:
> /usr/lib/hudi/cli/lib/spring-beans-4.2.4.RELEASE.jar 
> /usr/lib/hudi/cli/lib/spring-core-4.2.4.RELEASE.jar
> More Information: 
> Spring Framework: https://spring.io/projects/spring-framework
> Spring project spring-framework release notes: 
> https://github.com/spring-projects/spring-framework/releases
> CVE-2022-22965: https://tanzu.vmware.com/security/cve-2022-22965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3855) Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3855:
-
Sprint: Hudi-Sprint-Apr-12

> Hudi's metadata field "_hoodie_file_name" not updated in MergeHandle
> 
>
> Key: HUDI-3855
> URL: https://issues.apache.org/jira/browse/HUDI-3855
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Alexey Kudinkin
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>
> As was reported by the user here: 
> [https://github.com/apache/hudi/issues/5231]
>  
> Quoting:
> So i was able to reproduce behavior that you're seeing and it turns out to be 
> that {{_hoodie_file_name}} is simply not updated during Commit 3, meaning 
> that during C3, all records are copied from latest base-file of the 
> file-group into new latest base-file (in your most recent experiment it's 
> {{{}c872d135-bf8f-4c5e-9eee-6347635c32d3-0_0-21-22_20220406182741563.parquet{}}})
>  but it doesn't update the {{_hoodie_file_name}} field which is kept pointing 
> at the old file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3013) Docs for Presto and Hudi

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3013:
-
Sprint: Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, 
Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: Hudi-Sprint-Mar-14, 
Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05)

> Docs for Presto and Hudi
> 
>
> Key: HUDI-3013
> URL: https://issues.apache.org/jira/browse/HUDI-3013
> Project: Apache Hudi
>  Issue Type: Task
>  Components: trino-presto
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3724) Too many open files w/ COW spark long running tests

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3724:
-
Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: 
Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05)

> Too many open files w/ COW spark long running tests
> ---
>
> Key: HUDI-3724
> URL: https://issues.apache.org/jira/browse/HUDI-3724
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> We run integ tests against hudi and recently our spark long running tests are 
> failing for COW table with "too many open files". May be we have some leaks 
> and need to chase them and close it out. 
> {code:java}
>   ... 6 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor 
> driver): java.io.FileNotFoundException: 
> /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3
>  (Too many open files)
>   at java.io.FileOutputStream.open0(Native Method)
>   at java.io.FileOutputStream.open(FileOutputStream.java:270)
>   at java.io.FileOutputStream.(FileOutputStream.java:213)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152)
>   at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3857) NoSuchMethodError: Continuous deltastreamer test with async compaction fails on EMR spark

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3857:
-
Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: Hudi-Sprint-Apr-05)

> NoSuchMethodError: Continuous deltastreamer test with async compaction fails 
> on EMR spark 
> --
>
> Key: HUDI-3857
> URL: https://issues.apache.org/jira/browse/HUDI-3857
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.11.0
>
>
> EMR 6.5, Spark 3.1.2
> While running continuous deltastreamer with async compaction enabled, I hit 
> this exception
> {code:java}
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$2(MergeOnReadSnapshotRelation.scala:130)
>     at scala.Option.map(Option.scala:230)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.$anonfun$buildSplits$1(MergeOnReadSnapshotRelation.scala:128)
>     at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>     at scala.collection.immutable.List.foreach(List.scala:392)
>     at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>     at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>     at scala.collection.immutable.List.map(List.scala:298)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.buildSplits(MergeOnReadSnapshotRelation.scala:124)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:108)
>     at 
> org.apache.hudi.MergeOnReadSnapshotRelation.collectFileSplits(MergeOnReadSnapshotRelation.scala:44)
>     at 
> org.apache.hudi.HoodieBaseRelation.buildScan(HoodieBaseRelation.scala:221) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3738) Perf comparison between parquet and hudi for COW snapshot and MOR read optimized

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3738:
-
Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: 
Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05)

> Perf comparison between parquet and hudi for COW snapshot and MOR read 
> optimized
> 
>
> Key: HUDI-3738
> URL: https://issues.apache.org/jira/browse/HUDI-3738
> Project: Apache Hudi
>  Issue Type: Task
>  Components: performance
>Reporter: sivabalan narayanan
>Assignee: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3799) Understand reason behind "Not an avro data file" with hudi

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3799:
-
Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: Hudi-Sprint-Apr-05)

> Understand reason behind "Not an avro data file" with hudi
> --
>
> Key: HUDI-3799
> URL: https://issues.apache.org/jira/browse/HUDI-3799
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> We merged [https://github.com/apache/hudi/pull/4016] to tackle "Not an avro 
> data file" exception while cleaning or archiving. We need to understand why 
> and when such exception happens. and try to mitigate it before happening if 
> feasible. 
>  
> Atleast we should have a good understanding of what are the conditions under 
> which this is expected. 
>  
> Ref: https://github.com/apache/hudi/pull/4016#pullrequestreview-841692564



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3806) Improve HoodieBloomIndex using bloom_filter and col_stats in MDT

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3806:
-
Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: Hudi-Sprint-Apr-05)

> Improve HoodieBloomIndex using bloom_filter and col_stats in MDT
> 
>
> Key: HUDI-3806
> URL: https://issues.apache.org/jira/browse/HUDI-3806
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3207) Hudi Trino connector PR review

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3207:
-
Sprint: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, 
Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7, Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, 
Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, 
Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: 
Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31, 
Hudi-Sprint-Feb-7, Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, 
Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, 
Hudi-Sprint-Apr-05)

> Hudi Trino connector PR review
> --
>
> Key: HUDI-3207
> URL: https://issues.apache.org/jira/browse/HUDI-3207
> Project: Apache Hudi
>  Issue Type: Task
>  Components: trino-presto
>Reporter: Ethan Guo
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.11.0
>
>
> https://github.com/trinodb/trino/pull/10228



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2606) Ensure query engines not access MDT if disabled

2022-04-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2606:
-
Sprint: Hudi-Sprint-Feb-14, Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, 
Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, 
Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-6  (was: Hudi-Sprint-Feb-14, 
Hudi-Sprint-Feb-22, Hudi-Sprint-Mar-01, Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, 
Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05)

> Ensure query engines not access MDT if disabled
> ---
>
> Key: HUDI-2606
> URL: https://issues.apache.org/jira/browse/HUDI-2606
> Project: Apache Hudi
>  Issue Type: Task
>  Components: metadata, reader-core
>Reporter: sivabalan narayanan
>Assignee: Tao Meng
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> This is to visit all the read code paths and ensure when metadata is 
> disabled, query engines won't read from metadata table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

1 2 3 4 >

1 - 100 of 346 matches

Mail list logo