[jira] [Updated] (HUDI-7850) Makes hoodie.record.merge.mode mandatory upon creating the table and first write

2024-06-25 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-7850:

Status: In Progress  (was: Open)

> Makes hoodie.record.merge.mode mandatory upon creating the table and first 
> write
> 
>
> Key: HUDI-7850
> URL: https://issues.apache.org/jira/browse/HUDI-7850
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Geser Dugarov
>Priority: Major
> Fix For: 1.0.0
>
>
> Right now, "hoodie.record.merge.mode" is optional during writes as it is 
> inferred from the payload class name, payload type, and the record merger 
> strategy during the creation of the table properties.  We should make this 
> config mandatory in release 1.0 and make other merge configs optional to 
> simplify the configuration experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch asf-site updated: [HUDI-7838][DOCS] Remove the option hoodie.schema.cache.enable (#11506)

2024-06-25 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0afc44c1a73 [HUDI-7838][DOCS] Remove the option 
hoodie.schema.cache.enable (#11506)
0afc44c1a73 is described below

commit 0afc44c1a737d7b0440e5518e37a06b307057ef3
Author: Vova Kolmakov 
AuthorDate: Wed Jun 26 10:06:49 2024 +0700

[HUDI-7838][DOCS] Remove the option hoodie.schema.cache.enable (#11506)
---
 website/docs/configurations.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 4ca3a09e81e..278be1f5afa 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -963,7 +963,6 @@ Configurations that control write behavior on Hudi tables. 
These can be directly
 | 
[hoodie.rollback.instant.backup.enabled](#hoodierollbackinstantbackupenabled)   
  | false   
 | Backup instants removed during rollback and 
restore (useful for debugging)`Config Param: 
ROLLBACK_INSTANT_BACKUP_ENABLED`

 [...]
 | [hoodie.rollback.parallelism](#hoodierollbackparallelism)
 | 100  
| This config controls the parallelism for 
rollback of commits. Rollbacks perform deletion of files or logging delete 
blocks to file groups on storage in parallel. The configure value limits the 
parallelism so that the number of Spark tasks do not exceed the value. If 
rollback is slow due to the  [...]
 | [hoodie.rollback.using.markers](#hoodierollbackusingmarkers) 
 | true 
| Enables a more efficient mechanism for 
rollbacks based on the marker files generated during the writes. Turned on by 
default.`Config Param: ROLLBACK_USING_MARKERS_ENABLE` 

   [...]
-| [hoodie.schema.cache.enable](#hoodieschemacacheenable)   
 | false
| cache query internalSchemas in 
driver/executor side`Config Param: ENABLE_INTERNAL_SCHEMA_CACHE`  


 [...]
 | [hoodie.sensitive.config.keys](#hoodiesensitiveconfigkeys)   
 | 
ssl,tls,sasl,auth,credentials| Comma separated 
list of filters for sensitive config keys. Hudi Streamer will not print any 
configuration which contains the configured filter. For example with a 
configured filter `ssl`, value for config `ssl.trustore.location` would be 
masked.`Config Param: SENSITIVE_CONFIG_KEYS_FILTER` [...]
 | 
[hoodie.skip.default.partition.validation](#hoodieskipdefaultpartitionvalidation)
 | false
| When table is upgraded from pre 0.12 to 0.12, 
we check for "default" partition and fail if found one. Users are expected to 
rewrite the data in those partitions. Enabling this config will bypass this 
validation`Config Param: SKIP_DEFAULT_PARTITION_VALIDATION``Since 
Version: 0.12.0`  [...]
 | [hoodie.table.base.file.format](#hoodietablebasefileformat)  
 | PARQUET  
| File format to store all the base file 
data. org.apache.hudi.common.model.HoodieFileFormat: Hoodie file formats. 
PARQUET(default): Apache Parquet is an open source, column-oriented data file 
format designed for efficient data storage and retrieval. It provides efficient 
data compression and [...]



(hudi) branch master updated: [HUDI-7882] Picking RFC-78 for bridge release (#11515)

2024-06-25 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1c731769d60 [HUDI-7882] Picking RFC-78 for bridge release (#11515)
1c731769d60 is described below

commit 1c731769d601c1c2effbc02cc602acf3169d034d
Author: Sivabalan Narayanan 
AuthorDate: Tue Jun 25 19:19:58 2024 -0700

[HUDI-7882] Picking RFC-78 for bridge release (#11515)
---
 rfc/README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rfc/README.md b/rfc/README.md
index c3ad9178466..2fdd3d8db49 100644
--- a/rfc/README.md
+++ b/rfc/README.md
@@ -112,4 +112,5 @@ The list of all RFCs can be found here.
 | 74 | [`HoodieStorage`: Hudi Storage Abstraction and 
APIs](./rfc-74/rfc-74.md)   

  | `UNDER REVIEW` |
 | 75 | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md)   

 | `UNDER 
REVIEW` |
 | 76 | [Auto Record key generation](./rfc-76/rfc-76.md)

 | `IN 
PROGRESS`  |
-| 77 | [Secondary Index](./rfc-77/rfc-77.md)   

 | `UNDER 
REVIEW` |
\ No newline at end of file
+| 77 | [Secondary Index](./rfc-77/rfc-77.md)   

 | `UNDER 
REVIEW` |
+| 78 | [Bridge release for 1.x](./rfc-78/rfc-78.md)

 | `IN 
PROGRESS`  |
\ No newline at end of file



(hudi) branch branch-0.x updated: [HUDI-6508] Support compilation on Java 11 (#11513)

2024-06-25 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch branch-0.x
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/branch-0.x by this push:
 new 538e6619ed5 [HUDI-6508] Support compilation on Java 11 (#11513)
538e6619ed5 is described below

commit 538e6619ed50cd64d12652058e4b5c68cfef0f99
Author: Y Ethan Guo 
AuthorDate: Tue Jun 25 18:56:01 2024 -0700

[HUDI-6508] Support compilation on Java 11 (#11513)
---
 .github/workflows/bot.yml  | 167 +++--
 .../hudi/table/TestHoodieMergeOnReadTable.java |   8 +-
 .../commit/TestCopyOnWriteActionExecutor.java  |  15 +-
 .../hudi/metadata/HoodieTableMetadataUtil.java |  21 ++-
 hudi-examples/hudi-examples-common/pom.xml |  14 --
 hudi-examples/hudi-examples-java/pom.xml   |  14 --
 .../org/apache/hudi/common/util/ParquetUtils.java  |  21 +--
 7 files changed, 186 insertions(+), 74 deletions(-)

diff --git a/.github/workflows/bot.yml b/.github/workflows/bot.yml
index 72200c4822d..5d659123f13 100644
--- a/.github/workflows/bot.yml
+++ b/.github/workflows/bot.yml
@@ -245,12 +245,6 @@ jobs:
   - scalaProfile: "scala-2.12"
 sparkProfile: "spark3.4"
 sparkModules: "hudi-spark-datasource/hudi-spark3.4.x"
-  - scalaProfile: "scala-2.12"
-sparkProfile: "spark3.5"
-sparkModules: "hudi-spark-datasource/hudi-spark3.5.x"
-  - scalaProfile: "scala-2.13"
-sparkProfile: "spark3.5"
-sparkModules: "hudi-spark-datasource/hudi-spark3.5.x"
 
 steps:
   - uses: actions/checkout@v3
@@ -285,7 +279,6 @@ jobs:
   SCALA_PROFILE: ${{ matrix.scalaProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
   SPARK_MODULES: ${{ matrix.sparkModules }}
-if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 
as it's covered by Azure CI
 run:
   mvn test -Punit-tests -Pjava17 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DwildcardSuites=skipScalaTests -DfailIfNoTests=false -pl 
"hudi-common,$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
   - name: Java FT - Spark
@@ -293,7 +286,6 @@ jobs:
   SCALA_PROFILE: ${{ matrix.scalaProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
   SPARK_MODULES: ${{ matrix.sparkModules }}
-if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 
as it's covered by Azure CI
 run:
   mvn test -Pfunctional-tests -Pjava17 -D"$SCALA_PROFILE" 
-D"$SPARK_PROFILE" -pl "$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
 
@@ -308,6 +300,49 @@ jobs:
   - scalaProfile: "scala-2.12"
 sparkProfile: "spark3.4"
 sparkModules: "hudi-spark-datasource/hudi-spark3.4.x"
+
+steps:
+  - uses: actions/checkout@v3
+  - name: Set up JDK 8
+uses: actions/setup-java@v3
+with:
+  java-version: '8'
+  distribution: 'temurin'
+  architecture: x64
+  cache: maven
+  - name: Build Project
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+run:
+  mvn clean install -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DskipTests=true $MVN_ARGS -am -pl 
"hudi-examples/hudi-examples-spark,hudi-common,$SPARK_COMMON_MODULES,$SPARK_MODULES"
+  - name: Set up JDK 17
+uses: actions/setup-java@v3
+with:
+  java-version: '17'
+  distribution: 'temurin'
+  architecture: x64
+  cache: maven
+  - name: Scala UT - Common & Spark
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  SPARK_MODULES: ${{ matrix.sparkModules }}
+run:
+  mvn test -Punit-tests -Pjava17 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-Dtest=skipJavaTests -DfailIfNoTests=false -pl 
"hudi-common,$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
+  - name: Scala FT - Spark
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  SPARK_MODULES: ${{ matrix.sparkModules }}
+run:
+  mvn test -Pfunctional-tests -Pjava17 -D"$SCALA_PROFILE" 
-D"$SPARK_PROFILE" -Dtest=skipJavaTests -DfailIfNoTests=false -pl 
"$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
+
+  test-spark-java11-17-java-tests:
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+include:
   - scalaProfile: "scala-2.12"
 sparkProfile: "spark3.5"
 sparkModules: "hudi-spark-datasource/hudi-spark3.5.x"
@@ -317,10 +352,65 @@ jobs:
 
 steps:
   - uses: actions/checkout@v3
-  - name: Set up JDK 8
+  - name: Set up JDK 11
 uses: actions/setup-java@v3
 with:
-  java-version: '8'
+  java-version: '11'
+  

(hudi) branch master updated (3152e47876f -> 4b7e6e41573)

2024-06-25 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 3152e47876f [MINOR] Bump JUnit version to 5.8.2 (#11511)
 add 4b7e6e41573 [HUDI-7922] Add Hudi CLI bundle for Scala 2.13 (#11495)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/bot.yml |  2 +-
 .../apache/hudi/cli/commands/ArchivedCommitsCommand.java  |  8 +---
 .../org/apache/hudi/cli/commands/CompactionCommand.java   | 15 ---
 .../scala/org/apache/hudi/util/JavaScalaConverters.scala  |  8 
 scripts/release/deploy_staging_jars.sh|  3 ++-
 scripts/release/validate_staged_bundles.sh|  2 +-
 6 files changed, 25 insertions(+), 13 deletions(-)



(hudi) branch master updated (4370178eb0b -> 3152e47876f)

2024-06-25 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 4370178eb0b [HUDI-7927] Lazy init secondary view in FS view (#10652)
 add 3152e47876f [MINOR] Bump JUnit version to 5.8.2 (#11511)

No new revisions were added by this update.

Summary of changes:
 pom.xml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)



[jira] [Updated] (HUDI-7882) Umbrella ticket to track all changes required to support reading 1.x tables with 0.16.0

2024-06-25 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7882:
--
Sprint: 2024/06/17-30

> Umbrella ticket to track all changes required to support reading 1.x tables 
> with 0.16.0 
> 
>
> Key: HUDI-7882
> URL: https://issues.apache.org/jira/browse/HUDI-7882
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.16.0
>
>
> We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
> umbrella ticket to track all of them.
>  
> Changes required to be ported: 
> 0. Creating 0.16.0 branch
> 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 
>  
> 1. Timeline 
> 1.a Hoodie instant parsing should be able to read 1.x instants. 
> https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 
> 1.b Commit metadata parsing is able to handle both json and avro formats. 
> Scope might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  
> Siva.
> 1.c HoodieDefaultTimeline able to read both timelines based on table version. 
>  https://issues.apache.org/jira/browse/HUDI-7884 Siva.
> 1.d Reading LSM timeline using 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7890 Siva. 
> 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901
>  
> 2. Table property changes 
> 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
> https://issues.apache.org/jira/browse/HUDI-7865 LJ
>  
> 3. MDT table changes
> 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ
> 3.b MDT payload schema changes. 
> https://issues.apache.org/jira/browse/HUDI-7886 LJ
>  
> 4. Log format changes
> 4.a All metadata header types porting 
> https://issues.apache.org/jira/browse/HUDI-7887 Jon
> 4.b Meaningful error for incompatible features from 1.x 
> https://issues.apache.org/jira/browse/HUDI-7888 Jon
>  
> 5. Log file slice or grouping detection compatibility 
>  
> 5. Tests 
> 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 
>  
> 6 Doc changes 
> 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
> https://issues.apache.org/jira/browse/HUDI-7889 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7882) Umbrella ticket to track all changes required to support reading 1.x tables with 0.16.0

2024-06-25 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7882:
--
Fix Version/s: 0.16.0

> Umbrella ticket to track all changes required to support reading 1.x tables 
> with 0.16.0 
> 
>
> Key: HUDI-7882
> URL: https://issues.apache.org/jira/browse/HUDI-7882
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.16.0
>
>
> We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
> umbrella ticket to track all of them.
>  
> Changes required to be ported: 
> 0. Creating 0.16.0 branch
> 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 
>  
> 1. Timeline 
> 1.a Hoodie instant parsing should be able to read 1.x instants. 
> https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 
> 1.b Commit metadata parsing is able to handle both json and avro formats. 
> Scope might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  
> Siva.
> 1.c HoodieDefaultTimeline able to read both timelines based on table version. 
>  https://issues.apache.org/jira/browse/HUDI-7884 Siva.
> 1.d Reading LSM timeline using 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7890 Siva. 
> 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901
>  
> 2. Table property changes 
> 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
> https://issues.apache.org/jira/browse/HUDI-7865 LJ
>  
> 3. MDT table changes
> 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ
> 3.b MDT payload schema changes. 
> https://issues.apache.org/jira/browse/HUDI-7886 LJ
>  
> 4. Log format changes
> 4.a All metadata header types porting 
> https://issues.apache.org/jira/browse/HUDI-7887 Jon
> 4.b Meaningful error for incompatible features from 1.x 
> https://issues.apache.org/jira/browse/HUDI-7888 Jon
>  
> 5. Log file slice or grouping detection compatibility 
>  
> 5. Tests 
> 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 
>  
> 6 Doc changes 
> 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
> https://issues.apache.org/jira/browse/HUDI-7889 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7882) Umbrella ticket to track all changes required to support reading 1.x tables with 0.16.0

2024-06-25 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-7882:
-

Assignee: sivabalan narayanan

> Umbrella ticket to track all changes required to support reading 1.x tables 
> with 0.16.0 
> 
>
> Key: HUDI-7882
> URL: https://issues.apache.org/jira/browse/HUDI-7882
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
> umbrella ticket to track all of them.
>  
> Changes required to be ported: 
> 0. Creating 0.16.0 branch
> 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 
>  
> 1. Timeline 
> 1.a Hoodie instant parsing should be able to read 1.x instants. 
> https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 
> 1.b Commit metadata parsing is able to handle both json and avro formats. 
> Scope might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  
> Siva.
> 1.c HoodieDefaultTimeline able to read both timelines based on table version. 
>  https://issues.apache.org/jira/browse/HUDI-7884 Siva.
> 1.d Reading LSM timeline using 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7890 Siva. 
> 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901
>  
> 2. Table property changes 
> 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
> https://issues.apache.org/jira/browse/HUDI-7865 LJ
>  
> 3. MDT table changes
> 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ
> 3.b MDT payload schema changes. 
> https://issues.apache.org/jira/browse/HUDI-7886 LJ
>  
> 4. Log format changes
> 4.a All metadata header types porting 
> https://issues.apache.org/jira/browse/HUDI-7887 Jon
> 4.b Meaningful error for incompatible features from 1.x 
> https://issues.apache.org/jira/browse/HUDI-7888 Jon
>  
> 5. Log file slice or grouping detection compatibility 
>  
> 5. Tests 
> 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 
>  
> 6 Doc changes 
> 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
> https://issues.apache.org/jira/browse/HUDI-7889 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7927] Lazy init secondary view in FS view (#10652)

2024-06-25 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4370178eb0b [HUDI-7927] Lazy init secondary view in FS view (#10652)
4370178eb0b is described below

commit 4370178eb0b8d1adad5148a2967f60b921568b27
Author: Tim Brown 
AuthorDate: Tue Jun 25 19:34:59 2024 -0500

[HUDI-7927] Lazy init secondary view in FS view (#10652)
---
 .../common/table/view/FileSystemViewManager.java   | 50 
 .../table/view/PriorityBasedFileSystemView.java| 90 --
 .../view/TestPriorityBasedFileSystemView.java  | 83 +++-
 3 files changed, 164 insertions(+), 59 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java
index 7b729dacac4..d875168085c 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java
@@ -23,6 +23,7 @@ import org.apache.hudi.common.config.HoodieMetadataConfig;
 import org.apache.hudi.common.config.HoodieMetaserverConfig;
 import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.function.SerializableFunctionUnchecked;
+import org.apache.hudi.common.function.SerializableSupplier;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.Functions.Function2;
@@ -260,25 +261,42 @@ public class FileSystemViewManager {
 return new FileSystemViewManager(context, config, (metaClient, 
viewConfig) -> {
   RemoteHoodieTableFileSystemView remoteFileSystemView =
   createRemoteFileSystemView(viewConfig, metaClient);
-  SyncableFileSystemView secondaryView;
-  switch (viewConfig.getSecondaryStorageType()) {
-case MEMORY:
-  secondaryView = createInMemoryFileSystemView(viewConfig, 
metaClient, metadataCreator);
-  break;
-case EMBEDDED_KV_STORE:
-  secondaryView = createRocksDBBasedFileSystemView(viewConfig, 
metaClient);
-  break;
-case SPILLABLE_DISK:
-  secondaryView = 
createSpillableMapBasedFileSystemView(viewConfig, metaClient, commonConfig);
-  break;
-default:
-  throw new IllegalArgumentException("Secondary Storage type can 
only be in-memory or spillable. Was :"
-  + viewConfig.getSecondaryStorageType());
-  }
-  return new PriorityBasedFileSystemView(remoteFileSystemView, 
secondaryView);
+  SerializableSupplier secondaryViewSupplier = 
new SecondaryViewSupplier(viewConfig, metaClient, commonConfig, 
metadataCreator);
+  return new PriorityBasedFileSystemView(remoteFileSystemView, 
secondaryViewSupplier);
 });
   default:
 throw new IllegalArgumentException("Unknown file system view type :" + 
config.getStorageType());
 }
   }
+
+  private static class SecondaryViewSupplier implements 
SerializableSupplier {
+private final FileSystemViewStorageConfig viewConfig;
+private final HoodieTableMetaClient metaClient;
+private final HoodieCommonConfig commonConfig;
+private final SerializableFunctionUnchecked metadataCreator;
+
+private SecondaryViewSupplier(FileSystemViewStorageConfig viewConfig,
+  HoodieTableMetaClient metaClient, 
HoodieCommonConfig commonConfig,
+  
SerializableFunctionUnchecked 
metadataCreator) {
+  this.viewConfig = viewConfig;
+  this.metaClient = metaClient;
+  this.commonConfig = commonConfig;
+  this.metadataCreator = metadataCreator;
+}
+
+@Override
+public SyncableFileSystemView get() {
+  switch (viewConfig.getSecondaryStorageType()) {
+case MEMORY:
+  return createInMemoryFileSystemView(viewConfig, metaClient, 
metadataCreator);
+case EMBEDDED_KV_STORE:
+  return createRocksDBBasedFileSystemView(viewConfig, metaClient);
+case SPILLABLE_DISK:
+  return createSpillableMapBasedFileSystemView(viewConfig, metaClient, 
commonConfig);
+default:
+  throw new IllegalArgumentException("Secondary Storage type can only 
be in-memory or spillable. Was :"
+  + viewConfig.getSecondaryStorageType());
+  }
+}
+  }
 }
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/view/PriorityBasedFileSystemView.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/view/PriorityBasedFileSystemView.java
index 8cfd6d64713..87fb73893a7 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/ta

(hudi) branch release-0.14.1-spark35-scala213 updated: [HUDI-6508] Support compilation on Java 11

2024-06-25 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch release-0.14.1-spark35-scala213
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to 
refs/heads/release-0.14.1-spark35-scala213 by this push:
 new 027690bc175 [HUDI-6508] Support compilation on Java 11
027690bc175 is described below

commit 027690bc17572f585f7ff13813ae00a03e6a0a32
Author: Y Ethan Guo 
AuthorDate: Tue Jun 25 11:10:22 2024 -0700

[HUDI-6508] Support compilation on Java 11
---
 .github/workflows/bot.yml  | 167 +++--
 .../hudi/table/TestHoodieMergeOnReadTable.java |   8 +-
 .../commit/TestCopyOnWriteActionExecutor.java  |  14 +-
 .../org/apache/hudi/common/util/ParquetUtils.java  |  17 +--
 .../hudi/metadata/HoodieTableMetadataUtil.java |  21 ++-
 hudi-examples/hudi-examples-common/pom.xml |  14 --
 hudi-examples/hudi-examples-java/pom.xml   |  14 --
 7 files changed, 183 insertions(+), 72 deletions(-)

diff --git a/.github/workflows/bot.yml b/.github/workflows/bot.yml
index 017c0d41fb5..2a812b565e1 100644
--- a/.github/workflows/bot.yml
+++ b/.github/workflows/bot.yml
@@ -245,12 +245,6 @@ jobs:
   - scalaProfile: "scala-2.12"
 sparkProfile: "spark3.4"
 sparkModules: "hudi-spark-datasource/hudi-spark3.4.x"
-  - scalaProfile: "scala-2.12"
-sparkProfile: "spark3.5"
-sparkModules: "hudi-spark-datasource/hudi-spark3.5.x"
-  - scalaProfile: "scala-2.13"
-sparkProfile: "spark3.5"
-sparkModules: "hudi-spark-datasource/hudi-spark3.5.x"
 
 steps:
   - uses: actions/checkout@v3
@@ -285,7 +279,6 @@ jobs:
   SCALA_PROFILE: ${{ matrix.scalaProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
   SPARK_MODULES: ${{ matrix.sparkModules }}
-if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 
as it's covered by Azure CI
 run:
   mvn test -Punit-tests -Pjava17 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DwildcardSuites=skipScalaTests -DfailIfNoTests=false -pl 
"hudi-common,$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
   - name: Java FT - Spark
@@ -293,7 +286,6 @@ jobs:
   SCALA_PROFILE: ${{ matrix.scalaProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
   SPARK_MODULES: ${{ matrix.sparkModules }}
-if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 
as it's covered by Azure CI
 run:
   mvn test -Pfunctional-tests -Pjava17 -D"$SCALA_PROFILE" 
-D"$SPARK_PROFILE" -pl "$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
 
@@ -308,6 +300,49 @@ jobs:
   - scalaProfile: "scala-2.12"
 sparkProfile: "spark3.4"
 sparkModules: "hudi-spark-datasource/hudi-spark3.4.x"
+
+steps:
+  - uses: actions/checkout@v3
+  - name: Set up JDK 8
+uses: actions/setup-java@v3
+with:
+  java-version: '8'
+  distribution: 'temurin'
+  architecture: x64
+  cache: maven
+  - name: Build Project
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+run:
+  mvn clean install -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DskipTests=true $MVN_ARGS -am -pl 
"hudi-examples/hudi-examples-spark,hudi-common,$SPARK_COMMON_MODULES,$SPARK_MODULES"
+  - name: Set up JDK 17
+uses: actions/setup-java@v3
+with:
+  java-version: '17'
+  distribution: 'temurin'
+  architecture: x64
+  cache: maven
+  - name: Scala UT - Common & Spark
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  SPARK_MODULES: ${{ matrix.sparkModules }}
+run:
+  mvn test -Punit-tests -Pjava17 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-Dtest=skipJavaTests -DfailIfNoTests=false -pl 
"hudi-common,$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
+  - name: Scala FT - Spark
+env:
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
+  SPARK_PROFILE: ${{ matrix.sparkProfile }}
+  SPARK_MODULES: ${{ matrix.sparkModules }}
+run:
+  mvn test -Pfunctional-tests -Pjava17 -D"$SCALA_PROFILE" 
-D"$SPARK_PROFILE" -Dtest=skipJavaTests -DfailIfNoTests=false -pl 
"$SPARK_COMMON_MODULES,$SPARK_MODULES" $MVN_ARGS
+
+  test-spark-java11-17-java-tests:
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+include:
   - scalaProfile: "scala-2.12"
 sparkProfile: "spark3.5"
 sparkModules: "hudi-spark-datasource/hudi-spark3.5.x"
@@ -317,10 +352,65 @@ jobs:
 
 steps:
   - uses: actions/checkout@v3
-  - name: Set up JDK 8
+  - name: Set up JDK 11
 uses: actions/setup-java@v3
 with:
-  java-version: '8'
+  j

[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-06-25 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7711:

Reviewers: Ethan Guo

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Assignee: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: {TABLE}
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(S

[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-06-25 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7711:

Status: Patch Available  (was: In Progress)

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Assignee: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: {TABLE}
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.dep

[jira] [Assigned] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-06-25 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7711:
---

Assignee: Jihwan Lee

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Assignee: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: {TABLE}
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$

[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-06-25 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7711:

Sprint: 2024/06/17-30

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Assignee: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: {TABLE}
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(

[jira] [Updated] (HUDI-7711) Fix MultiTableStreamer can deal with path of properties file for each streamer

2024-06-25 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7711:

Fix Version/s: 0.15.1
   1.0.0

> Fix MultiTableStreamer can deal with path of properties file for each streamer
> --
>
> Key: HUDI-7711
> URL: https://issues.apache.org/jira/browse/HUDI-7711
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hudi-utilities
> Environment: hudi0.14.1, Spark3.2
>Reporter: Jihwan Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0, 0.15.1
>
>
> HudiMultiTableStreamer initializes common configs, then deepcopy related 
> fields into each streams.
> Because _propsFilePath_ on each streamer is not handled, they always retrieve 
> path of test files as default value.
>  
> Also, if runs MultiTableStreamer with {_}--hoodie-conf{_}, each streamer 
> should be able to have these configs. (such like inheritance)
>  
> MultiTable configs (kafka-source.properties):
>  
> {code:java}
> ...
> hoodie.streamer.ingestion.tablesToBeIngested=db.tbl1,db.tb2
> hoodie.streamer.ingestion.db.tbl1.configFile=hdfs:///tmp/config_1.properties
> hoodie.streamer.ingestion.db.tbl2.configFile=hdfs:///tmp/config_2.properties
> ... {code}
>  
>  
> /tmp/config_1.properties:
>  
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic1
> ... {code}
>  
>  
> /tmp/config_2.properties:
> {code:java}
> ...
> hoodie.datasource.write.recordkey.field=id
> hoodie.streamer.source.kafka.topic=topic2
> ... {code}
>  
> error log (workspace is replaced to \{RUNNING_PATH}) :
>  
> {code:java}
> 24/05/04 21:41:01 ERROR config.DFSPropertiesConfiguration: Error reading in 
> properties from dfs from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
> 24/05/04 21:41:01 INFO streamer.StreamSync: Shutting down embedded timeline 
> server
> 24/05/04 21:41:01 ERROR streamer.HoodieMultiTableStreamer: error while 
> running MultiTableDeltaStreamer for table: {TABLE}
> org.apache.hudi.exception.HoodieIOException: Cannot read properties from dfs 
> from file 
> file:{RUNNING_PATH}/src/test/resources/streamer-config/dfs-source.properties
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:168)
>         at 
> org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:87)
>         at 
> org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:258)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$Config.getProps(HoodieStreamer.java:453)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.getDeducedSchemaProvider(StreamSync.java:714)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchNextBatchFromSource(StreamSync.java:676)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.fetchFromSourceAndPrepareRecords(StreamSync.java:568)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.readFromSource(StreamSync.java:540)
>         at 
> org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:444)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:874)
>         at 
> org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
>         at org.apache.hudi.common.util.Option.ifPresent(Option.java:101)
>         at 
> org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:216)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.sync(HoodieMultiTableStreamer.java:457)
>         at 
> org.apache.hudi.utilities.streamer.HoodieMultiTableStreamer.main(HoodieMultiTableStreamer.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubm

(hudi) branch master updated: [MINOR] Removed useless checks from SqlBasedTransformers (#11499)

2024-06-25 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c5ff6a2113f [MINOR] Removed useless checks from SqlBasedTransformers 
(#11499)
c5ff6a2113f is described below

commit c5ff6a2113fae46a0c3d29b0f5e574eb8e0a9c62
Author: Vova Kolmakov 
AuthorDate: Tue Jun 25 21:32:10 2024 +0700

[MINOR] Removed useless checks from SqlBasedTransformers (#11499)

Co-authored-by: Vova Kolmakov 
---
 .../apache/hudi/utilities/transform/SqlFileBasedTransformer.java | 9 ++---
 .../hudi/utilities/transform/SqlQueryBasedTransformer.java   | 8 ++--
 .../hudi/utilities/transform/TestSqlQueryBasedTransformer.java   | 5 +
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlFileBasedTransformer.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlFileBasedTransformer.java
index 6c3b10bd264..cdef1677e58 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlFileBasedTransformer.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlFileBasedTransformer.java
@@ -21,7 +21,6 @@ package org.apache.hudi.utilities.transform;
 import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.hadoop.fs.HadoopFSUtils;
 import org.apache.hudi.utilities.config.SqlTransformerConfig;
-import org.apache.hudi.utilities.exception.HoodieTransformException;
 import org.apache.hudi.utilities.exception.HoodieTransformExecutionException;
 
 import org.apache.hadoop.fs.FileSystem;
@@ -72,22 +71,18 @@ public class SqlFileBasedTransformer implements Transformer 
{
   final TypedProperties props) {
 
 final String sqlFile = getStringWithAltKeys(props, 
SqlTransformerConfig.TRANSFORMER_SQL_FILE);
-if (null == sqlFile) {
-  throw new HoodieTransformException(
-  "Missing required configuration : (" + 
SqlTransformerConfig.TRANSFORMER_SQL_FILE.key() + ")");
-}
 
 final FileSystem fs = HadoopFSUtils.getFs(sqlFile, 
jsc.hadoopConfiguration(), true);
 // tmp table name doesn't like dashes
 final String tmpTable = 
TMP_TABLE.concat(UUID.randomUUID().toString().replace("-", "_"));
-LOG.info("Registering tmp table : " + tmpTable);
+LOG.info("Registering tmp table: {}", tmpTable);
 rowDataset.createOrReplaceTempView(tmpTable);
 
 try (final Scanner scanner = new Scanner(fs.open(new Path(sqlFile)), 
"UTF-8")) {
   Dataset rows = null;
   // each sql statement is separated with semicolon hence set that as 
delimiter.
   scanner.useDelimiter(";");
-  LOG.info("SQL Query for transformation : ");
+  LOG.info("SQL Query for transformation:");
   while (scanner.hasNext()) {
 String sqlStr = scanner.next();
 sqlStr = sqlStr.replaceAll(SRC_PATTERN, tmpTable).trim();
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlQueryBasedTransformer.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlQueryBasedTransformer.java
index 636e4784950..290e3a69432 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlQueryBasedTransformer.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/SqlQueryBasedTransformer.java
@@ -20,7 +20,6 @@ package org.apache.hudi.utilities.transform;
 
 import org.apache.hudi.common.config.TypedProperties;
 import org.apache.hudi.utilities.config.SqlTransformerConfig;
-import org.apache.hudi.utilities.exception.HoodieTransformException;
 import org.apache.hudi.utilities.exception.HoodieTransformExecutionException;
 
 import org.apache.spark.api.java.JavaSparkContext;
@@ -50,17 +49,14 @@ public class SqlQueryBasedTransformer implements 
Transformer {
   public Dataset apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset rowDataset,
   TypedProperties properties) {
 String transformerSQL = getStringWithAltKeys(properties, 
SqlTransformerConfig.TRANSFORMER_SQL);
-if (null == transformerSQL) {
-  throw new HoodieTransformException("Missing configuration : (" + 
SqlTransformerConfig.TRANSFORMER_SQL.key() + ")");
-}
 
 try {
   // tmp table name doesn't like dashes
   String tmpTable = 
TMP_TABLE.concat(UUID.randomUUID().toString().replace("-", "_"));
-  LOG.info("Registering tmp table : " + tmpTable);
+  LOG.info("Registering tmp table: {}", tmpTable);
   rowDataset.createOrReplaceTempView(tmpTable);
   String sqlStr = transformerSQL.replaceAll(SRC_PATTERN, tmpTable);
-  LOG.debug("SQL Query for transformation : (" + sqlStr + ")");
+  LOG.debug("SQL Query for transformation: {}", sqlStr);
   Dataset transformed = sparkSession.sql(sqlStr);
   sparkSession.catalog().dropTempView(tmpTable);
   return transformed;

[jira] [Updated] (HUDI-7925) Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieHadoopFsRelationFactory`

2024-06-25 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-7925:

Description: 
There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
"hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
could got ClassCastException during extracting., for instance, see HUDI-7709.
Need to implement logic similar to `HoodieBaseRelation`.

  was:
There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
"hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
could got ClassCastException during extracting., for instance, see .
Need to implement logic similar to `HoodieBaseRelation`.


> Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`
> --
>
> Key: HUDI-7925
> URL: https://issues.apache.org/jira/browse/HUDI-7925
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Priority: Major
>
> There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
> "hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
> could got ClassCastException during extracting., for instance, see HUDI-7709.
> Need to implement logic similar to `HoodieBaseRelation`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7925) Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieHadoopFsRelationFactory`

2024-06-25 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-7925:

Description: 
There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
"hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
could got ClassCastException during extracting., for instance, see .
Need to implement logic similar to `HoodieBaseRelation`.

  was:
There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
"hoodie.file.group.reader.enabled" = "true", which is default behavior, we got 
null values.
Need to implement logic similar to `HoodieBaseRelation`.


> Implement logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`
> --
>
> Key: HUDI-7925
> URL: https://issues.apache.org/jira/browse/HUDI-7925
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Geser Dugarov
>Priority: Major
>
> There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
> `HoodieHadoopFsRelationFactory`. Therefore during reading of data with 
> "hoodie.file.group.reader.enabled" = "true", which is default behavior, we 
> could got ClassCastException during extracting., for instance, see .
> Need to implement logic similar to `HoodieBaseRelation`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)