(hudi) branch master updated (2cc45cc228a -> bb76de48e9f)

2024-06-21 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 2cc45cc228a [HUDI-7881] Verify table base path as well for syncing 
table in bigquery metastore (#11460)
 add bb76de48e9f [HUDI-6508] Support compilation on Java 11 (#11479)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/bot.yml  | 167 +++--
 .../TestHoodieClientOnCopyOnWriteStorage.java  |   6 +-
 .../hudi/table/TestHoodieMergeOnReadTable.java |   8 +-
 .../commit/TestCopyOnWriteActionExecutor.java  |  15 +-
 .../hudi/metadata/HoodieTableMetadataUtil.java |  21 ++-
 hudi-examples/hudi-examples-common/pom.xml |  14 --
 hudi-examples/hudi-examples-java/pom.xml   |  14 --
 .../org/apache/hudi/common/util/ParquetUtils.java  |  21 +--
 8 files changed, 190 insertions(+), 76 deletions(-)



(hudi) branch master updated (8a4bed03fa7 -> 2cc45cc228a)

2024-06-21 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 8a4bed03fa7 [HUDI-7849] Reduce time spent on running 
testFiltersInFileFormat (#11423)
 add 2cc45cc228a [HUDI-7881] Verify table base path as well for syncing 
table in bigquery metastore (#11460)

No new revisions were added by this update.

Summary of changes:
 .../gcp/bigquery/HoodieBigQuerySyncClient.java | 13 +++-
 .../hudi/gcp/bigquery/TestBigQuerySyncTool.java|  5 +++
 .../gcp/bigquery/TestHoodieBigQuerySyncClient.java | 37 ++
 .../org/apache/hudi/common/util/StringUtils.java   | 25 +++
 .../apache/hudi/common/util/TestStringUtils.java   | 10 ++
 5 files changed, 89 insertions(+), 1 deletion(-)



(hudi) branch master updated: [HUDI-7849] Reduce time spent on running testFiltersInFileFormat (#11423)

2024-06-21 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 8a4bed03fa7 [HUDI-7849] Reduce time spent on running 
testFiltersInFileFormat (#11423)
8a4bed03fa7 is described below

commit 8a4bed03fa7c64c4ede559113e9ff06085b983a7
Author: Vova Kolmakov 
AuthorDate: Sat Jun 22 12:07:19 2024 +0700

[HUDI-7849] Reduce time spent on running testFiltersInFileFormat (#11423)

Co-authored-by: Vova Kolmakov 
---
 .../java/org/apache/hudi/functional/TestFiltersInFileGroupReader.java | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestFiltersInFileGroupReader.java
 
b/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestFiltersInFileGroupReader.java
index b8ca6373237..eaf6312f800 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestFiltersInFileGroupReader.java
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestFiltersInFileGroupReader.java
@@ -47,8 +47,8 @@ public class TestFiltersInFileGroupReader extends 
TestBootstrapReadBase {
 this.dashPartitions = true;
 this.tableType = HoodieTableType.MERGE_ON_READ;
 this.nPartitions = 2;
-this.nInserts = 10;
-this.nUpdates = 2;
+this.nInserts = 100;
+this.nUpdates = 20;
 sparkSession.conf().set(SQLConf.PARQUET_RECORD_FILTER_ENABLED().key(), 
"true");
 setupDirs();
 



[jira] [Closed] (HUDI-7906) improve the parallelism deduce in rdd write

2024-06-21 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7906.

Resolution: Fixed

Fixed via master branch: 51c9c0e226ab158556de87dc0e5c3e6530b6b8c1

> improve the parallelism deduce in rdd write
> ---
>
> Key: HUDI-7906
> URL: https://issues.apache.org/jira/browse/HUDI-7906
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
>
> as [https://github.com/apache/hudi/issues/11274] and 
> [https://github.com/apache/hudi/pull/11463] describe, there has two case 
> question.
>  # if the rdd is input rdd without shuffle, the partitiion number is too 
> bigger or too small
>  # user need can not control it easy
>  ## in some case user can set `spark.default.parallelism` change it.
>  ## in some case user can not change because hard-code
>  ## and in spark, the better way is use `spark.default.parallelism` or 
> `spark.sql.shuffle.partitions` can control it, other is advanced in hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7906) improve the parallelism deduce in rdd write

2024-06-21 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7906:
-
Fix Version/s: 0.16.0
   1.0.0

> improve the parallelism deduce in rdd write
> ---
>
> Key: HUDI-7906
> URL: https://issues.apache.org/jira/browse/HUDI-7906
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> as [https://github.com/apache/hudi/issues/11274] and 
> [https://github.com/apache/hudi/pull/11463] describe, there has two case 
> question.
>  # if the rdd is input rdd without shuffle, the partitiion number is too 
> bigger or too small
>  # user need can not control it easy
>  ## in some case user can set `spark.default.parallelism` change it.
>  ## in some case user can not change because hard-code
>  ## and in spark, the better way is use `spark.default.parallelism` or 
> `spark.sql.shuffle.partitions` can control it, other is advanced in hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7906] Improve the parallelism deduce in rdd write (#11470)

2024-06-21 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 51c9c0e226a [HUDI-7906] Improve the parallelism deduce in rdd write 
(#11470)
51c9c0e226a is described below

commit 51c9c0e226ab158556de87dc0e5c3e6530b6b8c1
Author: KnightChess <981159...@qq.com>
AuthorDate: Sat Jun 22 12:29:35 2024 +0800

[HUDI-7906] Improve the parallelism deduce in rdd write (#11470)
---
 .../org/apache/hudi/config/HoodieIndexConfig.java  |  2 +-
 .../hudi/index/simple/HoodieGlobalSimpleIndex.java |  5 +++-
 .../hudi/index/simple/HoodieSimpleIndex.java   | 10 
 .../table/action/commit/HoodieDeleteHelper.java|  2 +-
 .../table/action/commit/HoodieWriteHelper.java |  2 +-
 .../table/action/commit/TestWriterHelperBase.java  | 19 ---
 .../org/apache/hudi/data/HoodieJavaPairRDD.java| 23 ++
 .../java/org/apache/hudi/data/HoodieJavaRDD.java   | 23 ++
 .../index/bloom/SparkHoodieBloomIndexHelper.java   |  2 +-
 .../scala/org/apache/hudi/HoodieSparkUtils.scala   |  4 ++--
 .../org/apache/hudi/data/TestHoodieJavaRDD.java| 28 ++
 .../table/action/commit/TestSparkWriteHelper.java  | 23 ++
 .../org/apache/hudi/common/data/HoodieData.java|  5 
 .../apache/hudi/common/data/HoodieListData.java|  5 
 .../hudi/common/data/HoodieListPairData.java   |  5 
 .../apache/hudi/common/data/HoodiePairData.java|  5 
 .../spark/sql/hudi/dml/TestInsertTable.scala   |  2 ++
 17 files changed, 134 insertions(+), 31 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
index c80c5a2de8a..385532917c4 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
@@ -168,7 +168,7 @@ public class HoodieIndexConfig extends HoodieConfig {
 
   public static final ConfigProperty GLOBAL_SIMPLE_INDEX_PARALLELISM = 
ConfigProperty
   .key("hoodie.global.simple.index.parallelism")
-  .defaultValue("100")
+  .defaultValue("0")
   .markAdvanced()
   .withDocumentation("Only applies if index type is GLOBAL_SIMPLE. "
   + "This limits the parallelism of fetching records from the base 
files of all table "
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java
index 7432d606839..3c76ff17935 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieGlobalSimpleIndex.java
@@ -69,8 +69,11 @@ public class HoodieGlobalSimpleIndex extends 
HoodieSimpleIndex {
   HoodieData> inputRecords, HoodieEngineContext context,
   HoodieTable hoodieTable) {
 List> latestBaseFiles = 
getAllBaseFilesInTable(context, hoodieTable);
+int configuredSimpleIndexParallelism = 
config.getGlobalSimpleIndexParallelism();
+int fetchParallelism =
+configuredSimpleIndexParallelism > 0 ? 
configuredSimpleIndexParallelism : inputRecords.deduceNumPartitions();
 HoodiePairData allKeysAndLocations =
-fetchRecordGlobalLocations(context, hoodieTable, 
config.getGlobalSimpleIndexParallelism(), latestBaseFiles);
+fetchRecordGlobalLocations(context, hoodieTable, fetchParallelism, 
latestBaseFiles);
 boolean mayContainDuplicateLookup = 
hoodieTable.getMetaClient().getTableType() == MERGE_ON_READ;
 boolean shouldUpdatePartitionPath = 
config.getGlobalSimpleIndexUpdatePartitionPath() && hoodieTable.isPartitioned();
 return tagGlobalLocationBackToRecords(inputRecords, allKeysAndLocations,
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java
index cca7a43d1f9..99ffc1b47e6 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/simple/HoodieSimpleIndex.java
@@ -107,16 +107,16 @@ public class HoodieSimpleIndex
   
.getString(HoodieIndexConfig.SIMPLE_INDEX_INPUT_STORAGE_LEVEL_VALUE));
 }
 
-int inputParallelism = inputRecords.getNumPartitions();
+int deduceNumParallelism = inputRecords.deduceNumPartitions();
 int configuredSimpleIndexParallelism = config.getSimpleIndexParallelism();
 // NOTE

(hudi) branch master updated: [MINOR][DNM] Test disabling new HFile reader (#11488)

2024-06-21 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1ce97bae116 [MINOR][DNM] Test disabling new HFile reader (#11488)
1ce97bae116 is described below

commit 1ce97bae11655c9a33f8665c3dd53116302686ee
Author: Y Ethan Guo 
AuthorDate: Fri Jun 21 18:44:15 2024 -0700

[MINOR][DNM] Test disabling new HFile reader (#11488)
---
 .../src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
index bb29e090ec3..a7e41098d66 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
@@ -31,7 +31,7 @@ import javax.annotation.concurrent.Immutable;
 public class HoodieReaderConfig extends HoodieConfig {
   public static final ConfigProperty USE_NATIVE_HFILE_READER = 
ConfigProperty
   .key("_hoodie.hfile.use.native.reader")
-  .defaultValue(true)
+  .defaultValue(false)
   .markAdvanced()
   .sinceVersion("1.0.0")
   .withDocumentation("When enabled, the native HFile reader is used to 
read HFiles.  This is an internal config.");



[jira] [Created] (HUDI-7915) Spark 4 support

2024-06-21 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7915:
---

 Summary: Spark 4 support
 Key: HUDI-7915
 URL: https://issues.apache.org/jira/browse/HUDI-7915
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi-rs) branch main updated: feat: use `object_store` for common storage APIs (#25)

2024-06-21 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hudi-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new d84854c  feat: use `object_store` for common storage APIs (#25)
d84854c is described below

commit d84854c2b3d58252aba9701f320432714cdc3b29
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Fri Jun 21 17:28:45 2024 -0500

feat: use `object_store` for common storage APIs (#25)
---
 Cargo.toml |   4 +-
 crates/core/Cargo.toml |   7 +
 .../core/fixtures/timeline/commits_stub/a.parquet  |   0
 .../fixtures/timeline/commits_stub/part1/b.parquet |   0
 .../timeline/commits_stub/part2/part22/c.parquet   |   0
 .../commits_stub/part3/part32/part33/d.parquet |   0
 crates/core/src/error.rs   |   9 -
 crates/core/src/file_group/mod.rs  |   3 +-
 crates/core/src/lib.rs |   2 +-
 .../src/{utils.rs => storage/file_metadata.rs} |  30 ++-
 crates/core/src/storage/mod.rs | 217 +
 crates/core/src/table/config.rs|   8 +-
 crates/core/src/table/fs_view.rs   | 136 +++--
 crates/core/src/table/mod.rs   |   2 +-
 14 files changed, 317 insertions(+), 101 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 29294d8..4a9f419 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -53,10 +53,11 @@ datafusion-sql = { version = "35" }
 datafusion-physical-expr = { version = "35" }
 
 # serde
-serde = { version = "1.0.194", features = ["derive"] }
+serde = { version = "1.0.203", features = ["derive"] }
 serde_json = "1"
 
 # "stdlib"
+anyhow = { version = "1.0.86" }
 bytes = { version = "1" }
 chrono = { version = "=0.4.34", default-features = false, features = ["clock"] 
}
 tracing = { version = "0.1", features = ["log"] }
@@ -67,6 +68,7 @@ uuid = { version = "1" }
 
 # runtime / async
 async-trait = { version = "0.1" }
+async-recursion = { version = "1.1.1" }
 futures = { version = "0.3" }
 tokio = { version = "1" }
 num_cpus = { version = "1" }
diff --git a/crates/core/Cargo.toml b/crates/core/Cargo.toml
index e2f9bf0..f97a526 100644
--- a/crates/core/Cargo.toml
+++ b/crates/core/Cargo.toml
@@ -36,6 +36,7 @@ arrow-ord = { workspace = true }
 arrow-row = { workspace = true }
 arrow-schema = { workspace = true, features = ["serde"] }
 arrow-select = { workspace = true }
+object_store = { workspace = true }
 parquet = { workspace = true, features = [
 "async",
 "object_store",
@@ -55,6 +56,7 @@ serde = { workspace = true, features = ["derive"] }
 serde_json = { workspace = true }
 
 # "stdlib"
+anyhow = { workspace = true }
 bytes = { workspace = true }
 chrono = { workspace = true, default-features = false, features = ["clock"] }
 hashbrown = "0.14.3"
@@ -63,6 +65,11 @@ thiserror = { workspace = true }
 uuid = { workspace = true, features = ["serde", "v4"] }
 url = { workspace = true }
 
+# runtime / async
+async-recursion = { workspace = true }
+async-trait = { workspace = true }
+tokio = { workspace = true }
+
 # test
 tempfile = "3.10.1"
 zip-extract = "0.1.3"
diff --git a/crates/core/fixtures/timeline/commits_stub/a.parquet 
b/crates/core/fixtures/timeline/commits_stub/a.parquet
new file mode 100644
index 000..e69de29
diff --git a/crates/core/fixtures/timeline/commits_stub/part1/b.parquet 
b/crates/core/fixtures/timeline/commits_stub/part1/b.parquet
new file mode 100644
index 000..e69de29
diff --git a/crates/core/fixtures/timeline/commits_stub/part2/part22/c.parquet 
b/crates/core/fixtures/timeline/commits_stub/part2/part22/c.parquet
new file mode 100644
index 000..e69de29
diff --git 
a/crates/core/fixtures/timeline/commits_stub/part3/part32/part33/d.parquet 
b/crates/core/fixtures/timeline/commits_stub/part3/part32/part33/d.parquet
new file mode 100644
index 000..e69de29
diff --git a/crates/core/src/error.rs b/crates/core/src/error.rs
index 4a2f0f2..e8f76c9 100644
--- a/crates/core/src/error.rs
+++ b/crates/core/src/error.rs
@@ -17,7 +17,6 @@
  * under the License.
  */
 
-use std::error::Error;
 use std::fmt::Debug;
 
 use thiserror::Error;
@@ -28,18 +27,10 @@ pub enum HudiFileGroupError {
 CommitTimeAlreadyExists(String, String),
 }
 
-#[derive(Debug, Error)]
-pub enum HudiFileSystemViewError {
-#[error("Error in loading partitions: {0}")]
-FailToLoadPartitions(Box),
-}
-
 #[derive(Debug, Error)]
 pub enum HudiCoreError {
 #[error("Failed to load file group")]
 FailToLoadFileGroup(#[from] HudiFileGroupError),
-#[error("Failed to build file system view")]
-FailToBuildFileSystemView(#[from] HudiFileSystemViewError),
 #[error("Failed to load table properties")]
 LoadTablePropertiesError,
 }
diff --git a/crates/core/src/file_group/mod.rs 
b/crates/core/src/file_gr

[jira] [Created] (HUDI-7914) Incorrect schema produced in DELETE_PARTITION replacecommit

2024-06-21 Thread Vitali Makarevich (Jira)
Vitali Makarevich created HUDI-7914:
---

 Summary: Incorrect schema produced in DELETE_PARTITION 
replacecommit
 Key: HUDI-7914
 URL: https://issues.apache.org/jira/browse/HUDI-7914
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Vitali Makarevich


in the current scenario delete_partitions produces {{replacecommit}} with 
internal fields - like {{{}_hoodie_file_name{}}}, while e.g. normal {{commit}} 
produces schema without such fields.
This leads to unexpected behavior when the {{replacecommit}} is the last on the 
commitline,
e.g. [#10258|https://github.com/apache/hudi/issues/10258]
[#10533|https://github.com/apache/hudi/issues/10533]
and e.g. metadata sync things, or any other potential write will take incorrect 
schema - and in the best case will fail because fields are duplicated, in the 
worst cases can lead to dataloss.
The problem introduced here [https://github.com/apache/hudi/pull/5610/files]
And for other operations like {{delete}} the same approach used as I use now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]

2024-06-21 Thread via GitHub


yihua commented on PR #11470:
URL: https://github.com/apache/hudi/pull/11470#issuecomment-2182111622

   > look like is a flaky test oom, other pr has the same question
   
   I also noticed that Github CI frequently fails due to OOM now.  I’m going to 
triage the offending commit on master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]

2024-06-21 Thread via GitHub


hudi-bot commented on PR #11423:
URL: https://github.com/apache/hudi/pull/11423#issuecomment-2182017381

   
   ## CI report:
   
   * 42afc3dd0a30cf1d33c325fbb74bb42759830ed9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24478)
 
   * e955e0fd0043352259ab8cf56d4984808c84c307 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24479)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7849] Reduce time spent on running testFiltersInFileFormat [hudi]

2024-06-21 Thread via GitHub


hudi-bot commented on PR #11423:
URL: https://github.com/apache/hudi/pull/11423#issuecomment-2182031473

   
   ## CI report:
   
   * e955e0fd0043352259ab8cf56d4984808c84c307 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24479)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7909) Add Comment to the FieldSchema returned by Aws Glue Client

2024-06-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7909:
-
Labels: pull-request-available  (was: )

> Add Comment to the FieldSchema returned by Aws Glue Client 
> ---
>
> Key: HUDI-7909
> URL: https://issues.apache.org/jira/browse/HUDI-7909
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Vamsi Karnika
>Priority: Major
>  Labels: pull-request-available
>
> The Implementation of getMetastoreFieldSchema by AwsGlueCatalogSyncClient 
> doesn't included comment as part of the FieldSchema. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7909] Add Comment to the FieldSchema returned by Aws Glue Client [hudi]

2024-06-21 Thread via GitHub


hudi-bot commented on PR #11474:
URL: https://github.com/apache/hudi/pull/11474#issuecomment-2180552149

   
   ## CI report:
   
   * 9445b0794a1dc9b1dc65731fcda7e0d5a3cb79a5 UNKNOWN
   * 3d2b6497db1e308bba89ab8cf7df86f5a7a4c694 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24467)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]

2024-06-21 Thread via GitHub


hudi-bot commented on PR #11470:
URL: https://github.com/apache/hudi/pull/11470#issuecomment-2180536892

   
   ## CI report:
   
   * c2903c00cf7f3e49f7e142f9a2c5a7fc047e406a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24465)
 
   * 3b5dfcba404bce991639873319c37812a445f136 UNKNOWN
   * b5cbb1352c92f46c7af76227afded3b887379892 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24472)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7909] Add Comment to the FieldSchema returned by Aws Glue Client [hudi]

2024-06-21 Thread via GitHub


hudi-bot commented on PR #11474:
URL: https://github.com/apache/hudi/pull/11474#issuecomment-2180537003

   
   ## CI report:
   
   * 9445b0794a1dc9b1dc65731fcda7e0d5a3cb79a5 UNKNOWN
   * 968ed8f3e5236b607e596b4630071ec1d9f7f91b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24470)
 
   * 3d2b6497db1e308bba89ab8cf7df86f5a7a4c694 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7906] improve the parallelism deduce in rdd write [hudi]

2024-06-21 Thread via GitHub


danny0405 commented on code in PR #11470:
URL: https://github.com/apache/hudi/pull/11470#discussion_r1645249120


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/data/HoodieJavaRDD.java:
##
@@ -120,6 +123,26 @@ public int getNumPartitions() {
 return rddData.getNumPartitions();
   }
 
+  @Override
+  public int deduceNumPartitions() {
+// for source rdd, the partitioner is None
+final Optional partitioner = rddData.partitioner();
+if (partitioner.isPresent()) {
+  int partPartitions = partitioner.get().numPartitions();
+  if (partPartitions > 0) {
+return partPartitions;
+  }
+}
+
+if (SQLConf.get().contains(SQLConf.SHUFFLE_PARTITIONS().key())) {
+  return SQLConf.get().defaultNumShufflePartitions();
+} else if (rddData.context().conf().contains("spark.default.parallelism")) 
{

Review Comment:
   The Java context may never has this config option right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20240618064120870] error [hudi]

2024-06-21 Thread via GitHub


danny0405 commented on issue #11466:
URL: https://github.com/apache/hudi/issues/11466#issuecomment-2177326048

   > Changed the IGNORE_KEY to true and it seems to be working but I dont see 
any data in the parquet files. They are all empty. Any idea how should I debug 
this further
   
   The error is catched up by the writer, and when an error occurs, it logs an 
error msg:
   
   ```java
   LOG.error("Error writing record " + record, t);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] add show create table command [hudi]

2024-06-21 Thread via GitHub


danny0405 commented on PR #11471:
URL: https://github.com/apache/hudi/pull/11471#issuecomment-2177301367

   @houyuting  Nice contributtion, there are some compile errors that need to 
be taken care of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20240618064120870] error [hudi]

2024-06-21 Thread via GitHub


ankit0811 commented on issue #11466:
URL: https://github.com/apache/hudi/issues/11466#issuecomment-2177216350

   Also get this exception in the same job
   
   ```
   22:51:20.093 [pool-286-thread-1] ERROR 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView - Got error 
running preferred function. Trying secondary
   org.apache.hudi.exception.HoodieRemoteException: Connect to 
10.3.175.136:45105 [/10.3.175.136] failed: Connection refused (Connection 
refused)
at 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getPendingCompactionOperations(RemoteHoodieTableFileSystemView.java:547)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:69)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getPendingCompactionOperations(PriorityBasedFileSystemView.java:257)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.table.action.clean.CleanPlanner.(CleanPlanner.java:98) 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:107)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:159)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:185)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.scheduleCleaning(HoodieFlinkCopyOnWriteTable.java:359)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:629)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:752)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:862)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:835)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.sink.CleanFunction.lambda$open$0(CleanFunction.java:71) 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: org.apache.hudi.org.apache.http.conn.HttpHostConnectException: 
Connect to 10.3.175.136:45105 [/10.3.175.136] failed: Connection refused 
(Connection refused)
at 
org.apache.hudi.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.apache.hudi.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
 
~[blob_p-49658f01166115a275763af3064121d9588b4f90-c0a8066fe9067407b6e6321b83226118:0.0.1-SNAPSHOT]
at 
org.

[jira] [Assigned] (HUDI-7630) Create a separate StorageUtils for hadoop-free util method

2024-06-21 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7630:
---

Assignee: (was: Vova Kolmakov)

> Create a separate StorageUtils for hadoop-free util method
> --
>
> Key: HUDI-7630
> URL: https://issues.apache.org/jira/browse/HUDI-7630
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1484920647



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-3809) Make full scan optional for metadata partitions other than FILES

2024-06-21 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-3809:
---

Assignee: (was: Vova Kolmakov)

> Make full scan optional for metadata partitions other than FILES
> 
>
> Key: HUDI-3809
> URL: https://issues.apache.org/jira/browse/HUDI-3809
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Sagar Sumit
>Priority: Critical
>
> Currently, just one config controls whether to do full scan or point lookup 
> while reading log records in metadata table and that config applies to ALL 
> metadata partitions. However, full scan is disabled for column_stats and 
> bloom_filters partitions: 
> HoodieBackedTableMetadata#isFullScanAllowedForPartition
>  
> We should make it configurable for other partitions too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)