Re: [PR] [HUDI-7198]Create nested node path if does not exist for zookeeper. [hudi]

2024-01-02 Thread via GitHub


rmahindra123 commented on PR #10438:
URL: https://github.com/apache/hudi/pull/10438#issuecomment-1874959764

   lgtm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


yihua commented on code in PR #10435:
URL: https://github.com/apache/hudi/pull/10435#discussion_r1440140227


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##
@@ -107,23 +107,19 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
 //   injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
 //   [[SQLConf]] is required by [[AvroSerializer]]
 injectSQLConf(df.queryExecution.toRdd.mapPartitions { rows =>
-  if (rows.isEmpty) {
-Iterator.empty

Review Comment:
   Does removal of this provide any benefit?



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -448,7 +450,9 @@ public Pair, JavaRDD> 
syncOnce() throws IOException
 }
   }
 
+  long startWrite = System.currentTimeMillis();

Review Comment:
   Similar here on using `HoodieTimer` and below.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -1016,7 +1016,7 @@ private List 
getInstantsToRollbackForLazyCleanPolicy(HoodieTableMetaClie
   @Deprecated
   public boolean rollback(final String commitInstantTime, 
Option pendingRollbackInfo, boolean skipLocking) 
throws HoodieRollbackException {
 final String rollbackInstantTime = pendingRollbackInfo.map(entry -> 
entry.getRollbackInstant().getTimestamp())
-.orElse(createNewInstantTime(!skipLocking));
+.orElseGet(() -> createNewInstantTime(!skipLocking));
 return rollback(commitInstantTime, pendingRollbackInfo, 
rollbackInstantTime, skipLocking);

Review Comment:
   Good catch!



##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -402,7 +402,9 @@ public Pair, JavaRDD> 
syncOnce() throws IOException
 .build();
 String instantTime = metaClient.createNewInstantTime();
 
+long startInput = System.currentTimeMillis();
 InputBatch inputBatch = readFromSource(instantTime, metaClient);
+LOG.error("Time to read from source : " + (System.currentTimeMillis() - 
startInput));

Review Comment:
   Use `HoodieTimer` to track execution time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (12c26345f7c -> 1b74fc18fee)

2024-01-02 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 12c26345f7c [HUDI-7261] TVF to query hudi table's filesystem state 
through spark-sql (#10414)
 add 1b74fc18fee [MINOR] Fix ArchivalUtils Logger named (#10436)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/client/utils/ArchivalUtils.java  | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)



Re: [PR] [MINOR] Fix ArchivalUtils Logger name [hudi]

2024-01-02 Thread via GitHub


bvaradar merged PR #10436:
URL: https://github.com/apache/hudi/pull/10436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-01-02 Thread via GitHub


KnightChess opened a new pull request, #10191:
URL: https://github.com/apache/hudi/pull/10191

   ### Change Logs
   
   spark support query filter use bucket field if a bucket table query with 
appropriate expression( = 、in、and、or)
   
   ### Impact
   
   impore table query performance when use spark
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6207] spark support bucket index query for table with bucket index [hudi]

2024-01-02 Thread via GitHub


KnightChess closed pull request #10191: [HUDI-6207] spark support bucket index 
query for table with bucket index
URL: https://github.com/apache/hudi/pull/10191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7265) Support schema evolution by Flink SQL using HoodieHiveCatalog

2024-01-02 Thread Jing Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhang updated HUDI-7265:
-
Component/s: flink-sql

> Support schema evolution by Flink SQL using HoodieHiveCatalog
> -
>
> Key: HUDI-7265
> URL: https://issues.apache.org/jira/browse/HUDI-7265
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink-sql
>Reporter: Jing Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Since Flink 1.17, Flink SQL support more advanced alter table syntax.
> {code:sql}
> -- add a new column 
> ALTER TABLE MyTable ADD category_id STRING COMMENT 'identifier of the 
> category';
> -- modify a column type, comment and position
> ALTER TABLE MyTable MODIFY measurement double COMMENT 'unit is bytes per 
> second' AFTER `id`;
> -- drop columns
> ALTER TABLE MyTable DROP (col1, col2, col3);
> -- rename column
> ALTER TABLE MyTable RENAME request_body TO payload;
> {code}
> Find more detail information in [Flink Alter Table SQL 
> |https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/alter/].
> We could support schema evolution by Flink SQL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7270) Support schema evolution by Flink SQL using HoodieCatalog

2024-01-02 Thread Jing Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhang updated HUDI-7270:
-
Component/s: flink-sql

> Support schema evolution by Flink SQL using HoodieCatalog
> -
>
> Key: HUDI-7270
> URL: https://issues.apache.org/jira/browse/HUDI-7270
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink-sql
>Reporter: Jing Zhang
>Priority: Major
>
> Since Flink 1.17, Flink SQL support more advanced alter table syntax.
> {code:sql}
> -- add a new column 
> ALTER TABLE MyTable ADD category_id STRING COMMENT 'identifier of the 
> category';
> -- modify a column type, comment and position
> ALTER TABLE MyTable MODIFY measurement double COMMENT 'unit is bytes per 
> second' AFTER `id`;
> -- drop columns
> ALTER TABLE MyTable DROP (col1, col2, col3);
> -- rename column
> ALTER TABLE MyTable RENAME request_body TO payload;
> {code}
> Find more detail information in [Flink Alter Table SQL 
> |https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/alter/].
> We could support schema evolution by Flink SQL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7270) Support schema evolution by Flink SQL using HoodieCatalog

2024-01-02 Thread Jing Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhang updated HUDI-7270:
-
Description: 
Since Flink 1.17, Flink SQL support more advanced alter table syntax.

{code:sql}
-- add a new column 
ALTER TABLE MyTable ADD category_id STRING COMMENT 'identifier of the category';
-- modify a column type, comment and position
ALTER TABLE MyTable MODIFY measurement double COMMENT 'unit is bytes per 
second' AFTER `id`;
-- drop columns
ALTER TABLE MyTable DROP (col1, col2, col3);
-- rename column
ALTER TABLE MyTable RENAME request_body TO payload;
{code}

Find more detail information in [Flink Alter Table SQL 
|https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/alter/].

We could support schema evolution by Flink SQL.


> Support schema evolution by Flink SQL using HoodieCatalog
> -
>
> Key: HUDI-7270
> URL: https://issues.apache.org/jira/browse/HUDI-7270
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Jing Zhang
>Priority: Major
>
> Since Flink 1.17, Flink SQL support more advanced alter table syntax.
> {code:sql}
> -- add a new column 
> ALTER TABLE MyTable ADD category_id STRING COMMENT 'identifier of the 
> category';
> -- modify a column type, comment and position
> ALTER TABLE MyTable MODIFY measurement double COMMENT 'unit is bytes per 
> second' AFTER `id`;
> -- drop columns
> ALTER TABLE MyTable DROP (col1, col2, col3);
> -- rename column
> ALTER TABLE MyTable RENAME request_body TO payload;
> {code}
> Find more detail information in [Flink Alter Table SQL 
> |https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/alter/].
> We could support schema evolution by Flink SQL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7270) Support schema evolution by Flink SQL using HoodieCatalog

2024-01-02 Thread Jing Zhang (Jira)
Jing Zhang created HUDI-7270:


 Summary: Support schema evolution by Flink SQL using HoodieCatalog
 Key: HUDI-7270
 URL: https://issues.apache.org/jira/browse/HUDI-7270
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Jing Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] Create nested node path if does not exist for zookeeper. [hudi]

2024-01-02 Thread via GitHub


harsh1231 opened a new pull request, #10438:
URL: https://github.com/apache/hudi/pull/10438

   Catch KeeperException if node already exist.
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7261) Add TVF to query hudi file system view through spark-sql

2024-01-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7261.
-
Resolution: Done

> Add TVF to query hudi file system view through spark-sql
> 
>
> Key: HUDI-7261
> URL: https://issues.apache.org/jira/browse/HUDI-7261
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Having a table valued function to query hudi table's file system view  
> through spark-sql will help in debugging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7261) Add TVF to query hudi file system view through spark-sql

2024-01-02 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7261:
--
Fix Version/s: 1.0.0

> Add TVF to query hudi file system view through spark-sql
> 
>
> Key: HUDI-7261
> URL: https://issues.apache.org/jira/browse/HUDI-7261
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Having a table valued function to query hudi table's file system view  
> through spark-sql will help in debugging



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


codope merged PR #10414:
URL: https://github.com/apache/hudi/pull/10414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql (#10414)

2024-01-02 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 12c26345f7c [HUDI-7261] TVF to query hudi table's filesystem state 
through spark-sql (#10414)
12c26345f7c is described below

commit 12c26345f7c92b27592fc2f52b433b8b0b37e9a7
Author: bhat-vinay <152183592+bhat-vi...@users.noreply.github.com>
AuthorDate: Wed Jan 3 12:34:47 2024 +0530

[HUDI-7261] TVF to query hudi table's filesystem state through spark-sql 
(#10414)

A new TVF, `hudi_filesystem_view(...)` is added to support querying 
timeline through spark-sql.
The information displayed is similar to the 'fsview' command of hudi-cli.

Co-authored-by: Sagar Sumit 
---
 .../scala/org/apache/hudi/DataSourceOptions.scala  |  15 +++
 .../main/scala/org/apache/hudi/DefaultSource.scala |   5 +
 .../scala/org/apache/hudi/FileSystemRelation.scala | 137 +
 .../sql/hudi/TestHoodieTableValuedFunction.scala   |  63 ++
 .../HoodieFileSystemViewTableValuedFunction.scala  |  64 ++
 .../hudi/analysis/HoodieSpark32PlusAnalysis.scala  |  17 ++-
 .../sql/hudi/analysis/TableValuedFunctions.scala   |   7 +-
 7 files changed, 306 insertions(+), 2 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
index b99db9725e2..828e83fd106 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
@@ -226,6 +226,21 @@ object DataSourceReadOptions {
   .withDocumentation("When this is set, the result set of the table valued 
function hudi_query_timeline(...)" +
 " will include archived timeline")
 
+  val CREATE_FILESYSTEM_RELATION: ConfigProperty[String] = ConfigProperty
+.key("hoodie.datasource.read.create.filesystem.relation")
+.defaultValue("false")
+.markAdvanced()
+.sinceVersion("1.0.0")
+.withDocumentation("When this is set, the relation created by 
DefaultSource is for a view representing" +
+  " the result set of the table valued function hudi_filesystem_view(...)")
+
+  val FILESYSTEM_RELATION_ARG_SUBPATH:  ConfigProperty[String] =
+
ConfigProperty.key("hoodie.datasource.read.table.valued.function.filesystem.relation.subpath")
+  .defaultValue("")
+  .markAdvanced()
+  .sinceVersion("1.0.0")
+  .withDocumentation("A regex under the table's base path to get file 
system view information")
+
   /** @deprecated Use {@link QUERY_TYPE} and its methods instead */
   @Deprecated
   val QUERY_TYPE_OPT_KEY = QUERY_TYPE.key()
diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
index 87d527fdfe2..601c5e6f526 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala
@@ -224,9 +224,14 @@ object DefaultSource {
   
parameters.get(INCREMENTAL_FORMAT.key).contains(INCREMENTAL_FORMAT_CDC_VAL)
 val isMultipleBaseFileFormatsEnabled = 
metaClient.getTableConfig.isMultipleBaseFileFormatsEnabled
 
+
 val createTimeLineRln = 
parameters.get(DataSourceReadOptions.CREATE_TIMELINE_RELATION.key())
+val createFSRln = 
parameters.get(DataSourceReadOptions.CREATE_FILESYSTEM_RELATION.key())
+
 if (createTimeLineRln.isDefined) {
   new TimelineRelation(sqlContext, parameters, metaClient)
+} else if (createFSRln.isDefined) {
+  new FileSystemRelation(sqlContext, parameters, metaClient)
 } else {
   log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: 
$tableType, queryType is: $queryType")
 
diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/FileSystemRelation.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/FileSystemRelation.scala
new file mode 100644
index 000..90e6919e4fa
--- /dev/null
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/FileSystemRelation.scala
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1874929248

   
   ## CI report:
   
   * 904c660c2821c6cf77dcb1e5a308391b14eecb53 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (6a042255555 -> d15993b36be)

2024-01-02 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 6a04225 [HUDI-7198] Create nested node path if does not exist for 
zookeeper. (#10281)
 add d15993b36be Revert "[HUDI-7198] Create nested node path if does not 
exist for zookeeper. (#10281)" (#10437)

No new revisions were added by this update.

Summary of changes:
 .../lock/ZookeeperBasedLockProvider.java   | 34 ++
 1 file changed, 2 insertions(+), 32 deletions(-)



Re: [PR] Revert "[HUDI-7198]Create nested node path if does not exist for zookeeper." [hudi]

2024-01-02 Thread via GitHub


codope merged PR #10437:
URL: https://github.com/apache/hudi/pull/10437


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7198] Create nested node path if does not exist for zookeeper. (#10281)

2024-01-02 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6a04225 [HUDI-7198] Create nested node path if does not exist for 
zookeeper. (#10281)
6a04225 is described below

commit 6a04225d2101812618d216f581b69a5b8b69
Author: harshal 
AuthorDate: Wed Jan 3 12:20:30 2024 +0530

[HUDI-7198] Create nested node path if does not exist for zookeeper. 
(#10281)
---
 .../lock/ZookeeperBasedLockProvider.java   | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/ZookeeperBasedLockProvider.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/ZookeeperBasedLockProvider.java
index 31b92dcf914..0f31b6389cc 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/ZookeeperBasedLockProvider.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/ZookeeperBasedLockProvider.java
@@ -74,8 +74,39 @@ public class ZookeeperBasedLockProvider implements 
LockProvider

Re: [PR] [HUDI-7198]Create nested node path if does not exist for zookeeper. [hudi]

2024-01-02 Thread via GitHub


codope merged PR #10281:
URL: https://github.com/apache/hudi/pull/10281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Failed to create marker file Exception when trying to write data on Hudi [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10432:
URL: https://github.com/apache/hudi/issues/10432#issuecomment-1874904475

   @gsudhanshu After setting these, it should not use timeline server. Do you 
still see references of TimelineServerBasedWriteMarkers in the stack trace?
   
   can you paste the new stack trace please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874882886

   
   ## CI report:
   
   * c64e1e3a9816b278606ee32aede728ffb928708c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Clean action failure triggers an exception while trying to check whether metadata is a table [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10127:
URL: https://github.com/apache/hudi/issues/10127#issuecomment-1874852826

   @shubhamn21 Thanks for the update. Yes, having multiple writer without lock 
provider can cause inconsistent behaviour and create this kind of issues. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Getting error while connecting to Hudi(CLI) 0.14.0 tables. [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10249:
URL: https://github.com/apache/hudi/issues/10249#issuecomment-1874851371

   @jjjigar Sorry for the delay here. I will work on this in this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Getting error while connecting to Hudi(CLI) 0.14.0 tables. [hudi]

2024-01-02 Thread via GitHub


jjjigar commented on issue #10249:
URL: https://github.com/apache/hudi/issues/10249#issuecomment-1874828878

   Any update please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Avoid resource leaks [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10345:
URL: https://github.com/apache/hudi/pull/10345#issuecomment-1874828489

   
   ## CI report:
   
   * c637ed283ad26d4b97d46c7ddedb3858a5744831 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21733)
 
   * 76fac0f35c10d1b563229e8807491445f58fa675 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21802)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874828373

   
   ## CI report:
   
   * 6908f7cfde32ce14fbc3b73dee9ceace749a8abe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21797)
 
   * 536833e03706c665b00e88986596bf9f44aa2c47 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21801)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Avoid resource leaks [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10345:
URL: https://github.com/apache/hudi/pull/10345#issuecomment-1874825437

   
   ## CI report:
   
   * c637ed283ad26d4b97d46c7ddedb3858a5744831 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21733)
 
   * 76fac0f35c10d1b563229e8807491445f58fa675 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874825301

   
   ## CI report:
   
   * 6908f7cfde32ce14fbc3b73dee9ceace749a8abe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21797)
 
   * 536833e03706c665b00e88986596bf9f44aa2c47 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874800023

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   * c64e1e3a9816b278606ee32aede728ffb928708c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1874799885

   
   ## CI report:
   
   * cd562d6b1f2ded014670a9a765248013f21d49c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21789)
 
   * 904c660c2821c6cf77dcb1e5a308391b14eecb53 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874796616

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   * c64e1e3a9816b278606ee32aede728ffb928708c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


bhat-vinay commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1874797572

   Thanks for the review @bvaradar. @codope pointed that the failing tests 
could be fixed by https://github.com/apache/hudi/pull/10381. Rebased past it to 
see if I can get a clean run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1874796476

   
   ## CI report:
   
   * cd562d6b1f2ded014670a9a765248013f21d49c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21789)
 
   * 904c660c2821c6cf77dcb1e5a308391b14eecb53 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874792952

   
   ## CI report:
   
   * 6908f7cfde32ce14fbc3b73dee9ceace749a8abe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21797)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark (#10381)

2024-01-02 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 98616b1196b [HUDI-7244] Ensure HoodieFileGroupReader.close() is called 
in spark (#10381)
98616b1196b is described below

commit 98616b1196bdcdd8567b4b13dc38d0e305340aba
Author: Jon Vexler 
AuthorDate: Tue Jan 2 21:30:25 2024 -0500

[HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark (#10381)


-

Co-authored-by: Jonathan Vexler <=>
---
 .../hudi/util/CloseableInternalRowIterator.scala   |  7 +-
 .../common/table/read/HoodieFileGroupReader.java   |  4 +++-
 ...odieFileGroupReaderBasedParquetFileFormat.scala | 26 --
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/CloseableInternalRowIterator.scala
 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/CloseableInternalRowIterator.scala
index 30a5e93fb63..bf71a9c6a41 100644
--- 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/CloseableInternalRowIterator.scala
+++ 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/CloseableInternalRowIterator.scala
@@ -23,6 +23,8 @@ import org.apache.hudi.common.util.collection.ClosableIterator
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.vectorized.ColumnarBatch
 
+import java.io.Closeable
+
 /**
  * A [[ClosableIterator]] returning [[InternalRow]] by iterating through the 
entries returned
  * by a Spark reader.
@@ -37,7 +39,10 @@ class CloseableInternalRowIterator(iterator: Iterator[_]) 
extends ClosableIterat
   private var seqInBatch: Int = -1
 
   override def close(): Unit = {
-// No op
+iterator match {
+  case iterator: Iterator[_] with Closeable => iterator.close()
+  case _ =>
+}
   }
 
   override def hasNext: Boolean = {
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
index 52ee14d969e..8edf5d7130e 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java
@@ -306,7 +306,7 @@ public final class HoodieFileGroupReader implements 
Closeable {
   }
 
   public static class HoodieFileGroupReaderIterator implements 
ClosableIterator {
-private final HoodieFileGroupReader reader;
+private HoodieFileGroupReader reader;
 
 public HoodieFileGroupReaderIterator(HoodieFileGroupReader reader) {
   this.reader = reader;
@@ -332,6 +332,8 @@ public final class HoodieFileGroupReader implements 
Closeable {
 reader.close();
   } catch (IOException e) {
 throw new HoodieIOException("Failed to close the reader", e);
+  } finally {
+this.reader = null;
   }
 }
   }
diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala
index 82a38a58841..f2b66e25603 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala
@@ -28,18 +28,19 @@ import org.apache.hudi.common.fs.FSUtils
 import org.apache.hudi.common.model.{FileSlice, HoodieLogFile, HoodieRecord}
 import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient}
 import org.apache.hudi.common.table.read.HoodieFileGroupReader
-import org.apache.hudi.{AvroConversionUtils, HoodieFileIndex, 
HoodiePartitionCDCFileGroupMapping, HoodiePartitionFileSliceMapping, 
HoodieSparkUtils, HoodieTableSchema, HoodieTableState, 
MergeOnReadSnapshotRelation, SparkAdapterSupport, 
SparkFileFormatInternalRowReaderContext}
+import org.apache.hudi.{AvroConversionUtils, HoodieFileIndex, 
HoodiePartitionCDCFileGroupMapping, HoodiePartitionFileSliceMapping, 
HoodieSparkUtils, HoodieTableSchema, HoodieTableState, SparkAdapterSupport, 
SparkFileFormatInternalRowReaderContext}
 import 
org.apache.spark.sql.HoodieCatalystExpressionUtils.generateUnsafeProjection
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.JoinedRow
 import org.apache.spark.sql.execution.datasources.PartitionedFile
-import 

Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


xushiyan merged PR #10381:
URL: https://github.com/apache/hudi/pull/10381


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1874763454

   
   ## CI report:
   
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

2024-01-02 Thread via GitHub


zhangjw123321 commented on issue #10418:
URL: https://github.com/apache/hudi/issues/10418#issuecomment-1874749412

   
   
通过这个链接下载https://dlcdn.apache.org/hudi/0.14.0/hudi-0.14.0.src.tgz,maven编辑的hudi-spark3.2-bundle_2.12-0.14.0.jar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

2024-01-02 Thread via GitHub


zhangjw123321 commented on issue #10418:
URL: https://github.com/apache/hudi/issues/10418#issuecomment-1874749317

   Which stage is deleting duplicate records,Other than the above 
configuration, no other configuration is manually set。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874710579

   
   ## CI report:
   
   * 632755327f46883194b8da0f42c9b06d88a9cce4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21795)
 
   * 6908f7cfde32ce14fbc3b73dee9ceace749a8abe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21797)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874705088

   
   ## CI report:
   
   * 632755327f46883194b8da0f42c9b06d88a9cce4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21795)
 
   * 6908f7cfde32ce14fbc3b73dee9ceace749a8abe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1874674372

   
   ## CI report:
   
   * 072ac266eb2cf4b81d62cafda0c2579b88b43d5b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21778)
 
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21796)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix usages of orElse [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10435:
URL: https://github.com/apache/hudi/pull/10435#issuecomment-1874669320

   
   ## CI report:
   
   * 072ac266eb2cf4b81d62cafda0c2579b88b43d5b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21778)
 
   * 402d1eb5d0ea586d3a4afbf736dbb843809f7bb4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


jonvex commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874662101

   Azure CI is passsing: 
https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Clean action failure triggers an exception while trying to check whether metadata is a table [hudi]

2024-01-02 Thread via GitHub


shubhamn21 commented on issue #10127:
URL: https://github.com/apache/hudi/issues/10127#issuecomment-1874657544

   Hi @ad1happy2go,
   Thanks for responding. Yes, I did have multiple executors at one point 
writing to the table (kafka streaming job). 
   But I recently limited my deployments to 1 executor since I realized I need 
to set up a meta-storage (hive) as lock providers.
   
   Can multiple executors cause clean-action failures too if we do not have 
hive/dynamo-db set up?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874624757

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ffcf47d27b84e8301b7a9b986d8df69257e220a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874572982

   
   ## CI report:
   
   * e24ea448ae3743cc48798b4640a93a30a2e6270e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21273)
 
   * 632755327f46883194b8da0f42c9b06d88a9cce4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874561587

   
   ## CI report:
   
   * e24ea448ae3743cc48798b4640a93a30a2e6270e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21273)
 
   * 632755327f46883194b8da0f42c9b06d88a9cce4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874542310

   
   ## CI report:
   
   * e24ea448ae3743cc48798b4640a93a30a2e6270e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21273)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-02 Thread via GitHub


CTTY commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1874534132

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874480730

   
   ## CI report:
   
   * 33a87e77b985a8fd3fe0a6a997059ee20fbedb8b UNKNOWN
   * 9819ca4db7b4ab9f2476aecc753e3fcc09c7cb7a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874430037

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * de4e4ccc30e75153d16bd322447311f3233d3579 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21793)
 
   * ffcf47d27b84e8301b7a9b986d8df69257e220a3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874421122

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * c96dd5a7fe1951aabe9df3ca283e7858153b21c2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21792)
 
   * de4e4ccc30e75153d16bd322447311f3233d3579 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21793)
 
   * ffcf47d27b84e8301b7a9b986d8df69257e220a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874369688

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * c96dd5a7fe1951aabe9df3ca283e7858153b21c2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21792)
 
   * de4e4ccc30e75153d16bd322447311f3233d3579 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21793)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874359251

   
   ## CI report:
   
   * 33a87e77b985a8fd3fe0a6a997059ee20fbedb8b UNKNOWN
   * 8fd105afa86dc4d815dd94d7a55bca5bb85031d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21788)
 
   * 9819ca4db7b4ab9f2476aecc753e3fcc09c7cb7a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874359461

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ebdeeb3f45cad66a7eaa120e5f6ecc5cc6e3ddd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21790)
 
   * c96dd5a7fe1951aabe9df3ca283e7858153b21c2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21792)
 
   * de4e4ccc30e75153d16bd322447311f3233d3579 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1874359011

   
   ## CI report:
   
   * cd562d6b1f2ded014670a9a765248013f21d49c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874311524

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ebdeeb3f45cad66a7eaa120e5f6ecc5cc6e3ddd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21790)
 
   * c96dd5a7fe1951aabe9df3ca283e7858153b21c2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874302126

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ebdeeb3f45cad66a7eaa120e5f6ecc5cc6e3ddd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21790)
 
   * c96dd5a7fe1951aabe9df3ca283e7858153b21c2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4552][RFC-58] Integrate column stats index with all query engines [hudi]

2024-01-02 Thread via GitHub


pratyakshsharma commented on code in PR #6345:
URL: https://github.com/apache/hudi/pull/6345#discussion_r1439625464


##
rfc/rfc-58/rfc-58.md:
##
@@ -0,0 +1,69 @@
+
+# RFC-58: Integrate column stats index with all query engines
+
+
+
+## Proposers
+
+- @pratyakshsharma
+
+## Approvers

Review Comment:
   @prasannarajaperumal I have elaborated the ColumnHandle/ColumnDomain 
approach in a bit more detail. Sorry for the long hold on this one. Please have 
a look.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4552][RFC-58] Integrate column stats index with all query engines [hudi]

2024-01-02 Thread via GitHub


pratyakshsharma commented on code in PR #6345:
URL: https://github.com/apache/hudi/pull/6345#discussion_r1439623738


##
rfc/rfc-58/rfc-58.md:
##
@@ -0,0 +1,69 @@
+
+# RFC-58: Integrate column stats index with all query engines
+
+
+
+## Proposers
+
+- @pratyakshsharma
+
+## Approvers
+- @bhavanisudha
+- @danny0405
+- @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-4552
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Query engines like hive or presto typically scan a large amount of data for 
query planning and execution. Proper indexing can help reduce this scan to a 
great extent. Parquet files are the most commonly used file format for storing 
columnar data with various lakehouse techniques mainly because of their strong 
support with spark and
+the kind of indexing that they employ at different levels. Parquet files 
maintain indexes at file level, row group level and page level. Till some time 
back, Hudi used to make use of these indexes for fast querying via the parquet 
reader libraries. The problem with this approach was every file object had to 
be opened once to read the index stored in parquet footer to be able to do file 
pruning. This could potentially become a bottleneck in case of a large number of
+files. With the introduction of [multi-modal 
index](https://www.onehouse.ai/blog/introducing-multi-modal-index-for-the-lakehouse-in-apache-hudi)
 in Hudi, this problem has been solved to a great extent. Currently the data 
skipping support using this multi-modal index is available for spark and 
[flink](https://issues.apache.org/jira/browse/HUDI-4353) engines. We intend to 
extend this support for other query engines like presto, trino and hive in this 
RFC. 
+
+## Background
+[RFC-27](https://github.com/apache/hudi/blob/master/rfc/rfc-27/rfc-27.md) 
added a new partition corresponding to column_stats index in metadata table of 
Hudi. We plan to use the information stored in this partition for pruning the 
files. 
+
+## Implementation
+Describe the new thing you want to do in appropriate detail, how it fits into 
the project architecture.
+Provide a detailed description of how you intend to implement this 
feature.This may be fairly extensive and have large subsections of its own.
+Or it may be a few sentences. Use judgement based on the scope of the change.
+
+We propose two different approaches for integrating column stats index with 
different query engines and discuss the pros and cons for the same below.
+1. **Using domains** - Presto and Trino have the concept of column domains. 
Domain is actually the set of possible values that need to be returned for a 
particular column. Domains get created at the time of creating splits for 
processing. Domains basically contain a map of column to possible values where 
the possible values are populated after doing the necessary pre work of 
combining all the different filter predicates supplied as part of the query. 
[This draft PR](https://github.com/apache/hudi/pull/6087) shows the use of 
these domains for integrating data skipping index with presto engine. 
+This basically involves exposing a new api in HoodieTableMetadata.java as 
below - 
+
+```java
+FileStatus[] getFilesToQueryUsingCSI(List columns, 
ColumnDomain columnDomain) throws IOException;

Review Comment:
   @alexeykudinkin I have added the details.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874240503

   
   ## CI report:
   
   * 33a87e77b985a8fd3fe0a6a997059ee20fbedb8b UNKNOWN
   * fa4b20f1f5cabcd03bc488badaa4e97d26da49c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21656)
 
   * 8fd105afa86dc4d815dd94d7a55bca5bb85031d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21788)
 
   * 9819ca4db7b4ab9f2476aecc753e3fcc09c7cb7a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874228173

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ebdeeb3f45cad66a7eaa120e5f6ecc5cc6e3ddd7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874227930

   
   ## CI report:
   
   * 33a87e77b985a8fd3fe0a6a997059ee20fbedb8b UNKNOWN
   * fa4b20f1f5cabcd03bc488badaa4e97d26da49c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21656)
 
   * 8fd105afa86dc4d815dd94d7a55bca5bb85031d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21788)
 
   * 9819ca4db7b4ab9f2476aecc753e3fcc09c7cb7a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874206870

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * 987684e6b4e397d92c14d523d933f735a8a75984 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21787)
 
   * ebdeeb3f45cad66a7eaa120e5f6ecc5cc6e3ddd7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-02 Thread via GitHub


ergophobiac commented on issue #10415:
URL: https://github.com/apache/hudi/issues/10415#issuecomment-1874167209

   Hey @ad1happy2go, we have a test case running, we'll observe till we're sure 
it's stable and let you know how it turns out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874152589

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ea196eb78876cb4761ccf181131a179ed1c25fa5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21773)
 
   * 987684e6b4e397d92c14d523d933f735a8a75984 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21787)
 
   * ebdeeb3f45cad66a7eaa120e5f6ecc5cc6e3ddd7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874152332

   
   ## CI report:
   
   * 33a87e77b985a8fd3fe0a6a997059ee20fbedb8b UNKNOWN
   * fa4b20f1f5cabcd03bc488badaa4e97d26da49c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21656)
 
   * 8fd105afa86dc4d815dd94d7a55bca5bb85031d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21788)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1874152069

   
   ## CI report:
   
   * f40f52205a2da4f28b1c3c7300f2551a6699657d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21785)
 
   * cd562d6b1f2ded014670a9a765248013f21d49c1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7244] Ensure HoodieFileGroupReader.close() is called in spark [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10381:
URL: https://github.com/apache/hudi/pull/10381#issuecomment-1874140814

   
   ## CI report:
   
   * 33a87e77b985a8fd3fe0a6a997059ee20fbedb8b UNKNOWN
   * fa4b20f1f5cabcd03bc488badaa4e97d26da49c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21656)
 
   * 8fd105afa86dc4d815dd94d7a55bca5bb85031d2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1874140534

   
   ## CI report:
   
   * f40f52205a2da4f28b1c3c7300f2551a6699657d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21785)
 
   * cd562d6b1f2ded014670a9a765248013f21d49c1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1874129904

   
   ## CI report:
   
   * b4a68ad41cfe6d582dea52aea53d9f4b96341f26 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21786)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] EMR on EKS version 6.15.0, Spark 3.4.1 and Hudi 0.14.0 getting java.io.IOException: Failed to delete: /usr/lib/hudi/. [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10376:
URL: https://github.com/apache/hudi/issues/10376#issuecomment-1874094831

   @Lakshmi-Holla12 Were you able to resolve this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi 0.13.1 on EMR, MOR table writer hangs intermittently with S3 read timeout error for column stats index [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10415:
URL: https://github.com/apache/hudi/issues/10415#issuecomment-1874092759

   @ergophobiac Did you got a chance to try this out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10418:
URL: https://github.com/apache/hudi/issues/10418#issuecomment-1874091034

   @zhangjw123321 Its going in deduping records. For bulk insert it doesn't 
dedup with the default configs. Are you setting any other configs? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Kafka connect sink to S3 authentification parameters [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10428:
URL: https://github.com/apache/hudi/issues/10428#issuecomment-1874091735

   @akolyaga Did this suggestion worked? Do let us know. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874077677

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ea196eb78876cb4761ccf181131a179ed1c25fa5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21773)
 
   * 987684e6b4e397d92c14d523d933f735a8a75984 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21787)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10422:
URL: https://github.com/apache/hudi/pull/10422#issuecomment-1874068336

   
   ## CI report:
   
   * 99517e23baa60a6a0602e9daf7f522f3c1dcfa1e UNKNOWN
   * ea196eb78876cb4761ccf181131a179ed1c25fa5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21773)
 
   * 987684e6b4e397d92c14d523d933f735a8a75984 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix ArchivalUtils Logger name [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10436:
URL: https://github.com/apache/hudi/pull/10436#issuecomment-1873996433

   
   ## CI report:
   
   * 456b3982f0fd8622cf3099f4b982a91f9b739978 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21784)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1873996038

   
   ## CI report:
   
   * f40f52205a2da4f28b1c3c7300f2551a6699657d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21785)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Failed to create marker file Exception when trying to write data on Hudi [hudi]

2024-01-02 Thread via GitHub


gsudhanshu commented on issue #10432:
URL: https://github.com/apache/hudi/issues/10432#issuecomment-1873985101

   @ad1happy2go thanks for your reply.
   
   I have added as following:
   
![image](https://github.com/apache/hudi/assets/45429552/a3516567-ba3c-47ec-aaff-56489d025cf8)
   but still getting same error.
   
   should I add these in hudi_options?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Failed to create marker file Exception when trying to write data on Hudi [hudi]

2024-01-02 Thread via GitHub


ad1happy2go commented on issue #10432:
URL: https://github.com/apache/hudi/issues/10432#issuecomment-1873953371

   @gsudhanshu Can you try disabling the timeline server? 
   
   hoodie.write.markers.type= 'direct',
   hoodie.embed.timeline.server= 'false'
   
   We had a silmilar issue (https://github.com/apache/hudi/issues/4230) before 
which we had fixed. I see you using 0.14.0, So adding @yihua in case he have 
more insights.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873940685

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7261] TVF to query hudi table's filesystem state through spark-sql [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10414:
URL: https://github.com/apache/hudi/pull/10414#issuecomment-1873933395

   
   ## CI report:
   
   * 502d354dd4ddb15b8fe6e9c9a42973d8299fdb6d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1873894740

   
   ## CI report:
   
   * 037c96b19f25dd81a2c924e7770a535a6a1843c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21783)
 
   * b4a68ad41cfe6d582dea52aea53d9f4b96341f26 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21786)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-187388

   
   ## CI report:
   
   * 037c96b19f25dd81a2c924e7770a535a6a1843c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21783)
 
   * b4a68ad41cfe6d582dea52aea53d9f4b96341f26 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1873886233

   
   ## CI report:
   
   * a251c7e686efd96ac6d2a07f0b95e6383add2ad9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21590)
 
   * f40f52205a2da4f28b1c3c7300f2551a6699657d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21785)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1873877758

   
   ## CI report:
   
   * 037c96b19f25dd81a2c924e7770a535a6a1843c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21783)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1873877321

   
   ## CI report:
   
   * a251c7e686efd96ac6d2a07f0b95e6383add2ad9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21590)
 
   * f40f52205a2da4f28b1c3c7300f2551a6699657d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10297:
URL: https://github.com/apache/hudi/pull/10297#issuecomment-1873877127

   
   ## CI report:
   
   * 05bed31829f2362de479344215d29ccca99bd449 UNKNOWN
   * 855ede2626b0c95d0649a939aad51de277562fa5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21782)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

2024-01-02 Thread via GitHub


zhangjw123321 commented on issue #10418:
URL: https://github.com/apache/hudi/issues/10418#issuecomment-1873836778

   
![image](https://github.com/apache/hudi/assets/154970920/084e9134-9356-4c15-b489-4a420cfcbec2)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix ArchivalUtils Logger name [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10436:
URL: https://github.com/apache/hudi/pull/10436#issuecomment-1873829816

   
   ## CI report:
   
   * 456b3982f0fd8622cf3099f4b982a91f9b739978 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21784)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix ArchivalUtils Logger name [hudi]

2024-01-02 Thread via GitHub


hudi-bot commented on PR #10436:
URL: https://github.com/apache/hudi/pull/10436#issuecomment-1873821139

   
   ## CI report:
   
   * 456b3982f0fd8622cf3099f4b982a91f9b739978 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

2024-01-02 Thread via GitHub


zhangjw123321 commented on issue #10418:
URL: https://github.com/apache/hudi/issues/10418#issuecomment-1873811228

   
![image](https://github.com/apache/hudi/assets/154970920/fa708f0b-f7ad-45a7-8cb3-ddf538504668)
   
![image](https://github.com/apache/hudi/assets/154970920/6c9264cb-4821-4af5-b95b-619ce07c826c)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Fix ArchivalUtils Logger name [hudi]

2024-01-02 Thread via GitHub


eric9204 opened a new pull request, #10436:
URL: https://github.com/apache/hudi/pull/10436

   ### Change Logs
   
   None
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7266] add clustering metric for flink [hudi]

2024-01-02 Thread via GitHub


stream2000 commented on code in PR #10420:
URL: https://github.com/apache/hudi/pull/10420#discussion_r1439252905


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringPlanOperator.java:
##
@@ -88,10 +96,20 @@ public void notifyCheckpointComplete(long checkpointId) {
   }
 
   private void scheduleClustering(HoodieFlinkTable table, long 
checkpointId) {
+List pendingClusteringInstantTimes =
+
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient());
 // the first instant takes the highest priority.
 Option firstRequested = Option.fromJavaOptional(
-
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient()).stream()
+pendingClusteringInstantTimes.stream()
 .filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED).findFirst());
+
+long pendingClusteringCount = pendingClusteringInstantTimes.stream()

Review Comment:
   Why do we only include requested instants? Will inflight and requested 
instants be double-counted? If that's the case, we should also correct the 
method 
`org.apache.hudi.metrics.FlinkCompactionMetrics#setPendingCompactionCount`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7266] add clustering metric for flink [hudi]

2024-01-02 Thread via GitHub


stream2000 commented on code in PR #10420:
URL: https://github.com/apache/hudi/pull/10420#discussion_r1439252905


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringPlanOperator.java:
##
@@ -88,10 +96,20 @@ public void notifyCheckpointComplete(long checkpointId) {
   }
 
   private void scheduleClustering(HoodieFlinkTable table, long 
checkpointId) {
+List pendingClusteringInstantTimes =
+
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient());
 // the first instant takes the highest priority.
 Option firstRequested = Option.fromJavaOptional(
-
ClusteringUtils.getPendingClusteringInstantTimes(table.getMetaClient()).stream()
+pendingClusteringInstantTimes.stream()
 .filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED).findFirst());
+
+long pendingClusteringCount = pendingClusteringInstantTimes.stream()

Review Comment:
   Why do we only include inflight instants? Will inflight and requested 
instants be double-counted? If that's the case, we should also correct the 
method 
`org.apache.hudi.metrics.FlinkCompactionMetrics#setPendingCompactionCount`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org