Re: [I] [SUPPORT] Dataloss in FlinkCDC into Hudi without any exception or other infomation [hudi]

2024-03-03 Thread via GitHub


xuzifu666 commented on issue #10542:
URL: https://github.com/apache/hudi/issues/10542#issuecomment-1975886788

   Had been resolved in Hudi 1.0 beta,so close the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975878920

   
   ## CI report:
   
   * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN
   * fd3334591f8af75ee9d1f383722f65079114778e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22766)
 
   * 18efa5f4d90f23a57cee4ef1631b5e502d3bb9b3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22768)
 
   * af326abd821970c9eca4067842cce885a67a8684 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (a987574c061 -> 8e63349239d)

2024-03-03 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from a987574c061 [HUDI-7472] prevent MDT partitions from getting dropped 
(#10804)
 add 8e63349239d [MINOR] Add PR description validation on documentation 
updates (#10799)

No new revisions were added by this update.

Summary of changes:
 .github/PULL_REQUEST_TEMPLATE.md |  2 +-
 scripts/pr_compliance.py | 38 +-
 2 files changed, 30 insertions(+), 10 deletions(-)



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


yihua merged PR #10799:
URL: https://github.com/apache/hudi/pull/10799


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975875849

   "Java CI / validate-source", "Update Pr Compliance / run-tests", and 
"validate pr / validate-pr" pass which are suffucient for the tooling changes.  
There is no change to the production code.  Merging this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975874114

   > +1, but I still think it is too strict for force update for the doc part.
   
   We can experiment with this.  This will be a good reminder for people to 
think about docs.  We still have features of which the docs is missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975869514

   
   ## CI report:
   
   * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN
   * fd3334591f8af75ee9d1f383722f65079114778e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22766)
 
   * 18efa5f4d90f23a57cee4ef1631b5e502d3bb9b3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975869423

   
   ## CI report:
   
   * ca10086db11aafc594f9b05cf07b5e25a43da8bd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22754)
 
   * fb664ad5c922cbd6703706e53c607f42befda76b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7472) Creating a functional index implicitly drops metadata RLI partition

2024-03-03 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7472:
--
Fix Version/s: 1.0.0

> Creating a functional index implicitly drops metadata RLI partition
> ---
>
> Key: HUDI-7472
> URL: https://issues.apache.org/jira/browse/HUDI-7472
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> This is because of a bug in generating write-config for the index creation 
> which does not set the relevent fields for enabling RLI. The metadata writer 
> creating code path in `HudiTable` ends up dropping the metadata partitions 
> for RLI, bloom and col-stats because it assumes the current 'write-config' 
> has disabled it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7472) Creating a functional index implicitly drops metadata RLI partition

2024-03-03 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7472.
-
Resolution: Fixed

> Creating a functional index implicitly drops metadata RLI partition
> ---
>
> Key: HUDI-7472
> URL: https://issues.apache.org/jira/browse/HUDI-7472
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> This is because of a bug in generating write-config for the index creation 
> which does not set the relevent fields for enabling RLI. The metadata writer 
> creating code path in `HudiTable` ends up dropping the metadata partitions 
> for RLI, bloom and col-stats because it assumes the current 'write-config' 
> has disabled it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7472] prevent MDT partitions from getting dropped (#10804)

2024-03-03 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a987574c061 [HUDI-7472] prevent MDT partitions from getting dropped 
(#10804)
a987574c061 is described below

commit a987574c0613da9010e6a67d058c1debcefe03c4
Author: bhat-vinay <152183592+bhat-vi...@users.noreply.github.com>
AuthorDate: Mon Mar 4 12:36:01 2024 +0530

[HUDI-7472] prevent MDT partitions from getting dropped (#10804)

The functional index creation code-path creates a HudiTable object and gets 
a metadata writer.
But, this code path (of creating metadata writer) also deletes the existing 
MDT partitions
iff the write-config does not contain the relevant MDT/index configs. This 
logic is contained
within HoodieTable::deleteMetadataIndexIfNecessary.

The existing code in HoodieSparkFunctionalIndexClient::create is the entry 
point for
functional index creation. This creates a custom write-config in
HoodieSparkFunctionalIndexClient::buildWriteConfig which is then used to 
create a client for
the base table (on which the functional index needs to be created). This PR 
fixes the
issue noted earlier by adding the relevant MDT partitions config in
HoodieSparkFunctionalIndexClient::buildWriteConfig. A test is also added to 
ensure that creating
of functional index does not drop the existing MDT partitions.

Co-authored-by: Vinaykumar Bhat 
---
 .../hudi/HoodieSparkFunctionalIndexClient.java | 26 +-
 .../hudi/command/index/TestFunctionalIndex.scala   | 17 --
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkFunctionalIndexClient.java
 
b/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkFunctionalIndexClient.java
index 541a0d272a4..e66ad5ac417 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkFunctionalIndexClient.java
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkFunctionalIndexClient.java
@@ -27,7 +27,6 @@ import org.apache.hudi.common.model.WriteConcurrencyMode;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ValidationUtils;
-import org.apache.hudi.config.HoodieLockConfig;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.exception.HoodieFunctionalIndexException;
@@ -49,6 +48,9 @@ import scala.collection.JavaConverters;
 
 import static org.apache.hudi.HoodieConversionUtils.mapAsScalaImmutableMap;
 import static org.apache.hudi.HoodieConversionUtils.toScalaOption;
+import static 
org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE_METADATA_INDEX_BLOOM_FILTER;
+import static 
org.apache.hudi.common.config.HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS;
+import static 
org.apache.hudi.common.config.HoodieMetadataConfig.RECORD_INDEX_ENABLE_PROP;
 import static org.apache.hudi.common.util.ValidationUtils.checkArgument;
 
 public class HoodieSparkFunctionalIndexClient extends 
BaseHoodieFunctionalIndexClient {
@@ -122,11 +124,25 @@ public class HoodieSparkFunctionalIndexClient extends 
BaseHoodieFunctionalIndexC
   private static Map buildWriteConfig(HoodieTableMetaClient 
metaClient, HoodieFunctionalIndexDefinition indexDefinition) {
 Map writeConfig = new HashMap<>();
 if (metaClient.getTableConfig().isMetadataTableAvailable()) {
-  if 
(!writeConfig.containsKey(HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key())) {
-writeConfig.put(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
-
writeConfig.putAll(JavaConverters.mapAsJavaMapConverter(HoodieCLIUtils.getLockOptions(metaClient.getBasePathV2().toString())).asJava());
-  }
+  writeConfig.put(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
+  
writeConfig.putAll(JavaConverters.mapAsJavaMapConverter(HoodieCLIUtils.getLockOptions(metaClient.getBasePathV2().toString())).asJava());
+
+  // [HUDI-7472] Ensure write-config contains the existing MDT partition 
to prevent those from getting deleted
+  
metaClient.getTableConfig().getMetadataPartitions().forEach(partitionPath -> {
+if 
(partitionPath.equals(MetadataPartitionType.RECORD_INDEX.getPartitionPath())) {
+  writeConfig.put(RECORD_INDEX_ENABLE_PROP.key(), "true");
+}
+
+if 
(partitionPath.equals(MetadataPartitionType.BLOOM_FILTERS.getPartitionPath())) {
+  writeConfig.put(ENABLE_METADATA_INDEX_BLOOM_FILTER.key(), "true");
+}
+
+if 
(partitionPath.equal

Re: [PR] [HUDI-7472] prevent MDT partitions from getting dropped [hudi]

2024-03-03 Thread via GitHub


codope merged PR #10804:
URL: https://github.com/apache/hudi/pull/10804


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-1975861181

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22764)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975861138

   
   ## CI report:
   
   * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7472] prevent MDT partitions from getting dropped [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10804:
URL: https://github.com/apache/hudi/pull/10804#issuecomment-1975861013

   
   ## CI report:
   
   * 8e9dc90a531a2284a7e61548d57d9945c35a6c43 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22762)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10801:
URL: https://github.com/apache/hudi/pull/10801#issuecomment-1975860963

   
   ## CI report:
   
   * f920b0a4f0eb180d2d9b1455731af1280f3c3f5d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22761)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


yihua commented on code in PR #10799:
URL: https://github.com/apache/hudi/pull/10799#discussion_r1510669674


##
scripts/pr_compliance.py:
##
@@ -402,7 +402,7 @@ def make_default_validator(body, debug=False):
 "### Documentation Update",
 {"_Describe any necessary documentation update if there is any new 
feature, config, or user-facing change_",
 "",
-"- _The config description must be updated if new configs are added or 
the default value of the configs are changed_",
+"- _The config description must be updated if new configs are added or 
the default value of the configs are changed. If not, put \"N/A\"._",

Review Comment:
   Makes sense.  Addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975847329

   
   @hudi-bot run azure
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975814519

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7476] Incremental loading for archived timeline [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10807:
URL: https://github.com/apache/hudi/pull/10807#issuecomment-1975810140

   
   ## CI report:
   
   * e7ce757189d4a1de1e81b3866a16c795be410b95 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22767)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-1975810103

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22764)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975810071

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN
   * fd3334591f8af75ee9d1f383722f65079114778e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975802813

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-1975801986

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7476] Incremental loading for archived timeline [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10807:
URL: https://github.com/apache/hudi/pull/10807#issuecomment-1975802017

   
   ## CI report:
   
   * e7ce757189d4a1de1e81b3866a16c795be410b95 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975801950

   
   ## CI report:
   
   * ae230ff45cf8fc7aa6e33e3567faf6c4415a8696 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22760)
 
   * c7c575df44ea9bf7f7b26587e26116d93955b2e2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (6db37510ab0 -> de4e88183ac)

2024-03-03 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 6db37510ab0 [HUDI-7150] ExternalSpillableMap support values method 
(#10194)
 add de4e88183ac [HUDI-7471] Use existing util method to get Spark conf in 
tests (#10802)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/testutils/HoodieClientTestUtils.java|  2 +-
 .../org/apache/hudi/testutils/providers/SparkProvider.java  |  2 +-
 .../execution/datasources/TestHoodieInMemoryFileIndex.scala |  5 ++---
 .../scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala|  9 +++--
 .../test/scala/org/apache/hudi/TestHoodieSparkUtils.scala   | 13 +++--
 .../table/read/TestHoodieFileGroupReaderOnSpark.scala   |  4 ++--
 .../utilities/deltastreamer/TestSourceFormatAdapter.java|  5 ++---
 .../utilities/sources/helpers/TestSanitizationUtils.java|  6 ++
 .../apache/hudi/utilities/testutils/UtilitiesTestBase.java  |  2 +-
 .../utilities/transform/TestSqlQueryBasedTransformer.java   |  4 ++--
 10 files changed, 23 insertions(+), 29 deletions(-)



Re: [PR] [HUDI-7471] Increase the number of Spark executors in tests [hudi]

2024-03-03 Thread via GitHub


yihua merged PR #10802:
URL: https://github.com/apache/hudi/pull/10802


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Whether Hudi would provide a dynamic library for C++ in future [hudi]

2024-03-03 Thread via GitHub


xuzifu666 opened a new issue, #10442:
URL: https://github.com/apache/hudi/issues/10442

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Whether Hudi would provide a dynamic library for C++ in future [hudi]

2024-03-03 Thread via GitHub


xuzifu666 closed issue #10442: [SUPPORT] Whether Hudi would provide a dynamic 
library for C++ in future
URL: https://github.com/apache/hudi/issues/10442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-1975794558

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Increase the number of Spark executors in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975794475

   
   ## CI report:
   
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 7107740bd23811d0bf6b792a163980f3d34e86ac Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on code in PR #10799:
URL: https://github.com/apache/hudi/pull/10799#discussion_r1510621129


##
scripts/pr_compliance.py:
##
@@ -402,7 +402,7 @@ def make_default_validator(body, debug=False):
 "### Documentation Update",
 {"_Describe any necessary documentation update if there is any new 
feature, config, or user-facing change_",
 "",
-"- _The config description must be updated if new configs are added or 
the default value of the configs are changed_",
+"- _The config description must be updated if new configs are added or 
the default value of the configs are changed. If not, put \"N/A\"._",

Review Comment:
   Maybe we should use `none` to keep in line with the other checks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7150) ExternalSpillableMap support values method

2024-03-03 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7150.

Resolution: Fixed

Fixed via master branch: 6db37510ab07dcd5ff99b84dc85e480e1bb3a373

> ExternalSpillableMap support values method
> --
>
> Key: HUDI-7150
> URL: https://issues.apache.org/jira/browse/HUDI-7150
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> both RocksDbDiskMap and BitCaskDiskMap not support values method,but other 
> modules would call the method. we should support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7150) ExternalSpillableMap support values method

2024-03-03 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7150:
-
Fix Version/s: 0.15.0
   1.0.0

> ExternalSpillableMap support values method
> --
>
> Key: HUDI-7150
> URL: https://issues.apache.org/jira/browse/HUDI-7150
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> both RocksDbDiskMap and BitCaskDiskMap not support values method,but other 
> modules would call the method. we should support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (d23abd3e9a0 -> 6db37510ab0)

2024-03-03 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from d23abd3e9a0 [HUDI-7458] Fix bug with functional index creation (#10792)
 add 6db37510ab0 [HUDI-7150] ExternalSpillableMap support values method 
(#10194)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/common/util/collection/ExternalSpillableMap.java | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)



Re: [PR] [HUDI-7150] ExternalSpillableMap support values method [hudi]

2024-03-03 Thread via GitHub


danny0405 merged PR #10194:
URL: https://github.com/apache/hudi/pull/10194


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7476) Incremental loading for archived timeline

2024-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7476:
-
Labels: pull-request-available  (was: )

> Incremental loading for archived timeline
> -
>
> Key: HUDI-7476
> URL: https://issues.apache.org/jira/browse/HUDI-7476
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7476] Incremental loading for archived timeline [hudi]

2024-03-03 Thread via GitHub


danny0405 opened a new pull request, #10807:
URL: https://github.com/apache/hudi/pull/10807

   ### Change Logs
   
   This is a subtask for goloal timeline, as the first step, we add the 
functionality of incremental loading for archived timeline.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7472] prevent MDT partitions from getting dropped [hudi]

2024-03-03 Thread via GitHub


codope commented on code in PR #10804:
URL: https://github.com/apache/hudi/pull/10804#discussion_r1510616751


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkFunctionalIndexClient.java:
##
@@ -122,11 +124,25 @@ private static boolean indexExists(HoodieTableMetaClient 
metaClient, String inde
   private static Map buildWriteConfig(HoodieTableMetaClient 
metaClient, HoodieFunctionalIndexDefinition indexDefinition) {
 Map writeConfig = new HashMap<>();
 if (metaClient.getTableConfig().isMetadataTableAvailable()) {
-  if 
(!writeConfig.containsKey(HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key())) {
-writeConfig.put(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
-
writeConfig.putAll(JavaConverters.mapAsJavaMapConverter(HoodieCLIUtils.getLockOptions(metaClient.getBasePathV2().toString())).asJava());
-  }
+  writeConfig.put(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
+  
writeConfig.putAll(JavaConverters.mapAsJavaMapConverter(HoodieCLIUtils.getLockOptions(metaClient.getBasePathV2().toString())).asJava());
+
+  // [HUDI-7472] Ensure write-config contains the existing MDT partition 
to prevent those from getting deleted
+  
metaClient.getTableConfig().getMetadataPartitions().forEach(partitionPath -> {

Review Comment:
   Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7476) Incremental loading for archived timeline

2024-03-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-7476:


 Summary: Incremental loading for archived timeline
 Key: HUDI-7476
 URL: https://issues.apache.org/jira/browse/HUDI-7476
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: Danny Chen
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7475) Disable ITs in hudi-aws module

2024-03-03 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7475:

Description: 
The tests do not work.  Disabling them to unblock Azure CI.
{code:java}
[ERROR] Errors: 
[ERROR]   ITTestGluePartitionPushdown.setUp:96 » Execution 
software.amazon.awssdk.core.e...
[ERROR]   ITTestGluePartitionPushdown.setUp:96 » Execution 
software.amazon.awssdk.core.e...
[ERROR]   ITTestGluePartitionPushdown.setUp:96 » Execution 
software.amazon.awssdk.core.e...
[ERROR]   
ITTestDynamoDBBasedLockProvider.setup:66->getDynamoClientWithLocalEndpoint:110 
IllegalState
[INFO] 
[ERROR] Tests run: 9, Failures: 0, Errors: 4, Skipped: 0


2024-03-04T04:55:22.6893321Z [ERROR] 
org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider  Time 
elapsed: 0.019 s  <<< ERROR!
2024-03-04T04:55:22.6893739Z java.lang.IllegalStateException: 
dynamodb-local.endpoint system property not set
2024-03-04T04:55:22.6894356Zat 
org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider.getDynamoClientWithLocalEndpoint(ITTestDynamoDBBasedLockProvider.java:110)
2024-03-04T04:55:22.6894867Zat 
org.apache.hudi.aws.transaction.integ.ITTestDynamoDBBasedLockProvider.setup(ITTestDynamoDBBasedLockProvider.java:66)
2024-03-04T04:55:22.6895225Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-03-04T04:55:22.6895711Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2024-03-04T04:55:22.6896080Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-03-04T04:55:22.6896418Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2024-03-04T04:55:22.6896755Zat 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
2024-03-04T04:55:22.6897322Zat 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
2024-03-04T04:55:22.6897911Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
2024-03-04T04:55:22.6971261Zat 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
2024-03-04T04:55:22.6971737Zat 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
2024-03-04T04:55:22.6972156Zat 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
2024-03-04T04:55:22.6972608Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
2024-03-04T04:55:22.6973048Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
2024-03-04T04:55:22.6973483Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
2024-03-04T04:55:22.6974121Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
2024-03-04T04:55:22.6974562Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
2024-03-04T04:55:22.6975257Zat 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
2024-03-04T04:55:22.6975649Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
2024-03-04T04:55:22.6976025Zat 
org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
2024-03-04T04:55:22.6976454Zat 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods$9(ClassBasedTestDescriptor.java:384)
2024-03-04T04:55:22.6976901Zat 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2024-03-04T04:55:22.6977341Zat 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllMethods(ClassBasedTestDescriptor.java:382)
2024-03-04T04:55:22.6977781Zat 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:196)
2024-03-04T04:55:22.6978194Zat 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:78)
2024-03-04T04:55:22.6978624Zat 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:136)
2024-03-04T04:55:22.6979051Zat 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
2024-03-04T04:55:22.6979473Zat 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
2024-03-04T04:55:22.6979866Zat 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
2024-03-04T04:55:22

[jira] [Created] (HUDI-7475) Disable ITs in hudi-aws module

2024-03-03 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7475:
---

 Summary: Disable ITs in hudi-aws module
 Key: HUDI-7475
 URL: https://issues.apache.org/jira/browse/HUDI-7475
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [MINOR] Streamer test setup performance [hudi]

2024-03-03 Thread via GitHub


the-other-tim-brown opened a new pull request, #10806:
URL: https://github.com/apache/hudi/pull/10806

   ### Change Logs
   
   Performs file copying and other file creation once per class instead of once 
per test case.
   
   ### Impact
   
   Lower CI execution overhead
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7472] prevent MDT partitions from getting dropped [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10804:
URL: https://github.com/apache/hudi/pull/10804#issuecomment-1975745261

   
   ## CI report:
   
   * a697f425976d2f21e339e82fe0c5d160ed99a84c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22747)
 
   * 8e9dc90a531a2284a7e61548d57d9945c35a6c43 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22762)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975738860

   
   ## CI report:
   
   * ae230ff45cf8fc7aa6e33e3567faf6c4415a8696 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22760)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7472] prevent MDT partitions from getting dropped [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10804:
URL: https://github.com/apache/hudi/pull/10804#issuecomment-1975738841

   
   ## CI report:
   
   * a697f425976d2f21e339e82fe0c5d160ed99a84c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22747)
 
   * 8e9dc90a531a2284a7e61548d57d9945c35a6c43 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7474) Functional index creation fails for an existing table as reported by community user

2024-03-03 Thread Vinaykumar Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinaykumar Bhat reassigned HUDI-7474:
-

Assignee: Vinaykumar Bhat

> Functional index creation fails for an existing table as reported by 
> community user
> ---
>
> Key: HUDI-7474
> URL: https://issues.apache.org/jira/browse/HUDI-7474
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>
> Investigate issue reported with functional index here - 
> https://github.com/apache/hudi/issues/10110



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7474) Functional index creation fails for an existing table as reported by community user

2024-03-03 Thread Vinaykumar Bhat (Jira)
Vinaykumar Bhat created HUDI-7474:
-

 Summary: Functional index creation fails for an existing table as 
reported by community user
 Key: HUDI-7474
 URL: https://issues.apache.org/jira/browse/HUDI-7474
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Vinaykumar Bhat


Investigate issue reported with functional index here - 
https://github.com/apache/hudi/issues/10110



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7471] Increase the number of Spark executors in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975669539

   
   ## CI report:
   
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 00c8febb5918350558c9582b90ad124422a165df Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22753)
 
   * 7107740bd23811d0bf6b792a163980f3d34e86ac Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10801:
URL: https://github.com/apache/hudi/pull/10801#issuecomment-1975669489

   
   ## CI report:
   
   * 6524c27e11d40ab23b6248d82a6115a79da6cf49 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22741)
 
   * f920b0a4f0eb180d2d9b1455731af1280f3c3f5d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22761)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975664144

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * ae230ff45cf8fc7aa6e33e3567faf6c4415a8696 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22760)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10801:
URL: https://github.com/apache/hudi/pull/10801#issuecomment-1975664071

   
   ## CI report:
   
   * 6524c27e11d40ab23b6248d82a6115a79da6cf49 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22741)
 
   * f920b0a4f0eb180d2d9b1455731af1280f3c3f5d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Increase the number of Spark executors in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975664105

   
   ## CI report:
   
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 00c8febb5918350558c9582b90ad124422a165df Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22753)
 
   * 7107740bd23811d0bf6b792a163980f3d34e86ac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975658848

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * ae230ff45cf8fc7aa6e33e3567faf6c4415a8696 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Increase the number of Spark executors in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975658814

   
   ## CI report:
   
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 00c8febb5918350558c9582b90ad124422a165df Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


xuzifu666 commented on code in PR #10801:
URL: https://github.com/apache/hudi/pull/10801#discussion_r1510521354


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -327,8 +327,10 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, HoodieTable tab
 try {
   this.txnManager.beginTransaction(Option.of(compactionInstant), 
Option.empty());
   finalizeWrite(table, compactionCommitTime, writeStats);
-  // commit to data table after committing to metadata table.
-  writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  // if metatable is enable, then commit to data table after committing to 
metadata table.
+  if (config.getMetadataConfig().enabled()) {
+writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  }
   LOG.info("Committing Compaction " + compactionCommitTime + ". Finished 
with result " + metadata);

Review Comment:
   Maybe add a new config to control it could be better,I do the change with 
adding a config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975636877

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Increase the number of Spark executors in tests [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975636796

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7458) Creating multiple functional index fails

2024-03-03 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7458.
-
Resolution: Fixed

> Creating multiple functional index fails
> 
>
> Key: HUDI-7458
> URL: https://issues.apache.org/jira/browse/HUDI-7458
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Looks like an issue in `
> HoodieSparkFunctionalIndexClient::create(...)` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7458) Creating multiple functional index fails

2024-03-03 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7458:
--
Fix Version/s: 1.0.0

> Creating multiple functional index fails
> 
>
> Key: HUDI-7458
> URL: https://issues.apache.org/jira/browse/HUDI-7458
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Vinaykumar Bhat
>Assignee: Vinaykumar Bhat
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Looks like an issue in `
> HoodieSparkFunctionalIndexClient::create(...)` 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (5e18e24b76d -> d23abd3e9a0)

2024-03-03 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 5e18e24b76d [MINOR] Clean code of FileSystemViewManager (#10797)
 add d23abd3e9a0 [HUDI-7458] Fix bug with functional index creation (#10792)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/MetadataCommand.java  |  2 +-
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  4 +--
 .../metadata/HoodieBackedTableMetadataWriter.java  | 11 ---
 .../java/org/apache/hudi/table/HoodieTable.java|  8 +++--
 .../table/action/index/RunIndexActionExecutor.java | 11 ++-
 .../action/index/ScheduleIndexActionExecutor.java  | 14 ++--
 .../table/upgrade/ThreeToFourUpgradeHandler.java   |  2 +-
 .../hudi/table/HoodieFlinkCopyOnWriteTable.java|  2 +-
 .../hudi/table/HoodieJavaCopyOnWriteTable.java |  4 +--
 .../hudi/table/HoodieSparkCopyOnWriteTable.java|  4 +--
 .../functional/TestHoodieBackedMetadata.java   |  2 +-
 .../hudi/client/functional/TestHoodieIndex.java|  4 +--
 .../hudi/table/upgrade/TestUpgradeDowngrade.java   |  2 +-
 .../hudi/common/table/HoodieTableConfig.java   | 38 +++---
 .../hudi/metadata/HoodieTableMetadataUtil.java | 32 +-
 .../hudi/HoodieSparkFunctionalIndexClient.java | 10 +++---
 .../hudi/functional/TestRecordLevelIndex.scala |  2 +-
 .../hudi/command/index/TestFunctionalIndex.scala   | 22 +++--
 .../org/apache/hudi/utilities/HoodieIndexer.java   |  4 ++-
 .../apache/hudi/utilities/TestHoodieIndexer.java   | 26 +++
 20 files changed, 116 insertions(+), 88 deletions(-)



Re: [PR] [HUDI-7458] Fix bug with functional index creation [hudi]

2024-03-03 Thread via GitHub


codope merged PR #10792:
URL: https://github.com/apache/hudi/pull/10792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7471) Increase the number of Spark executors in tests

2024-03-03 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7471:

Summary: Increase the number of Spark executors in tests  (was: Use 
existing util method to get Spark conf in tests)

> Increase the number of Spark executors in tests
> ---
>
> Key: HUDI-7471
> URL: https://issues.apache.org/jira/browse/HUDI-7471
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975602405

   
   ## CI report:
   
   * 45960bdc728cd4cfb7610ba503b5dcdd97de15ad Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22757)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975586719

   
   ## CI report:
   
   * dcc5eef2a7ec5336ad8e9b8c6d519dbe7cd8d6f3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22756)
 
   * 45960bdc728cd4cfb7610ba503b5dcdd97de15ad UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975570890

   
   ## CI report:
   
   * dcc5eef2a7ec5336ad8e9b8c6d519dbe7cd8d6f3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


xuzifu666 commented on code in PR #10801:
URL: https://github.com/apache/hudi/pull/10801#discussion_r1510521354


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -327,8 +327,10 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, HoodieTable tab
 try {
   this.txnManager.beginTransaction(Option.of(compactionInstant), 
Option.empty());
   finalizeWrite(table, compactionCommitTime, writeStats);
-  // commit to data table after committing to metadata table.
-  writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  // if metatable is enable, then commit to data table after committing to 
metadata table.
+  if (config.getMetadataConfig().enabled()) {
+writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  }
   LOG.info("Committing Compaction " + compactionCommitTime + ". Finished 
with result " + metadata);

Review Comment:
   Maybe add a new config to control it could be better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


xuzifu666 commented on code in PR #10801:
URL: https://github.com/apache/hudi/pull/10801#discussion_r1510519771


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -327,8 +327,10 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, HoodieTable tab
 try {
   this.txnManager.beginTransaction(Option.of(compactionInstant), 
Option.empty());
   finalizeWrite(table, compactionCommitTime, writeStats);
-  // commit to data table after committing to metadata table.
-  writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  // if metatable is enable, then commit to data table after committing to 
metadata table.
+  if (config.getMetadataConfig().enabled()) {
+writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  }
   LOG.info("Committing Compaction " + compactionCommitTime + ". Finished 
with result " + metadata);

Review Comment:
   writeTableMetadata judge whether to execute commit data to metadata table 
had two conditions:
   **1. mdt enable  2. mdt dir exists;**
   user if want to stop commit to metadata who enable it before(in the 
condition metadata dir would exists),can not stop compaction operation commit 
data to metadata by set hoodie.metadata.enable=false,currently could not 
support it which is not fitable @danny0405 @CTTY 
   
![1709519127428.png](https://github.com/apache/hudi/assets/10645422/e3eebb7c-40e9-4d56-a5bb-d9b413c40268)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


xuzifu666 commented on code in PR #10801:
URL: https://github.com/apache/hudi/pull/10801#discussion_r1510519771


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -327,8 +327,10 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, HoodieTable tab
 try {
   this.txnManager.beginTransaction(Option.of(compactionInstant), 
Option.empty());
   finalizeWrite(table, compactionCommitTime, writeStats);
-  // commit to data table after committing to metadata table.
-  writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  // if metatable is enable, then commit to data table after committing to 
metadata table.
+  if (config.getMetadataConfig().enabled()) {
+writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  }
   LOG.info("Committing Compaction " + compactionCommitTime + ". Finished 
with result " + metadata);

Review Comment:
   writeTableMetadata judge whether to execute commit data to metadata table 
had two conditions:
   **1. mdt enable  2. mdt dir exists;**
   user if want to stop commit to metadata who enable it before(in the 
condition metadata dir would exists),can not stop compaction operation commit 
data to metadata by set hoodie.metadata.enable=false,currently could not 
support it which is not fitable @danny0405 @CTTY 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975537092

   
   ## CI report:
   
   * 0f4b93e06c0d5dff1ba80755f5b99c97c42f7234 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22755)
 
   * dcc5eef2a7ec5336ad8e9b8c6d519dbe7cd8d6f3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975531430

   
   ## CI report:
   
   * 0f4b93e06c0d5dff1ba80755f5b99c97c42f7234 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22755)
 
   * dcc5eef2a7ec5336ad8e9b8c6d519dbe7cd8d6f3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975526028

   
   ## CI report:
   
   * 0f4b93e06c0d5dff1ba80755f5b99c97c42f7234 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22755)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Use existing util method to get Spark conf in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975525988

   
   ## CI report:
   
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 00c8febb5918350558c9582b90ad124422a165df Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975525940

   
   ## CI report:
   
   * ca10086db11aafc594f9b05cf07b5e25a43da8bd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22754)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975522969

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975486966

   
   ## CI report:
   
   * 1982318df811e9dbbb0458b2219d251ceeae683a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22737)
 
   * ca10086db11aafc594f9b05cf07b5e25a43da8bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22754)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975487021

   
   ## CI report:
   
   * 3017abe64c0d4bd8e3866cb466425ca982915126 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22749)
 
   * 0f4b93e06c0d5dff1ba80755f5b99c97c42f7234 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22755)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Use existing util method to get Spark conf in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975486994

   
   ## CI report:
   
   * 4c37feb88ed56cbc6cb81aedcde0eba21996b84f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22743)
 
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 00c8febb5918350558c9582b90ad124422a165df Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975480494

   
   ## CI report:
   
   * 3017abe64c0d4bd8e3866cb466425ca982915126 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22749)
 
   * 0f4b93e06c0d5dff1ba80755f5b99c97c42f7234 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Use existing util method to get Spark conf in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975480390

   
   ## CI report:
   
   * 4c37feb88ed56cbc6cb81aedcde0eba21996b84f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22743)
 
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   * 00c8febb5918350558c9582b90ad124422a165df UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975480361

   
   ## CI report:
   
   * 1982318df811e9dbbb0458b2219d251ceeae683a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22737)
 
   * ca10086db11aafc594f9b05cf07b5e25a43da8bd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7471] Use existing util method to get Spark conf in tests [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10802:
URL: https://github.com/apache/hudi/pull/10802#issuecomment-1975474880

   
   ## CI report:
   
   * 4c37feb88ed56cbc6cb81aedcde0eba21996b84f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22743)
 
   * 29052e85e4aa6d257b8b16b9ba4ea771bce7bd75 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


yihua commented on code in PR #10799:
URL: https://github.com/apache/hudi/pull/10799#discussion_r1510461098


##
scripts/pr_compliance.py:
##
@@ -402,7 +402,7 @@ def make_default_validator(body, debug=False):
 "### Documentation Update",
 {"_Describe any necessary documentation update if there is any new 
feature, config, or user-facing change_",
 "",
-"- _The config description must be updated if new configs are added or 
the default value of the configs are changed_",
+"- _The config description must be updated if new configs are added or 
the default value of the configs are changed. If not, put \"N/A\"._",

Review Comment:
   As long as the user provide a description, the validation of this section 
passes.



##
scripts/pr_compliance.py:
##
@@ -402,7 +402,7 @@ def make_default_validator(body, debug=False):
 "### Documentation Update",
 {"_Describe any necessary documentation update if there is any new 
feature, config, or user-facing change_",
 "",
-"- _The config description must be updated if new configs are added or 
the default value of the configs are changed_",
+"- _The config description must be updated if new configs are added or 
the default value of the configs are changed. If not, put \"N/A\"._",

Review Comment:
   As long as the user provides a description, the validation of this section 
passes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


yihua commented on code in PR #10805:
URL: https://github.com/apache/hudi/pull/10805#discussion_r1510460519


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTableWithNonRecordKeyField.scala:
##
@@ -1,24 +1,27 @@
 /*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information

Review Comment:
   My local Apache copyright template is updated with the latest thus the 
difference after moving the classes.  I've reverted these changes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on code in PR #10799:
URL: https://github.com/apache/hudi/pull/10799#discussion_r1510455650


##
scripts/pr_compliance.py:
##
@@ -402,7 +402,7 @@ def make_default_validator(body, debug=False):
 "### Documentation Update",
 {"_Describe any necessary documentation update if there is any new 
feature, config, or user-facing change_",
 "",
-"- _The config description must be updated if new configs are added or 
the default value of the configs are changed_",
+"- _The config description must be updated if new configs are added or 
the default value of the configs are changed. If not, put \"N/A\"._",

Review Comment:
   Maybe put anything there, like `none` or `nothing`, or just like the other 
checks to give some options. What is expected for the answer, yes or no ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7470] Compaction completed not need write to mdt if mdt is disable [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on code in PR #10801:
URL: https://github.com/apache/hudi/pull/10801#discussion_r1510455170


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -327,8 +327,10 @@ protected void completeCompaction(HoodieCommitMetadata 
metadata, HoodieTable tab
 try {
   this.txnManager.beginTransaction(Option.of(compactionInstant), 
Option.empty());
   finalizeWrite(table, compactionCommitTime, writeStats);
-  // commit to data table after committing to metadata table.
-  writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  // if metatable is enable, then commit to data table after committing to 
metadata table.
+  if (config.getMetadataConfig().enabled()) {
+writeTableMetadata(table, compactionCommitTime, metadata, 
context.emptyHoodieData());
+  }
   LOG.info("Committing Compaction " + compactionCommitTime + ". Finished 
with result " + metadata);

Review Comment:
   `writeTableMetadata` already has a inline check for the disability of the 
MDT.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [MINOR] Clean code of FileSystemViewManager (#10797)

2024-03-03 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e18e24b76d [MINOR] Clean code of FileSystemViewManager (#10797)
5e18e24b76d is described below

commit 5e18e24b76da1fb485f8932ea2ef7ae58ad9cb0e
Author: stayrascal 
AuthorDate: Mon Mar 4 08:40:09 2024 +0800

[MINOR] Clean code of FileSystemViewManager (#10797)

Co-authored-by: wuzhiping 
---
 .../java/org/apache/hudi/table/HoodieTable.java|  15 ++-
 .../TestRemoteFileSystemViewWithMetadataTable.java |   2 +-
 .../TestTimelineServerBasedWriteMarkers.java   |   5 +-
 .../hudi/testutils/HoodieClientTestUtils.java  |   3 +-
 .../hudi/common/table/HoodieTableMetaClient.java   |   6 +-
 .../common/table/view/FileSystemViewManager.java   | 102 ++---
 .../table/read/TestHoodieFileGroupReaderBase.java  |   2 +-
 .../sql/hudi/TestPartialUpdateForMergeInto.scala   |   2 +-
 .../hudi/timeline/service/TimelineService.java |   6 +-
 .../TestRemoteHoodieTableFileSystemView.java   |   4 +-
 10 files changed, 68 insertions(+), 79 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
index 3b78fb09090..cec27379a85 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
@@ -121,13 +121,12 @@ import static 
org.apache.hudi.metadata.HoodieTableMetadataUtil.metadataPartition
  * @param  Type of outputs
  */
 public abstract class HoodieTable implements Serializable {
-
   private static final Logger LOG = LoggerFactory.getLogger(HoodieTable.class);
 
   protected final HoodieWriteConfig config;
   protected final HoodieTableMetaClient metaClient;
   protected final HoodieIndex index;
-  private SerializableConfiguration hadoopConfiguration;
+  private final SerializableConfiguration hadoopConfiguration;
   protected final TaskContextSupplier taskContextSupplier;
   private final HoodieTableMetadata metadata;
   private final HoodieStorageLayout storageLayout;
@@ -146,7 +145,7 @@ public abstract class HoodieTable implements 
Serializable {
 .build();
 this.metadata = HoodieTableMetadata.create(context, metadataConfig, 
config.getBasePath());
 
-this.viewManager = FileSystemViewManager.createViewManager(context, 
config.getMetadataConfig(), config.getViewStorageConfig(), 
config.getCommonConfig(), unused -> metadata);
+this.viewManager = getViewManager();
 this.metaClient = metaClient;
 this.index = getIndex(config, context);
 this.storageLayout = getStorageLayout(config);
@@ -165,7 +164,7 @@ public abstract class HoodieTable implements 
Serializable {
 
   private synchronized FileSystemViewManager getViewManager() {
 if (null == viewManager) {
-  viewManager = FileSystemViewManager.createViewManager(getContext(), 
config.getMetadataConfig(), config.getViewStorageConfig(), 
config.getCommonConfig(), unused -> metadata);
+  viewManager = FileSystemViewManager.createViewManager(getContext(), 
config.getViewStorageConfig(), config.getCommonConfig(), unused -> metadata);
 }
 return viewManager;
   }
@@ -177,8 +176,7 @@ public abstract class HoodieTable implements 
Serializable {
* @param records  hoodieRecords to upsert
* @return HoodieWriteMetadata
*/
-  public abstract HoodieWriteMetadata upsert(HoodieEngineContext context, 
String instantTime,
-  I records);
+  public abstract HoodieWriteMetadata upsert(HoodieEngineContext context, 
String instantTime, I records);
 
   /**
* Insert a batch of new records into Hoodie table at the supplied 
instantTime.
@@ -187,8 +185,7 @@ public abstract class HoodieTable implements 
Serializable {
* @param records  hoodieRecords to upsert
* @return HoodieWriteMetadata
*/
-  public abstract HoodieWriteMetadata insert(HoodieEngineContext context, 
String instantTime,
-  I records);
+  public abstract HoodieWriteMetadata insert(HoodieEngineContext context, 
String instantTime, I records);
 
   /**
* Bulk Insert a batch of new records into Hoodie table at the supplied 
instantTime.
@@ -267,7 +264,7 @@ public abstract class HoodieTable implements 
Serializable {
* @return HoodieWriteMetadata
*/
   public abstract HoodieWriteMetadata bulkInsertPrepped(HoodieEngineContext 
context, String instantTime,
-  I preppedRecords,  Option bulkInsertPartitioner);
+  I preppedRecords, Option bulkInsertPartitioner);
 
   /**
* Replaces all the existing records and inserts the specified new records 
into Hoodie table at the supplied instantTime,
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/Te

Re: [PR] [MINOR] Clean code of FileSystemViewManager [hudi]

2024-03-03 Thread via GitHub


danny0405 merged PR #10797:
URL: https://github.com/apache/hudi/pull/10797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Clean code of FileSystemViewManager [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on PR #10797:
URL: https://github.com/apache/hudi/pull/10797#issuecomment-1975461409

   The test failure is known to be falky, will merge it soon~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Clean code of FileSystemViewManager [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on code in PR #10797:
URL: https://github.com/apache/hudi/pull/10797#discussion_r1510454074


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java:
##
@@ -201,43 +202,40 @@ public static HoodieTableFileSystemView 
createInMemoryFileSystemViewWithTimeline
   /**
* Create a remote file System view for a table.
*
-   * @param viewConf View Storage Configuration
+   * @param viewConf   View Storage Configuration
* @param metaClient Hoodie Table MetaClient for the table.
-   * @return
+   * @return {@link RemoteHoodieTableFileSystemView}
*/
   private static RemoteHoodieTableFileSystemView 
createRemoteFileSystemView(FileSystemViewStorageConfig viewConf,
-  HoodieTableMetaClient metaClient) {
-LOG.info("Creating remote view for basePath " + metaClient.getBasePath() + 
". Server="
-+ viewConf.getRemoteViewServerHost() + ":" + 
viewConf.getRemoteViewServerPort() + ", Timeout="
-+ viewConf.getRemoteTimelineClientTimeoutSecs());
+
HoodieTableMetaClient metaClient) {
+LOG.info("Creating remote view for basePath {}. Server={}:{}, Timeout={}", 
metaClient.getBasePathV2(),
+viewConf.getRemoteViewServerHost(), 
viewConf.getRemoteViewServerPort(), 
viewConf.getRemoteTimelineClientTimeoutSecs());
 return new RemoteHoodieTableFileSystemView(metaClient, viewConf);
   }
 
+  public static FileSystemViewManager createViewManagerWithTableMetadata(
+  final HoodieEngineContext context,
+  final HoodieMetadataConfig metadataConfig,

Review Comment:
   Why this method is needful?



##
hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java:
##
@@ -201,43 +202,40 @@ public static HoodieTableFileSystemView 
createInMemoryFileSystemViewWithTimeline
   /**
* Create a remote file System view for a table.
*
-   * @param viewConf View Storage Configuration
+   * @param viewConf   View Storage Configuration
* @param metaClient Hoodie Table MetaClient for the table.
-   * @return
+   * @return {@link RemoteHoodieTableFileSystemView}
*/
   private static RemoteHoodieTableFileSystemView 
createRemoteFileSystemView(FileSystemViewStorageConfig viewConf,
-  HoodieTableMetaClient metaClient) {
-LOG.info("Creating remote view for basePath " + metaClient.getBasePath() + 
". Server="
-+ viewConf.getRemoteViewServerHost() + ":" + 
viewConf.getRemoteViewServerPort() + ", Timeout="
-+ viewConf.getRemoteTimelineClientTimeoutSecs());
+
HoodieTableMetaClient metaClient) {
+LOG.info("Creating remote view for basePath {}. Server={}:{}, Timeout={}", 
metaClient.getBasePathV2(),
+viewConf.getRemoteViewServerHost(), 
viewConf.getRemoteViewServerPort(), 
viewConf.getRemoteTimelineClientTimeoutSecs());
 return new RemoteHoodieTableFileSystemView(metaClient, viewConf);
   }
 
+  public static FileSystemViewManager createViewManagerWithTableMetadata(
+  final HoodieEngineContext context,
+  final HoodieMetadataConfig metadataConfig,

Review Comment:
   Hmm, it's just a displacement of the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add PR description validation on documentation updates [hudi]

2024-03-03 Thread via GitHub


yihua commented on PR #10799:
URL: https://github.com/apache/hudi/pull/10799#issuecomment-1975458169

   @danny0405 I updated the template to be more instructive so user knows to 
put "N/A" if no docs update is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7471) Use existing util method to get Spark conf in tests

2024-03-03 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7471:

Summary: Use existing util method to get Spark conf in tests  (was: 
Increase the number of Spark executors in tests)

> Use existing util method to get Spark conf in tests
> ---
>
> Key: HUDI-7471
> URL: https://issues.apache.org/jira/browse/HUDI-7471
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on code in PR #10805:
URL: https://github.com/apache/hudi/pull/10805#discussion_r1510449991


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTableWithNonRecordKeyField.scala:
##
@@ -1,24 +1,27 @@
 /*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information

Review Comment:
   Why most of the licence been changed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-3625] Update RFC-60 (#9462)

2024-03-03 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 849b217edb5 [HUDI-3625] Update RFC-60 (#9462)
849b217edb5 is described below

commit 849b217edb563369720e528ad4487df4b57a2308
Author: Shawn Chang <42792772+c...@users.noreply.github.com>
AuthorDate: Sun Mar 3 16:12:36 2024 -0800

[HUDI-3625] Update RFC-60 (#9462)

Co-authored-by: Shawn Chang 
---
 rfc/rfc-60/read_flow.png  | Bin 0 -> 176856 bytes
 rfc/rfc-60/rfc-60.md  |  99 ++
 rfc/rfc-60/wrapper_fs.png | Bin 0 -> 148392 bytes
 3 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/rfc/rfc-60/read_flow.png b/rfc/rfc-60/read_flow.png
new file mode 100644
index 000..4ef464f41e7
Binary files /dev/null and b/rfc/rfc-60/read_flow.png differ
diff --git a/rfc/rfc-60/rfc-60.md b/rfc/rfc-60/rfc-60.md
index d509aec1f20..bdfaa58b899 100644
--- a/rfc/rfc-60/rfc-60.md
+++ b/rfc/rfc-60/rfc-60.md
@@ -15,7 +15,7 @@
   limitations under the License.
 -->
 
-# RFC-60: Federated Storage Layer
+# RFC-60: Federated Storage Layout
 
 ## Proposers
 - @umehrot2
@@ -52,7 +52,10 @@ but there can be a 30 - 60 minute wait time before new 
partitions are created. T
 same table path prefix could result in these request limits being hit for the 
table prefix, specially as workloads
 scale, and there are several thousands of files being written/updated 
concurrently. This hurts performance due to
 re-trying of failed requests affecting throughput, and result in occasional 
failures if the retries are not able to
-succeed either and continue to be throttled.
+succeed either and continue to be throttled. Note an exception would be 
non-partitioned tables 
+reside directly under S3 buckets (using S3 buckets as their table paths), and 
those tables would be free
+from the throttling problem. However, this exception cannot invalidate the 
necessity of addressing the throttling 
+problem for partitioned tables.
 
 The traditional storage layout also tightly couples the partitions as folders 
under the table path. However,
 some users want flexibility to be able to distribute files/partitions under 
multiple different paths across cloud stores,
@@ -97,22 +100,21 @@ public interface HoodieStorageStrategy extends 
Serializable {
 }
 ```
 
-### Generating file paths for object store optimized layout
+### Generating File Paths for Object Store Optimized Layout
 
 We want to distribute files evenly across multiple random prefixes, instead of 
following the traditional Hive storage
 layout of keeping them under a common table path/prefix. In addition to the 
`Table Path`, for this new layout user will
 configure another `Table Storage Path` under which the actual data files will 
be distributed. The original `Table Path` will
 be used to maintain the table/partitions Hudi metadata.
 
-For the purpose of this documentation lets assume:
+For the purpose of this documentation let's assume:
 ```
 Table Path => s3:
 
 Table Storage Path => s3:///
 ```
-Note: `Table Storage Path` can be a path in the same Amazon S3 bucket or a 
different bucket. For best results,
-`Table Storage Path` should be a top-level bucket instead of a prefix under 
the bucket to avoid multiple 
-tables sharing the prefix.
+`Table Storage Path` should be a top-level bucket instead of a prefix under 
the bucket for the best results.
+So that we can avoid multiple tables sharing the prefix causing throttling.
 
 We will use a Hashing function on the `Partition Path/File ID` to map them to 
a prefix generated under `Table Storage Path`:
 ```
@@ -148,7 +150,7 @@ 
s3:///0bfb3d6e//.075f3295-def8-4a42-a927-
 ...
 ```
 
-Note: Storage strategy would only return a storage location instead of a full 
path. In the above example,
+Storage strategy would only return a storage location instead of a full path. 
In the above example,
 the storage location is `s3:///0bfb3d6e/`, and the 
lower-level folder structure would be appended
 later automatically to get the actual file path. In another word, 
 users would only be able to customize upper-level folder structure (storage 
location). 
@@ -176,7 +178,7 @@ The hashing function should be made user configurable for 
use cases like bucketi
 sub-partitioning/re-hash to reduce the number of hash prefixes. Having too 
many unique hash prefixes
 would make files too dispersed, and affect performance on other operations 
such as listing.
 
-### Maintain mapping to files
+### Maintaining Mapping to Files with Metadata Table
 
 In 
[RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331),
 we introduced an internal
 Metadata Table with a `files` partition that maintains mapping from partitions 
to list of files in the partition stored
@@ -196,13 +198,75 @@ for metadata table to be populated.
 
 4. If there

Re: [PR] [HUDI-3625] Update RFC-60 [hudi]

2024-03-03 Thread via GitHub


yihua merged PR #9462:
URL: https://github.com/apache/hudi/pull/9462


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-3625] Update RFC-60 [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #9462:
URL: https://github.com/apache/hudi/pull/9462#issuecomment-1975413474

   
   ## CI report:
   
   * 6891d53be9579bc698e6266781fc69c43d08db27 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=19331)
 
   * 5543ce244909d9e8aef49411325469a3b07c78dc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Data loss due to incorrect selection of log file during compaction [hudi]

2024-03-03 Thread via GitHub


danny0405 commented on issue #10803:
URL: https://github.com/apache/hudi/issues/10803#issuecomment-1975412635

   cc @nsivabalan for taking care of this issue~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7473] Rebalance CI [hudi]

2024-03-03 Thread via GitHub


hudi-bot commented on PR #10805:
URL: https://github.com/apache/hudi/pull/10805#issuecomment-1975410326

   
   ## CI report:
   
   * 3017abe64c0d4bd8e3866cb466425ca982915126 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-3625] Update RFC-60 [hudi]

2024-03-03 Thread via GitHub


CTTY commented on code in PR #9462:
URL: https://github.com/apache/hudi/pull/9462#discussion_r1510428841


##
rfc/rfc-60/rfc-60.md:
##
@@ -196,13 +195,75 @@ for metadata table to be populated.
 
 4. If there is an error reading from Metadata table, we will not fall back 
listing from file system.
 
-5. In case of metadata table getting corrupted or lost, we need to have a 
solution here to reconstruct metadata table
-from the files which distributed using federated storage. We will likely have 
to implement a file system listing
-logic, that can get all the partition to files mapping by listing all the 
prefixes under the `Table Storage Path`.
-Following the folder structure of adding table name/partitions under the 
prefix will help in getting the listing and
-identifying the table/partition they belong to.
+### Integration
+This section mainly describes how storage strategy is integrated with other 
components and how read/write
+would look like from Hudi side with object storage layout.
+
+We propose integrating the storage strategy at the filesystem level, 
specifically within `HoodieWrapperFileSystem`. 
+This way, only file read/write operations undergo path conversion and we can 
limit the usage of 
+storage strategy to only filesystem level so other upper-level components 
don't need to be aware of physical paths.
+
+This also mandates that `HoodieWrapperFileSystem` is the filesystem of choice 
for all upper-level Hudi components.
+Getting filesystem from `Path` or such won't be allowed anymore as using raw 
filesystem may not reach 
+to physical locations without storage strategy. Hudi components can simply 
call `HoodieMetaClient#getFs` 
+to get `HoodieWrapperFileSystem`, and this needs to be the only allowed way 
for any filesystem-related operation. 
+The only exception is when we need to interact with metadata that's still 
stored under the original table path, 
+and we should call `HoodieMetaClient#getRawFs` in this case so 
`HoodieMetaClient` can still be the single entry
+for getting filesystem.
+
+![](wrapper_fs.png)
+
+When conducting a read operation, Hudi would: 
+1. Access filesystem view, `HoodieMetadataFileSystemView` specifically
+2. Scan metadata table via filesystem view to compose `HoodieMetadataPayload`
+3. Call `HoodieMetadataPayload#getFileStatuses` and employ 
`HoodieWrapperFileSystem` to get 
+file statuses with physical locations
+
+This flow can be concluded in the chart below.
+
+![](read_flow.png)
+
+ Considerations
+- Path conversion happens on the fly when reading/writing files. This saves 
Hudi from storing physical locations
+but it also means extra performance burden, even though it may be negligible.
+- Since table path and data path will most likely have different top-level 
folders/authorities,
+`HoodieWrapperFileSystem` should maintain at least two `FileSystem` objects: 
one to access table path and another
+to access storage path. `HoodieWrapperFileSystem` should intelligently tell if 
it needs
+to convert the path by checking the path on the fly.
+- When using Hudi file reader/writer implementation, we will need to pass 
`HoodieWrapperFileSystem` down
+to parent reader. For instance, when using `HoodieAvroHFileReader`, we will 
need to pass `HoodieWrapperFileSystem`
+to `HFile.Reader` so it can have access to storage strategy. If reader/writer 
doesn't take filesystem
+directly (e.g. `ParquetFileReader` only takes `Configuration` and `Path` for 
reading), then we will
+need to register `HoodieWrapperFileSystem` to `Configuration` so it can be 
initialized/used later.
+
+### Repair Tool

Review Comment:
   Yes, it should work with custom strategies. But the how custom strategies 
would be implemented is a big black box so we plan to support basic cases where 
there is only one storage path (the algo used for hashing doesn't matter here). 
   
   This may be implemented differently with the new `HoodieStorage` and I'll 
explore that and update this part in separate PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >