date:20240501

Re: [PR] [HUDI-7703] Clean plan to exclude partitions with no deleting file [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11136:
URL: https://github.com/apache/hudi/pull/11136#issuecomment-2089670441

   
   ## CI report:
   
   * 05e8bc658ccd29c673954a0d1e8e37d139878cc3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23610)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] upgrade from 0.10.0 to 0.14.0 [hudi]

2024-05-01 Thread via GitHub



ghrahul commented on issue #11126:
URL: https://github.com/apache/hudi/issues/11126#issuecomment-2089670401

   Hi @ad1happy2go ,
   I tried to turn off `hoodie.datasource.write.reconcile.schema` and rerun the 
process, I am still facing the same error. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7703] Clean plan to exclude partitions with no deleting file [hudi]

2024-05-01 Thread via GitHub



Gatsby-Lee commented on PR #11136:
URL: https://github.com/apache/hudi/pull/11136#issuecomment-2089669375

   👍 
   thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7703] Clean plan to exclude partitions with no deleting file [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11136:
URL: https://github.com/apache/hudi/pull/11136#issuecomment-2089656068

   
   ## CI report:
   
   * 05e8bc658ccd29c673954a0d1e8e37d139878cc3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11124:
URL: https://github.com/apache/hudi/pull/11124#issuecomment-2089655794

   
   ## CI report:
   
   * 33909835f589e444771c8c9c6e5bdec15785e397 UNKNOWN
   * 7161ba385f35b74192f863c40a78f13a8505ec4c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23608)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-7703) Clean plan does not need to include partitions with no files to delete

2024-05-01 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-7703:


Assignee: Raymond Xu

> Clean plan does not need to include partitions with no files to delete
> --
>
> Key: HUDI-7703
> URL: https://issues.apache.org/jira/browse/HUDI-7703
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: Screenshot 2024-04-10 at 2.59.57 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587094528


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -558,9 +558,6 @@ protected void runTableServicesInline(HoodieTable table, 
HoodieCommitMetadata me
   return;
 }
 
-if (config.isMetadataTableEnabled()) {

Review Comment:
   I remember there’s a reason to sync the FS view for MDT.  Is this 
unnecessary now?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -798,7 +795,11 @@ protected void archive(HoodieTable table) {
 }
 try {
   final Timer.Context timerContext = metrics.getArchiveCtx();
-  // We cannot have unbounded commit files. Archive commits if we have to 
archive
+  // We cannot have unbounded commit files. Archive commits if we have to 
archive.
+
+  // Reload table timeline to reflect the latest commits,
+  // there are some table services (for e.g, the cleaning) that executed 
right before the archiving.
+  table.getMetaClient().reloadActiveTimeline();

Review Comment:
   Is this going to archive more instants since the view is refreshed?  Maybe 
we can avoid the overhead if that’s the case, since the next write should get 
the timeline refreshed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7703) Clean plan does not need to include partitions with no files to delete

2024-05-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7703:
-
Labels: pull-request-available  (was: )

> Clean plan does not need to include partitions with no files to delete
> --
>
> Key: HUDI-7703
> URL: https://issues.apache.org/jira/browse/HUDI-7703
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: table-service
>Reporter: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: Screenshot 2024-04-10 at 2.59.57 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-7703] Clean plan to exclude partitions with no deleting file [hudi]

2024-05-01 Thread via GitHub



xushiyan opened a new pull request, #11136:
URL: https://github.com/apache/hudi/pull/11136

   ### Change Logs
   
   Exclude partitions with no deleting files for clean plan.
   
   ### Impact
   
   Remove unnecessary info in  the clean plan - minor optimization to reduce 
cleaner memory footprint.
   
   ### Risk level
   
   Low.
   
   ### Documentation Update
   
   NA.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7703) Clean plan does not need to include partitions with no files to delete

2024-05-01 Thread Raymond Xu (Jira)

Raymond Xu created HUDI-7703:


 Summary: Clean plan does not need to include partitions with no 
files to delete
 Key: HUDI-7703
 URL: https://issues.apache.org/jira/browse/HUDI-7703
 Project: Apache Hudi
  Issue Type: Improvement
  Components: table-service
Reporter: Raymond Xu
 Fix For: 0.15.0, 1.0.0
 Attachments: Screenshot 2024-04-10 at 2.59.57 PM.png





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

(hudi) branch master updated: [HUDI-7702] Remove unused method in ReflectUtil (#11135)

2024-05-01 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1ec7e631c38 [HUDI-7702] Remove unused method in ReflectUtil (#11135)
1ec7e631c38 is described below

commit 1ec7e631c38097b015f2e6d7a0f60ca37be7580e
Author: Y Ethan Guo 
AuthorDate: Wed May 1 22:21:00 2024 -0700

[HUDI-7702] Remove unused method in ReflectUtil (#11135)
---
 .../apache/hudi/spark3/internal/ReflectUtil.java   | 29 +---
 .../hudi/spark3/internal/TestReflectUtil.java  | 54 --
 .../hudi/spark3/internal/TestReflectUtil.java  | 54 --
 .../hudi/spark3/internal/TestReflectUtil.java  | 54 --
 4 files changed, 1 insertion(+), 190 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/hudi/spark3/internal/ReflectUtil.java
 
b/hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/hudi/spark3/internal/ReflectUtil.java
index ad83720b021..c726777876f 100644
--- 
a/hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/hudi/spark3/internal/ReflectUtil.java
+++ 
b/hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/hudi/spark3/internal/ReflectUtil.java
@@ -18,41 +18,14 @@
 package org.apache.hudi.spark3.internal;
 
 import org.apache.hudi.HoodieSparkUtils;
-import org.apache.spark.sql.catalyst.plans.logical.InsertIntoStatement;
-import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan;
-import org.apache.spark.sql.catalyst.util.DateFormatter;
 
-import scala.Option;
-import scala.collection.Seq;
-import scala.collection.immutable.Map;
+import org.apache.spark.sql.catalyst.util.DateFormatter;
 
-import java.lang.reflect.Constructor;
 import java.lang.reflect.Method;
 import java.time.ZoneId;
 
 public class ReflectUtil {
 
-  public static InsertIntoStatement createInsertInto(LogicalPlan table, 
Map> partition, Seq userSpecifiedCols,
- LogicalPlan query, 
boolean overwrite, boolean ifPartitionNotExists, boolean byName) {
-try {
-  if (HoodieSparkUtils.gteqSpark3_5()) {
-Constructor constructor = 
InsertIntoStatement.class.getConstructor(
-LogicalPlan.class, Map.class, Seq.class, LogicalPlan.class, 
boolean.class, boolean.class, boolean.class);
-return constructor.newInstance(table, partition, userSpecifiedCols, 
query, overwrite, ifPartitionNotExists, byName);
-  } else if (HoodieSparkUtils.isSpark3_0()) {
-Constructor constructor = 
InsertIntoStatement.class.getConstructor(
-LogicalPlan.class, Map.class, LogicalPlan.class, 
boolean.class, boolean.class);
-return constructor.newInstance(table, partition, query, overwrite, 
ifPartitionNotExists);
-  } else {
-Constructor constructor = 
InsertIntoStatement.class.getConstructor(
-LogicalPlan.class, Map.class, Seq.class, LogicalPlan.class, 
boolean.class, boolean.class);
-return constructor.newInstance(table, partition, userSpecifiedCols, 
query, overwrite, ifPartitionNotExists);
-  }
-} catch (Exception e) {
-  throw new RuntimeException("Error in create InsertIntoStatement", e);
-}
-  }
-
   public static DateFormatter getDateFormatter(ZoneId zoneId) {
 try {
   ClassLoader loader = Thread.currentThread().getContextClassLoader();
diff --git 
a/hudi-spark-datasource/hudi-spark3.3.x/src/test/java/org/apache/hudi/spark3/internal/TestReflectUtil.java
 
b/hudi-spark-datasource/hudi-spark3.3.x/src/test/java/org/apache/hudi/spark3/internal/TestReflectUtil.java
deleted file mode 100644
index 0763a22f032..000
--- 
a/hudi-spark-datasource/hudi-spark3.3.x/src/test/java/org/apache/hudi/spark3/internal/TestReflectUtil.java
+++ /dev/null
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *  http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hudi.spark3.internal;
-
-import org.apache.hudi.testutils.HoodieClientTestBase;
-
-import org.apache.spark.sql.SparkSession;
-import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation;
-import

Re: [PR] [HUDI-7702] Remove unused method in ReflectUtil [hudi]

2024-05-01 Thread via GitHub



yihua merged PR #11135:
URL: https://github.com/apache/hudi/pull/11135


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7702] Remove unused method in ReflectUtil [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11135:
URL: https://github.com/apache/hudi/pull/11135#issuecomment-2089598670

   
   ## CI report:
   
   * 07e7b3fc02030e1836161055d113795e9cf6240c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23606)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089598608

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * e869465714018ad7085a175529dfc8f700ee867c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23605)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11131:
URL: https://github.com/apache/hudi/pull/11131#issuecomment-2089591663

   
   ## CI report:
   
   * d1de8c5240cf8f3695303a6e118538a87dea82a8 UNKNOWN
   * 7e38f4e8260c1bff3189873cd74dded2c012a7e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23607)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11124:
URL: https://github.com/apache/hudi/pull/11124#issuecomment-2089591532

   
   ## CI report:
   
   * 33909835f589e444771c8c9c6e5bdec15785e397 UNKNOWN
   * 13d4b2235ffd4671b6573996b0f7ac3052226ad0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23586)
 
   * 7161ba385f35b74192f863c40a78f13a8505ec4c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23608)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11131:
URL: https://github.com/apache/hudi/pull/11131#issuecomment-2089584545

   
   ## CI report:
   
   * d1de8c5240cf8f3695303a6e118538a87dea82a8 UNKNOWN
   * 65e6b37c7e41c84a2e37350e77e631d547dc0408 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23598)
 
   * 7e38f4e8260c1bff3189873cd74dded2c012a7e2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11124:
URL: https://github.com/apache/hudi/pull/11124#issuecomment-2089584482

   
   ## CI report:
   
   * 33909835f589e444771c8c9c6e5bdec15785e397 UNKNOWN
   * 13d4b2235ffd4671b6573996b0f7ac3052226ad0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23586)
 
   * 7161ba385f35b74192f863c40a78f13a8505ec4c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089513705

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 5af22d4d68fb12e153472e4a2d7fffb04acb83af Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23604)
 
   * e869465714018ad7085a175529dfc8f700ee867c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23605)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7702] Remove unused method in ReflectUtil [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11135:
URL: https://github.com/apache/hudi/pull/11135#issuecomment-2089494867

   
   ## CI report:
   
   * 07e7b3fc02030e1836161055d113795e9cf6240c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23606)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



danny0405 commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587037847


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -798,7 +795,9 @@ protected void archive(HoodieTable table) {
 }
 try {
   final Timer.Context timerContext = metrics.getArchiveCtx();
-  // We cannot have unbounded commit files. Archive commits if we have to 
archive
+  // We cannot have unbounded commit files. Archive commits if we have to 
archive.
+  // Reload table timeline to reflect the latest commit.
+  table.getMetaClient().reloadActiveTimeline();

Review Comment:
   Only the archive needs this now because the cleaning may be executed before 
the archiving, and the archiving needs to see these cleaning commits. In all 
other other cases, there is no need to refresh either the timeline or the fs 
view.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089494829

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 5af22d4d68fb12e153472e4a2d7fffb04acb83af Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23604)
 
   * e869465714018ad7085a175529dfc8f700ee867c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



danny0405 commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587037847


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -798,7 +795,9 @@ protected void archive(HoodieTable table) {
 }
 try {
   final Timer.Context timerContext = metrics.getArchiveCtx();
-  // We cannot have unbounded commit files. Archive commits if we have to 
archive
+  // We cannot have unbounded commit files. Archive commits if we have to 
archive.
+  // Reload table timeline to reflect the latest commit.
+  table.getMetaClient().reloadActiveTimeline();

Review Comment:
   Only the archive needs this now because the cleaning may be executed before 
the archiving. In other other cases, there is no need to refresh either the 
timeline or the fs view.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



danny0405 commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587036637


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##
@@ -1100,7 +1106,7 @@ public void testArchiveRollbacksAndCleanTestTable() 
throws Exception {
   testTable.doClean(cleanInstant, partitionToFileDeleteCount);
 }
 
-for (int i = 5; i <= 13; i += 3) {
+for (int i = 5; i <= 11; i += 2) {

Review Comment:
   the jump step per run is antually the num of new commits in one loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



danny0405 commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587036340


##
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java:
##
@@ -289,6 +289,14 @@ public HoodieTestTable moveInflightCommitToComplete(String 
instantTime, HoodieCo
 return this;
   }
 
+  public void moveCompleteCommitToInflight(String instantTime) throws 
IOException {

Review Comment:
   yeah, maybe we can extend it when there is a necessity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



danny0405 commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587035583


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestUtils.java:
##
@@ -118,6 +119,10 @@ public static StreamReadMonitoringFunction 
getMonitorFunc(Configuration conf) {
 return new StreamReadMonitoringFunction(conf, new Path(basePath), 
TestConfigurations.ROW_TYPE, 1024 * 1024L, null);
   }
 
+  public static MockStreamingRuntimeContext getMockRuntimeContext() {
+return new org.apache.hudi.sink.utils.MockStreamingRuntimeContext(false, 
4, 0);

Review Comment:
   yeah



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7702] Remove unused method in ReflectUtil [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11135:
URL: https://github.com/apache/hudi/pull/11135#issuecomment-2089489284

   
   ## CI report:
   
   * 07e7b3fc02030e1836161055d113795e9cf6240c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1587033612


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestHdfsParquetImportProcedure.scala:
##
@@ -112,7 +112,7 @@ class TestHdfsParquetImportProcedure extends 
HoodieSparkProcedureTestBase {
   @throws[ParseException]
   @throws[IOException]
   def createInsertRecords(srcFolder: Path): util.List[GenericRecord] = {
-import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._

Review Comment:
   Fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7702) Remove unused method in ReflectUtil

2024-05-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7702:
-
Labels: pull-request-available  (was: )

> Remove unused method in ReflectUtil
> ---
>
> Key: HUDI-7702
> URL: https://issues.apache.org/jira/browse/HUDI-7702
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> ReflectUtil#createInsertInto is no longer used in the repo and causes issue 
> for Scala 2.13 support.  We should remove the unused method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-7702] Remove unused method in ReflectUtil [hudi]

2024-05-01 Thread via GitHub



yihua opened a new pull request, #11135:
URL: https://github.com/apache/hudi/pull/11135

   ### Change Logs
   
   `ReflectUtil#createInsertInto` is no longer used in the repo and causes an 
issue for Scala 2.13 support.  We should remove the unused method.
   
   ### Impact
   
   Code cleanup.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7702) Remove unused method in ReflectUtil

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7702:

Description: ReflectUtil#createInsertInto is no longer used in the repo and 
causes issue for Scala 2.13 support.  We should remove the unused method.  
(was: createInsertInto)

> Remove unused method in ReflectUtil
> ---
>
> Key: HUDI-7702
> URL: https://issues.apache.org/jira/browse/HUDI-7702
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>
> ReflectUtil#createInsertInto is no longer used in the repo and causes issue 
> for Scala 2.13 support.  We should remove the unused method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7702) Remove unused method in ReflectUtil

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7702:

Description: createInsertInto

> Remove unused method in ReflectUtil
> ---
>
> Key: HUDI-7702
> URL: https://issues.apache.org/jira/browse/HUDI-7702
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>
> createInsertInto



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7702) Remove unused method in ReflectUtil

2024-05-01 Thread Ethan Guo (Jira)

Ethan Guo created HUDI-7702:
---

 Summary: Remove unused method in ReflectUtil
 Key: HUDI-7702
 URL: https://issues.apache.org/jira/browse/HUDI-7702
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7702) Remove unused method in ReflectUtil

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7702:

Fix Version/s: 0.15.0
   1.0.0

> Remove unused method in ReflectUtil
> ---
>
> Key: HUDI-7702
> URL: https://issues.apache.org/jira/browse/HUDI-7702
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7694) Unify bijection-avro dependency version

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7694.
---
Resolution: Fixed

> Unify bijection-avro dependency version
> ---
>
> Key: HUDI-7694
> URL: https://issues.apache.org/jira/browse/HUDI-7694
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7702) Remove unused method in ReflectUtil

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-7702:
---

Assignee: Ethan Guo

> Remove unused method in ReflectUtil
> ---
>
> Key: HUDI-7702
> URL: https://issues.apache.org/jira/browse/HUDI-7702
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-05-01 Thread via GitHub



codope commented on code in PR #11124:
URL: https://github.com/apache/hudi/pull/11124#discussion_r1587021674


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/TestUtils.java:
##
@@ -118,6 +119,10 @@ public static StreamReadMonitoringFunction 
getMonitorFunc(Configuration conf) {
 return new StreamReadMonitoringFunction(conf, new Path(basePath), 
TestConfigurations.ROW_TYPE, 1024 * 1024L, null);
   }
 
+  public static MockStreamingRuntimeContext getMockRuntimeContext() {
+return new org.apache.hudi.sink.utils.MockStreamingRuntimeContext(false, 
4, 0);

Review Comment:
   Are hard-coded values passed to construct `MockStreamingRuntimeContext` 
sufficient for all tests?



##
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java:
##
@@ -289,6 +289,14 @@ public HoodieTestTable moveInflightCommitToComplete(String 
instantTime, HoodieCo
 return this;
   }
 
+  public void moveCompleteCommitToInflight(String instantTime) throws 
IOException {

Review Comment:
   Is this method also used for MOR testing with compaction? In that case, 
deleteCommit should be called instead of deleteDeltaCommit right. 



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##
@@ -663,7 +665,11 @@ public void testArchivalWithMultiWriters(boolean 
enableMetadata) throws Exceptio
 
 // do ingestion and trigger archive actions here.
 for (int i = 1; i < 30; i++) {
-  testTable.doWriteOperation("000" + String.format("%02d", i), 
WriteOperationType.UPSERT, i == 1 ? Arrays.asList("p1", "p2") : 
Collections.emptyList(), Arrays.asList("p1", "p2"), 2);
+  String instant = metaClient.createNewInstantTime();
+  if (i == 29) {

Review Comment:
   maybe declare the number of rounds before this loop and use that variable?



##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##
@@ -1100,7 +1106,7 @@ public void testArchiveRollbacksAndCleanTestTable() 
throws Exception {
   testTable.doClean(cleanInstant, partitionToFileDeleteCount);
 }
 
-for (int i = 5; i <= 13; i += 3) {
+for (int i = 5; i <= 11; i += 2) {

Review Comment:
   Can we add a comment above for why we need to jump in steps of 2?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##
@@ -798,7 +795,9 @@ protected void archive(HoodieTable table) {
 }
 try {
   final Timer.Context timerContext = metrics.getArchiveCtx();
-  // We cannot have unbounded commit files. Archive commits if we have to 
archive
+  // We cannot have unbounded commit files. Archive commits if we have to 
archive.
+  // Reload table timeline to reflect the latest commit.
+  table.getMetaClient().reloadActiveTimeline();

Review Comment:
   If we reload timeline here, it is possible that write client and table 
service client have different fs view. Is that expected? Also, why should we 
not then reload timeline for other table services before executing that table 
service?



##
hudi-client/hudi-java-client/src/test/java/org/apache/hudi/client/TestJavaHoodieBackedMetadata.java:
##
@@ -395,25 +397,27 @@ public void 
testMetadataArchivalCleanConfig(HoodieTableType tableType) throws Ex
 .build();
 initWriteConfigAndMetatableWriter(writeConfig, true);
 
-AtomicInteger commitTime = new AtomicInteger(1);
 // Trigger 4 regular writes in data table.
+List instants = new ArrayList<>();
 for (int i = 1; i <= 4; i++) {
-  doWriteOperation(testTable, "00" + (commitTime.getAndIncrement()), 
INSERT);
+  String instant = metaClient.createNewInstantTime();

Review Comment:
   same here, maybe we can declar a variable numWrites = 4. Test is more 
readable and easy to understand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1587020774


##
.github/workflows/bot.yml:
##
@@ -454,17 +486,21 @@ jobs:
 env:
   FLINK_PROFILE: ${{ matrix.flinkProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
-  SCALA_PROFILE: 'scala-2.12'
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
 run: |
-  mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS
+  if [ "$SCALA_PROFILE" == "scala-2.13" ]; then
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS -pl 
packaging/hudi-hadoop-mr-bundle,packaging/hudi-kafka-connect-bundle,packaging/hudi-spark-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle,packaging/hudi-metaserver-server-bundle
 -am
+  else
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS

Review Comment:
   The first line uses Spark profile while the second one uses the Flink 
profile to build Flink bundle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089452939

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 43488ee2970b0680b63a212b7c2652bd717cb0db Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23603)
 
   * 5af22d4d68fb12e153472e4a2d7fffb04acb83af Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23604)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089446845

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 9196766e914173f0aa16aa57ca79da036a296dbb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23602)
 
   * 43488ee2970b0680b63a212b7c2652bd717cb0db Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23603)
 
   * 5af22d4d68fb12e153472e4a2d7fffb04acb83af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7665) Rolling upgrade of 1.0

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7665:
-
Description: 
We need to update the table version due to the format changes in 1.0.


| * Plan to get 1.x readers to read 0.x tables
 * Rollout plan for users
 *  [Writer side] Migrating log files, timeline, metadata
 * Migrating table properties including key generators, payloads.
 * Supporting all the existing payloads or migrating them
 * Call out breaking changes
 * Call out behavior changes|

  was:We need to update the table version due to the format changes in 1.0.


> Rolling upgrade of 1.0 
> ---
>
> Key: HUDI-7665
> URL: https://issues.apache.org/jira/browse/HUDI-7665
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> We need to update the table version due to the format changes in 1.0.
> | * Plan to get 1.x readers to read 0.x tables
>  * Rollout plan for users
>  *  [Writer side] Migrating log files, timeline, metadata
>  * Migrating table properties including key generators, payloads.
>  * Supporting all the existing payloads or migrating them
>  * Call out breaking changes
>  * Call out behavior changes|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7665) Rolling upgrade of 1.0

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-7665:


Assignee: Balaji Varadarajan

> Rolling upgrade of 1.0 
> ---
>
> Key: HUDI-7665
> URL: https://issues.apache.org/jira/browse/HUDI-7665
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> We need to update the table version due to the format changes in 1.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7665) Rolling upgrade of 1.0

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7665:
-
Summary: Rolling upgrade of 1.0   (was: Upgrade Table Version)

> Rolling upgrade of 1.0 
> ---
>
> Key: HUDI-7665
> URL: https://issues.apache.org/jira/browse/HUDI-7665
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> We need to update the table version due to the format changes in 1.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1587003129


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/JavaScalaConverters.scala:
##
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.util
+
+import scala.collection.JavaConverters._
+
+/**
+ * Utils that do conversion between Java and Scala collections.
+ */
+object JavaScalaConverters {

Review Comment:
   Yeah, for now, this are the only ones, limited to usage in Java code.  
Ideally, such conversion should be limited to avoid the overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1587002563


##
packaging/bundle-validation/base/Dockerfile:
##
@@ -51,9 +52,16 @@ RUN wget 
https://archive.apache.org/dist/flink/flink-$FLINK_VERSION/flink-$FLINK
 && rm $WORKDIR/flink-$FLINK_VERSION-bin-scala_2.12.tgz
 ENV FLINK_HOME=$WORKDIR/flink-$FLINK_VERSION
 
-RUN wget 
https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz
 -P "$WORKDIR" \
-&& tar -xf 
$WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz -C $WORKDIR/ \
-&& rm $WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz
+RUN if [ "$SCALA_VERSION" = "2.13" ]; then \

Review Comment:
   That's the default value: `ARG SPARK_VERSION=3.1.3`.  When we build the 
docker image, we override the arguments as needed, e.g., see 
`packaging/bundle-validation/base/build_flink1180hive313spark350scala213.sh`:
   ```
   docker build \
--build-arg HIVE_VERSION=3.1.3 \
--build-arg FLINK_VERSION=1.18.0 \
--build-arg SPARK_VERSION=3.5.0 \
--build-arg SPARK_HADOOP_VERSION=3 \
--build-arg HADOOP_VERSION=3.3.5 \
--build-arg SCALA_VERSION=2.13 \
-t hudi-ci-bundle-validation-base:flink1180hive313spark350scala213 .
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1587001638


##
hudi-spark-datasource/hudi-spark3.5.x/src/test/java/org/apache/hudi/spark3/internal/TestReflectUtil.java:
##
@@ -42,7 +44,7 @@ public void testDataSourceWriterExtraCommitMetadata() throws 
Exception {
 InsertIntoStatement newStatment = ReflectUtil.createInsertInto(
 statement.table(),
 statement.partitionSpec(),
-scala.collection.immutable.List.empty(),
+((scala.collection.immutable.Seq) 
scala.collection.immutable.Seq$.MODULE$.empty()).toSeq(),

Review Comment:
   I plan to remove `ReflectUtil.createInsertInto` in another PR.  
`ReflectUtil.createInsertInto` is not used in the codebase (a leftover function 
used before, but abandoned now). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1587001209


##
hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/hudi/spark3/internal/ReflectUtil.java:
##
@@ -23,7 +23,7 @@
 import org.apache.spark.sql.catalyst.util.DateFormatter;
 
 import scala.Option;
-import scala.collection.Seq;
+import scala.collection.immutable.Seq;
 import scala.collection.immutable.Map;

Review Comment:
   These scala imports cannot be combined.  Otherwise the compiler throws error.



##
hudi-spark-datasource/hudi-spark3-common/src/main/java/org/apache/hudi/spark3/internal/ReflectUtil.java:
##
@@ -23,7 +23,7 @@
 import org.apache.spark.sql.catalyst.util.DateFormatter;
 
 import scala.Option;
-import scala.collection.Seq;
+import scala.collection.immutable.Seq;
 import scala.collection.immutable.Map;

Review Comment:
   These scala imports cannot be combined.  Otherwise, the compiler throws 
error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586999412


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestHdfsParquetImportProcedure.scala:
##
@@ -112,7 +112,7 @@ class TestHdfsParquetImportProcedure extends 
HoodieSparkProcedureTestBase {
   @throws[ParseException]
   @throws[IOException]
   def createInsertRecords(srcFolder: Path): util.List[GenericRecord] = {
-import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._

Review Comment:
   I don't know the original intention.  I'm only translating what's already 
there.  I'll check if moving the import to the top works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586998696


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/analysis/TestHoodiePruneFileSourcePartitions.scala:
##
@@ -107,12 +107,12 @@ class TestHoodiePruneFileSourcePartitions extends 
HoodieClientTestBase with Scal
 case "eager" =>
   // NOTE: In case of partitioned table 3 files will be created, 
while in case of non-partitioned just 1
   if (partitioned) {
-assertEquals(1275, f.stats.sizeInBytes.longValue() / 1024)
-assertEquals(1275, lr.stats.sizeInBytes.longValue() / 1024)
+assertEquals(1275, f.stats.sizeInBytes.longValue / 1024)

Review Comment:
   `()` does not compile in Scala 2.13, due to breaking changes in API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586999074


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestHoodieTableValuedFunction.scala:
##
@@ -689,6 +690,6 @@ class TestHoodieTableValuedFunction extends 
HoodieSparkSqlTestBase {
 }
   }
 }
-spark.sessionState.conf.unsetConf(SPARK_SQL_INSERT_INTO_OPERATION.key)
+spark.sessionState.conf.unsetConf(SPARK_SQL_INSERT_INTO_OPERATION.key)*/

Review Comment:
   This is a failed test I'm investigating.  I'll fix the test.  Currently 
trying to run other tests and see if there's any issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586998232


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##
@@ -959,7 +959,7 @@ class TestCOWDataSource extends HoodieSparkClientTestBase 
with ScalaAssertionSup
 assertEquals(insert1Cnt, hoodieROViewDF1.count())
 
 val commitInstantTime1 = HoodieDataSourceHelpers.latestCommit(storage, 
basePath)
-val records2 = recordsToStrings(inserts2Dup ++ inserts2New).toList
+val records2 = recordsToStrings((inserts2Dup.asScala ++ 
inserts2New.asScala).asJava).asScala.toList

Review Comment:
   Fixed this occurrence to be readable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



jonvex commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586997450


##
.github/workflows/bot.yml:
##
@@ -454,17 +486,21 @@ jobs:
 env:
   FLINK_PROFILE: ${{ matrix.flinkProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
-  SCALA_PROFILE: 'scala-2.12'
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
 run: |
-  mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS
+  if [ "$SCALA_PROFILE" == "scala-2.13" ]; then
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS -pl 
packaging/hudi-hadoop-mr-bundle,packaging/hudi-kafka-connect-bundle,packaging/hudi-spark-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle,packaging/hudi-metaserver-server-bundle
 -am
+  else
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS

Review Comment:
   Are you sure? In the else block it currently is:
   ```
   mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS
   mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$FLINK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS -pl 
packaging/hudi-flink-bundle -am -Davro.version=1.10.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586996825


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##
@@ -959,7 +959,7 @@ class TestCOWDataSource extends HoodieSparkClientTestBase 
with ScalaAssertionSup
 assertEquals(insert1Cnt, hoodieROViewDF1.count())
 
 val commitInstantTime1 = HoodieDataSourceHelpers.latestCommit(storage, 
basePath)
-val records2 = recordsToStrings(inserts2Dup ++ inserts2New).toList
+val records2 = recordsToStrings((inserts2Dup.asScala ++ 
inserts2New.asScala).asJava).asScala.toList

Review Comment:
   This is how it is equivalently in Scala 2.13.  I guess Scala 2.13 wants to 
surface all these anti-patterns up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586996341


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ValidateMetadataTableFilesProcedure.scala:
##
@@ -115,10 +115,10 @@ class ValidateMetadataTableFilesProcedure() extends 
BaseProcedure with Procedure
   rows.add(Row(partition, file, doesFsFileExists, 
doesMetadataFileExists, fsFileLength, metadataFileLength))
 }
   }
-  if (metadataPathInfoList.length != pathInfoList.length) {
-logError(" FS and metadata files count not matching for " + partition 
+ ". FS files count " + pathInfoList.length + ", metadata base files count " + 
metadataPathInfoList.length)
+  if (metadataPathInfoList.size() != pathInfoList.size()) {

Review Comment:
   Previously, it is equivalent to `metadataPathInfoList.asScala.length`, which 
returns the number of elements in the collection (`metadataPathInfoList` is a 
Java List).  That is equivalent to `metadataPathInfoList.size()`.



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##
@@ -226,7 +226,7 @@ class TestCOWDataSource extends HoodieSparkClientTestBase 
with ScalaAssertionSup
   .save(basePath)
 
 partitionPaths = FSUtils.getAllPartitionPaths(new 
HoodieSparkEngineContext(jsc), HoodieMetadataConfig.newBuilder().build(), 
basePath)
-assertEquals(partitionPaths.length, 1)
+assertEquals(partitionPaths.size(), 1)

Review Comment:
   same here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to class org.apache.spark.sql.vectorized.ColumnarBatch [hudi]

2024-05-01 Thread via GitHub



vicuna96 commented on issue #11106:
URL: https://github.com/apache/hudi/issues/11106#issuecomment-2089425319

   Hi @danny0405 , this seems to be in the spark-catalyst_2.12-3.3.2.jar 
package. but org.apache.spark.sql.catalyst.expressions.UnsafeRow does not 
extend org.apache.spark.sql.vectorized.ColumnarBatch. Is this expected in 
different versions?
   
   Hi @ad1happy2go , I can give it a try but the table should have less than 
100 columns and also this seems like a spark property rather than hudi property 
and the spark version has not changed. I will update once I get a chance to 
test it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586994861


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala:
##
@@ -176,12 +176,12 @@ class ExportInstantsProcedure extends BaseProcedure with 
ProcedureBuilder with L
 
   @throws[Exception]
   private def copyNonArchivedInstants(metaClient: HoodieTableMetaClient, 
instants: util.List[HoodieInstant], limit: Int, localFolder: String): Int = {
-import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._
 var copyCount = 0
-if (instants.nonEmpty) {
+if (!instants.isEmpty) {

Review Comment:
   Again, I'm trying to avoid the conversion and use the language-native method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586993646


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##
@@ -263,10 +262,10 @@ object DefaultSource {
 Option(schema)
   }
 
-  val useNewParquetFileFormat = 
parameters.getOrDefault(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(),
+  val useNewParquetFileFormat = 
parameters.asJava.getOrDefault(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(),
 
HoodieReaderConfig.FILE_GROUP_READER_ENABLED.defaultValue().toString).toBoolean 
&&
 !metaClient.isMetadataTable && (globPaths == null || 
globPaths.isEmpty) &&
-!parameters.getOrDefault(SCHEMA_EVOLUTION_ENABLED.key(), 
SCHEMA_EVOLUTION_ENABLED.defaultValue().toString).toBoolean &&
+!parameters.asJava.getOrDefault(SCHEMA_EVOLUTION_ENABLED.key(), 
SCHEMA_EVOLUTION_ENABLED.defaultValue().toString).toBoolean &&

Review Comment:
   Yeah, I missed this one.  Fixed now.  Previously, the conversion to Java Map 
was implicit (implicitly converting to Java Map without using `.asJava`); in 
Scala 2.13 it has to be explicit (`.asJava` is required).  There is no logic 
change here but we should avoid conversion for getting the config value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586992547


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##
@@ -308,7 +309,7 @@ class ColumnStatsIndexSupport(spark: SparkSession,
   }
   }
 
-Row(coalescedRowValuesSeq:_*)
+Row(coalescedRowValuesSeq.toSeq: _*)

Review Comment:
   `coalescedRowValuesSeq` is a `mutable.Seq`.  `toSeq` is a no-op on and 
before Scala 2.12 and converts the mutable `Seq` to the immutable `Seq` 
required by Spark 
(https://docs.scala-lang.org/overviews/core/collections-migration-213.html#option-1-migrate-back-to-scalacollectionseq).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586990669


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieInternalRowUtils.scala:
##
@@ -18,11 +18,12 @@
 
 package org.apache.spark.sql
 
-import org.apache.avro.Schema

Review Comment:
   Some imports are reordered to conform to the import ordering.  
`scala.jdk.CollectionConverters.collectionAsScalaIterableConverter` is removed, 
and `scala.collection.JavaConverters._` is used instead.  They are basically 
the same, just that `scala.collection.JavaConverters` is available across all 
Scala versions.
   ```
   object CollectionConverters extends DecorateAsJava with DecorateAsScala
   object JavaConverters extends DecorateAsJava with DecorateAsScala
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586989393


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieDatasetBulkInsertHelper.scala:
##
@@ -241,17 +241,16 @@ object HoodieDatasetBulkInsertHelper
 }
   }
 
-  private def getPartitionPathFields(config: HoodieWriteConfig): Seq[String] = 
{
+  private def getPartitionPathFields(config: HoodieWriteConfig): 
mutable.Seq[String] = {

Review Comment:
   There are immutable and mutable `Seq`.  In Scala 2.13, `scala.Seq[+A]` is 
now an alias for `scala.collection.immutable.Seq[A]` (instead of 
`scala.collection.Seq[A]`) (see 
https://docs.scala-lang.org/overviews/core/collections-migration-213.html).  
When not necessary, we pass around `mutable.Seq`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r158697


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieConversionUtils.scala:
##
@@ -30,9 +31,7 @@ object HoodieConversionUtils {
* a mutable one)
*/
   def mapAsScalaImmutableMap[K, V](map: ju.Map[K, V]): Map[K, V] = {
-// NOTE: We have to use deprecated [[JavaConversions]] to stay compatible 
w/ Scala 2.11

Review Comment:
   It's still compatible.  `scala.collection.JavaConverters` works across Scala 
2.11, 2.12, and 2.13.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586988569


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionUtils.scala:
##
@@ -18,20 +18,20 @@
 
 package org.apache.hudi
 
-import org.apache.avro.Schema.Type
-import org.apache.avro.generic.GenericRecord
-import org.apache.avro.{JsonProperties, Schema}
 import org.apache.hudi.HoodieSparkUtils.sparkAdapter
 import org.apache.hudi.avro.AvroSchemaUtils
 import org.apache.hudi.exception.SchemaCompatibilityException
 import org.apache.hudi.internal.schema.HoodieSchemaException
+
+import org.apache.avro.Schema.Type
+import org.apache.avro.generic.GenericRecord
+import org.apache.avro.{JsonProperties, Schema}
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.encoders.RowEncoder
 import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType}
 import org.apache.spark.sql.{Dataset, Row, SparkSession}
 
-import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._

Review Comment:
   The `JavaScalaConverters` is only meant to be used by Java classes, to make 
conversion easier in Java code. For Scala classes, it's better to use 
`scala.collection.JavaConverters._` as 
[recommended](https://www.scala-lang.org/api/2.13.x/scala/collection/JavaConverters$.html)
 and explicitly use `.asScala` or `.asJava`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089411018

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 9196766e914173f0aa16aa57ca79da036a296dbb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23602)
 
   * 43488ee2970b0680b63a212b7c2652bd717cb0db Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23603)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



yihua commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586987506


##
.github/workflows/bot.yml:
##
@@ -454,17 +486,21 @@ jobs:
 env:
   FLINK_PROFILE: ${{ matrix.flinkProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
-  SCALA_PROFILE: 'scala-2.12'
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
 run: |
-  mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS
+  if [ "$SCALA_PROFILE" == "scala-2.13" ]; then
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS -pl 
packaging/hudi-hadoop-mr-bundle,packaging/hudi-kafka-connect-bundle,packaging/hudi-spark-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle,packaging/hudi-metaserver-server-bundle
 -am
+  else
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS

Review Comment:
   We need this for other Scala versions.  For Scala 2.13, we can only build 
Spark related bundles.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089405801

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 9196766e914173f0aa16aa57ca79da036a296dbb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23602)
 
   * 43488ee2970b0680b63a212b7c2652bd717cb0db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089400409

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * 9196766e914173f0aa16aa57ca79da036a296dbb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23602)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-6495) Finalize the RFC-61/Non-blocking Concurrency Control design

2024-05-01 Thread Danny Chen (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842818#comment-17842818
 ] 

Danny Chen commented on HUDI-6495:
--

The MDT compaction is already switched to NBCC style now, which can interplay 
in good manner with async table services.

Also I'm working on unblocking the MDT initailization with pending instants on 
DT.

> Finalize the RFC-61/Non-blocking Concurrency Control design
> ---
>
> Key: HUDI-6495
> URL: https://issues.apache.org/jira/browse/HUDI-6495
> Project: Apache Hudi
>  Issue Type: Task
>  Components: multi-writer
>Reporter: Vinoth Chandar
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4372) Enable matadata table by default for flink

2024-05-01 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-4372:
-
Status: Patch Available  (was: In Progress)

> Enable matadata table by default for flink
> --
>
> Key: HUDI-4372
> URL: https://issues.apache.org/jira/browse/HUDI-4372
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink, metadata
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-4372) Enable matadata table by default for flink

2024-05-01 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-4372:
-
Reviewers: Ethan Guo

> Enable matadata table by default for flink
> --
>
> Key: HUDI-4372
> URL: https://issues.apache.org/jira/browse/HUDI-4372
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink, metadata
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6296) Add Scala 2.13 build profile to support scala 2.13

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6296:

Reviewers: Jonathan Vexler

> Add Scala 2.13 build profile to support scala 2.13
> --
>
> Key: HUDI-6296
> URL: https://issues.apache.org/jira/browse/HUDI-6296
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Aditya Goenka
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-6296) Add Scala 2.13 build profile to support scala 2.13

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6296:

Status: Patch Available  (was: In Progress)

> Add Scala 2.13 build profile to support scala 2.13
> --
>
> Key: HUDI-6296
> URL: https://issues.apache.org/jira/browse/HUDI-6296
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Aditya Goenka
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7701) Metadata table initailization with pending instants

2024-05-01 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7701:
-
Sprint: Sprint 2023-04-26

> Metadata table initailization with pending instants
> ---
>
> Key: HUDI-7701
> URL: https://issues.apache.org/jira/browse/HUDI-7701
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>
> Metatadata table can still initialize when there are pending instants on the 
> dataset. This is critical for streaming ingestion becase the streaming 
> writers always left a pending instant on the dataset timeline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7672) Fix the Hive server scratch dir for tests in hudi-utilities

2024-05-01 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7672.

Resolution: Fixed

> Fix the Hive server scratch dir for tests in hudi-utilities
> ---
>
> Key: HUDI-7672
> URL: https://issues.apache.org/jira/browse/HUDI-7672
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Currently a null/hive/${user} dir would be left over when the tests finished, 
> which also introduces some permission access issues for Azure CI test reports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7701) Metadata table initailization with pending instants

2024-05-01 Thread Danny Chen (Jira)

Danny Chen created HUDI-7701:


 Summary: Metadata table initailization with pending instants
 Key: HUDI-7701
 URL: https://issues.apache.org/jira/browse/HUDI-7701
 Project: Apache Hudi
  Issue Type: Improvement
  Components: core
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.0.0


Metatadata table can still initialize when there are pending instants on the 
dataset. This is critical for streaming ingestion becase the streaming writers 
always left a pending instant on the dataset timeline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7701) Metadata table initailization with pending instants

2024-05-01 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7701:
-
Status: In Progress  (was: Open)

> Metadata table initailization with pending instants
> ---
>
> Key: HUDI-7701
> URL: https://issues.apache.org/jira/browse/HUDI-7701
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: core
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 1.0.0
>
>
> Metatadata table can still initialize when there are pending instants on the 
> dataset. This is critical for streaming ingestion becase the streaming 
> writers always left a pending instant on the dataset timeline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (HUDI-7633) Use try with resources for AutoCloseable

2024-05-01 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-7633.
---
Resolution: Fixed

> Use try with resources for AutoCloseable
> 
>
> Key: HUDI-7633
> URL: https://issues.apache.org/jira/browse/HUDI-7633
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7538) Consolidate the CDC Formats (changelog format, RFC-51)

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7538:
-
Description: 
For sake of more consistency, we need to consolidate the the changelog mode 
(currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
debezium style change log (currently supported for CoW for Spark/Flink)

 
|Format Name|CDC Source Required|Resource Cost(writer)|Resource 
Cost(reader)|Friendly to Streaming|
|CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the debezium 
style output is not what Flink needs for e.g)|
|Changelog|Yes|low|low|Yes|

This proposal is to converge onto "CDC" as the path going forward, with the 
following changes to incorporated for supporting existing users/usage of 
changelog. CDC format is more generalized in the database world. It offers 
advantages like not requiring further down-stream processing to say stitch 
together +U and -U, to update a downstream table. for e.g a field that changed 
is a key in a downstream table, so we need both +U and -U to compute the 
updates. 

 

(A) Introduce a new "changelog" output mode for CDC queries, which generates 
I,+U,-U,D format that changelog needs (this can be constructed easily by 
processing the output of CDC query as follows)
 * when before is `null`, emit I
 * when after is `null`, emit D
 * when both are non-null, emit two records +U and -U

(B) New writes in 1.0 will *ONLY* produce .cdc changelog format, and stops 
publishing to _hoodie_operation field 
 # this means, anyone querying this field, using a snapshot query, will break.
 # we will bring this back in 1.1 etc, based on user feedback as a hidden/field 
in the FlinkCatalog.

(C) To support backwards compatibilty, we fallback to reading 
`_hoodie_operation` in 0.X tables. 

For CDC reads, we use first use the CDC log if its avaible for that file slice. 
If not and base file schema has {{_hoodie_operation}} already, we fallback to 
reading {{_hoodie_operation}} from base file if mode=OP_KEY_ONLY.. Throw error 
for other modes. 



(D) Snapshot queries from spark, presto, trino etc all work with tables, that 
have `_hoodie_operation` published. 

 This is already completed for Spark. so others should be easy to do. 

 

(E) We need to complete a review of the CDC schema

ts - should be completion time or instant time?

 

 

  was:
For sake of more consistency, we need to consolidate the the changelog mode 
(currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
debezium style change log (currently supported for CoW for Spark/Flink)

 
|Format Name|CDC Source Required|Resource Cost(writer)|Resource 
Cost(reader)|Friendly to Streaming|
|CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the debezium 
style output is not what Flink needs for e.g)|
|Changelog|Yes|low|low|Yes|

This proposal is to converge onto "CDC" as the path going forward, with the 
following changes to incorporate. 

 

 

 

 

 


> Consolidate the CDC Formats (changelog format, RFC-51)
> --
>
> Key: HUDI-7538
> URL: https://issues.apache.org/jira/browse/HUDI-7538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> For sake of more consistency, we need to consolidate the the changelog mode 
> (currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
> debezium style change log (currently supported for CoW for Spark/Flink)
>  
> |Format Name|CDC Source Required|Resource Cost(writer)|Resource 
> Cost(reader)|Friendly to Streaming|
> |CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the 
> debezium style output is not what Flink needs for e.g)|
> |Changelog|Yes|low|low|Yes|
> This proposal is to converge onto "CDC" as the path going forward, with the 
> following changes to incorporated for supporting existing users/usage of 
> changelog. CDC format is more generalized in the database world. It offers 
> advantages like not requiring further down-stream processing to say stitch 
> together +U and -U, to update a downstream table. for e.g a field that 
> changed is a key in a downstream table, so we need both +U and -U to compute 
> the updates. 
>  
> (A) Introduce a new "changelog" output mode for CDC queries, which generates 
> I,+U,-U,D format that changelog needs (this can be constructed easily by 
> processing the output of CDC query as follows)
>  * when before is `null`, emit I
>  * when after is `null`, emit D
>  * when both are non-null, emit two records +U and -U
> (B) New writes in 1.0 will *ONLY* produce .cdc changelog format, and stops 
> publishing to _hoodi

[jira] [Updated] (HUDI-7538) Consolidate the CDC Formats (changelog format, RFC-51)

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7538:
-
Reviewers: Danny Chen, Ethan Guo

> Consolidate the CDC Formats (changelog format, RFC-51)
> --
>
> Key: HUDI-7538
> URL: https://issues.apache.org/jira/browse/HUDI-7538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> For sake of more consistency, we need to consolidate the the changelog mode 
> (currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
> debezium style change log (currently supported for CoW for Spark/Flink)
>  
> |Format Name|CDC Source Required|Resource Cost(writer)|Resource 
> Cost(reader)|Friendly to Streaming|
> |CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the 
> debezium style output is not what Flink needs for e.g)|
> |Changelog|Yes|low|low|Yes|
> This proposal is to converge onto "CDC" as the path going forward, with the 
> following changes to incorporated for supporting existing users/usage of 
> changelog. CDC format is more generalized in the database world. It offers 
> advantages like not requiring further down-stream processing to say stitch 
> together +U and -U, to update a downstream table. for e.g a field that 
> changed is a key in a downstream table, so we need both +U and -U to compute 
> the updates. 
>  
> (A) Introduce a new "changelog" output mode for CDC queries, which generates 
> I,+U,-U,D format that changelog needs (this can be constructed easily by 
> processing the output of CDC query as follows)
>  * when before is `null`, emit I
>  * when after is `null`, emit D
>  * when both are non-null, emit two records +U and -U
> (B) New writes in 1.0 will *ONLY* produce .cdc changelog format, and stops 
> publishing to _hoodie_operation field 
>  # this means, anyone querying this field, using a snapshot query, will break.
>  # we will bring this back in 1.1 etc, based on user feedback as a 
> hidden/field in the FlinkCatalog.
> (C) To support backwards compatibilty, we fallback to reading 
> `_hoodie_operation` in 0.X tables. 
> For CDC reads, we use first use the CDC log if its avaible for that file 
> slice. If not and base file schema has {{_hoodie_operation}} already, we 
> fallback to reading {{_hoodie_operation}} from base file if 
> mode=OP_KEY_ONLY.. Throw error for other modes. 
> (D) Snapshot queries from spark, presto, trino etc all work with tables, that 
> have `_hoodie_operation` published. 
>  This is already completed for Spark. so others should be easy to do. 
>  
> (E) We need to complete a review of the CDC schema
> ts - should be completion time or instant time?
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7538) Consolidate the CDC Formats (changelog format, RFC-51)

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7538:
-
Description: 
For sake of more consistency, we need to consolidate the the changelog mode 
(currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
debezium style change log (currently supported for CoW for Spark/Flink)

 
|Format Name|CDC Source Required|Resource Cost(writer)|Resource 
Cost(reader)|Friendly to Streaming|
|CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the debezium 
style output is not what Flink needs for e.g)|
|Changelog|Yes|low|low|Yes|

This proposal is to converge onto "CDC" as the path going forward, with the 
following changes to incorporate. 

 

 

 

 

 

  was:
For sake of more consistency, we need to consolidate the the changelog mode 
(currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
debezium style change log (currently supported for CoW for Spark/Flink)

 
|Format Name|CDC Source Required|Resource Cost(writer)|Resource 
Cost(reader)|Friendly to Streaming|
|CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the debezium 
style output is not what Flink needs for e.g)|
|Changelog|Yes|low|low|Yes|


This proposal is to converge onto "CDC" as the path going forward, with the 
following changes to incorporate. 







 

 

 


> Consolidate the CDC Formats (changelog format, RFC-51)
> --
>
> Key: HUDI-7538
> URL: https://issues.apache.org/jira/browse/HUDI-7538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> For sake of more consistency, we need to consolidate the the changelog mode 
> (currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
> debezium style change log (currently supported for CoW for Spark/Flink)
>  
> |Format Name|CDC Source Required|Resource Cost(writer)|Resource 
> Cost(reader)|Friendly to Streaming|
> |CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the 
> debezium style output is not what Flink needs for e.g)|
> |Changelog|Yes|low|low|Yes|
> This proposal is to converge onto "CDC" as the path going forward, with the 
> following changes to incorporate. 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7538) Consolidate the CDC Formats (changelog format, RFC-51)

2024-05-01 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-7538:
-
Description: 
For sake of more consistency, we need to consolidate the the changelog mode 
(currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
debezium style change log (currently supported for CoW for Spark/Flink)

 
|Format Name|CDC Source Required|Resource Cost(writer)|Resource 
Cost(reader)|Friendly to Streaming|
|CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the debezium 
style output is not what Flink needs for e.g)|
|Changelog|Yes|low|low|Yes|


This proposal is to converge onto "CDC" as the path going forward, with the 
following changes to incorporate. 







 

 

 

> Consolidate the CDC Formats (changelog format, RFC-51)
> --
>
> Key: HUDI-7538
> URL: https://issues.apache.org/jira/browse/HUDI-7538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: storage-management
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> For sake of more consistency, we need to consolidate the the changelog mode 
> (currently supported for Flink MoR) and RFC-51 based CDC feature which is a 
> debezium style change log (currently supported for CoW for Spark/Flink)
>  
> |Format Name|CDC Source Required|Resource Cost(writer)|Resource 
> Cost(reader)|Friendly to Streaming|
> |CDC|*No*|low/high|low/high (based on logging modes we choose)|No (the 
> debezium style output is not what Flink needs for e.g)|
> |Changelog|Yes|low|low|Yes|
> This proposal is to converge onto "CDC" as the path going forward, with the 
> following changes to incorporate. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



jonvex commented on code in PR #11130:
URL: https://github.com/apache/hudi/pull/11130#discussion_r1586941487


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieConversionUtils.scala:
##
@@ -30,9 +31,7 @@ object HoodieConversionUtils {
* a mutable one)
*/
   def mapAsScalaImmutableMap[K, V](map: ju.Map[K, V]): Map[K, V] = {
-// NOTE: We have to use deprecated [[JavaConversions]] to stay compatible 
w/ Scala 2.11

Review Comment:
   is this not compatible with scala2.11 anymore?



##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieDatasetBulkInsertHelper.scala:
##
@@ -241,17 +241,16 @@ object HoodieDatasetBulkInsertHelper
 }
   }
 
-  private def getPartitionPathFields(config: HoodieWriteConfig): Seq[String] = 
{
+  private def getPartitionPathFields(config: HoodieWriteConfig): 
mutable.Seq[String] = {

Review Comment:
   what is the purpose of this change?



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ValidateMetadataTableFilesProcedure.scala:
##
@@ -115,10 +115,10 @@ class ValidateMetadataTableFilesProcedure() extends 
BaseProcedure with Procedure
   rows.add(Row(partition, file, doesFsFileExists, 
doesMetadataFileExists, fsFileLength, metadataFileLength))
 }
   }
-  if (metadataPathInfoList.length != pathInfoList.length) {
-logError(" FS and metadata files count not matching for " + partition 
+ ". FS files count " + pathInfoList.length + ", metadata base files count " + 
metadataPathInfoList.length)
+  if (metadataPathInfoList.size() != pathInfoList.size()) {

Review Comment:
   size and length are the same? Usually one is the capacity and the other is 
the usage



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala:
##
@@ -176,12 +176,12 @@ class ExportInstantsProcedure extends BaseProcedure with 
ProcedureBuilder with L
 
   @throws[Exception]
   private def copyNonArchivedInstants(metaClient: HoodieTableMetaClient, 
instants: util.List[HoodieInstant], limit: Int, localFolder: String): Int = {
-import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._
 var copyCount = 0
-if (instants.nonEmpty) {
+if (!instants.isEmpty) {

Review Comment:
   They got rid of nonEmpty!!!?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/analysis/TestHoodiePruneFileSourcePartitions.scala:
##
@@ -107,12 +107,12 @@ class TestHoodiePruneFileSourcePartitions extends 
HoodieClientTestBase with Scal
 case "eager" =>
   // NOTE: In case of partitioned table 3 files will be created, 
while in case of non-partitioned just 1
   if (partitioned) {
-assertEquals(1275, f.stats.sizeInBytes.longValue() / 1024)
-assertEquals(1275, lr.stats.sizeInBytes.longValue() / 1024)
+assertEquals(1275, f.stats.sizeInBytes.longValue / 1024)

Review Comment:
   They require no empty ()?



##
hudi-spark-datasource/hudi-spark3.5.x/src/test/java/org/apache/hudi/spark3/internal/TestReflectUtil.java:
##
@@ -42,7 +44,7 @@ public void testDataSourceWriterExtraCommitMetadata() throws 
Exception {
 InsertIntoStatement newStatment = ReflectUtil.createInsertInto(
 statement.table(),
 statement.partitionSpec(),
-scala.collection.immutable.List.empty(),
+((scala.collection.immutable.Seq) 
scala.collection.immutable.Seq$.MODULE$.empty()).toSeq(),

Review Comment:
   ?. I guess disabled because it doesn't work?



##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieInternalRowUtils.scala:
##
@@ -18,11 +18,12 @@
 
 package org.apache.spark.sql
 
-import org.apache.avro.Schema

Review Comment:
   Not sure what is going on with this file?



##
.github/workflows/bot.yml:
##
@@ -454,17 +486,21 @@ jobs:
 env:
   FLINK_PROFILE: ${{ matrix.flinkProfile }}
   SPARK_PROFILE: ${{ matrix.sparkProfile }}
-  SCALA_PROFILE: 'scala-2.12'
+  SCALA_PROFILE: ${{ matrix.scalaProfile }}
 run: |
-  mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS
+  if [ "$SCALA_PROFILE" == "scala-2.13" ]; then
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS -pl 
packaging/hudi-hadoop-mr-bundle,packaging/hudi-kafka-connect-bundle,packaging/hudi-spark-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle,packaging/hudi-metaserver-server-bundle
 -am
+  else
+mvn clean package -T 2 -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" 
-DdeployArtifacts=true -DskipTests=true $MVN_ARGS

Review Comment:
   I think maybe th

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089355649

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * ef6d315d941cc770a3212fe7530294fdec30f749 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23599)
 
   * 9196766e914173f0aa16aa57ca79da036a296dbb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23602)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7700) Support query hint to inject indexes in query plans

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7700:
--
Sprint: Sprint 2023-04-26

> Support query hint to inject indexes in query plans
> ---
>
> Key: HUDI-7700
> URL: https://issues.apache.org/jira/browse/HUDI-7700
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> [Hints|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html]
>  give users a way to suggest how SQL to use specific approaches to generate 
> its execution plan. Simply creating the index, such as functional index or 
> secondary index, does not necessarily ensure its usage in the query planning. 
> While we have hierarchy of index to use in `HoodieFileIndex`, we want a way 
> for users to provide explicitly to use some specific index (for instantance 
> during index join) to use while planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7700) Support query hint to inject indexes in query plans

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7700:
--
Story Points: 6

> Support query hint to inject indexes in query plans
> ---
>
> Key: HUDI-7700
> URL: https://issues.apache.org/jira/browse/HUDI-7700
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> [Hints|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html]
>  give users a way to suggest how SQL to use specific approaches to generate 
> its execution plan. Simply creating the index, such as functional index or 
> secondary index, does not necessarily ensure its usage in the query planning. 
> While we have hierarchy of index to use in `HoodieFileIndex`, we want a way 
> for users to provide explicitly to use some specific index (for instantance 
> during index join) to use while planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #11130:
URL: https://github.com/apache/hudi/pull/11130#issuecomment-2089320198

   
   ## CI report:
   
   * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN
   * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN
   * ef6d315d941cc770a3212fe7530294fdec30f749 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23599)
 
   * 9196766e914173f0aa16aa57ca79da036a296dbb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7700) Support query hint to inject indexes in query plans

2024-05-01 Thread Sagar Sumit (Jira)

Sagar Sumit created HUDI-7700:
-

 Summary: Support query hint to inject indexes in query plans
 Key: HUDI-7700
 URL: https://issues.apache.org/jira/browse/HUDI-7700
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit
Assignee: Sagar Sumit
 Fix For: 1.0.0


[Hints|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html]
 give users a way to suggest how SQL to use specific approaches to generate its 
execution plan. Simply creating the index, such as functional index or 
secondary index, does not necessarily ensure its usage in the query planning. 
While we have hierarchy of index to use in `HoodieFileIndex`, we want a way for 
users to provide explicitly to use some specific index (for instantance during 
index join) to use while planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7661) Create index readme to show how a new index implementation can be added

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7661:
--
Summary: Create index readme to show how a new index implementation can be 
added  (was: Update docs to show how a new index implementation can be added)

> Create index readme to show how a new index implementation can be added
> ---
>
> Key: HUDI-7661
> URL: https://issues.apache.org/jira/browse/HUDI-7661
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7661) Update docs to show how a new index implementation can be added

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7661:
--
Story Points: 0.5  (was: 1)

> Update docs to show how a new index implementation can be added
> ---
>
> Key: HUDI-7661
> URL: https://issues.apache.org/jira/browse/HUDI-7661
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7661) Update docs to show how a new index implementation can be added

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7661:
--
Sprint: Sprint 2023-04-26

> Update docs to show how a new index implementation can be added
> ---
>
> Key: HUDI-7661
> URL: https://issues.apache.org/jira/browse/HUDI-7661
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]

2024-05-01 Thread via GitHub



hudi-bot commented on PR #9979:
URL: https://github.com/apache/hudi/pull/9979#issuecomment-2089312742

   
   ## CI report:
   
   * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN
   * 5ea3b0b905186b2701ee57f466cbec82043ddbea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23601)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-7661) Update docs to show how a new index implementation can be added

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-7661:
-

Assignee: Sagar Sumit

> Update docs to show how a new index implementation can be added
> ---
>
> Key: HUDI-7661
> URL: https://issues.apache.org/jira/browse/HUDI-7661
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7696) Consolidate convertFilesToPartitionStatsRecords and convertMetadataToPartitionStatsRecords

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-7696:
-

Assignee: Sagar Sumit

> Consolidate convertFilesToPartitionStatsRecords and 
> convertMetadataToPartitionStatsRecords
> --
>
> Key: HUDI-7696
> URL: https://issues.apache.org/jira/browse/HUDI-7696
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Minor
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584149612



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7661) Update docs to show how a new index implementation can be added

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7661:
--
Story Points: 1

> Update docs to show how a new index implementation can be added
> ---
>
> Key: HUDI-7661
> URL: https://issues.apache.org/jira/browse/HUDI-7661
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7691) Move MDT partition type related logic in HoodieBackedTableMetadataWriter to MetadataPartitionType

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-7691:
-

Assignee: Sagar Sumit

> Move MDT partition type related logic in HoodieBackedTableMetadataWriter to 
> MetadataPartitionType
> -
>
> Key: HUDI-7691
> URL: https://issues.apache.org/jira/browse/HUDI-7691
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584129779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7692) Move MDT partiiton type code in HoodieMetadataPaylaod to MetadataPartitionType

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7692:
--
Sprint: Sprint 2023-04-26

> Move MDT partiiton type code in HoodieMetadataPaylaod to MetadataPartitionType
> --
>
> Key: HUDI-7692
> URL: https://issues.apache.org/jira/browse/HUDI-7692
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584137942



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7696) Consolidate convertFilesToPartitionStatsRecords and convertMetadataToPartitionStatsRecords

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7696:
--
Sprint: Sprint 2023-04-26

> Consolidate convertFilesToPartitionStatsRecords and 
> convertMetadataToPartitionStatsRecords
> --
>
> Key: HUDI-7696
> URL: https://issues.apache.org/jira/browse/HUDI-7696
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Minor
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584149612



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7691) Move MDT partition type related logic in HoodieBackedTableMetadataWriter to MetadataPartitionType

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7691:
--
Sprint: Sprint 2023-04-26

> Move MDT partition type related logic in HoodieBackedTableMetadataWriter to 
> MetadataPartitionType
> -
>
> Key: HUDI-7691
> URL: https://issues.apache.org/jira/browse/HUDI-7691
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584129779



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7692) Move MDT partiiton type code in HoodieMetadataPaylaod to MetadataPartitionType

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit reassigned HUDI-7692:
-

Assignee: Sagar Sumit

> Move MDT partiiton type code in HoodieMetadataPaylaod to MetadataPartitionType
> --
>
> Key: HUDI-7692
> URL: https://issues.apache.org/jira/browse/HUDI-7692
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584137942



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7690) Initialize all indexes in parallel instead of computing type by type.

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7690:
--
Story Points: 2

> Initialize all indexes in parallel instead of computing type by type.
> -
>
> Key: HUDI-7690
> URL: https://issues.apache.org/jira/browse/HUDI-7690
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10352#discussion_r1584141789



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7662) Expose a config to enable disable functional index

2024-05-01 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7662:
--
Story Points: 1

> Expose a config to enable disable functional index
> --
>
> Key: HUDI-7662
> URL: https://issues.apache.org/jira/browse/HUDI-7662
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 >

1 - 100 of 147 matches

Mail list logo