date:20240709

Re: [PR] [HUDI-7975] Provide an API to create empty commit [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11606:
URL: https://github.com/apache/hudi/pull/11606#issuecomment-2219689162

   
   ## CI report:
   
   * 7c2dc1d616944a7e24693e7710005c52fc446601 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24804)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7975] Provide an API to create empty commit [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11606:
URL: https://github.com/apache/hudi/pull/11606#issuecomment-2219677617

   
   ## CI report:
   
   * 7c2dc1d616944a7e24693e7710005c52fc446601 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] build: add info for rust and python artifacts [hudi-rs]

2024-07-09 Thread via GitHub



codecov[bot] commented on PR #60:
URL: https://github.com/apache/hudi-rs/pull/60#issuecomment-2219668014

   ## 
[Codecov](https://app.codecov.io/gh/apache/hudi-rs/pull/60?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 87.19%. Comparing base 
[(`78a558f`)](https://app.codecov.io/gh/apache/hudi-rs/commit/78a558f00c8a6c4556db5ee98f26369fd90fabcf?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`97fa31d`)](https://app.codecov.io/gh/apache/hudi-rs/commit/97fa31d7937583b56e0a900b6a50c58c40f44f6a?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   Additional details and impacted files
   
   
   ```diff
   @@   Coverage Diff   @@
   ## main  #60   +/-   ##
   ===
 Coverage   87.19%   87.19%   
   ===
 Files  13   13   
 Lines 687  687   
   ===
 Hits  599  599   
 Misses 88   88   
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/hudi-rs/pull/60?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11539:
URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219666454

   
   ## CI report:
   
   * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN
   * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799)
 
   * c14015c3618d231bc439c0a4fb14ce2dff32de00 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24803)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] build: add info for rust and python artifacts [hudi-rs]

2024-07-09 Thread via GitHub



xushiyan opened a new pull request, #60:
URL: https://github.com/apache/hudi-rs/pull/60

   - Make `datafusion` a feature to hudi crate
   - Add `__version__` to python package
   - Add more info for package repositories


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [HUDI-3625][RFC-60] Add StorageStrtegy to HoodieStorage [hudi]

2024-07-09 Thread via GitHub



CTTY opened a new pull request, #11607:
URL: https://github.com/apache/hudi/pull/11607

   ### Change Logs
   
   This is a part of RFC-60: Object Storage Storage Strategy 
https://github.com/apache/hudi/blob/master/rfc/rfc-60/rfc-60.md. The end goal 
is to leverage the HoodieStorage layer to further separate Hudi logic from File 
IO, allowing more flexibility in the physical location of files.
   
   This PR will add StorageStrategy to HoodieStorage, but StorageStrategy will 
NOT be used anywhere just yet.
   
   ### Impact
   
   No impact. 
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7975) Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset

2024-07-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7975:
-
Labels: pull-request-available  (was: )

> Transfer extrametada to new commits when new data is not ingeested to trigger 
> table services on the dataset
> ---
>
> Key: HUDI-7975
> URL: https://issues.apache.org/jira/browse/HUDI-7975
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-7975] Provide an API to create empty commit [hudi]

2024-07-09 Thread via GitHub



suryaprasanna opened a new pull request, #11606:
URL: https://github.com/apache/hudi/pull/11606

   Summary: By creating empty commit, checkpoints from the commit files can be 
transferred to new instants. So, this change is used to create emptyCommit by 
copying the extrametadata from the last completed non-table service commit in 
the timeline. Generally, empty commits are created max one instant per day per 
dataset, the cadence to transfer metadata can be configured that way bloating 
of the commit timeline can be avoided. whenever there is no new data to be 
written.
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7975) Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset

2024-07-09 Thread Surya Prasanna Yalla (Jira)

Surya Prasanna Yalla created HUDI-7975:
--

 Summary: Transfer extrametada to new commits when new data is not 
ingeested to trigger table services on the dataset
 Key: HUDI-7975
 URL: https://issues.apache.org/jira/browse/HUDI-7975
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Surya Prasanna Yalla






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7975) Transfer extrametada to new commits when new data is not ingeested to trigger table services on the dataset

2024-07-09 Thread Surya Prasanna Yalla (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surya Prasanna Yalla reassigned HUDI-7975:
--

Assignee: Surya Prasanna Yalla

> Transfer extrametada to new commits when new data is not ingeested to trigger 
> table services on the dataset
> ---
>
> Key: HUDI-7975
> URL: https://issues.apache.org/jira/browse/HUDI-7975
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11539:
URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219609595

   
   ## CI report:
   
   * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN
   * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799)
 
   * c14015c3618d231bc439c0a4fb14ce2dff32de00 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [DOCS] fix: update home page title [hudi]

2024-07-09 Thread via GitHub



pintusoliya commented on PR #11530:
URL: https://github.com/apache/hudi/pull/11530#issuecomment-2219609458

   > @pintusoliya would you able to run the website locally? pls screenshot 
your local run so we can see the outcome. also look like the CI is not passing
   
   Uploaded video as screenshot was not possible due to hover effect


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11605:
URL: https://github.com/apache/hudi/pull/11605#issuecomment-2219602875

   
   ## CI report:
   
   * 2fc956794c1effc3dbd09b665eac1266503f407f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24802)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11605:
URL: https://github.com/apache/hudi/pull/11605#issuecomment-2219546966

   
   ## CI report:
   
   * 2fc956794c1effc3dbd09b665eac1266503f407f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24802)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11605:
URL: https://github.com/apache/hudi/pull/11605#issuecomment-2219539346

   
   ## CI report:
   
   * 2fc956794c1effc3dbd09b665eac1266503f407f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing 
with bulk_insert + bucket_inde…
URL: https://github.com/apache/hudi/pull/11578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] Remote connection issue while testing locally Apache Hudi with Glue Image and LocalStack [hudi]

2024-07-09 Thread via GitHub



cannon-tp commented on issue #8691:
URL: https://github.com/apache/hudi/issues/8691#issuecomment-2219537023

   Hey, @danfran 
   I think setting hadoop properties in spark conf could be a problem. I faced 
the same, resolved it using the following code.
   
   ```
   import sys
   from awsglue.transforms import *
   from awsglue.utils import getResolvedOptions
   from pyspark.context import SparkContext, SparkConf
   from awsglue.context import GlueContext
   from awsglue.job import Job
   from awsglue.dynamicframe import DynamicFrame
 
   conf = (SparkConf().setAppName("hudi-1")
   .set("spark.hadoop.fs.s3a.endpoint", "http://localstack:4566";)
   .set("spark.hadoop.fs.s3a.connection.ssl.enabled", "false")
   .set("spark.hadoop.fs.s3a.multipart.size", "104857600")
   .set("spark.hadoop.fs.s3a.access.key", "test")
   .set("spark.hadoop.fs.s3a.secret.key", "test")
   .set("spark.hadoop.fs.s3a.impl", 
"org.apache.hadoop.fs.s3a.S3AFileSystem")
   .set("spark.hadoop.fs.s3a.path.style.access", "true")
   .set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
   .set("spark.jars.packages", 
"org.apache.hudi:hudi-spark3.3-bundle_2.12:0.15.0,org.apache.hadoop:hadoop-aws:3.3.3")
   .set("spark.sql.catalog.spark_catalog", 
"org.apache.spark.sql.hudi.catalog.HoodieCatalog")
   .set("spark.sql.extensions", 
"org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
   .set("spark.sql.legacy.timeParserPolicy", "LEGACY")
  )
   
   sc = SparkContext(conf=conf)
   glueContext = GlueContext(sc)
   spark = glueContext.spark_session
```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7859] Rename instant files to be consistent with 0.x naming format when downgrade [hudi]

2024-07-09 Thread via GitHub



codope commented on code in PR #11545:
URL: https://github.com/apache/hudi/pull/11545#discussion_r1671596452


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java:
##
@@ -20,18 +20,53 @@
 
 import org.apache.hudi.common.config.ConfigProperty;
 import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.storage.StoragePath;
+import org.apache.hudi.table.HoodieTable;
 
+import java.io.IOException;
 import java.util.Collections;
+import java.util.List;
 import java.util.Map;
 
+
 /**
  * Version 7 is going to be placeholder version for bridge release 0.16.0.
  * Version 8 is the placeholder version to track 1.x.
  */
 public class EightToSevenDowngradeHandler implements DowngradeHandler {
   @Override
   public Map downgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
+final HoodieTable table = upgradeDowngradeHelper.getTable(config, context);
+UpgradeDowngradeUtils.runCompaction(table, context, config, 
upgradeDowngradeHelper);
+UpgradeDowngradeUtils.syncCompactionRequestedFileToAuxiliaryFolder(table);
+
+HoodieTableMetaClient metaClient = 
HoodieTableMetaClient.builder().setConf(context.getStorageConf().newInstance()).setBasePath(config.getBasePath()).build();
+List instants = 
metaClient.getActiveTimeline().getInstants();
+if (!instants.isEmpty()) {
+  context.map(instants, instant -> {
+if (!instant.getFileName().contains("_")) {
+  return false;
+}
+try {
+  // Rename the metadata file name from the 
${instant_time}_${completion_time}.action[.state] format in version 1.x to the 
${instant_time}.action[.state] format in version 0.x.
+  StoragePath fromPath = new StoragePath(metaClient.getMetaPath(), 
instant.getFileName());
+  StoragePath toPath = new StoragePath(metaClient.getMetaPath(), 
instant.getFileName().replaceAll("_\\d+", ""));
+  boolean success = metaClient.getStorage().rename(fromPath, toPath);
+  // TODO: We need to rename the action-related part of the metadata 
file name here when we bring separate action name for clustering/compaction in 
1.x as well.

Review Comment:
   Is there a separate ticket tracking this TODO?



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestUpgradeOrDowngradeProcedure.scala:
##
@@ -142,6 +143,56 @@ class TestUpgradeOrDowngradeProcedure extends 
HoodieSparkProcedureTestBase {
 }
   }
 
+  test("Test downgrade table from version eight to version seven") {
+withTempDir { tmp =>
+  val tableName = generateTableName
+  val tablePath = s"${tmp.getCanonicalPath}/$tableName"
+  // create table
+  spark.sql(
+s"""
+   |create table $tableName (
+   |  id int,
+   |  name string,
+   |  price double,
+   |  ts long
+   |) using hudi
+   | location '$tablePath'
+   | options (
+   |  type = 'mor',
+   |  primaryKey = 'id',
+   |  preCombineField = 'ts'
+   | )
+   """.stripMargin)
+
+  spark.sql("set hoodie.compact.inline=true")
+  spark.sql("set hoodie.compact.inline.max.delta.commits=1")
+  spark.sql("set hoodie.clean.commits.retained = 2")
+  spark.sql("set hoodie.keep.min.commits = 3")
+  spark.sql("set hoodie.keep.min.commits = 4")
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+  spark.sql(s"insert into $tableName values(1, 'a1', 10, 1000)")
+
+  var metaClient = createMetaClient(spark, tablePath)
+  // verify hoodie.table.version of the table is EIGHT
+  if 
(metaClient.getTableConfig.getTableVersion.versionCode().equals(HoodieTableVersion.EIGHT.versionCode()))
 {
+// downgrade table from version eight to version seven
+checkAnswer(s"""call downgrade_table(table => '$tableName', to_version 
=> 'SEVEN')""")(Seq(true))
+metaClient = HoodieTableMetaClient.reload(metaClient)
+assertResult(HoodieTableVersion.SEVEN.versionCode) {
+  metaClient.getTableConfig.getTableVersion.versionCode()
+}
+// Verify whether the naming format of instant files is consistent 
with 0.x
+
metaClient.reloadActiveTimeline().getInstants.iterator().asScala.forall(f => 
!f.getFileName.contains("_"))

Review Comment:
   Can we add a pat

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



danny0405 commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219523986

   > Both algorithms have drawbacks.
   
   @xicm That's fine, the new algorithm looks simpler, there is no need to 
distinguish between different parallelisms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7974) Create empty clean commit at a cadence and make it configurable

2024-07-09 Thread Surya Prasanna Yalla (Jira)

Surya Prasanna Yalla created HUDI-7974:
--

 Summary: Create empty clean commit at a cadence and make it 
configurable
 Key: HUDI-7974
 URL: https://issues.apache.org/jira/browse/HUDI-7974
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Surya Prasanna Yalla
Assignee: Surya Prasanna Yalla






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7974) Create empty clean commit at a cadence and make it configurable

2024-07-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7974:
-
Labels: pull-request-available  (was: )

> Create empty clean commit at a cadence and make it configurable
> ---
>
> Key: HUDI-7974
> URL: https://issues.apache.org/jira/browse/HUDI-7974
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Surya Prasanna Yalla
>Assignee: Surya Prasanna Yalla
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-7974] Create empty clean commit at a cadence and make it configurable [hudi]

2024-07-09 Thread via GitHub



suryaprasanna opened a new pull request, #11605:
URL: https://github.com/apache/hudi/pull/11605

   Summary: This change fixes empty clean commit logic and also makes it 
configurable.
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing 
with bulk_insert + bucket_inde…
URL: https://github.com/apache/hudi/pull/11578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219457551

   don't know why this check will contain docker moudle, other success look 
like not contain, retrigger again
   
![image](https://github.com/apache/hudi/assets/20125927/f70572ae-afd0-4e0d-b9c3-e5f4d343ca62)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219424072

   
   ## CI report:
   
   * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT]Failed to update metadata（hudi 0.15.0） [hudi]

2024-07-09 Thread via GitHub



MrAladdin commented on issue #11587:
URL: https://github.com/apache/hudi/issues/11587#issuecomment-2219397124

   > hey @MrAladdin : are you in hudi slack. we can connect and investigate 
faster. can you post a msg there and tag me (shivnarayan) and sagar (sagar 
sumit)
   
   I'm really sorry, but due to certain reasons, I am unable to help you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11604:
URL: https://github.com/apache/hudi/pull/11604#issuecomment-2219384701

   
   ## CI report:
   
   * b1b476f2cea0fb02c9665e818711c0892b686352 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24798)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11539:
URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219384101

   
   ## CI report:
   
   * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN
   * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT]Failed to update metadata（hudi 0.15.0） [hudi]

2024-07-09 Thread via GitHub



MrAladdin commented on issue #11587:
URL: https://github.com/apache/hudi/issues/11587#issuecomment-2219370420

   > also, do you think you can use global simple in the mean time while we try 
to find the root cause and get a fix out?
   
   Due to my business scenario involving a large number of upsert operations 
(public opinion data), other index types did not perform well in previous 
tests. Only dynamic bucket and the newly released record_index in version 0.14 
met the requirements. I have always wanted to find an index that doesn't 
require manual intervention or adjustment, so record_index has attracted my 
attention and interest. This test is mainly focused on the performance 
improvements of record_index in version 0.15. I can wait for this to be fixed 
before conducting tests. Actually, there is another issue mentioned in 
https://github.com/apache/hudi/issues/11567. When the amount of already stored 
data is huge, it is also a maddening issue. You can pay attention to this as 
well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7969] Fix data loss caused by concurrent write and clean [hudi]

2024-07-09 Thread via GitHub



Zouxxyy closed pull request #11600: [HUDI-7969] Fix data loss caused by 
concurrent write and clean
URL: https://github.com/apache/hudi/pull/11600


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219339643

   > @xicm @KnightChess So we reach concensus the algorithm raised by 
@KnightChess is better? If that's true, let's fire a fix in a separate PR.
   
   Both algorithms have drawbacks. 
   For example, 
   parallelism = 10, bucketNumber = 5 and partition = ["2021-01-01", 
"2021-01-03"]
   old: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
   new: [2, 2, 2, 2, 2]
   
   parallelism = 20, bucketNumber = 5 and partition = ["2021-01-01", 
"2021-01-03"]
   old: [2, 2, 2, 2, 2]
   new: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
   
   The element in the array means how many data slice each TM processes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

(hudi) branch master updated: [HUDI-7968] Claiming rfc for robust spark writes (#11592)

2024-07-09 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new dbcd089b679 [HUDI-7968] Claiming rfc for robust spark writes (#11592)
dbcd089b679 is described below

commit dbcd089b679b9df5de763b115db1b0162a05ea6f
Author: Sivabalan Narayanan 
AuthorDate: Tue Jul 9 19:04:16 2024 -0700

[HUDI-7968] Claiming rfc for robust spark writes (#11592)
---
 rfc/README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rfc/README.md b/rfc/README.md
index 2fdd3d8db49..1c6c927ac58 100644
--- a/rfc/README.md
+++ b/rfc/README.md
@@ -113,4 +113,5 @@ The list of all RFCs can be found here.
 | 75 | [Hudi-Native HFile Reader and Writer](./rfc-75/rfc-75.md)   

 | `UNDER 
REVIEW` |
 | 76 | [Auto Record key generation](./rfc-76/rfc-76.md)

 | `IN 
PROGRESS`  |
 | 77 | [Secondary Index](./rfc-77/rfc-77.md)   

 | `UNDER 
REVIEW` |
-| 78 | [Bridge release for 1.x](./rfc-78/rfc-78.md)

 | `IN 
PROGRESS`  |
\ No newline at end of file
+| 78 | [Bridge release for 1.x](./rfc-78/rfc-78.md)

 | `IN 
PROGRESS`  |
+| 79 | [Robust handling of spark task retries and 
failures](./rfc-79/rfc-79.md)   

  | `IN PROGRESS`  |
\ No newline at end of file

Re: [PR] [HUDI-7968] Claiming rfc for robust spark writes [hudi]

2024-07-09 Thread via GitHub



yihua merged PR #11592:
URL: https://github.com/apache/hudi/pull/11592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219319141

   
   ## CI report:
   
   * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24773)
 
   * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24800)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11539:
URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219318998

   
   ## CI report:
   
   * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN
   * 13e49581c971458be6c84c60f69aa595bb7f73fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24793)
 
   * 83fe235b703ba4fa1224b41eec2e19f27600671f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24799)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219311546

   
   ## CI report:
   
   * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24773)
 
   * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11539:
URL: https://github.com/apache/hudi/pull/11539#issuecomment-2219311406

   
   ## CI report:
   
   * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN
   * 13e49581c971458be6c84c60f69aa595bb7f73fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24793)
 
   * 83fe235b703ba4fa1224b41eec2e19f27600671f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



danny0405 commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219306922

   @xicm @KnightChess So we reach concensus the algorithm raised by 
@KnightChess is better? If that's true, let's fire a fix in a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7969] Fix data loss caused by concurrent write and clean [hudi]

2024-07-09 Thread via GitHub



danny0405 commented on code in PR #11600:
URL: https://github.com/apache/hudi/pull/11600#discussion_r1671438127


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##
@@ -451,8 +451,12 @@ && 
noSubsequentReplaceCommit(earliestInstant.getTimestamp(), partitionPath)) {
* IMPORTANT: {@code fsView.getAllFileGroups} does not return pending 
file groups for metadata table,
* file listing must be used instead.
*/
-  private boolean hasPendingFiles(String partitionPath) {
+  private boolean mayHavePendingFiles(String partitionPath) {
 try {
+  // As long as there are pending commits never delete empty partitions, 
because they may write files to any partitions.
+  if 
(!hoodieTable.getMetaClient().getCommitsTimeline().filterInflightsAndRequested().empty())
 {
+return true;

Review Comment:
   For streaming ingestion, there should always be an pending instant on the 
timeline, and the following up logic may never reach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7859] Rename instant files to be consistent with 0.x naming format when downgrade [hudi]

2024-07-09 Thread via GitHub



watermelon12138 commented on PR #11545:
URL: https://github.com/apache/hudi/pull/11545#issuecomment-2219305418

   @danny0405 @codope 
   Hi, masters, all checks has passed. Could you help to review the code? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219304171

   > @xicm no, although fixing the overflow problem, the old will not be 
better, you can try the ut. I have tried before.
   
   oh, there's something wrong with my test case , the old algorithm also has 
drawbacks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219288802

   @danny0405 I have tried before, the result is the new algorithm better. I 
will fix it in a separate pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219285300

   @xicm no, although fixing the overflow problem, the old will not be better, 
you can try the ut. I have tried before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub



nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671424828


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features may

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub



nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features may

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub



nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671419636


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features may

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub



nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671414686


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features may

Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11572:
URL: https://github.com/apache/hudi/pull/11572#issuecomment-221989

   
   ## CI report:
   
   * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN
   * 301d9bc766ce39ffdfec634a790b9ba7aee51165 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24797)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11604:
URL: https://github.com/apache/hudi/pull/11604#issuecomment-2219222801

   
   ## CI report:
   
   * b1b476f2cea0fb02c9665e818711c0892b686352 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24798)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11604:
URL: https://github.com/apache/hudi/pull/11604#issuecomment-2219197965

   
   ## CI report:
   
   * b1b476f2cea0fb02c9665e818711c0892b686352 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11572:
URL: https://github.com/apache/hudi/pull/11572#issuecomment-2219197455

   
   ## CI report:
   
   * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN
   * 830fa27e599f91574af60b01837586a1d3f5764a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24794)
 
   * 301d9bc766ce39ffdfec634a790b9ba7aee51165 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7866) Pull commit metadata changes in bridge release.

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7866:
--
Epic Link: (was: HUDI-7856)

> Pull commit metadata changes in bridge release.
> ---
>
> Key: HUDI-7866
> URL: https://issues.apache.org/jira/browse/HUDI-7866
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> In 1.0.0, we changed some commit metadata to be written in avro. The scope 
> this task is to ensure that bridge release should be able to read commit 
> metadata written by 1.0.0. 
>  
> Scope could be lot more. 
> We could try to parse commit metadata at lot of adhoc places like compaction 
> planning, clean execution etc. So, we need to ensure we account for both 
> formats (json and avro) with 0.16.0 reader since we do not know if commit 
> metadata is from 0.16.0 or from 1.0. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7866) Pull commit metadata changes in bridge release.

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-7866:
-

Assignee: Balaji Varadarajan  (was: sivabalan narayanan)

> Pull commit metadata changes in bridge release.
> ---
>
> Key: HUDI-7866
> URL: https://issues.apache.org/jira/browse/HUDI-7866
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> In 1.0.0, we changed some commit metadata to be written in avro. The scope 
> this task is to ensure that bridge release should be able to read commit 
> metadata written by 1.0.0. 
>  
> Scope could be lot more. 
> We could try to parse commit metadata at lot of adhoc places like compaction 
> planning, clean execution etc. So, we need to ensure we account for both 
> formats (json and avro) with 0.16.0 reader since we do not know if commit 
> metadata is from 0.16.0 or from 1.0. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7866) Pull commit metadata changes in bridge release.

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7866:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Task)

> Pull commit metadata changes in bridge release.
> ---
>
> Key: HUDI-7866
> URL: https://issues.apache.org/jira/browse/HUDI-7866
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> In 1.0.0, we changed some commit metadata to be written in avro. The scope 
> this task is to ensure that bridge release should be able to read commit 
> metadata written by 1.0.0. 
>  
> Scope could be lot more. 
> We could try to parse commit metadata at lot of adhoc places like compaction 
> planning, clean execution etc. So, we need to ensure we account for both 
> formats (json and avro) with 0.16.0 reader since we do not know if commit 
> metadata is from 0.16.0 or from 1.0. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7973:
--
Fix Version/s: 1.0.0

> Add table property to track list of columns being indexed in col stats 
> ---
>
> Key: HUDI-7973
> URL: https://issues.apache.org/jira/browse/HUDI-7973
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 1.0.0
>
>
> we need to add a new table property to track what cols are being indexed in 
> col stats. 
> If not for table property, could be a aux folder or somewhere. but we need to 
> store this state somewhere. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-6510] Support GHCI on Java 17 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11573:
URL: https://github.com/apache/hudi/pull/11573#issuecomment-2219172213

   
   ## CI report:
   
   * 03fb589a59c05f9c6f5c4ee99c934bb0b67de617 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-7973) Add table property to track list of columns being indexed in col stats

2024-07-09 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-7973:
-

 Summary: Add table property to track list of columns being indexed 
in col stats 
 Key: HUDI-7973
 URL: https://issues.apache.org/jira/browse/HUDI-7973
 Project: Apache Hudi
  Issue Type: Improvement
  Components: metadata
Reporter: sivabalan narayanan


we need to add a new table property to track what cols are being indexed in col 
stats. 

If not for table property, could be a aux folder or somewhere. but we need to 
store this state somewhere. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7973:
--
Epic Link: (was: HUDI-7856)

> Add table property to track list of columns being indexed in col stats 
> ---
>
> Key: HUDI-7973
> URL: https://issues.apache.org/jira/browse/HUDI-7973
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Priority: Major
>
> we need to add a new table property to track what cols are being indexed in 
> col stats. 
> If not for table property, could be a aux folder or somewhere. but we need to 
> store this state somewhere. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7973:
--
Epic Link: HUDI-7856

> Add table property to track list of columns being indexed in col stats 
> ---
>
> Key: HUDI-7973
> URL: https://issues.apache.org/jira/browse/HUDI-7973
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Priority: Major
>
> we need to add a new table property to track what cols are being indexed in 
> col stats. 
> If not for table property, could be a aux folder or somewhere. but we need to 
> store this state somewhere. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7973) Add table property to track list of columns being indexed in col stats

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7973:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Improvement)

> Add table property to track list of columns being indexed in col stats 
> ---
>
> Key: HUDI-7973
> URL: https://issues.apache.org/jira/browse/HUDI-7973
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Priority: Major
>
> we need to add a new table property to track what cols are being indexed in 
> col stats. 
> If not for table property, could be a aux folder or somewhere. but we need to 
> store this state somewhere. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub



danny0405 commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219159512

   > if we fix the overflow problem, the old algorithm is better.
   
   Let's fire a fix for it, and @KnightChess let's keep the Flink hashing 
algorithm the same as it is and we can improve it in a separate PR I think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [SUPPORT] Metaserver read/write errors [hudi]

2024-07-09 Thread via GitHub



danny0405 commented on issue #9814:
URL: https://github.com/apache/hudi/issues/9814#issuecomment-2219146486

   cc @yihua for visibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7692] Extract metadata record type to MetadataPartitionType enum [hudi]

2024-07-09 Thread via GitHub



danny0405 commented on code in PR #11597:
URL: https://github.com/apache/hudi/pull/11597#discussion_r1671384805


##
hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java:
##
@@ -137,6 +148,10 @@ public String getFileIdPrefix() {
 return fileIdPrefix;
   }
 
+  public int getRecordType(String key) {
+return recordType;

Review Comment:
   The key is never used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7882][WIP] Adding RFC 78 for bridge release to assist users to migrate to 1.x from 0.x [hudi]

2024-07-09 Thread via GitHub



nsivabalan commented on code in PR #11514:
URL: https://github.com/apache/hudi/pull/11514#discussion_r1671383928


##
rfc/rfc-78/rfc-78.md:
##
@@ -0,0 +1,301 @@
+
+# RFC-76: [Bridge release for 1.x]
+
+## Proposers
+
+- @nsivabalan
+- @vbalaji
+
+## Approvers
+ - @yihua
+ - @codope
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7882
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+[Hudi 
1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md)
 is a powerful 
+re-imagination of the transactional database layer in Hudi to power continued 
innovation across the community in the coming 
+years. It introduces lot of differentiating features for Apache Hudi. Feel 
free to checkout the 
+[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more 
info. We had beta1 and beta2 releases which was meant for 
+interested developers/users to give a spin on some of the  advanced features. 
But as we are working towards 1.0 GA, we are proposing 
+a bridge release (0.16.0) for smoother migration for existing hudi users. 
+
+## Objectives 
+Goal is to have a smooth migration experience for the users from 0.x to 1.0. 
We plan to have a 0.16.0 bridge release asking everyone to first migrate to 
0.16.0 before they can upgrade to 1.x. 
+
+A typical organization might have a medallion architecture deployed to run 
1000s of Hudi pipelines i.e. bronze, silver and gold layer. 
+For this layout of pipelines, here is how a typical migration might look 
like(w/o a bridge release)
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold) 
+b. Migrate gold pipelines to 1.x. 
+- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not 
be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 
1.x before migrating entire gold layer, we might end up in a situation, 
+where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This 
might lead to failures. So, we have to follow certain order in which we migrate 
pipelines. 
+c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. 
+d. Once all of gold and silver pipelines are migrated to 1.x, finally we can 
move all of bronze to 1.x.
+
+In the end, we would have migrated all of existing hudi pipelines from 0.15.0 
to 1.x. 
+But as you could see, we need some coordination with which we need to migrate. 
And in a very large organization, sometimes we may not have good control over 
downstream consumers. 
+Hence, coordinating entire migration workflow and orchestrating the same might 
be challenging.
+
+Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a 
bridge release.  
+
+Here are the objectives with this bridge release:
+
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed. 
+But for new features that was introduced in 1.x, we may not be able to support 
all of them. Will be calling out which new features may not work with 0.16.x 
reader. 
+- In this case, we explicitly request users to not turn on these features 
untill all readers are completely migrated to 1.x so as to not break any 
readers as applicable. 
+
+Connecting back to our example above, lets see how the migration might look 
like for an existing user. 
+
+a. Existing pipelines are in 0.15.x. (bronze, silver, gold)
+b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints 
around which pipeline should be migrated first). 
+c. Ensure all pipelines are in 0.16.0 (both readers and writers)
+d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we 
could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x 
+can read 1.x tables, we should be ok here. Just that do not enable new 
features like Non blocking concurrency control yet. 
+e. Migrate all of 0.16.0 to 1.x version. 
+f. Once all readers and writers are in 1.x, we are good to enable any new 
features (like NBCC) with 1.x tables.
+
+As you could see, company/org wide coordination to migrate gold before 
migrating silver or bronze is relaxed with the bridge release. Only requirement 
to keep a tab on, 
+is to ensure to migrate all pipelines completely to 0.16.x before starting to 
migrate to 1.x.
+
+So, here are the objectives of this RFC with the bridge release. 
+- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in 
functionality and no data inconsistencies.
+- 0.16.x should have read capability for 1.x tables w/ some limitations. For 
features ported over from 0.x, no loss in functionality should be guaranteed.
+  But for new features that was introduced in 1.x, we may not be able to 
support all of them. Will be calling out which new features may

(hudi) branch master updated: [HUDI-7929] Fix file name in k8s example (#11603)

2024-07-09 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b0ee6152d99 [HUDI-7929] Fix file name in k8s example (#11603)
b0ee6152d99 is described below

commit b0ee6152d998e7ad75295b384ed0932b9f7e3c30
Author: Peter Huang 
AuthorDate: Tue Jul 9 17:16:53 2024 -0700

[HUDI-7929] Fix file name in k8s example (#11603)
---
 .../config/k8s/{flink-deployment.yml => flink-deployment.yaml}| 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yml 
b/hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yaml
similarity index 100%
rename from hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yml
rename to hudi-examples/hudi-examples-k8s/config/k8s/flink-deployment.yaml

Re: [PR] [HUDI-7929] fix file name in k8s example [hudi]

2024-07-09 Thread via GitHub



danny0405 merged PR #11603:
URL: https://github.com/apache/hudi/pull/11603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] refactor: enhance error handling with custom ConfigError type [hudi-rs]

2024-07-09 Thread via GitHub



xushiyan commented on code in PR #59:
URL: https://github.com/apache/hudi-rs/pull/59#discussion_r1671383401


##
crates/core/src/config/mod.rs:
##
@@ -18,14 +18,33 @@
  */
 use std::any::type_name;
 use std::collections::HashMap;
+use std::error::Error;
+use std::fmt;
 use std::sync::Arc;
 
-use anyhow::Result;
-
 pub mod internal;
 pub mod read;
 pub mod table;
 
+#[derive(Debug)]
+pub enum ConfigError {
+NotFound,
+ParseError(String),
+Other(String),

Review Comment:
   that's a nice improvement. I would suggest capture underlying error as 
source, like ParseError should capture std::ParseIntError, etc, and NotFound 
should capture which key (ConfigParser) it refers to.
   
   On a bigger scope, we should definitely standardize error types throughout 
hudi-core and other hudi crates. I chose `anyhow` for fast iteration and 
uncover error handling paths first. So all errors come out from hudi are now 
anyhow::Error. I suggest replace anyhow dependency with well-defined custom 
error enums implemented with 
[thiserror](https://docs.rs/thiserror/latest/thiserror/) in the next release.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7888:
--
Fix Version/s: 1.0.0

> Throw meaningful error when reading partial update or DV written in 1.x from 
> 0.16.0 reader
> --
>
> Key: HUDI-7888
> URL: https://issues.apache.org/jira/browse/HUDI-7888
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
> Fix For: 1.0.0
>
>
> If 0.16.x reader is used to read 1.x table having partial updates/merges 
> enabled, we need to throw meaningful error to end user. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7888:
--
Description: 
If 0.16.x reader is used to read 1.x table having partial updates/merges 
enabled, we need to throw meaningful error to end user. 

 

 

  was:
We wanted to support reading 1.x tables in 0.16.0 reader.   

 

If 1.x table does not have any new features enabled which are backwards 
incompatible we are good. If not, if someone has enabled partial update feature 
or deletion vector support, we should parse and throw a meaningful error from 
0.16.0 reader. Lets also comb for any other additional features in 1.x and 
throw meaningful error. 

 


> Throw meaningful error when reading partial update or DV written in 1.x from 
> 0.16.0 reader
> --
>
> Key: HUDI-7888
> URL: https://issues.apache.org/jira/browse/HUDI-7888
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
>
> If 0.16.x reader is used to read 1.x table having partial updates/merges 
> enabled, we need to throw meaningful error to end user. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7972:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Improvement)

> Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
> --
>
> Key: HUDI-7972
> URL: https://issues.apache.org/jira/browse/HUDI-7972
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: 1.0-migration
> Fix For: 1.0.0
>
>
> If 0.16.x reader is used to read a 1.x table with deletion vector, we should 
> fallback to using key based merges instead of position based merges. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables

2024-07-09 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-7972:
-

 Summary: Add fallback for deletion vector in 0.16.x reader while 
reading 1.x tables
 Key: HUDI-7972
 URL: https://issues.apache.org/jira/browse/HUDI-7972
 Project: Apache Hudi
  Issue Type: Improvement
  Components: reader-core
Reporter: sivabalan narayanan


If 0.16.x reader is used to read a 1.x table with deletion vector, we should 
fallback to using key based merges instead of position based merges. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0

2024-07-09 Thread Balaji Varadarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-7886:


Assignee: Lokesh Jain  (was: Balaji Varadarajan)

> Make metadata payload from 1.x readable in 0.16.0
> -
>
> Key: HUDI-7886
> URL: https://issues.apache.org/jira/browse/HUDI-7886
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Lokesh Jain
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> So, lets port over all metadata payload schema changes to 0.16.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7865) Pull table properties changes in bridge release

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7865:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Task)

> Pull table properties changes in bridge release
> ---
>
> Key: HUDI-7865
> URL: https://issues.apache.org/jira/browse/HUDI-7865
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> In 1.0.0, we changed some table properties to have nums as value instead of 
> classnames and then added infer functions. The scope of this task is to 
> ensure that bridge release should be able to read hoodie.properties written 
> by 1.0.0.
> a. Payload enum change reference - 
> [https://github.com/apache/hudi/pull/9590/files]
> b. hoodie.record.merge.mode : ref links : #9894, #11439. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7888:
--
Epic Link: (was: HUDI-7856)

> Throw meaningful error when reading partial update or DV written in 1.x from 
> 0.16.0 reader
> --
>
> Key: HUDI-7888
> URL: https://issues.apache.org/jira/browse/HUDI-7888
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> If 1.x table does not have any new features enabled which are backwards 
> incompatible we are good. If not, if someone has enabled partial update 
> feature or deletion vector support, we should parse and throw a meaningful 
> error from 0.16.0 reader. Lets also comb for any other additional features in 
> 1.x and throw meaningful error. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7865) Pull table properties changes in bridge release

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7865:
--
Epic Link: (was: HUDI-7856)

> Pull table properties changes in bridge release
> ---
>
> Key: HUDI-7865
> URL: https://issues.apache.org/jira/browse/HUDI-7865
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> In 1.0.0, we changed some table properties to have nums as value instead of 
> classnames and then added infer functions. The scope of this task is to 
> ensure that bridge release should be able to read hoodie.properties written 
> by 1.0.0.
> a. Payload enum change reference - 
> [https://github.com/apache/hudi/pull/9590/files]
> b. hoodie.record.merge.mode : ref links : #9894, #11439. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7888) Throw meaningful error when reading partial update or DV written in 1.x from 0.16.0 reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7888:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Improvement)

> Throw meaningful error when reading partial update or DV written in 1.x from 
> 0.16.0 reader
> --
>
> Key: HUDI-7888
> URL: https://issues.apache.org/jira/browse/HUDI-7888
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> If 1.x table does not have any new features enabled which are backwards 
> incompatible we are good. If not, if someone has enabled partial update 
> feature or deletion vector support, we should parse and throw a meaningful 
> error from 0.16.0 reader. Lets also comb for any other additional features in 
> 1.x and throw meaningful error. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7972:
--
Fix Version/s: 1.0.0

> Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
> --
>
> Key: HUDI-7972
> URL: https://issues.apache.org/jira/browse/HUDI-7972
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: 1.0-migration
> Fix For: 1.0.0
>
>
> If 0.16.x reader is used to read a 1.x table with deletion vector, we should 
> fallback to using key based merges instead of position based merges. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7972) Add fallback for deletion vector in 0.16.x reader while reading 1.x tables

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7972:
--
Labels: 1.0-migration  (was: )

> Add fallback for deletion vector in 0.16.x reader while reading 1.x tables
> --
>
> Key: HUDI-7972
> URL: https://issues.apache.org/jira/browse/HUDI-7972
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: 1.0-migration
>
> If 0.16.x reader is used to read a 1.x table with deletion vector, we should 
> fallback to using key based merges instead of position based merges. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0

2024-07-09 Thread Balaji Varadarajan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-7886:


Assignee: Balaji Varadarajan  (was: Lokesh Jain)

> Make metadata payload from 1.x readable in 0.16.0
> -
>
> Key: HUDI-7886
> URL: https://issues.apache.org/jira/browse/HUDI-7886
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> So, lets port over all metadata payload schema changes to 0.16.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7887) Any log format header types changes need to be ported to 0.16.0 from 1.x

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7887:
--
Epic Link: (was: HUDI-7856)

> Any log format header types changes need to be ported to 0.16.0 from 1.x
> 
>
> Key: HUDI-7887
> URL: https://issues.apache.org/jira/browse/HUDI-7887
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> Port any new log header metadata types introduced in 1.x to 0.16.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7887) Any log format header types changes need to be ported to 0.16.0 from 1.x

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7887:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Improvement)

> Any log format header types changes need to be ported to 0.16.0 from 1.x
> 
>
> Key: HUDI-7887
> URL: https://issues.apache.org/jira/browse/HUDI-7887
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: Jonathan Vexler
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> Port any new log header metadata types introduced in 1.x to 0.16.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7886:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Improvement)

> Make metadata payload from 1.x readable in 0.16.0
> -
>
> Key: HUDI-7886
> URL: https://issues.apache.org/jira/browse/HUDI-7886
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Lokesh Jain
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> So, lets port over all metadata payload schema changes to 0.16.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7886) Make metadata payload from 1.x readable in 0.16.0

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7886:
--
Epic Link: (was: HUDI-7856)

> Make metadata payload from 1.x readable in 0.16.0
> -
>
> Key: HUDI-7886
> URL: https://issues.apache.org/jira/browse/HUDI-7886
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Lokesh Jain
>Priority: Major
>
> We wanted to support reading 1.x tables in 0.16.0 reader.   
>  
> So, lets port over all metadata payload schema changes to 0.16.0 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7971:
--
Epic Link: (was: HUDI-7856)

> Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader 
> -
>
> Key: HUDI-7971
> URL: https://issues.apache.org/jira/browse/HUDI-7971
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 1.0.0
>
>
> Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x 
> tables 
>  
> Readers :  1.x
>  # Spark SQL
>  # Spark Datasource
>  # Trino/Presto
>  # Hive
>  # Flink
> Writer: 0.16
> Table State:
>  * COW
>  * Pending clustering
>  * Completed Clustering
>  * Failed writes with no rollbacks
>  * Insert overwrite table/partition
>  * Savepoint for Time-travel query
>  * MOR
>  * Same as COW
>  * Pending and completed async compaction (with log-files and no base file)
>  * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload)
>  * Rollback formats - DELETE, rollback block
> Other knobs:
>  # Metadata enabled/disabled
>  # Column Stats enabled/disabled and data-skipping enabled/disabled
>  # RLI enabled with eq/IN queries
>  # Non-Partitioned dataset
>  # CDC Reads 
>  # Incremental Reads
>  # Time-travel query
>  
> What to test ?
>  # Query Results Correctness
>  # Performance : See the benefit of 
>  # Partition Pruning
>  # Metadata  table - col stats, RLI,
>  
> Corner Case Testing:
>  
>  # Schema Evolution with different file-groups having different generation of 
> schema
>  # Dynamic Partition Pruning
>  # Does Column Projection work correctly for log files reading 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7971:
--
Parent: HUDI-7882
Issue Type: Sub-task  (was: Test)

> Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader 
> -
>
> Key: HUDI-7971
> URL: https://issues.apache.org/jira/browse/HUDI-7971
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 1.0.0
>
>
> Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x 
> tables 
>  
> Readers :  1.x
>  # Spark SQL
>  # Spark Datasource
>  # Trino/Presto
>  # Hive
>  # Flink
> Writer: 0.16
> Table State:
>  * COW
>  * Pending clustering
>  * Completed Clustering
>  * Failed writes with no rollbacks
>  * Insert overwrite table/partition
>  * Savepoint for Time-travel query
>  * MOR
>  * Same as COW
>  * Pending and completed async compaction (with log-files and no base file)
>  * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload)
>  * Rollback formats - DELETE, rollback block
> Other knobs:
>  # Metadata enabled/disabled
>  # Column Stats enabled/disabled and data-skipping enabled/disabled
>  # RLI enabled with eq/IN queries
>  # Non-Partitioned dataset
>  # CDC Reads 
>  # Incremental Reads
>  # Time-travel query
>  
> What to test ?
>  # Query Results Correctness
>  # Performance : See the benefit of 
>  # Partition Pruning
>  # Metadata  table - col stats, RLI,
>  
> Corner Case Testing:
>  
>  # Schema Evolution with different file-groups having different generation of 
> schema
>  # Dynamic Partition Pruning
>  # Does Column Projection work correctly for log files reading 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader

2024-07-09 Thread sivabalan narayanan (Jira)

sivabalan narayanan created HUDI-7971:
-

 Summary: Test and Certify 0.14.x to 0.16.x tables are readable in 
1.x Hudi reader 
 Key: HUDI-7971
 URL: https://issues.apache.org/jira/browse/HUDI-7971
 Project: Apache Hudi
  Issue Type: Test
Reporter: sivabalan narayanan


Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x 
tables 

 

Readers :  1.x
 # Spark SQL
 # Spark Datasource
 # Trino/Presto
 # Hive
 # Flink

Writer: 0.16

Table State:
 * COW
 * Pending clustering
 * Completed Clustering
 * Failed writes with no rollbacks
 * Insert overwrite table/partition
 * Savepoint for Time-travel query


 * MOR
 * Same as COW
 * Pending and completed async compaction (with log-files and no base file)
 * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload)
 * Rollback formats - DELETE, rollback block

Other knobs:
 # Metadata enabled/disabled
 # Column Stats enabled/disabled and data-skipping enabled/disabled
 # RLI enabled with eq/IN queries


 # Non-Partitioned dataset
 # CDC Reads 
 # Incremental Reads
 # Time-travel query

 

What to test ?
 # Query Results Correctness
 # Performance : See the benefit of 
 # Partition Pruning
 # Metadata  table - col stats, RLI,

 

Corner Case Testing:

 
 # Schema Evolution with different file-groups having different generation of 
schema
 # Dynamic Partition Pruning
 # Does Column Projection work correctly for log files reading 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7971) Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7971:
--
Fix Version/s: 1.0.0

> Test and Certify 0.14.x to 0.16.x tables are readable in 1.x Hudi reader 
> -
>
> Key: HUDI-7971
> URL: https://issues.apache.org/jira/browse/HUDI-7971
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 1.0.0
>
>
> Lets ensure 1.x reader is fully compatible w/ reading any of 0.14.x to 0.16.x 
> tables 
>  
> Readers :  1.x
>  # Spark SQL
>  # Spark Datasource
>  # Trino/Presto
>  # Hive
>  # Flink
> Writer: 0.16
> Table State:
>  * COW
>  * Pending clustering
>  * Completed Clustering
>  * Failed writes with no rollbacks
>  * Insert overwrite table/partition
>  * Savepoint for Time-travel query
>  * MOR
>  * Same as COW
>  * Pending and completed async compaction (with log-files and no base file)
>  * Custom Payloads (for MOR snapshot queries) (e:g SQL Expression Payload)
>  * Rollback formats - DELETE, rollback block
> Other knobs:
>  # Metadata enabled/disabled
>  # Column Stats enabled/disabled and data-skipping enabled/disabled
>  # RLI enabled with eq/IN queries
>  # Non-Partitioned dataset
>  # CDC Reads 
>  # Incremental Reads
>  # Time-travel query
>  
> What to test ?
>  # Query Results Correctness
>  # Performance : See the benefit of 
>  # Partition Pruning
>  # Metadata  table - col stats, RLI,
>  
> Corner Case Testing:
>  
>  # Schema Evolution with different file-groups having different generation of 
> schema
>  # Dynamic Partition Pruning
>  # Does Column Projection work correctly for log files reading 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] [HUDI-6510] Support compilation on Java 17 [hudi]

2024-07-09 Thread via GitHub



CTTY opened a new pull request, #11604:
URL: https://github.com/apache/hudi/pull/11604

   ### Change Logs
   
   Make Hudi compilable with Java 17
   
   ### Impact
   
   No public facing-API changes
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11573:
URL: https://github.com/apache/hudi/pull/11573#issuecomment-2219027152

   
   ## CI report:
   
   * 9ff2b8fef206bdb1a4e2b3dd61b6e4417db5e41f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24781)
 
   * 03fb589a59c05f9c6f5c4ee99c934bb0b67de617 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24795)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11572:
URL: https://github.com/apache/hudi/pull/11572#issuecomment-2219027062

   
   ## CI report:
   
   * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN
   * 830fa27e599f91574af60b01837586a1d3f5764a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24794)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

(hudi-rs) branch main updated: docs: update CONTRIBUTING with minor changes (#58)

2024-07-09 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hudi-rs.git


The following commit(s) were added to refs/heads/main by this push:
 new 78a558f  docs: update CONTRIBUTING with minor changes (#58)
78a558f is described below

commit 78a558f00c8a6c4556db5ee98f26369fd90fabcf
Author: Sagar Sumit 
AuthorDate: Wed Jul 10 05:10:56 2024 +0530

docs: update CONTRIBUTING with minor changes (#58)

Corrected typos and linked to source files for clarity.

-

Co-authored-by: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
---
 CONTRIBUTING.md | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 0698faa..5a451d6 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -25,25 +25,32 @@ platform. This guide will walk you through the process of 
making your first cont
 ## File an issue
 
 Testing and reporting bugs are also valueable contributions. Please follow
-the [issue 
template](https://github.com/apache/hudi-rs/issues/new?template=bug_report.md) 
to file bug reports.
+the [issue 
template](https://github.com/apache/hudi-rs/issues/new?template=bug_report.yml) 
to file bug reports.
 
 ## Prepare for development
 
 - Install Rust, e.g. as described 
[here](https://doc.rust-lang.org/cargo/getting-started/installation.html)
-- Have a compatible Python version installed (check `python/pyproject.toml` 
for current requirement)
+- Have a compatible Python version installed (check 
[`python/pyproject.toml`](./python/pyproject.toml) for current
+  requirement)
 
 ## Commonly used dev commands
 
-For most of the time, use dev commands specified in `python/Makefile`, it 
applies to both Python and Rust modules. You
-don't need to
-CD to the root directory and run `cargo` commands.
+For most of the time, use dev commands specified in 
[`python/Makefile`](./python/Makefile), it applies to both Python
+and Rust modules. You don't need to `cd` to the root directory and run `cargo` 
commands.
 
 To setup python virtual env, run
 
 ```shell
-make setup-env
+make setup-venv
 ```
 
+> [!NOTE]
+> This will run `python` command to setup the virtual environment. You can 
either change that to `python3.X`,
+> or simply alias `python` to your local `python3.X` installation, for example:
+> ```shell
+> echo "alias 
python=/Library/Frameworks/Python.framework/Versions/3.12/bin/python3" >> 
~/.zshrc`
+> ```
+
 Once activate virtual env, build the project for development by
 
 ```shell

Re: [PR] docs: update CONTRIBUTING with minor changes [hudi-rs]

2024-07-09 Thread via GitHub



xushiyan merged PR #58:
URL: https://github.com/apache/hudi-rs/pull/58


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-6510] Support compilation on Java 17 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11573:
URL: https://github.com/apache/hudi/pull/11573#issuecomment-2219004260

   
   ## CI report:
   
   * 9ff2b8fef206bdb1a4e2b3dd61b6e4417db5e41f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24781)
 
   * 03fb589a59c05f9c6f5c4ee99c934bb0b67de617 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-2955] Support Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11572:
URL: https://github.com/apache/hudi/pull/11572#issuecomment-2219004107

   
   ## CI report:
   
   * 2639f581f20ab0b8fddf22d0fcfeb54f164ec346 UNKNOWN
   * f714012ecb37f584c7bd6d6656b93096f7f1cc10 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24748)
 
   * 830fa27e599f91574af60b01837586a1d3f5764a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7929] fix file name in k8s example [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11603:
URL: https://github.com/apache/hudi/pull/11603#issuecomment-2218981621

   
   ## CI report:
   
   * a275778fb2062747f9b4ada9344e6a8d26d8b438 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [HUDI-7915] Spark4 + Hadoop3 [hudi]

2024-07-09 Thread via GitHub



hudi-bot commented on PR #11539:
URL: https://github.com/apache/hudi/pull/11539#issuecomment-2218980767

   
   ## CI report:
   
   * dac29c7e89201f0ced6d394bf6fd4a5c0622167b UNKNOWN
   * 13e49581c971458be6c84c60f69aa595bb7f73fc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24793)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-7882) Umbrella ticket for 1.x tables and 0.16.x compatibility

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7882:
--
Description: 
We have 4 major goals w/ this umbrella ticket. 

a. 1.x reader should be capable of reading any of 0.14.x to 0.16.x tables for 
all query types. 

b. 0.16.x should be capable of reading 1.x tables for most features

c. Upgrade 0.16.x to 1.x 

d. Downgrade 1.x to 0.16.0. 

 

 

We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
umbrella ticket to track all of them.

 

RFC in progress: [https://github.com/apache/hudi/pull/11514] 

 

Changes required to be ported: 
0. Creating 0.16.0 branch

0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 

 

1. Timeline 

1.a Hoodie instant parsing should be able to read 1.x instants. 
https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 

1.b Commit metadata parsing is able to handle both json and avro formats. Scope 
might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  Siva.
1.c HoodieDefaultTimeline able to read both timelines based on table version.  
https://issues.apache.org/jira/browse/HUDI-7884 Siva.

1.d Reading LSM timeline using 0.16.0 
https://issues.apache.org/jira/browse/HUDI-7890 Siva. 

1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901

 

2. Table property changes 

2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
https://issues.apache.org/jira/browse/HUDI-7865 LJ

 

3. MDT table changes

3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ

3.b MDT payload schema changes. https://issues.apache.org/jira/browse/HUDI-7886 
LJ

 

4. Log format changes

4.a All metadata header types porting 
https://issues.apache.org/jira/browse/HUDI-7887 Jon

4.b Meaningful error for incompatible features from 1.x 
https://issues.apache.org/jira/browse/HUDI-7888 Jon

 

5. Log file slice or grouping detection compatibility 

 

5. Tests 

5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 

 

6 Doc changes 

6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
https://issues.apache.org/jira/browse/HUDI-7889 

  was:
We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
umbrella ticket to track all of them.

 

RFC in progress: [https://github.com/apache/hudi/pull/11514] 

 

Changes required to be ported: 
0. Creating 0.16.0 branch

0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 

 

1. Timeline 

1.a Hoodie instant parsing should be able to read 1.x instants. 
https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 

1.b Commit metadata parsing is able to handle both json and avro formats. Scope 
might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  Siva.
1.c HoodieDefaultTimeline able to read both timelines based on table version.  
https://issues.apache.org/jira/browse/HUDI-7884 Siva.

1.d Reading LSM timeline using 0.16.0 
https://issues.apache.org/jira/browse/HUDI-7890 Siva. 

1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901

 

2. Table property changes 

2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
https://issues.apache.org/jira/browse/HUDI-7865 LJ

 

3. MDT table changes

3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ

3.b MDT payload schema changes. https://issues.apache.org/jira/browse/HUDI-7886 
LJ

 

4. Log format changes

4.a All metadata header types porting 
https://issues.apache.org/jira/browse/HUDI-7887 Jon

4.b Meaningful error for incompatible features from 1.x 
https://issues.apache.org/jira/browse/HUDI-7888 Jon

 

5. Log file slice or grouping detection compatibility 

 

5. Tests 

5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 

 

6 Doc changes 

6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
https://issues.apache.org/jira/browse/HUDI-7889 


> Umbrella ticket for 1.x tables and 0.16.x compatibility
> ---
>
> Key: HUDI-7882
> URL: https://issues.apache.org/jira/browse/HUDI-7882
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> We have 4 major goals w/ this umbrella ticket. 
> a. 1.x reader should be capable of reading any of 0.14.x to 0.16.x tables for 
> all query types. 
> b. 0.16.x should be capable of reading 1.x tables for most features
> c. Upgrade 0.16.x to 1.x 
> d. Downgrade 1.x to 0.16.0. 
>  
>  
> We wanted to support re

[jira] [Updated] (HUDI-7882) Umbrella ticket for 1.x tables and 0.16.x compatibility

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7882:
--
Summary: Umbrella ticket for 1.x tables and 0.16.x compatibility  (was: 
Umbrella ticket 1.x tables and 0.16.x compatibility)

> Umbrella ticket for 1.x tables and 0.16.x compatibility
> ---
>
> Key: HUDI-7882
> URL: https://issues.apache.org/jira/browse/HUDI-7882
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
> umbrella ticket to track all of them.
>  
> RFC in progress: [https://github.com/apache/hudi/pull/11514] 
>  
> Changes required to be ported: 
> 0. Creating 0.16.0 branch
> 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 
>  
> 1. Timeline 
> 1.a Hoodie instant parsing should be able to read 1.x instants. 
> https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 
> 1.b Commit metadata parsing is able to handle both json and avro formats. 
> Scope might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  
> Siva.
> 1.c HoodieDefaultTimeline able to read both timelines based on table version. 
>  https://issues.apache.org/jira/browse/HUDI-7884 Siva.
> 1.d Reading LSM timeline using 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7890 Siva. 
> 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901
>  
> 2. Table property changes 
> 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
> https://issues.apache.org/jira/browse/HUDI-7865 LJ
>  
> 3. MDT table changes
> 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ
> 3.b MDT payload schema changes. 
> https://issues.apache.org/jira/browse/HUDI-7886 LJ
>  
> 4. Log format changes
> 4.a All metadata header types porting 
> https://issues.apache.org/jira/browse/HUDI-7887 Jon
> 4.b Meaningful error for incompatible features from 1.x 
> https://issues.apache.org/jira/browse/HUDI-7888 Jon
>  
> 5. Log file slice or grouping detection compatibility 
>  
> 5. Tests 
> 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 
>  
> 6 Doc changes 
> 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
> https://issues.apache.org/jira/browse/HUDI-7889 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7882) Umbrella ticket 1.x tables and 0.16.x compatibility

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7882:
--
Summary: Umbrella ticket 1.x tables and 0.16.x compatibility  (was: 
Umbrella ticket to track all changes required to support reading 1.x tables 
with 0.16.0 )

> Umbrella ticket 1.x tables and 0.16.x compatibility
> ---
>
> Key: HUDI-7882
> URL: https://issues.apache.org/jira/browse/HUDI-7882
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.16.0, 1.0.0
>
>
> We wanted to support reading 1.x tables in 0.16.0 release. So, creating this 
> umbrella ticket to track all of them.
>  
> RFC in progress: [https://github.com/apache/hudi/pull/11514] 
>  
> Changes required to be ported: 
> 0. Creating 0.16.0 branch
> 0.a https://issues.apache.org/jira/browse/HUDI-7860 Completed. 
>  
> 1. Timeline 
> 1.a Hoodie instant parsing should be able to read 1.x instants. 
> https://issues.apache.org/jira/browse/HUDI-7883 Sagar. 
> 1.b Commit metadata parsing is able to handle both json and avro formats. 
> Scope might be non-trivial.  https://issues.apache.org/jira/browse/HUDI-7866  
> Siva.
> 1.c HoodieDefaultTimeline able to read both timelines based on table version. 
>  https://issues.apache.org/jira/browse/HUDI-7884 Siva.
> 1.d Reading LSM timeline using 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7890 Siva. 
> 1.e Ensure 1.0 MDT timeline is readable by 0.16 - HUDI-7901
>  
> 2. Table property changes 
> 2.a Table property changes https://issues.apache.org/jira/browse/HUDI-7885  
> https://issues.apache.org/jira/browse/HUDI-7865 LJ
>  
> 3. MDT table changes
> 3.a record positions to RLI https://issues.apache.org/jira/browse/HUDI-7877 LJ
> 3.b MDT payload schema changes. 
> https://issues.apache.org/jira/browse/HUDI-7886 LJ
>  
> 4. Log format changes
> 4.a All metadata header types porting 
> https://issues.apache.org/jira/browse/HUDI-7887 Jon
> 4.b Meaningful error for incompatible features from 1.x 
> https://issues.apache.org/jira/browse/HUDI-7888 Jon
>  
> 5. Log file slice or grouping detection compatibility 
>  
> 5. Tests 
> 5.a Tests to validate that 1.x tables can be read w/ 0.16.0 
> https://issues.apache.org/jira/browse/HUDI-7896 Siva and Sagar. 
>  
> 6 Doc changes 
> 6.a Call out unsupported features in 0.16.0 reader when reading 1.x tables. 
> https://issues.apache.org/jira/browse/HUDI-7889 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HUDI-7865) Pull table properties changes in bridge release

2024-07-09 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-7865:
-

Assignee: Balaji Varadarajan  (was: Lokesh Jain)

> Pull table properties changes in bridge release
> ---
>
> Key: HUDI-7865
> URL: https://issues.apache.org/jira/browse/HUDI-7865
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.16.0, 1.0.0
>
>
> In 1.0.0, we changed some table properties to have nums as value instead of 
> classnames and then added infer functions. The scope of this task is to 
> ensure that bridge release should be able to read hoodie.properties written 
> by 1.0.0.
> a. Payload enum change reference - 
> [https://github.com/apache/hudi/pull/9590/files]
> b. hoodie.record.merge.mode : ref links : #9894, #11439. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

1 2 >

1 - 100 of 177 matches

Mail list logo