[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6462:
-
Labels: pull-request-available  (was: )

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> At the time of instantiation of the write/base client, user may want to do 
> additional processing such as sending metrics/logs/notification or adding 
> more properties to the write config.  The write/base client init callback 
> abstraction allows such logic to be plugged into Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua opened a new pull request, #9108: [HUDI-6462] Add Hudi client init callback interface

2023-06-30 Thread via GitHub


yihua opened a new pull request, #9108:
URL: https://github.com/apache/hudi/pull/9108

   ### Change Logs
   
   This PR adds the interface for Hudi client init callback to run custom logic 
at the time of initialization of a Hudi client:
   
   ```
   @PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING)
   public interface HoodieClientInitCallback {
 /**
  * A callback method in which the user can implement custom logic.
  * This method is called when a {@link BaseHoodieClient} is initialized.
  *
  * @param hoodieClient {@link BaseHoodieClient} instance.
  */
 @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
 void call(BaseHoodieClient hoodieClient);
   }
   ```
   
   At the time of instantiation of the write or table service client, a user 
may want to do additional processing, such as sending metrics, logsm 
notification, or adding more properties to the write config.  The 
implementation of client init callback interface allows such logic to be 
plugged into Hudi.
   
   A new config, `hoodie.client.init.callback.classes`, is added for plugging 
in the callback implementation.  The class list is comma-separated.
   
   New tests are added and the behavior is expected.
   
   ### Impact
   
   Adds new functionality of client init callback to run custom logic at the 
time of initialization of a Hudi client.
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   Will update the Hudi docs.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1615506177

   
   ## CI report:
   
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * 69f68c8ee2ed4cdae41cbf62a47a28b39ddcd57f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18245)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9107:
URL: https://github.com/apache/hudi/pull/9107#issuecomment-1615502789

   
   ## CI report:
   
   * 3f4ef9bc84c59b038504f86acd3734eb2cc11bad Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18244)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1615459950

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18243)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8774: [HUDI-6246] Fixing restore for compaction commit

2023-06-30 Thread via GitHub


nsivabalan commented on code in PR #8774:
URL: https://github.com/apache/hudi/pull/8774#discussion_r1248439284


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackStrategy.java:
##
@@ -117,21 +126,22 @@ public List 
getRollbackRequests(HoodieInstant instantToRo
   // If there is no delta commit present after the current commit 
(if compaction), no action, else we
   // need to make sure that a compaction commit rollback also 
deletes any log files written as part of the
   // succeeding deltacommit.
-  boolean higherDeltaCommits =
+  boolean hasHigherCompletedDeltaCommits =
   
!activeTimeline.getDeltaCommitTimeline().filterCompletedInstants().findInstantsAfter(commit,
 1)
   .empty();
-  if (higherDeltaCommits) {
-// Rollback of a compaction action with no higher deltacommit 
means that the compaction is scheduled
+  if (hasHigherCompletedDeltaCommits && 
!isCommitMetadataCompleted) {

Review Comment:
   due to async compaction. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key

2023-06-30 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler reassigned HUDI-6464:
-

Fix Version/s: 0.14.0
 Assignee: Jonathan Vexler

> Implement Spark SQL Merge Into for tables without primary key
> -
>
> Key: HUDI-6464
> URL: https://issues.apache.org/jira/browse/HUDI-6464
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Merge Into currently only matches on the primary key which pkless tables 
> don't have



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key

2023-06-30 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-6464:
--
Status: Patch Available  (was: In Progress)

> Implement Spark SQL Merge Into for tables without primary key
> -
>
> Key: HUDI-6464
> URL: https://issues.apache.org/jira/browse/HUDI-6464
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Merge Into currently only matches on the primary key which pkless tables 
> don't have



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key

2023-06-30 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-6464:
--
Status: In Progress  (was: Open)

> Implement Spark SQL Merge Into for tables without primary key
> -
>
> Key: HUDI-6464
> URL: https://issues.apache.org/jira/browse/HUDI-6464
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Merge Into currently only matches on the primary key which pkless tables 
> don't have



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] jonvex commented on a diff in pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables

2023-06-30 Thread via GitHub


jonvex commented on code in PR #9083:
URL: https://github.com/apache/hudi/pull/9083#discussion_r1248405665


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestMergeIntoTable2.scala:
##
@@ -884,10 +887,9 @@ class TestMergeIntoTable2 extends HoodieSparkSqlTestBase {
  """.stripMargin
   )
   checkAnswer(s"select id, name, price, ts, dt from $tableName")(
-Seq(1, "a1", 10.1, 1000, "2021-03-21"),
 Seq(1, "a2", 10.2, 1002, "2021-03-21"),
-Seq(3, "a3", 10.3, 1003, "2021-03-21"),
-Seq(1, "a2", 10.2, 1002, "2021-03-21")

Review Comment:
   Slight behavior change here. Previously we were doing an insert when matched 
because of no precombine key. Now we actually do an update 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key

2023-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6464:
-
Labels: pull-request-available  (was: )

> Implement Spark SQL Merge Into for tables without primary key
> -
>
> Key: HUDI-6464
> URL: https://issues.apache.org/jira/browse/HUDI-6464
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: spark-sql
>Reporter: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
>
> Merge Into currently only matches on the primary key which pkless tables 
> don't have



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1615381146

   
   ## CI report:
   
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * ac4f2ce82babd0794dd73ec097ae79853978b5a5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18211)
 
   * 69f68c8ee2ed4cdae41cbf62a47a28b39ddcd57f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18245)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6464) Implement Spark SQL Merge Into for tables without primary key

2023-06-30 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-6464:
-

 Summary: Implement Spark SQL Merge Into for tables without primary 
key
 Key: HUDI-6464
 URL: https://issues.apache.org/jira/browse/HUDI-6464
 Project: Apache Hudi
  Issue Type: New Feature
  Components: spark-sql
Reporter: Jonathan Vexler


Merge Into currently only matches on the primary key which pkless tables don't 
have



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9083: PKLess Merge Into

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9083:
URL: https://github.com/apache/hudi/pull/9083#issuecomment-1615372261

   
   ## CI report:
   
   * 3a0bfb88049cf2c0f8afe5c925dbd76fa6f7cd89 UNKNOWN
   * ac4f2ce82babd0794dd73ec097ae79853978b5a5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18211)
 
   * 69f68c8ee2ed4cdae41cbf62a47a28b39ddcd57f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9056: [HUDI-6456] [DOC] Add parquet blooms documentation

2023-06-30 Thread via GitHub


danny0405 commented on code in PR #9056:
URL: https://github.com/apache/hudi/pull/9056#discussion_r1248375900


##
website/docs/configurations.md:
##
@@ -20,6 +20,7 @@ hoodie.datasource.hive_sync.support_timestamp  false
 It helps to have a central configuration file for your common cross job 
configurations/tunings, so all the jobs on your cluster can utilize it. It also 
works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the 
SQL statements.
 
 By default, Hudi would load the configuration file under `/etc/hudi/conf` 
directory. You can specify a different configuration directory location by 
setting the `HUDI_CONF_DIR` environment variable.
+- [**Parquet Configs**](#PARQUET_CONFIG): These configs makes it possible to 
bring native parquet features
 - [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the 
Hudi Spark Datasource, providing ability to define keys/partitioning, pick out 
the write operation, specify how to merge records or choosing query type to 
read.

Review Comment:
   Should we put it under `Spark Datasource Configs` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9107:
URL: https://github.com/apache/hudi/pull/9107#issuecomment-1615342762

   
   ## CI report:
   
   * 3f4ef9bc84c59b038504f86acd3734eb2cc11bad Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18244)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9038: [HUDI-6423] Incremental cleaning should consider inflight compaction instant

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9038:
URL: https://github.com/apache/hudi/pull/9038#issuecomment-1615339864

   
   ## CI report:
   
   * a65a29c0cf1c8feb9f39e168ba80c99ebcae1c5d UNKNOWN
   * 34f8823f48712c57058bc37c8936a276c1457557 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18187)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18193)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18188)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18243)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9107:
URL: https://github.com/apache/hudi/pull/9107#issuecomment-1615339935

   
   ## CI report:
   
   * 3f4ef9bc84c59b038504f86acd3734eb2cc11bad UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6463) Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate

2023-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6463:
-
Labels: pull-request-available  (was: )

> Fix deluge loggings of 
> HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
> 
>
> Key: HUDI-6463
> URL: https://issues.apache.org/jira/browse/HUDI-6463
> Project: Apache Hudi
>  Issue Type: Task
>  Components: writer-core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 opened a new pull request, #9107: [HUDI-6463] Fix deluge loggings of HoodieBackedTableMetadataWriter#ge…

2023-06-30 Thread via GitHub


danny0405 opened a new pull request, #9107:
URL: https://github.com/apache/hudi/pull/9107

   …tMetadataPartitionsToUpdate
   
   ### Change Logs
   
   There are too many verbose loggins of the warnnings, fix it.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6463) Fix deluge loggings of HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate

2023-06-30 Thread Danny Chen (Jira)
Danny Chen created HUDI-6463:


 Summary: Fix deluge loggings of 
HoodieBackedTableMetadataWriter#getMetadataPartitionsToUpdate
 Key: HUDI-6463
 URL: https://issues.apache.org/jira/browse/HUDI-6463
 Project: Apache Hudi
  Issue Type: Task
  Components: writer-core
Reporter: Danny Chen
 Fix For: 0.14.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9035:
URL: https://github.com/apache/hudi/pull/9035#issuecomment-1615299463

   
   ## CI report:
   
   * f0735271d079b8dfa76b6350505e9a4e38610d8a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18242)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Description: At the time of instantiation of the write/base client, user 
may want to do additional processing such as sending metrics/logs/notification 
or adding more properties to the write config.  The write/base client init 
callback abstraction allows such logic to be plugged into Hudi.  (was: At the 
time of instantiation of the write client, user may want to do additional 
processing such as sending metrics/logs/notification or adding more properties 
to the write config.  The write/base client init callback abstraction allows 
such logic to be plugged into Hudi.)

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>
> At the time of instantiation of the write/base client, user may want to do 
> additional processing such as sending metrics/logs/notification or adding 
> more properties to the write config.  The write/base client init callback 
> abstraction allows such logic to be plugged into Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Description: At the time of instantiation of the write client, user may 
want to do additional processing such as sending metrics/logs/notification or 
adding more properties to the write config.  The write/base client init 
callback abstraction allows such logic to be plugged into Hudi.  (was: At the 
time of instantiation of the write client, user may want to do additional 
processing such as sending metrics/logs/notification or adding more properties 
to the write config.  )

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>
> At the time of instantiation of the write client, user may want to do 
> additional processing such as sending metrics/logs/notification or adding 
> more properties to the write config.  The write/base client init callback 
> abstraction allows such logic to be plugged into Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9035:
URL: https://github.com/apache/hudi/pull/9035#issuecomment-1615249731

   
   ## CI report:
   
   * d273d7fca86a899653508ae50316107ac3243d42 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18050)
 
   * f0735271d079b8dfa76b6350505e9a4e38610d8a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18242)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Description: At the time of instantiation of the write client, user may 
want to do additional processing such as sending metrics/logs/notification or 
adding more properties to the write config.  

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>
> At the time of instantiation of the write client, user may want to do 
> additional processing such as sending metrics/logs/notification or adding 
> more properties to the write config.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6462:
---

Assignee: Ethan Guo

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Fix Version/s: 0.14.0

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Priority: Blocker  (was: Major)

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Summary: Add write/base client init callback abstraction  (was: Add write 
client init callback abstraction)

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Story Points: 2

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6462) Add write/base client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6462:

Component/s: writer-core

> Add write/base client init callback abstraction
> ---
>
> Key: HUDI-6462
> URL: https://issues.apache.org/jira/browse/HUDI-6462
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6462) Add write client init callback abstraction

2023-06-30 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6462:
---

 Summary: Add write client init callback abstraction
 Key: HUDI-6462
 URL: https://issues.apache.org/jira/browse/HUDI-6462
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9035:
URL: https://github.com/apache/hudi/pull/9035#issuecomment-1615245617

   
   ## CI report:
   
   * d273d7fca86a899653508ae50316107ac3243d42 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18050)
 
   * f0735271d079b8dfa76b6350505e9a4e38610d8a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-30 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1615240130

   
   ## CI report:
   
   * 3e22656f66687bb920ec82e6764bf083985df09c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18241)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1615153382

   
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-30 Thread via GitHub


nsivabalan commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1248215039


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -1871,7 +1865,11 @@ private void 
testTableOperationsImpl(HoodieSparkEngineContext engineContext, Hoo
   validateMetadata(client);
 
   // Restore
-  client.restoreToInstant("2021010100060", 
writeConfig.isMetadataTableEnabled());
+  if (metaClient.getTableType() == COPY_ON_WRITE) {
+assertThrows(HoodieRestoreException.class, () -> 
client.restoreToInstant("2021010100060", 
writeConfig.isMetadataTableEnabled()));

Review Comment:
   @prashantwason : hey, can you help clarify, why do we expect this to fail 
just for COW table and not MOR ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9087:
URL: https://github.com/apache/hudi/pull/9087#issuecomment-1615098734

   
   ## CI report:
   
   * 1ff671477f3635ced1643f31de4d2c47acfb3244 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18238)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-30 Thread via GitHub


nsivabalan commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1248170142


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -973,52 +973,46 @@ public void update(HoodieRestoreMetadata restoreMetadata, 
String instantTime) {
*/
   @Override
   public void update(HoodieRollbackMetadata rollbackMetadata, String 
instantTime) {
-// The commit which is being rolled back on the dataset
-final String commitInstantTime = 
rollbackMetadata.getCommitsRollback().get(0);
-// Find the deltacommits since the last compaction
-Option> deltaCommitsInfo =
-
CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline());
-if (!deltaCommitsInfo.isPresent()) {
-  LOG.info(String.format("Ignoring rollback of instant %s at %s since 
there are no deltacommits on MDT", commitInstantTime, instantTime));
-  return;
-}
-
-// This could be a compaction or deltacommit instant (See 
CompactionUtils.getDeltaCommitsSinceLatestCompaction)
-HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue();
-HoodieTimeline deltacommitsSinceCompaction = 
deltaCommitsInfo.get().getKey();
-
-// The deltacommit that will be rolled back
-HoodieInstant deltaCommitInstant = new HoodieInstant(false, 
HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime);
-
-// The commit being rolled back should not be older than the latest 
compaction on the MDT. Compaction on MDT only occurs when all actions
-// are completed on the dataset. Hence, this case implies a rollback of 
completed commit which should actually be handled using restore.
-if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {
-  final String compactionInstantTime = compactionInstant.getTimestamp();
-  if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitInstantTime, 
compactionInstantTime)) {
-throw new HoodieMetadataException(String.format("Commit being rolled 
back %s is older than the latest compaction %s. "
-+ "There are %d deltacommits after this compaction: %s", 
commitInstantTime, compactionInstantTime,
-deltacommitsSinceCompaction.countInstants(), 
deltacommitsSinceCompaction.getInstants()));
+if (initialized && metadata != null) {
+  // The commit which is being rolled back on the dataset
+  final String commitInstantTime = 
rollbackMetadata.getCommitsRollback().get(0);
+  // Find the deltacommits since the last compaction
+  Option> deltaCommitsInfo =
+  
CompactionUtils.getDeltaCommitsSinceLatestCompaction(metadataMetaClient.getActiveTimeline());
+  if (!deltaCommitsInfo.isPresent() || 
deltaCommitsInfo.get().getKey().empty()) {
+LOG.info(String.format("Ignoring rollback of instant %s at %s since 
there are no deltacommits on MDT", commitInstantTime, instantTime));
+return;
   }
-}
 
-if (deltacommitsSinceCompaction.containsInstant(deltaCommitInstant)) {
-  LOG.info("Rolling back MDT deltacommit " + commitInstantTime);
-  if (!getWriteClient().rollback(commitInstantTime, instantTime)) {
-throw new HoodieMetadataException("Failed to rollback deltacommit at " 
+ commitInstantTime);
+  // This could be a compaction or deltacommit instant (See 
CompactionUtils.getDeltaCommitsSinceLatestCompaction)
+  HoodieInstant compactionInstant = deltaCommitsInfo.get().getValue();
+  HoodieTimeline deltacommitsSinceCompaction = 
deltaCommitsInfo.get().getKey();
+
+  // The deltacommit that will be rolled back
+  HoodieInstant deltaCommitInstant = new HoodieInstant(false, 
HoodieTimeline.DELTA_COMMIT_ACTION, commitInstantTime);
+
+  // The commit being rolled back should not be older than the latest 
compaction on the MDT. Compaction on MDT only occurs when all actions
+  // are completed on the dataset. Hence, this case implies a rollback of 
completed commit which should actually be handled using restore.
+  if (compactionInstant.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {
+final String compactionInstantTime = compactionInstant.getTimestamp();
+if (HoodieTimeline.LESSER_THAN_OR_EQUALS.test(commitInstantTime, 
compactionInstantTime)) {
+  throw new HoodieMetadataException(String.format("Commit being rolled 
back %s is older than the latest compaction %s. "
+  + "There are %d deltacommits after this compaction: %s", 
commitInstantTime, compactionInstantTime,
+  deltacommitsSinceCompaction.countInstants(), 
deltacommitsSinceCompaction.getInstants()));
+}
   }
-} else {
-  LOG.info(String.format("Ignoring rollback of instant %s at %s since 
there are no corresponding deltacommits on MDT",
-  commitInstantTime, instantTime));
-}
 
-// Rollback of MOR table may end up adding a new log file. So we need to 
check 

[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-30 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1615048617

   
   ## CI report:
   
   * c401984679350ad245c1b60d4f889b8a18715169 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18220)
 
   * 3e22656f66687bb920ec82e6764bf083985df09c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18241)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1615033363

   
   ## CI report:
   
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18196)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-30 Thread via GitHub


hudi-bot commented on PR #8837:
URL: https://github.com/apache/hudi/pull/8837#issuecomment-1615032413

   
   ## CI report:
   
   * c401984679350ad245c1b60d4f889b8a18715169 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18220)
 
   * 3e22656f66687bb920ec82e6764bf083985df09c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1615017356

   
   ## CI report:
   
   * 60c1b8c5885fdda28e07f3ba79290f01dc60a9c4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1615017280

   
   ## CI report:
   
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * af87c98dd4c370bb40287013adcecd314e20b546 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18237)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #9066: [HUDI-6452] Add MOR snapshot reader to integrate with query engines without using Hadoop APIs

2023-06-30 Thread via GitHub


codope commented on PR #9066:
URL: https://github.com/apache/hudi/pull/9066#issuecomment-1615001639

   CI after the latest commit - 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=18196=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #8837: [HUDI-6153] Changed the rollback mechanism for MDT to actual rollbacks rather than appending revert blocks.

2023-06-30 Thread via GitHub


codope commented on code in PR #8837:
URL: https://github.com/apache/hudi/pull/8837#discussion_r1248137779


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java:
##
@@ -1871,7 +1865,11 @@ private void 
testTableOperationsImpl(HoodieSparkEngineContext engineContext, Hoo
   validateMetadata(client);
 
   // Restore
-  client.restoreToInstant("2021010100060", 
writeConfig.isMetadataTableEnabled());
+  if (metaClient.getTableType() == COPY_ON_WRITE) {
+assertThrows(HoodieRestoreException.class, () -> 
client.restoreToInstant("2021010100060", 
writeConfig.isMetadataTableEnabled()));
+  } else {
+client.restoreToInstant("2021010100060", 
writeConfig.isMetadataTableEnabled());
+  }

Review Comment:
   There should not be a need to check this based on table type. Need to look 
into why this fails for COW.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1614965807

   
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18240)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9106:
URL: https://github.com/apache/hudi/pull/9106#issuecomment-1614956663

   
   ## CI report:
   
   * eb56e1be9ea831362a61adccec2ec2826c86d6a7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-6460) Fix Hbase Index for deletes

2023-06-30 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason reassigned HUDI-6460:


Assignee: Prashant Wason

> Fix Hbase Index for deletes
> ---
>
> Key: HUDI-6460
> URL: https://issues.apache.org/jira/browse/HUDI-6460
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: sivabalan narayanan
>Assignee: Prashant Wason
>Priority: Major
>
> With  adding delete support for RLI, 
> [https://github.com/apache/hudi/pull/9058/files] 
> Hbase index needs some fixes. 
> Test that is failing is:
> TestSparkHoodieHBaseIndex.
> testTagLocationAndPartitionPathUpdateWithExplicitRollback
>  
> Root cause:
> when update partition path is set to true, within same batch we have a 
> deleted record and a new insert record. So, to hbase we are sending both the 
> records and for some inserts take precedence, while for others deletes take 
> precedence. 
>  
> we need to fix SparkHoodieHbaseIndex.
> updateLocation
> to do one pass overWriteStatus and ensure we de-dup if we have two records 
> where one of them is deleted and another is inserted. 
> but there are chances that only deletes are present, so in such cases, we 
> need to ensure deletes are routed to hbase. 
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6118) Testing of MDT and RI code on HDFS

2023-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6118:
-
Labels: pull-request-available  (was: )

> Testing of MDT and RI code on HDFS
> --
>
> Key: HUDI-6118
> URL: https://issues.apache.org/jira/browse/HUDI-6118
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> The current defaults are not optimal for large partitions like record index. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] prashantwason opened a new pull request, #9106: [HUDI-6118] Some fixes to improve the MDT and record index code base.

2023-06-30 Thread via GitHub


prashantwason opened a new pull request, #9106:
URL: https://github.com/apache/hudi/pull/9106

   [HUDI-6118] Some fixes to improve the MDT and record index code base.
   
   ### Change Logs
   
   1. Print MDT partition name instead of the enum tostring in logs
   2. Use fsView.loadAllPartitions()
   3. When publishing size metrics for MDT, only consider partitions which have 
been initialized
   4. Fixed job status names
   5. Limited logs which were printing the entire list of partitions. This is 
very verbose for datasets with large number of partitions
   6. Added a config to reduce the max parallelism of record index 
initialization.
   7. Changed defaults for MDT write configs to reasonable values
   8. Added config for MDT logBlock size. Larger blocks are preferred to reduce 
lookup time.
   9. Fixed the size metrics for MDT. These metrics should be set instead of 
incremented.
   
   
   ### Impact
   
   Fixes issues for the recently commited RI and MDT changes
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6118) Testing of MDT and RI code on HDFS

2023-06-30 Thread Prashant Wason (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Wason updated HUDI-6118:
-
Summary: Testing of MDT and RI code on HDFS  (was: Provide reasonable 
defaults for operation parallelism in MDT write configuration)

> Testing of MDT and RI code on HDFS
> --
>
> Key: HUDI-6118
> URL: https://issues.apache.org/jira/browse/HUDI-6118
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Major
> Fix For: 0.14.0
>
>
> The current defaults are not optimal for large partitions like record index. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] BBency commented on issue #9094: Async Clustering failing with errors for MOR table

2023-06-30 Thread via GitHub


BBency commented on issue #9094:
URL: https://github.com/apache/hudi/issues/9094#issuecomment-1614926417

   Is there any other detail that you would want me to share. 
   Any updates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6376) Support for DELETE keys in record index

2023-06-30 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-6376.
-
Resolution: Fixed

> Support for DELETE keys in record index
> ---
>
> Key: HUDI-6376
> URL: https://issues.apache.org/jira/browse/HUDI-6376
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Prashant Wason
>Assignee: Prashant Wason
>Priority: Blocker
>  Labels: pull-request-available, release-0.14.0-blocker
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan merged pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-30 Thread via GitHub


nsivabalan merged PR #9058:
URL: https://github.com/apache/hudi/pull/9058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index. (#9058)

2023-06-30 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1d5f2f7c63d [HUDI-6376] Support for deletes in HUDI Indexes including 
metadata table record index. (#9058)
1d5f2f7c63d is described below

commit 1d5f2f7c63de441b9f475dd7ba4cf1540e0f9c42
Author: Prashant Wason 
AuthorDate: Fri Jun 30 09:12:36 2023 -0700

[HUDI-6376] Support for deletes in HUDI Indexes including metadata table 
record index. (#9058)

* [HUDI-6376] Support for deletes in HUDI Indexes including metadata table 
record index.

-

Co-authored-by: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
Co-authored-by: sivabalan 
---
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  4 +
 .../functional/TestHoodieBackedMetadata.java   | 94 ++
 .../hudi/client/functional/TestHoodieIndex.java| 61 ++
 .../index/hbase/TestSparkHoodieHBaseIndex.java |  3 +-
 .../org/apache/hudi/common/model/HoodieRecord.java | 13 ++-
 .../hudi/metadata/HoodieBackedTableMetadata.java   |  6 +-
 .../hudi/metadata/HoodieMetadataPayload.java   | 29 ++-
 7 files changed, 202 insertions(+), 8 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
index cfe11b1fd8d..8c4b0bc18d5 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
@@ -314,6 +314,10 @@ public class HoodieMergeHandle extends 
HoodieWriteHandle
 recordsWritten++;
   } else {
 recordsDeleted++;
+// Clear the new location as the record was deleted
+newRecord.unseal();
+newRecord.clearNewLocation();
+newRecord.seal();
   }
   writeStatus.markSuccess(newRecord, recordMetadata);
   // deflate record payload after recording success. This will help users 
access payload as a
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
index 075afd61eb1..a1657c204b8 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
@@ -46,6 +46,7 @@ import org.apache.hudi.common.model.HoodieKey;
 import org.apache.hudi.common.model.HoodieLogFile;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType;
+import org.apache.hudi.common.model.HoodieRecordGlobalLocation;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.HoodieTableType;
 import org.apache.hudi.common.model.HoodieWriteStat;
@@ -103,6 +104,7 @@ import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
 import org.apache.hudi.table.upgrade.SparkUpgradeDowngradeHelper;
 import org.apache.hudi.table.upgrade.UpgradeDowngrade;
+import org.apache.hudi.testutils.HoodieClientTestUtils;
 import org.apache.hudi.testutils.MetadataMergeWriteStatus;
 
 import org.apache.avro.Schema;
@@ -3068,6 +3070,98 @@ public class TestHoodieBackedMetadata extends 
TestHoodieMetadataBase {
 validateMetadata(client);
   }
 
+  @Test
+  public void testDeleteWithRecordIndex() throws Exception {
+init(HoodieTableType.COPY_ON_WRITE, true);
+HoodieSparkEngineContext engineContext = new HoodieSparkEngineContext(jsc);
+HoodieWriteConfig writeConfig = getWriteConfigBuilder(true, true, false)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().withEnableRecordIndex(true).withMaxNumDeltaCommitsBeforeCompaction(1).build())
+
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.RECORD_INDEX).build())
+.build();
+
+String firstCommitTime = HoodieActiveTimeline.createNewInstantTime();
+String secondCommitTime;
+List allRecords;
+List keysToDelete;
+List recordsToDelete;
+
+// Initialize the dataset and add some commits.
+try (SparkRDDWriteClient client = new SparkRDDWriteClient(engineContext, 
writeConfig)) {
+  // First commit
+  List firstBatchOfrecords = 
dataGen.generateInserts(firstCommitTime, 10);
+  client.startCommitWithTime(firstCommitTime);
+  client.insert(jsc.parallelize(firstBatchOfrecords, 1), 
firstCommitTime).collect();
+
+  // Records got inserted and RI is initialized
+  metaClient = HoodieTableMetaClient.reload(metaClient);
+  

[jira] [Created] (HUDI-6461) Fix deletion of entire record in MDT for col stats, bloom filter

2023-06-30 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6461:
-

 Summary: Fix deletion of entire record in MDT for col stats, bloom 
filter 
 Key: HUDI-6461
 URL: https://issues.apache.org/jira/browse/HUDI-6461
 Project: Apache Hudi
  Issue Type: Improvement
  Components: metadata
Reporter: sivabalan narayanan


w/ RLI, we are introducing a proper way to delete a MDT record. 
[https://github.com/apache/hudi/pull/9058] 

 

We might have to follow similar logic for other partitions as well to optimize 
it better. We should avoid relying on nested fields to deduce whether a record 
is deleted (for eg, 

ColumnStatsMetadata.isDeleted

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] nsivabalan commented on pull request #9058: [HUDI-6376] Support for deletes in HUDI Indexes including metadata table record index.

2023-06-30 Thread via GitHub


nsivabalan commented on PR #9058:
URL: https://github.com/apache/hudi/pull/9058#issuecomment-1614873987

   @danny0405 : I feel having isDeleted explicitly is more clear and 
comprehensible. So, will prefer to keep it that way. anyways, we have to fix 
all other partitions (col stats, etc) in a follow up patch. so lets tackle this 
in that patch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6459) Add Rollback and other tests for Record Level Index

2023-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6459:
-
Labels: pull-request-available  (was: )

> Add Rollback and other tests for Record Level Index
> ---
>
> Key: HUDI-6459
> URL: https://issues.apache.org/jira/browse/HUDI-6459
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>
> The Jira aims to add validation for rollback with record level index. The 
> validation is added in TestRecordLevelIndex test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] lokeshj1703 opened a new pull request, #9105: [WIP] [HUDI-6459] Add Rollback and other tests for Record Level Index

2023-06-30 Thread via GitHub


lokeshj1703 opened a new pull request, #9105:
URL: https://github.com/apache/hudi/pull/9105

   ### Change Logs
   
   The Jira aims to add validation for rollback with record level index. The 
validation is added in TestRecordLevelIndex test.
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6459) Add Rollback and other tests for Record Level Index

2023-06-30 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HUDI-6459:
--
Summary: Add Rollback and other tests for Record Level Index  (was: Add 
Rollback test for Record Level Index)

> Add Rollback and other tests for Record Level Index
> ---
>
> Key: HUDI-6459
> URL: https://issues.apache.org/jira/browse/HUDI-6459
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>
> The Jira aims to add validation for rollback with record level index. The 
> validation is added in TestRecordLevelIndex test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614815163

   
   ## CI report:
   
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9104: [HUDI-6445] Removing gc hints from test base

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9104:
URL: https://github.com/apache/hudi/pull/9104#issuecomment-1614803818

   
   ## CI report:
   
   * 022113d3bfa7d479b935b193293fae2a295be46d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18234)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9080: [HUDI-6445] Making some of Spark DS tests as functional

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9080:
URL: https://github.com/apache/hudi/pull/9080#issuecomment-1614717741

   
   ## CI report:
   
   * d28ff949a1dd43456fda75e5624848bb63e030f4 UNKNOWN
   * b9dd8237e187586c5d05b46d4d4eee891822813e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18232)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9087:
URL: https://github.com/apache/hudi/pull/9087#issuecomment-1614637415

   
   ## CI report:
   
   * 92e8459715422f3e72fb05a298e2b103330a7cce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18229)
 
   * 1ff671477f3635ced1643f31de4d2c47acfb3244 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18238)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9087:
URL: https://github.com/apache/hudi/pull/9087#issuecomment-1614587068

   
   ## CI report:
   
   * 92e8459715422f3e72fb05a298e2b103330a7cce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18229)
 
   * 1ff671477f3635ced1643f31de4d2c47acfb3244 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1614586903

   
   ## CI report:
   
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198)
 
   * af87c98dd4c370bb40287013adcecd314e20b546 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18237)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9087: [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9087:
URL: https://github.com/apache/hudi/pull/9087#issuecomment-1614578428

   
   ## CI report:
   
   * 92e8459715422f3e72fb05a298e2b103330a7cce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18229)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9064:
URL: https://github.com/apache/hudi/pull/9064#issuecomment-1614578240

   
   ## CI report:
   
   * b8418b74febf4551c0f79c7ebe71cf24916124e6 UNKNOWN
   * 9c6d2bf222b7247bc926302045123bad69157d39 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18198)
 
   * af87c98dd4c370bb40287013adcecd314e20b546 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on a diff in pull request #9056: [HUDI-6456] [DOC] Add parquet blooms documentation

2023-06-30 Thread via GitHub


parisni commented on code in PR #9056:
URL: https://github.com/apache/hudi/pull/9056#discussion_r1247776991


##
website/docs/configurations.md:
##
@@ -197,7 +197,10 @@ Options useful for reading tables via 
`read.format.option(...)`
 
 ### Write Options {#Write-Options}
 
-You can pass down any of the WriteClient level configs directly using 
`options()` or `option(k,v)` methods.
+Hudi supports [parquet modular encryption](/docs/encryption) and [parquet 
bloom filters](/docs/parquet_bloom) through hadoop configurations.
+

Review Comment:
   added parquet_config heading



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6457) Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned

2023-06-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6457.

Resolution: Fixed

Fixed via master branch: a439ea0f449fb334f0823323651ec1512f4cd5df

> Keep JavaSizeBasedClusteringPlanStrategy and 
> SparkSizeBasedClusteringPlanStrategy aligned
> -
>
> Key: HUDI-6457
> URL: https://issues.apache.org/jira/browse/HUDI-6457
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-6457] Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBasedClusteringPlanStrategy aligned (#9099)

2023-06-30 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a439ea0f449 [HUDI-6457] Keep JavaSizeBasedClusteringPlanStrategy and 
SparkSizeBasedClusteringPlanStrategy aligned (#9099)
a439ea0f449 is described below

commit a439ea0f449fb334f0823323651ec1512f4cd5df
Author: ksmou <135721692+ks...@users.noreply.github.com>
AuthorDate: Fri Jun 30 19:39:31 2023 +0800

[HUDI-6457] Keep JavaSizeBasedClusteringPlanStrategy and 
SparkSizeBasedClusteringPlanStrategy aligned (#9099)
---
 .../JavaSizeBasedClusteringPlanStrategy.java   | 53 +-
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git 
a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java
 
b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java
index fe66cedb133..d8f0c5fc804 100644
--- 
a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java
+++ 
b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/JavaSizeBasedClusteringPlanStrategy.java
@@ -60,41 +60,52 @@ public class JavaSizeBasedClusteringPlanStrategy
 
   @Override
   protected Stream 
buildClusteringGroupsForPartition(String partitionPath, List 
fileSlices) {
+HoodieWriteConfig writeConfig = getWriteConfig();
+
 List, Integer>> fileSliceGroups = new ArrayList<>();
 List currentGroup = new ArrayList<>();
+
+// Sort fileSlices before dividing, which makes dividing more compact
+List sortedFileSlices = new ArrayList<>(fileSlices);
+sortedFileSlices.sort((o1, o2) -> (int)
+((o2.getBaseFile().isPresent() ? o2.getBaseFile().get().getFileSize() 
: writeConfig.getParquetMaxFileSize())
+- (o1.getBaseFile().isPresent() ? 
o1.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize(;
+
 long totalSizeSoFar = 0;
-HoodieWriteConfig writeConfig = getWriteConfig();
-for (FileSlice currentSlice : fileSlices) {
-  // assume each filegroup size is ~= parquet.max.file.size
-  totalSizeSoFar += currentSlice.getBaseFile().isPresent() ? 
currentSlice.getBaseFile().get().getFileSize() : 
writeConfig.getParquetMaxFileSize();
+
+for (FileSlice currentSlice : sortedFileSlices) {
+  long currentSize = currentSlice.getBaseFile().isPresent() ? 
currentSlice.getBaseFile().get().getFileSize() : 
writeConfig.getParquetMaxFileSize();
   // check if max size is reached and create new group, if needed.
-  if (totalSizeSoFar >= writeConfig.getClusteringMaxBytesInGroup() && 
!currentGroup.isEmpty()) {
+  if (totalSizeSoFar + currentSize > 
writeConfig.getClusteringMaxBytesInGroup() && !currentGroup.isEmpty()) {
 int numOutputGroups = getNumberOfOutputFileGroups(totalSizeSoFar, 
writeConfig.getClusteringTargetFileMaxBytes());
 LOG.info("Adding one clustering group " + totalSizeSoFar + " max 
bytes: "
-+ writeConfig.getClusteringMaxBytesInGroup() + " num input 
slices: " + currentGroup.size() + " output groups: " + numOutputGroups);
++ writeConfig.getClusteringMaxBytesInGroup() + " num input slices: 
" + currentGroup.size() + " output groups: " + numOutputGroups);
 fileSliceGroups.add(Pair.of(currentGroup, numOutputGroups));
 currentGroup = new ArrayList<>();
 totalSizeSoFar = 0;
   }
+
+  // Add to the current file-group
   currentGroup.add(currentSlice);
-  // totalSizeSoFar could be 0 when new group was created in the previous 
conditional block.
-  // reset to the size of current slice, otherwise the number of output 
file group will become 0 even though current slice is present.
-  if (totalSizeSoFar == 0) {
-totalSizeSoFar += currentSlice.getBaseFile().isPresent() ? 
currentSlice.getBaseFile().get().getFileSize() : 
writeConfig.getParquetMaxFileSize();
-  }
+  // assume each file group size is ~= parquet.max.file.size
+  totalSizeSoFar += currentSize;
 }
+
 if (!currentGroup.isEmpty()) {
-  int numOutputGroups = getNumberOfOutputFileGroups(totalSizeSoFar, 
writeConfig.getClusteringTargetFileMaxBytes());
-  LOG.info("Adding final clustering group " + totalSizeSoFar + " max 
bytes: "
-  + writeConfig.getClusteringMaxBytesInGroup() + " num input 
slices: " + currentGroup.size() + " output groups: " + numOutputGroups);
-  fileSliceGroups.add(Pair.of(currentGroup, numOutputGroups));
+  if (currentGroup.size() > 1 || 
writeConfig.shouldClusteringSingleGroup()) {
+int numOutputGroups = getNumberOfOutputFileGroups(totalSizeSoFar, 
writeConfig.getClusteringTargetFileMaxBytes());
+

[GitHub] [hudi] danny0405 merged pull request #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…

2023-06-30 Thread via GitHub


danny0405 merged PR #9099:
URL: https://github.com/apache/hudi/pull/9099


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6458) Scheduling jobs should not fail when there is no completed commits

2023-06-30 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6458.

Resolution: Fixed

Fixed via master branch: a94db121b3aa05fd2243cb0a7794a2c20048065b

> Scheduling jobs should not fail when there is no completed commits
> --
>
> Key: HUDI-6458
> URL: https://issues.apache.org/jira/browse/HUDI-6458
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: kwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 merged pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-30 Thread via GitHub


danny0405 merged PR #9097:
URL: https://github.com/apache/hudi/pull/9097


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6458] Scheduling jobs should not fail when there is no completed commits (#9097)

2023-06-30 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new a94db121b3a [HUDI-6458] Scheduling jobs should not fail when there is 
no completed commits (#9097)
a94db121b3a is described below

commit a94db121b3aa05fd2243cb0a7794a2c20048065b
Author: ksmou <135721692+ks...@users.noreply.github.com>
AuthorDate: Fri Jun 30 19:37:33 2023 +0800

[HUDI-6458] Scheduling jobs should not fail when there is no completed 
commits (#9097)
---
 .../src/main/java/org/apache/hudi/utilities/HoodieCompactor.java  | 4 
 .../src/main/java/org/apache/hudi/utilities/UtilHelpers.java  | 3 ---
 2 files changed, 7 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
index c1958e76e6b..603502affb6 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java
@@ -30,7 +30,6 @@ import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.config.HoodieCleanConfig;
-import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
 import 
org.apache.hudi.table.action.compact.strategy.LogFileSizeBasedCompactionStrategy;
 
@@ -293,9 +292,6 @@ public class HoodieCompactor {
 
   private String getSchemaFromLatestInstant() throws Exception {
 TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient);
-if 
(metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants()
 == 0) {
-  throw new HoodieException("Cannot run compaction without any completed 
commits");
-}
 Schema schema = schemaUtil.getTableAvroSchema(false);
 return schema.toString();
   }
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
index 5c09cf71a2b..a0d241752c5 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
@@ -589,9 +589,6 @@ public class UtilHelpers {
 
   public static String getSchemaFromLatestInstant(HoodieTableMetaClient 
metaClient) throws Exception {
 TableSchemaResolver schemaResolver = new TableSchemaResolver(metaClient);
-if 
(metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().countInstants()
 == 0) {
-  throw new HoodieException("Cannot run clustering without any completed 
commits");
-}
 Schema schema = schemaResolver.getTableAvroSchema(false);
 return schema.toString();
   }



[GitHub] [hudi] danny0405 commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-30 Thread via GitHub


danny0405 commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1247765082


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   Looks good to me ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] zaza commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-30 Thread via GitHub


zaza commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1247764006


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   Is 
[this](https://github.com/apache/hudi/pull/9064/commits/af87c98dd4c370bb40287013adcecd314e20b546)
 better or would you like me go further?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #3391: [HUDI-83] Fix Timestamp/Date type read by Hive3

2023-06-30 Thread via GitHub


danny0405 commented on PR #3391:
URL: https://github.com/apache/hudi/pull/3391#issuecomment-1614521616

   Should be, but it is more related with how the timestamp type is synced I 
think: https://github.com/apache/hudi/pull/8867


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9048: [HUDI-6434] Fix illegalArgumentException when do read_optimized read in Flink

2023-06-30 Thread via GitHub


danny0405 commented on PR #9048:
URL: https://github.com/apache/hudi/pull/9048#issuecomment-1614517556

   That's true. Actually it is even more friendly for Hive query engine too, 
just a little late for 0.14.0 release because I'm scared for introducing 
protential bug, we can make the first file slice with parquets once we have 
enough test cases in production for backing up the confidence.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #9064: [HUDI-6450] Fix null strings handling in convertRowToJsonString

2023-06-30 Thread via GitHub


danny0405 commented on code in PR #9064:
URL: https://github.com/apache/hudi/pull/9064#discussion_r1247754441


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/cdc/HoodieCDCRDD.scala:
##
@@ -561,7 +561,7 @@ class HoodieCDCRDD(
   originTableSchema.structTypeSchema.zipWithIndex.foreach {
 case (field, idx) =>
   if (field.dataType.isInstanceOf[StringType]) {
-map(field.name) = record.getString(idx)
+map(field.name) = 
Option(record.getUTF8String(idx)).map(_.toString).orNull
   } else {

Review Comment:
   I think we are cool, a basic tool test makes sense to me. It's cool if we 
can make the tool a singleton, no json mapper resigtering for each invocation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9103: [MINOR]move hoodie hfile/orc reader/writer test cases from hudi-client-common to hudi-common

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9103:
URL: https://github.com/apache/hudi/pull/9103#issuecomment-1614506556

   
   ## CI report:
   
   * f26d06b7eb099e698fe7058f3ffba327d4ae5c7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18228)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?

2023-06-30 Thread via GitHub


danny0405 commented on issue #9093:
URL: https://github.com/apache/hudi/issues/9093#issuecomment-1614496615

   You should define the `bulk_insert` option while initializing the table with 
sql:
   
   ```sql
   String createTabelSql = "create table dept(\n" +
   "  dept_id BIGINT PRIMARY KEY NOT ENFORCED,\n" +
   "  dept_name varchar(10),\n" +
   "  ts timestamp(3)\n" +
   ")\n" +
   "with (\n" +
   "  'connector' = 'hudi',\n" +
   "  'path' = 'hdfs://localhost:9000/hudi/dept',\n" +
   "  'table.type' = 'MERGE_ON_READ'\n" +
   ")";
   ```
   
   It's weird you can't query the data, is there any exception thrown out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9099: [HUDI-6457]Keep JavaSizeBasedClusteringPlanStrategy and SparkSizeBase…

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9099:
URL: https://github.com/apache/hudi/pull/9099#issuecomment-1614453233

   
   ## CI report:
   
   * c61be845ddfc82ffcc107f8db437fc75d334eb58 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18226)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] gamblewin commented on issue #9093: [SUPPORT] Is it allowed using Flink Table API sqlQuery() to read data from hudi tables?

2023-06-30 Thread via GitHub


gamblewin commented on issue #9093:
URL: https://github.com/apache/hudi/issues/9093#issuecomment-1614448235

   @danny0405 Thx for replying.
   
   1. Data is committed into the table, but can not be queried by using 
`sTableEnv.sqlQuery(select * from dept)`.
   
![image](https://github.com/apache/hudi/assets/39117591/732b92ec-4de2-473c-a80a-8db48db13616)
   
   2. If i use sql way, which is inserting multiple rows in one sql and 
executing this sql, **is this way bulk insert or not?** 
   ```java
 sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
 sEnv.setRuntimeMode(RuntimeExecutionMode.BATCH);// set execution mode 
as batch
 sTableEnv = StreamTableEnvironment.create(sEnv);
 sEnv.setParallelism(1);
 sEnv.enableCheckpointing(3000);
   
 // SQL way: insert multiple rows in one sql without explicitly configuring 
write option as bulk insert
 sTableEnv.executeSql("insert into dept values (1, 'a', NOW()), (2, 'b', 
NOW())");
   ```
   
   3. If the above sql way is not bulk insert, **is there any way i can bulk 
insert data by using sql?** I know that for query sql, we can add options to 
set up some configurations, but i tried add options to insert data sql, it's 
not working.
   ```sql
   insert into dept values
   (1, 'a', NOW()),
   (2, 'b', NOW())
   /*+
   options (
   'write.operation' = 'bulk_insert'
   )*/
   ```
   4. I think what u really mean is using streaming API to bulk insert data. In 
my understanding, bulk insert means insert a batch of data at a time, but in 
the following code, **source data is an unbounded stream, how does sink 
function split source data into different batches?**
   ```java
 DataStream dataStream = env.addSource(...);
 Map options = new HashMap<>();
 // other option configurations ..
 options.put("write.operation", "bulk_insert");
 DataStream dataStream = sEnv.addSource(...);
 HoodiePipeline.Builder builder = HoodiePipeline.builder("dept")
 .column(...)
 .options(options);
 builder.sink(dataStream, false);  
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-30 Thread via GitHub


parisni commented on PR #9053:
URL: https://github.com/apache/hudi/pull/9053#issuecomment-1614447464

   > Currently, we lack tests that cover the 
sortDataFrameBySampleSupportAllTypes function. It would be highly beneficial if 
you could include it as well.
   
   Agreed, fill free to submit a patch, I am in vacation for a week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] parisni commented on a diff in pull request #9053: [HUDI-6369] Fix spacial curve with sample strategy fails when 0 or 1 rows only is incoming

2023-06-30 Thread via GitHub


parisni commented on code in PR #9053:
URL: https://github.com/apache/hudi/pull/9053#discussion_r1247703419


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/execution/RangeSample.scala:
##
@@ -316,6 +316,8 @@ object RangeSampleSort {
 
HoodieClusteringConfig.LAYOUT_OPTIMIZE_BUILD_CURVE_SAMPLE_SIZE.defaultValue.toString).toInt
   val sample = new RangeSample(zOrderBounds, sampleRdd)
   val rangeBounds = sample.getRangeBounds()
+  if (rangeBounds.size <= 1)

Review Comment:
   yes, the test has `height` column which is complex (array). But it did't 
trigger an error, a simple columns did.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9097: [HUDI-6458]Scheduling jobs should not fail when there is no completed commits

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9097:
URL: https://github.com/apache/hudi/pull/9097#issuecomment-1614383551

   
   ## CI report:
   
   * db92d6d09635496b22c27e1375057fed504e6c70 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18225)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8965: [SUPPORT]NoSuchMethodError: org.apache.curator.CuratorZookeeperClient.startAdvancedTrace

2023-06-30 Thread via GitHub


ad1happy2go commented on issue #8965:
URL: https://github.com/apache/hudi/issues/8965#issuecomment-1614381178

   @nb Also it this a deltastreamer job or spark datasource writer, Can you 
also paste the code snippet so I can take a look into.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8965: [SUPPORT]NoSuchMethodError: org.apache.curator.CuratorZookeeperClient.startAdvancedTrace

2023-06-30 Thread via GitHub


ad1happy2go commented on issue #8965:
URL: https://github.com/apache/hudi/issues/8965#issuecomment-1614379041

   @nb I tried to reproduce this issue but zookeeper concurrency is working 
fine with Spark 3.1 and Hudi 0.13.0.
   
   I checked the stack trace and it looks like while writing data only you are 
getting this exception. Any special information about your setup you can 
provide to help me triage this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6448] Improve upgrade/downgrade for table ver. 6 (#9063)

2023-06-30 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new f57248abb46 [HUDI-6448] Improve upgrade/downgrade for table ver. 6 
(#9063)
f57248abb46 is described below

commit f57248abb465a923418129c18801ec1d64a15a5d
Author: Shiyan Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Fri Jun 30 02:18:13 2023 -0700

[HUDI-6448] Improve upgrade/downgrade for table ver. 6 (#9063)



-

Co-authored-by: sivabalan 
---
 .../table/upgrade/FiveToFourDowngradeHandler.java  |  4 +-
 .../table/upgrade/FiveToSixUpgradeHandler.java | 20 --
 .../table/upgrade/FourToFiveUpgradeHandler.java|  4 +-
 .../table/upgrade/OneToZeroDowngradeHandler.java   |  2 +-
 .../table/upgrade/SixToFiveDowngradeHandler.java   | 44 ++--
 .../table/upgrade/TwoToOneDowngradeHandler.java|  2 +-
 .../functional/TestHoodieBackedMetadata.java   |  4 +-
 .../hudi/table/upgrade/TestUpgradeDowngrade.java   | 78 --
 .../hudi/common/table/HoodieTableConfig.java   |  4 ++
 .../hudi/common/table/HoodieTableVersion.java  |  2 +-
 .../TestUpgradeOrDowngradeProcedure.scala  |  4 +-
 11 files changed, 143 insertions(+), 25 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java
index 51da9810f6a..e51f5496c2d 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToFourDowngradeHandler.java
@@ -23,13 +23,13 @@ import org.apache.hudi.common.config.ConfigProperty;
 import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.config.HoodieWriteConfig;
 
-import java.util.HashMap;
+import java.util.Collections;
 import java.util.Map;
 
 public class FiveToFourDowngradeHandler implements DowngradeHandler {
 
   @Override
   public Map downgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
-return new HashMap<>();
+return Collections.emptyMap();
   }
 }
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java
index e3346c2f455..69086b394bf 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FiveToSixUpgradeHandler.java
@@ -18,7 +18,6 @@
 
 package org.apache.hudi.table.upgrade;
 
-import org.apache.hadoop.fs.Path;
 import org.apache.hudi.common.config.ConfigProperty;
 import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
@@ -28,11 +27,13 @@ import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.HoodieUpgradeDowngradeException;
 import org.apache.hudi.table.HoodieTable;
+
+import org.apache.hadoop.fs.Path;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
-import java.util.HashMap;
+import java.util.Collections;
 import java.util.Map;
 
 /**
@@ -46,9 +47,18 @@ public class FiveToSixUpgradeHandler implements 
UpgradeHandler {
 
   @Override
   public Map upgrade(HoodieWriteConfig config, 
HoodieEngineContext context, String instantTime, SupportsUpgradeDowngrade 
upgradeDowngradeHelper) {
-HoodieTable table = upgradeDowngradeHelper.getTable(config, context);
+final HoodieTable table = upgradeDowngradeHelper.getTable(config, context);
+
+deleteCompactionRequestedFileFromAuxiliaryFolder(table);
+
+return Collections.emptyMap();
+  }
+
+  /**
+   * See HUDI-6040.
+   */
+  private void deleteCompactionRequestedFileFromAuxiliaryFolder(HoodieTable 
table) {
 HoodieTableMetaClient metaClient = table.getMetaClient();
-// delete compaction file from .aux
 HoodieTimeline compactionTimeline = 
metaClient.getActiveTimeline().filterPendingCompactionTimeline()
 .filter(instant -> instant.getState() == 
HoodieInstant.State.REQUESTED);
 compactionTimeline.getInstantsAsStream().forEach(
@@ -65,6 +75,6 @@ public class FiveToSixUpgradeHandler implements 
UpgradeHandler {
   }
 }
 );
-return new HashMap<>();
   }
+
 }
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FourToFiveUpgradeHandler.java
 

[GitHub] [hudi] hudi-bot commented on pull request #9063: [HUDI-6448] Improve upgrade/downgrade for table ver. 6

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9063:
URL: https://github.com/apache/hudi/pull/9063#issuecomment-1614372389

   
   ## CI report:
   
   * 045511c3843e115d0df5d97f5f38726b75c98be7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18224)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9098: [MINOR] Reverting disabled tests for multiwriter archival

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9098:
URL: https://github.com/apache/hudi/pull/9098#issuecomment-1614315715

   
   ## CI report:
   
   * 120a4bcce84c866dfff254294f2a20a54a7d0b1e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18223)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614305979

   
   ## CI report:
   
   * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218)
 
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18233)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9104: [HUDI-6445] Removing gc hints from test base

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9104:
URL: https://github.com/apache/hudi/pull/9104#issuecomment-1614295931

   
   ## CI report:
   
   * 022113d3bfa7d479b935b193293fae2a295be46d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18234)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9080: [HUDI-6445] Making some of Spark DS tests as functional

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9080:
URL: https://github.com/apache/hudi/pull/9080#issuecomment-1614254548

   
   ## CI report:
   
   * d28ff949a1dd43456fda75e5624848bb63e030f4 UNKNOWN
   * 645cc6e14e3bac64ddce26dcad6a51fd4aec3f51 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18174)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18173)
 
   * b9dd8237e187586c5d05b46d4d4eee891822813e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18232)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614254342

   
   ## CI report:
   
   * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218)
 
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   * 78b7acc447a6cdadccf1b0ca57e1cc634233c879 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9104: [HUDI-6445] Removing gc hints from test base

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9104:
URL: https://github.com/apache/hudi/pull/9104#issuecomment-1614246214

   
   ## CI report:
   
   * 022113d3bfa7d479b935b193293fae2a295be46d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9080: [HUDI-6445] Making some of Spark DS tests as functional

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9080:
URL: https://github.com/apache/hudi/pull/9080#issuecomment-1614246039

   
   ## CI report:
   
   * d28ff949a1dd43456fda75e5624848bb63e030f4 UNKNOWN
   * 645cc6e14e3bac64ddce26dcad6a51fd4aec3f51 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18174)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18173)
 
   * b9dd8237e187586c5d05b46d4d4eee891822813e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #9007: [HUDI-6405] Fix incremental file sync for clustering and logcompaction

2023-06-30 Thread via GitHub


hudi-bot commented on PR #9007:
URL: https://github.com/apache/hudi/pull/9007#issuecomment-1614245819

   
   ## CI report:
   
   * f154ee335eb307e2bcffd895cfd95bfb1f417a1e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=18218)
 
   * c221efd733a444258780949b698830c2cef47931 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >