Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1882542666

   
   ## CI report:
   
   * 0510337de5adb626429e72da5539b9f23231974f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21880)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7276] FILE_GROUP_READER_ENABLED should be disable for query [hudi]

2024-01-08 Thread via GitHub


linliu-code commented on PR #10455:
URL: https://github.com/apache/hudi/pull/10455#issuecomment-1882525739

   @xuzifu666 do you have any same data and the query that generates the error? 
With that I can try to fix the problem asap. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-1623] Solid completion time on timeline [hudi]

2024-01-08 Thread via GitHub


waywtdcc commented on PR #9617:
URL: https://github.com/apache/hudi/pull/9617#issuecomment-1882506165

   > No, we would fix the ugrade for 1.0.0 GA release.
   
   Does this mean that the official version 1.0.0 will support automatic 
upgrade and fix this bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882481681

   
   ## CI report:
   
   * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882474976

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 4d1c6b4d5d83eaed954174ce26a83e23be62bb20 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21882)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882441755

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 6a208e28d1b9a9a34616f80bfb9e0ab1ead14dc3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21881)
 
   * 4d1c6b4d5d83eaed954174ce26a83e23be62bb20 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882436037

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 88c81170920a6eaf95e826e979ce6e526d8b33f9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21879)
 
   * 6a208e28d1b9a9a34616f80bfb9e0ab1ead14dc3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1882430499

   
   ## CI report:
   
   * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861)
 
   * 0510337de5adb626429e72da5539b9f23231974f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21880)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1882430134

   
   ## CI report:
   
   * 4827a8d0481f67243920efee57eda41b8a8210a7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21876)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882430073

   
   ## CI report:
   
   *  Unknown: [CANCELED](TBD) 
   * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1882401300

   
   ## CI report:
   
   * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861)
 
   * 0510337de5adb626429e72da5539b9f23231974f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882401167

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 88c81170920a6eaf95e826e979ce6e526d8b33f9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21879)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882396258

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * b925f08b8056623f82bfec8e8031db0536d7bbee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21877)
 
   * 88c81170920a6eaf95e826e979ce6e526d8b33f9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10462:
URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882391258

   
   ## CI report:
   
   * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882355671

   
   ## CI report:
   
   * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875)
 
   *  Unknown: [CANCELED](TBD) 
   * 9c61cc3b1ff124314bb7cacb82bb141762678d54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21878)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-7280) Add/Drop/Rename table properties hoodie.properties

2024-01-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu reassigned HUDI-7280:
-

Assignee: (was: Lin Liu)

> Add/Drop/Rename table properties hoodie.properties 
> ---
>
> Key: HUDI-7280
> URL: https://issues.apache.org/jira/browse/HUDI-7280
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882340593

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * b925f08b8056623f82bfec8e8031db0536d7bbee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21877)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882339654

   
   ## CI report:
   
   * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875)
 
   *  Unknown: [CANCELED](TBD) 
   * 9c61cc3b1ff124314bb7cacb82bb141762678d54 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


beyond1920 commented on code in PR #10426:
URL: https://github.com/apache/hudi/pull/10426#discussion_r1444512704


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -971,6 +952,59 @@ public void alterPartitionColumnStatistics(
 throw new HoodieCatalogException("Not supported.");
   }
 
+  public boolean isUpdatePermissible(ObjectPath tablePath, CatalogBaseTable 
newCatalogTable, boolean ignoreIfNotExists) throws TableNotExistException {
+if (!newCatalogTable.getOptions().getOrDefault(CONNECTOR.key(), 
"").equalsIgnoreCase("hudi")) {
+  throw new HoodieCatalogException(String.format("The %s is not hoodie 
table", tablePath.getObjectName()));
+}
+if (newCatalogTable instanceof CatalogView) {
+  throw new HoodieCatalogException("Hoodie catalog does not support to 
ALTER VIEW");
+}
+
+try {
+  Table hiveTable = getHiveTable(tablePath);
+  if (!sameOptions(hiveTable.getParameters(), 
newCatalogTable.getOptions(), FlinkOptions.TABLE_TYPE)
+  || !sameOptions(hiveTable.getParameters(), 
newCatalogTable.getOptions(), FlinkOptions.INDEX_TYPE)) {

Review Comment:
   @xiarixiaoyao After I forbid  modifying `primaryKeys` and `preCombinekeys`, 
some tests failed in `TestHoodieHiveCatalog.testCreateAndGetHoodieTable` and 
`TestHoodieHiveCatalog.testCreateAndGetHoodieTable`.
   See more info in [Failed 
pipelines](https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=21861=logs=600e7de6-e133-5e69-e615-50ee129b3c08=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7).
   Maybe the real user would also modify `primaryKeys` and `preCombinekeys` 
after create the table. 
   I would remove those limitation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7240] Clean delete logic [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10398:
URL: https://github.com/apache/hudi/pull/10398#issuecomment-1882325265

   
   ## CI report:
   
   * 3d71d1c0e3220f0639b702d91539e1d070e93cca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21874)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882324940

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 3192ba3b71bdfe1421d92369e8787efbee821f90 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21871)
 
   * b925f08b8056623f82bfec8e8031db0536d7bbee UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


rohitmittapalli commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882325117

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] add parquet merge schema config [hudi]

2024-01-08 Thread via GitHub


rohitmittapalli opened a new pull request, #10463:
URL: https://github.com/apache/hudi/pull/10463

   ### Change Logs
   
   Adding Parquet DFS source config class from this PR: 
https://github.com/apache/hudi/pull/10199/files
   ### Impact
   
   Adding public documentation. 
   ### Risk level (write none, low medium or high below)
   
   Low
   ### Documentation Update
   
   ### Contributor's checklist
   
   - [X] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [X] Change Logs and Impact were stated clearly
   - [X] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7280) Add/Drop/Rename table properties hoodie.properties

2024-01-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7280:
--
Issue Type: Improvement  (was: Bug)

> Add/Drop/Rename table properties hoodie.properties 
> ---
>
> Key: HUDI-7280
> URL: https://issues.apache.org/jira/browse/HUDI-7280
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7280) Add/Drop/Rename table properties hoodie.properties

2024-01-08 Thread Lin Liu (Jira)
Lin Liu created HUDI-7280:
-

 Summary: Add/Drop/Rename table properties hoodie.properties 
 Key: HUDI-7280
 URL: https://issues.apache.org/jira/browse/HUDI-7280
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lin Liu
Assignee: Lin Liu
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1882239577

   
   ## CI report:
   
   * 536833e03706c665b00e88986596bf9f44aa2c47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21801)
 
   * 4827a8d0481f67243920efee57eda41b8a8210a7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21876)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882239471

   
   ## CI report:
   
   * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190)
 
   * 378a6a619dc288301c70275483bbb0ecfa73a7f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21875)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [] CVE-2023-44487 Upgrade jetty and exclude older jetty [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10223:
URL: https://github.com/apache/hudi/pull/10223#issuecomment-1882224888

   
   ## CI report:
   
   * 536833e03706c665b00e88986596bf9f44aa2c47 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21801)
 
   * 4827a8d0481f67243920efee57eda41b8a8210a7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882224728

   
   ## CI report:
   
   * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190)
 
   * 378a6a619dc288301c70275483bbb0ecfa73a7f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882210046

   
   ## CI report:
   
   * f7566099db43c39a06db5e4ae905a65dfd69a7ca Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21190)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10462:
URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882211360

   
   ## CI report:
   
   * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872)
 
   * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Merge schema in ParuqetDFSSource [hudi]

2024-01-08 Thread via GitHub


rohitmittapalli commented on PR #10199:
URL: https://github.com/apache/hudi/pull/10199#issuecomment-1882191866

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7247]Spark truncate table supports concurrency [hudi]

2024-01-08 Thread via GitHub


stream2000 commented on code in PR #10390:
URL: https://github.com/apache/hudi/pull/10390#discussion_r144086


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/TruncateHoodieTableCommand.scala:
##
@@ -68,7 +71,12 @@ case class TruncateHoodieTableCommand(
   val targetPath = new Path(basePath)
   val engineContext = new 
HoodieSparkEngineContext(sparkSession.sparkContext)
   val fs = FSUtils.getFs(basePath, 
sparkSession.sparkContext.hadoopConfiguration)
+  val hoodieWriteConfig = 
HoodieWriteConfig.newBuilder().withPath(basePath).withProps(properties).withEngineType(EngineType.SPARK)
+.build()
+  val transactionManager = new TransactionManager(hoodieWriteConfig, fs)
+  
transactionManager.beginTransaction(org.apache.hudi.common.util.Option.empty(), 
org.apache.hudi.common.util.Option.empty())
   FSUtils.deleteDir(engineContext, fs, targetPath, 
sparkSession.sparkContext.defaultParallelism)
+  
transactionManager.endTransaction(org.apache.hudi.common.util.Option.empty())

Review Comment:
   Can we use replace commit here to provide the transaction mechanism? We have 
already done it for partitioned table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7240] Clean delete logic [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10398:
URL: https://github.com/apache/hudi/pull/10398#issuecomment-1882131119

   
   ## CI report:
   
   * c3779c92492d0deeb5a261c63710a96ade6f0524 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21684)
 
   * 3d71d1c0e3220f0639b702d91539e1d070e93cca Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21874)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10462:
URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882124489

   
   ## CI report:
   
   * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872)
 
   * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21873)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7240] Clean delete logic [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10398:
URL: https://github.com/apache/hudi/pull/10398#issuecomment-1882124334

   
   ## CI report:
   
   * c3779c92492d0deeb5a261c63710a96ade6f0524 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21684)
 
   * 3d71d1c0e3220f0639b702d91539e1d070e93cca UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


danny0405 commented on code in PR #10426:
URL: https://github.com/apache/hudi/pull/10426#discussion_r1445532868


##
hudi-flink-datasource/hudi-flink1.14.x/src/main/java/org/apache/hudi/adapter/Utils.java:
##
@@ -57,4 +64,8 @@ public static BinaryExternalSorter getBinaryExternalSorter(
 return new BinaryExternalSorter(owner, memoryManager, reservedMemorySize,
 ioManager, inputSerializer, serializer, normalizedKeyComputer, 
comparator, conf);
   }
+
+  public static InternalSchema applyTableChange(InternalSchema oldSchema, List 
changes, Function convertFunc) {
+throw new RuntimeException("There is no possible to hit this method!");

Review Comment:
   Can we just throw a assertionError with simple msg ‘unexpected’



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


danny0405 commented on code in PR #10426:
URL: https://github.com/apache/hudi/pull/10426#discussion_r1445531226


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -131,6 +138,13 @@ public class HoodieHiveCatalog extends AbstractCatalog {
   // optional catalog base path: used for db/table path inference.
   private final String catalogPath;
   private final boolean external;
+  private static final Map> IMMUTABLE_CONFS = new 
HashMap<>();
+  {

Review Comment:
   Don’t think we need this. A simple if-else check is enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10462:
URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882118575

   
   ## CI report:
   
   * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872)
 
   * 1698e7f7a5410b6f7c46fc21e9e73e1362e5f5ce UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch release-0.14.1 updated (66cff7d7642 -> 5b0d67bc798)

2024-01-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a change to branch release-0.14.1
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 66cff7d7642 Bumping release candidate number 2
 add 5b0d67bc798 [MINOR] Update release version to reflect published 
version 0.14.1

No new revisions were added by this update.

Summary of changes:
 docker/hoodie/hadoop/base/pom.xml| 2 +-
 docker/hoodie/hadoop/base_java11/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml| 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml   | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml| 2 +-
 docker/hoodie/hadoop/pom.xml | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml  | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml | 2 +-
 docker/hoodie/hadoop/trinobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/trinocoordinator/pom.xml| 2 +-
 docker/hoodie/hadoop/trinoworker/pom.xml | 2 +-
 hudi-aws/pom.xml | 4 ++--
 hudi-cli/pom.xml | 2 +-
 hudi-client/hudi-client-common/pom.xml   | 4 ++--
 hudi-client/hudi-flink-client/pom.xml| 4 ++--
 hudi-client/hudi-java-client/pom.xml | 4 ++--
 hudi-client/hudi-spark-client/pom.xml| 4 ++--
 hudi-client/pom.xml  | 2 +-
 hudi-common/pom.xml  | 2 +-
 hudi-examples/hudi-examples-common/pom.xml   | 2 +-
 hudi-examples/hudi-examples-flink/pom.xml| 2 +-
 hudi-examples/hudi-examples-java/pom.xml | 2 +-
 hudi-examples/hudi-examples-spark/pom.xml| 2 +-
 hudi-examples/pom.xml| 2 +-
 hudi-flink-datasource/hudi-flink/pom.xml | 4 ++--
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.15.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.16.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.17.x/pom.xml   | 4 ++--
 hudi-flink-datasource/pom.xml| 4 ++--
 hudi-gcp/pom.xml | 2 +-
 hudi-hadoop-mr/pom.xml   | 2 +-
 hudi-integ-test/pom.xml  | 2 +-
 hudi-kafka-connect/pom.xml   | 4 ++--
 hudi-platform-service/hudi-metaserver/hudi-metaserver-client/pom.xml | 2 +-
 hudi-platform-service/hudi-metaserver/hudi-metaserver-server/pom.xml | 2 +-
 hudi-platform-service/hudi-metaserver/pom.xml| 4 ++--
 hudi-platform-service/pom.xml| 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml  | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark2/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark3.0.x/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml   | 2 +-
 hudi-spark-datasource/hudi-spark3.3.x/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3.4.x/pom.xml| 4 ++--
 hudi-spark-datasource/pom.xml| 2 +-
 hudi-sync/hudi-adb-sync/pom.xml  | 2 +-
 hudi-sync/hudi-datahub-sync/pom.xml  | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml | 2 +-
 hudi-sync/hudi-sync-common/pom.xml   | 2 +-
 hudi-sync/pom.xml  

Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10462:
URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882065745

   
   ## CI report:
   
   * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21872)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882065105

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 3192ba3b71bdfe1421d92369e8787efbee821f90 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21871)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [DOCS] Add OneTable to list of catalogs [hudi]

2024-01-08 Thread via GitHub


bhasudha closed pull request #10431: [DOCS] Add OneTable to list of catalogs
URL: https://github.com/apache/hudi/pull/10431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch asf-site updated: added onetable support (#10461)

2024-01-08 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0aa1ecc907e added onetable support (#10461)
0aa1ecc907e is described below

commit 0aa1ecc907ea2337e6e6b0e26dad4eba8c26e36a
Author: Sagar Lakshmipathy <18vidhyasa...@gmail.com>
AuthorDate: Mon Jan 8 16:33:53 2024 -0800

added onetable support (#10461)
---
 website/docs/syncing_onetable.md   | 54 ++
 website/sidebars.js|  3 +-
 .../version-0.14.0/syncing_onetable.md | 54 ++
 .../version-0.14.1/syncing_onetable.md | 54 ++
 .../version-0.14.0-sidebars.json   |  3 +-
 .../version-0.14.1-sidebars.json   |  3 +-
 6 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/website/docs/syncing_onetable.md b/website/docs/syncing_onetable.md
new file mode 100644
index 000..99760e101c0
--- /dev/null
+++ b/website/docs/syncing_onetable.md
@@ -0,0 +1,54 @@
+---
+title: OneTable
+keywords: [onetable, hudi, delta-lake, iceberg, sync]
+---
+
+Hudi (tables created from 0.14.0 onwards) supports syncing to 
[OneTable](https://onetable.dev/), providing users with the option to 
interoperate with other table formats like Delta Lake and Apache Iceberg.
+
+## Interoperating with OneTable
+
+If you have tables in one of the supported formats (Delta/Iceberg), you can 
use OneTable to translate the existing metadata to read as a Hudi table and 
vice versa.
+
+### Installation
+
+You can work with OneTable by either building the jar from the 
[source](https://github.com/onetable-io/onetable) or by downloading from 
[GitHub packages](https://github.com/onetable-io/onetable/packages/1986830).
+
+:::tip Note
+If you're using one of the JVM languages to work with Hudi/Delta/Iceberg, you 
can directly use OneTable as a 
[dependency](https://github.com/onetable-io/onetable/packages/1986830) in your 
project.
+This is highlighted in this [demo](https://onetable.dev/docs/demo/docker).
+:::
+
+### Syncing to OneTable
+
+Once you have the jar, you can simply run it against a Hudi/Delta/Iceberg 
table to add target table format metadata to the table.
+Below is an example configuration to translate a Hudi table to Delta & Iceberg 
table.
+
+```shell md title="my_config.yaml"
+sourceFormat: HUDI
+targetFormats:
+  - DELTA
+  - ICEBERG
+datasets:
+  -
+tableBasePath: path/to/hudi/table
+tableName: tableName
+partitionSpec: partition_field_name:VALUE
+```
+
+```shell md title="shell"
+java -jar path/to/bundled-onetable.jar --datasetConfig path/to/my_config.yaml
+```
+
+### Hudi Streamer Extensions
+If you want to use OneTable with Hudi Streamer to sync each commit into other 
table formats, you have to
+
+1. Add the [extensions 
jar](https://github.com/onetable-io/onetable/tree/main/hudi-support/extensions) 
`hudi-extensions-0.1.0-SNAPSHOT-bundled.jar` to your class path.
+2. Add `io.onetable.hudi.sync.OneTableSyncTool` to your list of sync classes
+3. Set the following configurations based on your preferences:
+
+   ```
+   hoodie.onetable.formats: "ICEBERG,DELTA" 
+   hoodie.onetable.target.metadata.retention.hr: 168
+   ```
+
+For more examples, you can refer to the [OneTable 
docs](https://onetable.dev/docs/how-to).
\ No newline at end of file
diff --git a/website/sidebars.js b/website/sidebars.js
index 9ee664a606c..72456f554bb 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -57,7 +57,8 @@ module.exports = {
 'syncing_aws_glue_data_catalog',
 'syncing_datahub',
 'syncing_metastore',
-"gcp_bigquery"
+'gcp_bigquery',
+'syncing_onetable'
 ],
 }
 ],
diff --git a/website/versioned_docs/version-0.14.0/syncing_onetable.md 
b/website/versioned_docs/version-0.14.0/syncing_onetable.md
new file mode 100644
index 000..99760e101c0
--- /dev/null
+++ b/website/versioned_docs/version-0.14.0/syncing_onetable.md
@@ -0,0 +1,54 @@
+---
+title: OneTable
+keywords: [onetable, hudi, delta-lake, iceberg, sync]
+---
+
+Hudi (tables created from 0.14.0 onwards) supports syncing to 
[OneTable](https://onetable.dev/), providing users with the option to 
interoperate with other table formats like Delta Lake and Apache Iceberg.
+
+## Interoperating with OneTable
+
+If you have tables in one of the supported formats (Delta/Iceberg), you can 
use OneTable to translate the existing metadata to read as a Hudi table and 
vice versa.
+
+### Installation
+
+You can work with OneTable by either building the jar from the 
[source](https://github.com/onetable-io/onetable) or by downloading from 
[GitHub 

Re: [PR] [DOCS] Added onetable support [hudi]

2024-01-08 Thread via GitHub


bhasudha merged PR #10461:
URL: https://github.com/apache/hudi/pull/10461


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10462:
URL: https://github.com/apache/hudi/pull/10462#issuecomment-1882050370

   
   ## CI report:
   
   * b06d46d4d50ffaa766d50e299f5fc3b3dd536c3e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1882049695

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 505fda45b13ca3afd7cb8e0ef5fe575456ad14ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21867)
 
   * 3192ba3b71bdfe1421d92369e8787efbee821f90 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10460:
URL: https://github.com/apache/hudi/pull/10460#issuecomment-1882036532

   
   ## CI report:
   
   * 1022041ada5bc1b360c463e2b044e232fd8f9749 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21869)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7251) Improve the logic of AppendHandle for partial updates

2024-01-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7251:
-
Labels: pull-request-available  (was: )

> Improve the logic of AppendHandle for partial updates
> -
>
> Key: HUDI-7251
> URL: https://issues.apache.org/jira/browse/HUDI-7251
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7251] Support partial schema for CDC workload [hudi]

2024-01-08 Thread via GitHub


linliu-code opened a new pull request, #10462:
URL: https://github.com/apache/hudi/pull/10462

   ### Change Logs
   
   We pass the partial schema into write config, which will be used in 
HoodieAppendHandle.
   Here we assume 
   1. the inputBatch contains the partial schema which is from the upstream.
   2. the inputBatch contains either insert, update or delete records.
   
   ### Impact
   
   Low since the partial schema equals the full schema by default.
   
   ### Risk level (write none, low medium or high below)
   
   Low.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [DOCS] Added onetable support [hudi]

2024-01-08 Thread via GitHub


sagarlakshmipathy opened a new pull request, #10461:
URL: https://github.com/apache/hudi/pull/10461

   ### Change Logs
   
   Added OneTable to Catalogs page for both current and 0.14.0 (only supported 
from 0.14 onwards)
   
   ### Impact
   
   Doc change, minor.
   
   ### Risk level (write none, low medium or high below)
   
   Minor
   
   ### Documentation Update
   
   Doc update
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [NA] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10460:
URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881865528

   
   ## CI report:
   
   * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865)
 
   * 1022041ada5bc1b360c463e2b044e232fd8f9749 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21869)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881865121

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 505fda45b13ca3afd7cb8e0ef5fe575456ad14ca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21867)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10460:
URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881854982

   
   ## CI report:
   
   * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865)
 
   * 1022041ada5bc1b360c463e2b044e232fd8f9749 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881854545

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 940711369818ddd1525a7e9b255fa48ea3de4152 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21868)
 
   * 505fda45b13ca3afd7cb8e0ef5fe575456ad14ca UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] fix violations of Sonarqube rule java:S2184 [hudi]

2024-01-08 Thread via GitHub


bvaradar commented on PR #10444:
URL: https://github.com/apache/hudi/pull/10444#issuecomment-1881846064

   @KUTEJiang : Can you fix the PR to pass the validation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7229] support partial updates for cdc payload demo [hudi]

2024-01-08 Thread via GitHub


linliu-code commented on PR #10384:
URL: https://github.com/apache/hudi/pull/10384#issuecomment-1881811126

   This is not a correct direction for cdc demo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7229] support partial updates for cdc payload demo [hudi]

2024-01-08 Thread via GitHub


linliu-code closed pull request #10384: [HUDI-7229] support partial updates for 
cdc payload demo
URL: https://github.com/apache/hudi/pull/10384


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [WIP] [HUDI-6787] Implement the HoodieFileGroupReader API for Hive [hudi]

2024-01-08 Thread via GitHub


vinothchandar commented on code in PR #10422:
URL: https://github.com/apache/hudi/pull/10422#discussion_r1445060501


##
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java:
##
@@ -116,6 +117,7 @@ public void setUp() {
 hadoopConf.set("fs.file.impl", 
org.apache.hadoop.fs.LocalFileSystem.class.getName());
 baseJobConf = new JobConf(hadoopConf);
 baseJobConf.set(HoodieMemoryConfig.MAX_DFS_STREAM_BUFFER_SIZE.key(), 
String.valueOf(1024 * 1024));
+baseJobConf.set(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(), 
"false");

Review Comment:
   why "false"



##
packaging/bundle-validation/validate.sh:
##
@@ -93,7 +93,7 @@ test_spark_hadoop_mr_bundles () {
 # save HiveQL query results
 hiveqlresultsdir=/tmp/hadoop-mr-bundle/hiveql/trips/results
 mkdir -p $hiveqlresultsdir
-$HIVE_HOME/bin/beeline --hiveconf 
hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat \
+$HIVE_HOME/bin/beeline --verbose --hiveconf 
hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat \

Review Comment:
   does this need to be checked in?



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java:
##
@@ -91,9 +94,42 @@ private void initAvroInputFormat() {
 }
   }
 
+  private static boolean checkTableIsHudi(final InputSplit split, final 
JobConf job) {

Review Comment:
   rename: checkIfHudiTable



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/ObjectInspectorCache.java:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.hadoop.utils;
+
+import com.github.benmanes.caffeine.cache.Cache;
+import com.github.benmanes.caffeine.cache.Caffeine;
+import org.apache.avro.Schema;
+import org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector;
+import org.apache.hadoop.hive.serde.serdeConstants;
+import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.mapred.JobConf;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Locale;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+public class ObjectInspectorCache {

Review Comment:
   java docs



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -227,23 +231,31 @@ private ClosableIterator 
makeBootstrapBaseFileIterator(HoodieBaseFile baseFil
 BaseFile dataFile = baseFile.getBootstrapBaseFile().get();
 Pair,List> requiredFields = 
getDataAndMetaCols(requiredSchema);
 Pair,List> allFields = 
getDataAndMetaCols(dataSchema);
-
-Option> dataFileIterator = 
requiredFields.getRight().isEmpty() ? Option.empty() :
-
Option.of(readerContext.getFileRecordIterator(dataFile.getHadoopPath(), 0, 
dataFile.getFileLen(),
-createSchemaFromFields(allFields.getRight()), 
createSchemaFromFields(requiredFields.getRight()), hadoopConf));
-
-Option> skeletonFileIterator = 
requiredFields.getLeft().isEmpty() ? Option.empty() :
-
Option.of(readerContext.getFileRecordIterator(baseFile.getHadoopPath(), 0, 
baseFile.getFileLen(),
-createSchemaFromFields(allFields.getLeft()), 
createSchemaFromFields(requiredFields.getLeft()), hadoopConf));
+Option,Schema>> dataFileIterator =

Review Comment:
   its cool we are able to add a new engine without much changes to this class.



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java:
##
@@ -101,6 +102,9 @@ public AbstractRealtimeRecordReader(RealtimeSplit split, 
JobConf job) {
   throw new HoodieException("Could not create HoodieRealtimeRecordReader 
on path " + this.split.getPath(), e);
 }
 prepareHiveAvroSerializer();
+if 

Re: [PR] [HUDI-7236] Fix mit change partition4 [hudi]

2024-01-08 Thread via GitHub


nsivabalan closed pull request #10369: [HUDI-7236] Fix mit change partition4
URL: https://github.com/apache/hudi/pull/10369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7236] Fix mit change partition5 [hudi]

2024-01-08 Thread via GitHub


nsivabalan closed pull request #10370: [HUDI-7236] Fix mit change partition5
URL: https://github.com/apache/hudi/pull/10370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7236] Fix mit change partition2 [hudi]

2024-01-08 Thread via GitHub


nsivabalan closed pull request #10367: [HUDI-7236] Fix mit change partition2
URL: https://github.com/apache/hudi/pull/10367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7236] Fix mit change partition3 [hudi]

2024-01-08 Thread via GitHub


nsivabalan closed pull request #10368: [HUDI-7236] Fix mit change partition3
URL: https://github.com/apache/hudi/pull/10368


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881771331

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 940711369818ddd1525a7e9b255fa48ea3de4152 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21868)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881761174

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 6e1bcd405ba3a0d3edd11560da1e5377a16c7d33 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21863)
 
   * 940711369818ddd1525a7e9b255fa48ea3de4152 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7251) Improve the logic of AppendHandle for partial updates

2024-01-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7251:
--
Summary: Improve the logic of AppendHandle for partial updates  (was: 
Improve the logic of AppenHandle for partial updates)

> Improve the logic of AppendHandle for partial updates
> -
>
> Key: HUDI-7251
> URL: https://issues.apache.org/jira/browse/HUDI-7251
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10460:
URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881686424

   
   ## CI report:
   
   * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7251) Improve the logic of AppenHandle for partial updates

2024-01-08 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-7251:
--
Status: In Progress  (was: Open)

> Improve the logic of AppenHandle for partial updates
> 
>
> Key: HUDI-7251
> URL: https://issues.apache.org/jira/browse/HUDI-7251
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7229] support partial updates for cdc payload demo [hudi]

2024-01-08 Thread via GitHub


bvaradar commented on code in PR #10384:
URL: https://github.com/apache/hudi/pull/10384#discussion_r1445146024


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/SourceFormatAdapter.java:
##
@@ -158,8 +160,9 @@ public InputBatch> 
fetchNewDataInAvroFormat(Option> r = ((Source>) 
source).fetchNext(lastCkptStr, sourceLimit);
-return new InputBatch<>(Option.ofNullable(r.getBatch().map(
+// InputBatch> r = ((Source>) 
source).fetchNext(lastCkptStr, sourceLimit);
+List>> rs = ((Source>) 
source).fetchNextForParitialUpdate(lastCkptStr, sourceLimit);

Review Comment:
   Can you keep fetchNext as the main API to be called. Processing 
PartialUpdate or fullUpdate needs to be encapsulated inside the Source 
implementation.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [MINOR] Disable flaky test (#10449)

2024-01-08 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b54381ff9e6 [MINOR] Disable flaky test (#10449)
b54381ff9e6 is described below

commit b54381ff9e6d34a41eaf756555c7967eb7146380
Author: Jon Vexler 
AuthorDate: Mon Jan 8 13:23:17 2024 -0500

[MINOR] Disable flaky test (#10449)

Co-authored-by: Jonathan Vexler <=>
---
 .../src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala| 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala
 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala
index 38221cc05c7..599e8ae9708 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala
@@ -40,7 +40,7 @@ import org.apache.spark.sql.functions.{expr, lit}
 import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
 import org.apache.spark.sql.hudi.command.SqlKeyGenerator
 import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, 
assertNotNull, assertNull, assertTrue, fail}
-import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Disabled, Test}
 import org.junit.jupiter.params.ParameterizedTest
 import org.junit.jupiter.params.provider.Arguments.arguments
 import org.junit.jupiter.params.provider._
@@ -1341,8 +1341,9 @@ def testBulkInsertForDropPartitionColumn(): Unit = {
   /*
* Test case for instant is generated with commit timezone when 
TIMELINE_TIMEZONE set to UTC
* related to HUDI-5978
+   * Issue [HUDI-7275] is tracking this test being disabled
*/
-  @Test
+  @Disabled
   def testInsertDatasetWithTimelineTimezoneUTC(): Unit = {
 val defaultTimezone = TimeZone.getDefault
 try {



Re: [PR] [MINOR] Disable org.apache.hudi.TestHoodieSparkSqlWriter#testInsertDatasetWithTimelineTimezoneUTC [hudi]

2024-01-08 Thread via GitHub


nsivabalan merged PR #10449:
URL: https://github.com/apache/hudi/pull/10449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fixed unit tests [hudi]

2024-01-08 Thread via GitHub


bvaradar commented on PR #10362:
URL: https://github.com/apache/hudi/pull/10362#issuecomment-1881544848

   @yihua : If you are ok with this change, can you land it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10460:
URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881544202

   
   ## CI report:
   
   * e3813a268b2e2270300c4be75e3660594c8f48d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881543595

   
   ## CI report:
   
   * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10460:
URL: https://github.com/apache/hudi/pull/10460#issuecomment-1881530821

   
   ## CI report:
   
   * e3813a268b2e2270300c4be75e3660594c8f48d2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Add parallel listing of existing partitions [hudi]

2024-01-08 Thread via GitHub


VitoMakarevich opened a new pull request, #10460:
URL: https://github.com/apache/hudi/pull/10460

   ### Change Logs
   
   Currently, the sync works next way(simplified for particular problem 
explanation):
   1. Read partitions which were upserted(create/update is indistinguishable) 
and deleted from the commitline(since previous sync).
   2. Read existing partitions from Glue.
   3. Iterate to understand which AWS call should be done for the changed 
partition(create/update/delete).
   
   To facilitate this, there are 2 clients, 1 is `AWSGlueCatalogSyncClient`, 
second is `HoodieHiveSyncClient`. In the Spark-EMR case, the second relies on 
the AWS's Hive-Glue interface implementation. There is no public code of the 
actual version, but there is of some previous, the interesting part is 
[this](https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/d885a996137b19df6a128cb950ebc43711374e83/aws-glue-datacatalog-client-common/src/main/java/com/amazonaws/glue/catalog/metastore/DefaultAWSGlueMetastore.java#L301).
   Saying short for AWS Glue exists a mechanism of 
[segments](https://docs.aws.amazon.com/glue/latest/webapi/API_Segment.html) 
which allows to query long lists in parallel. I added code similar to what 
exists in AWS Hive-Glue implementation.
   
   The real use-case for us was that we have been using `HiveSyncTool` (table 
with ~200k partitions) - and it was behaving slow when a lot of partitions were 
changing/in case of resync - so we decided to try out 
`AWSGlueCatalogSyncClient` which gave about 2x boost in write speed, but after 
we noticed a significant degradation in speed of listing(roughly 40 sec vs 210 
on the ~same amount of partitions - 200k).
   This improvement should bring the same listing speed to the tool.
   
   ### Impact
   
   There is a new config added with the default value from AWS 
Glue-Hive(parallelism of 5) - while before it was sequential, so I don't expect 
it to be failing, but we may set parallelism to 1 to be backward-compatible for 
whatever reasons.
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   **Need to adjust the website, but please let me know if you are ok with the 
changes, I'll proceed then.**
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881441184

   
   ## CI report:
   
   * b5d7c6285d8bc0a0809beebae19a2a789fc55661 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21858)
 
   * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881424054

   
   ## CI report:
   
   * b5d7c6285d8bc0a0809beebae19a2a789fc55661 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21858)
 
   * 0279dd4b1ab59776cbb5024810f5bb6a00fd2164 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


vinothchandar commented on code in PR #10352:
URL: https://github.com/apache/hudi/pull/10352#discussion_r1444903624


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/PartitionStatsIndexSupport.scala:
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.HoodieConversionUtils.toScalaOption
+import org.apache.hudi.avro.model.{HoodieMetadataColumnStats, 
HoodieMetadataRecord}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.data.HoodieData
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.HoodieTableMetaClient
+import org.apache.hudi.common.util.ValidationUtils.checkState
+import org.apache.hudi.common.util.hash.ColumnIndexID
+import org.apache.hudi.metadata.{HoodieMetadataPayload, HoodieTableMetadata, 
HoodieTableMetadataUtil}
+import org.apache.hudi.util.JFunction
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.types.StructType
+
+import scala.collection.JavaConverters._
+
+class PartitionStatsIndexSupport(spark: SparkSession,
+ tableSchema: StructType,
+ @transient metadataConfig: 
HoodieMetadataConfig,
+ @transient metaClient: HoodieTableMetaClient,
+ allowCaching: Boolean = false)
+  extends ColumnStatsIndexSupport(spark, tableSchema, metadataConfig, 
metaClient, allowCaching) {
+
+  @transient private lazy val engineCtx = new HoodieSparkEngineContext(new 
JavaSparkContext(spark.sparkContext))
+  @transient private lazy val metadataTable: HoodieTableMetadata =
+HoodieTableMetadata.create(engineCtx, metadataConfig, 
metaClient.getBasePathV2.toString)
+
+  override def isIndexAvailable: Boolean = {
+checkState(metadataConfig.enabled, "Metadata Table support has to be 
enabled")
+
metaClient.getTableConfig.getMetadataPartitions.contains(HoodieTableMetadataUtil.PARTITION_NAME_PARTITION_STATS)
+  }
+
+  override def loadColumnStatsIndexRecords(targetColumns: Seq[String], 
shouldReadInMemory: Boolean): HoodieData[HoodieMetadataColumnStats] = {
+checkState(targetColumns.nonEmpty)
+val encodedTargetColumnNames = targetColumns.map(colName => new 
ColumnIndexID(colName).asBase64EncodedString())
+val metadataRecords: HoodieData[HoodieRecord[HoodieMetadataPayload]] =
+  metadataTable.getRecordsByKeyPrefixes(encodedTargetColumnNames.asJava, 
HoodieTableMetadataUtil.PARTITION_NAME_PARTITION_STATS, shouldReadInMemory)
+val columnStatsRecords: HoodieData[HoodieMetadataColumnStats] =
+  // NOTE: Explicit conversion is required for Scala 2.11

Review Comment:
   side note: I think we can drop support for 2.11?



##
hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java:
##
@@ -454,6 +454,51 @@ private static BigDecimal extractDecimal(Object val, 
DecimalMetadata decimalMeta
 }
   }
 
+  /**
+   * Aggregate column range statistics across files in a partition.
+   *
+   * @param fileRanges List of column range statistics for each file in a 
partition
+   */
+  public > HoodieColumnRangeMetadata 
getColumnRangeInPartition(@Nonnull List> 
fileRanges) {

Review Comment:
   this is leaking upper level context (files and ranges) into ParquetUtils?  
This class ought to be about just reading various things out of parquet files. 
the actual columnar file format. 
   
   Please relocate and add unit tests/



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/PartitionStatsIndexTestBase.scala:
##
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may 

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


vinothchandar commented on code in PR #10352:
URL: https://github.com/apache/hudi/pull/10352#discussion_r1444890638


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##
@@ -330,6 +330,27 @@ public final class HoodieMetadataConfig extends 
HoodieConfig {
   .sinceVersion("1.0.0")
   .withDocumentation("Parallelism to use, when generating functional 
index.");
 
+  public static final ConfigProperty 
ENABLE_METADATA_INDEX_PARTITION_STATS = ConfigProperty

Review Comment:
   Can we turn this on and get all tests to pass? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881241846

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 6e1bcd405ba3a0d3edd11560da1e5377a16c7d33 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21863)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6497] WIP HoodieStorage abstraction [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10360:
URL: https://github.com/apache/hudi/pull/10360#issuecomment-1881226057

   
   ## CI report:
   
   * 0a958d6408a7d0107ae2dcfc2aae676fd1a6977d UNKNOWN
   * 1dded8d1f774627a49661bc2d1ca812da3caa540 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21686)
 
   * 6e1bcd405ba3a0d3edd11560da1e5377a16c7d33 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Hudi dbt incremental materialization not working during incremental dbt run with spark [hudi]

2024-01-08 Thread via GitHub


ad1happy2go commented on issue #10448:
URL: https://github.com/apache/hudi/issues/10448#issuecomment-1881216828

   Synced with @jetansi and ran his models in my setup and all of that working 
fine. So, should not be a hoodie issue but a setup issue. Shared the dbt setup 
configs I am using and he will try to fix the setup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10459:
URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881211302

   
   ## CI report:
   
   * e0a43ce9e388b4c8daf83c4ced333f8435de9991 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21862)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881116180

   
   ## CI report:
   
   * b5d7c6285d8bc0a0809beebae19a2a789fc55661 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21858)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1881101054

   
   ## CI report:
   
   * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10459:
URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881101296

   
   ## CI report:
   
   * 889a89640b0db39545469a625c0b961829f0aa0a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860)
 
   * e0a43ce9e388b4c8daf83c4ced333f8435de9991 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21862)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10352:
URL: https://github.com/apache/hudi/pull/10352#issuecomment-1881100486

   
   ## CI report:
   
   * b5d7c6285d8bc0a0809beebae19a2a789fc55661 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10459:
URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881023369

   
   ## CI report:
   
   * 889a89640b0db39545469a625c0b961829f0aa0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860)
 
   * e0a43ce9e388b4c8daf83c4ced333f8435de9991 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21862)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10459:
URL: https://github.com/apache/hudi/pull/10459#issuecomment-1881009514

   
   ## CI report:
   
   * 889a89640b0db39545469a625c0b961829f0aa0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860)
 
   * e0a43ce9e388b4c8daf83c4ced333f8435de9991 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


waitingF commented on code in PR #10459:
URL: https://github.com/apache/hudi/pull/10459#discussion_r1444569349


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -353,6 +353,18 @@ public class HoodieWriteConfig extends HoodieConfig {
   .markAdvanced()
   .withDocumentation("Size of in-memory buffer used for parallelizing 
network reads and lake storage writes.");
 
+  public static final ConfigProperty WRITE_BUFFER_RECORD_SAMPLING_RATE 
= ConfigProperty
+  .key("hoodie.write.buffer.record.sampling.rate")
+  .defaultValue(String.valueOf(64))
+  .markAdvanced()

Review Comment:
   sure, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1880929710

   
   ## CI report:
   
   * 1a4507ec8f1ef19381266816fd8dd58e9b73abdc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21856)
 
   * 30f390339ad03845b0068256f7938155cf9e08e9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10426:
URL: https://github.com/apache/hudi/pull/10426#issuecomment-1880917599

   
   ## CI report:
   
   * 1a4507ec8f1ef19381266816fd8dd58e9b73abdc Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21856)
 
   * 30f390339ad03845b0068256f7938155cf9e08e9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10457:
URL: https://github.com/apache/hudi/pull/10457#issuecomment-1880899016

   
   ## CI report:
   
   * 7c668bbb0b7cafeb9b6c4d302d6154c91beb366e Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21859)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7265] Support schema evolution by Flink SQL using HoodieHiveCatalog [hudi]

2024-01-08 Thread via GitHub


beyond1920 commented on code in PR #10426:
URL: https://github.com/apache/hudi/pull/10426#discussion_r1444512704


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -971,6 +952,59 @@ public void alterPartitionColumnStatistics(
 throw new HoodieCatalogException("Not supported.");
   }
 
+  public boolean isUpdatePermissible(ObjectPath tablePath, CatalogBaseTable 
newCatalogTable, boolean ignoreIfNotExists) throws TableNotExistException {
+if (!newCatalogTable.getOptions().getOrDefault(CONNECTOR.key(), 
"").equalsIgnoreCase("hudi")) {
+  throw new HoodieCatalogException(String.format("The %s is not hoodie 
table", tablePath.getObjectName()));
+}
+if (newCatalogTable instanceof CatalogView) {
+  throw new HoodieCatalogException("Hoodie catalog does not support to 
ALTER VIEW");
+}
+
+try {
+  Table hiveTable = getHiveTable(tablePath);
+  if (!sameOptions(hiveTable.getParameters(), 
newCatalogTable.getOptions(), FlinkOptions.TABLE_TYPE)
+  || !sameOptions(hiveTable.getParameters(), 
newCatalogTable.getOptions(), FlinkOptions.INDEX_TYPE)) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HUDI baseFile is empty String and this causes IllegalArgumentException [hudi]

2024-01-08 Thread via GitHub


ad1happy2go commented on issue #10458:
URL: https://github.com/apache/hudi/issues/10458#issuecomment-1880868263

   @nicholasxu Thanks for raising this. I am also getting this error while 
querying with 'read.streaming.enabled' and  'cdc.enabled' is true . Normal 
reads are running fine. We will look into it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10459:
URL: https://github.com/apache/hudi/pull/10459#issuecomment-1880838207

   
   ## CI report:
   
   * 889a89640b0db39545469a625c0b961829f0aa0a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21860)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


hudi-bot commented on PR #10459:
URL: https://github.com/apache/hudi/pull/10459#issuecomment-1880827643

   
   ## CI report:
   
   * 889a89640b0db39545469a625c0b961829f0aa0a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7279] make sampling rate configurable for BOUNDED_IN_MEMORY executor type [hudi]

2024-01-08 Thread via GitHub


codope commented on code in PR #10459:
URL: https://github.com/apache/hudi/pull/10459#discussion_r148100


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -353,6 +353,18 @@ public class HoodieWriteConfig extends HoodieConfig {
   .markAdvanced()
   .withDocumentation("Size of in-memory buffer used for parallelizing 
network reads and lake storage writes.");
 
+  public static final ConfigProperty WRITE_BUFFER_RECORD_SAMPLING_RATE 
= ConfigProperty
+  .key("hoodie.write.buffer.record.sampling.rate")
+  .defaultValue(String.valueOf(64))
+  .markAdvanced()

Review Comment:
   Please also add `.sinceVersion("1.0.0")` for both the configs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7278] make bloom filter skippable for CPU saving [hudi]

2024-01-08 Thread via GitHub


waitingF commented on code in PR #10457:
URL: https://github.com/apache/hudi/pull/10457#discussion_r135982


##
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileWriterFactory.java:
##
@@ -51,7 +51,7 @@ protected HoodieFileWriter newParquetFileWriter(
   String instantTime, Path path, Configuration conf, HoodieConfig config, 
Schema schema,
   TaskContextSupplier taskContextSupplier) throws IOException {
 boolean populateMetaFields = 
config.getBooleanOrDefault(HoodieTableConfig.POPULATE_META_FIELDS);
-boolean enableBloomFilter = populateMetaFields;
+boolean enableBloomFilter = populateMetaFields && 
config.getBooleanOrDefault(HoodieStorageConfig.PARQUET_WITH_BLOOM_FILTER_ENABLED);

Review Comment:
   Nice advice, will adjust



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >