[GitHub] [hudi] yyar commented on issue #7472: [SUPPORT] Too many metadata timeline file caused by old rollback active timeline

2022-12-29 Thread GitBox


yyar commented on issue #7472:
URL: https://github.com/apache/hudi/issues/7472#issuecomment-1367774930

   Thanks, @yihua That's good news. I'll check it maybe next week. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] stayrascal opened a new pull request, #7584: [HUDI-5205] support flink 1.16.0

2022-12-29 Thread GitBox


stayrascal opened a new pull request, #7584:
URL: https://github.com/apache/hudi/pull/7584

   ### Change Logs
   
   - support flink 1.16.0
   - Based on [PR](https://github.com/apache/hudi/pull/7397) 
 - copy the existing `adapters` from `hudi-flink1.15.x` to 
`hudi-flink1.16.x`
 - Add new adapters `StreamWriteOperatorCoordinatorAdapter` & 
`SortOperatorGenAdapter`  in each flink module
   
   ### Impact
   
   Low
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   the official documents need to be updated 
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #7562: [SUPPORT] How to Fire Async Compaction on Pyspark

2022-12-29 Thread GitBox


yihua commented on issue #7562:
URL: https://github.com/apache/hudi/issues/7562#issuecomment-1367767235

   @soumilshah1995 Thanks for raising the question.  `HoodieCompactor` is a 
Java class and the command-line arguments for spark-submit are parsed using 
JCommander, which is not compatible with PySpark.  One way to get around this 
is to call Java class in Python like 
[this](https://stackoverflow.com/questions/33544105/running-custom-java-class-in-pyspark),
 but then you have to construct `HoodieCompactor.Config` yourself to pass in 
relevant args.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #7536: [HUDI-5455] Add commons-configuration2 in hudi cli bundle

2022-12-29 Thread GitBox


yihua commented on code in PR #7536:
URL: https://github.com/apache/hudi/pull/7536#discussion_r1059263208


##
packaging/hudi-cli-bundle/pom.xml:
##
@@ -239,5 +241,11 @@
   httpclient
   ${http.version}
 
+
+  org.apache.commons
+  commons-configuration2

Review Comment:
   This is missing based on @rahil-c 's testing.  Without this, hudi-cli-bundle 
will fail with Spark 3.2 + Hadoop 3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-29 Thread GitBox


hudi-bot commented on PR #4966:
URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367755118

   
   ## CI report:
   
   * 24ea27ad2bc29400d8e5271f8f683662d0e0a93b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14048)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records

2022-12-29 Thread GitBox


hudi-bot commented on PR #7582:
URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367754215

   
   ## CI report:
   
   * 424e4a25b477f8aab3f8b4e5590023d20cca98f3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14049)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5420) Fix metadata table validator to exclude uncommitted log files in successful deltacommits

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-5420.
---
Resolution: Fixed

> Fix metadata table validator to exclude uncommitted log files in successful 
> deltacommits
> 
>
> Key: HUDI-5420
> URL: https://issues.apache.org/jira/browse/HUDI-5420
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When a write transaction writes uncommitted log files in a delta commit, 
> e.g., due to Spark task retries, these log files stay in the file system 
> after the successful delta commit for some time (unlike uncommitted base 
> files which are deleted based on the markers).  The delta commit metadata 
> does not contain these log files, and the metadata table does not contain 
> these entries either.  Currently, the metadata table validator does not 
> consider such valid case for discrepancy and thus throws errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5434:

Status: Patch Available  (was: In Progress)

> Fix archival in MDT to not rely on rollbacks/clean in DT
> 
>
> Key: HUDI-5434
> URL: https://issues.apache.org/jira/browse/HUDI-5434
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> as of now, archival in MDT is guarded until first entry in DT's active 
> timeline. but DT could contain rollback that could date back few days or even 
> weeks. So, we need to fix that to check for first write action in DT (commit, 
> delta commit, replace commit) and then guard MDT archival based on that. 
>  
> Impact:
> could result in huge no of entries in active timeline in MDT. might hamper 
> perf or throttling in cloud stores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] boneanxs commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


boneanxs commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367749797

   The test failure is caused by https://github.com/apache/hudi/pull/7582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope opened a new issue, #7583: [SUPPORT] Unable to query Partitioned COW Hudi tables with metadata enabled using Trino-Hudi Connector

2022-12-29 Thread GitBox


codope opened a new issue, #7583:
URL: https://github.com/apache/hudi/issues/7583

   **Describe the problem you faced**
   Original issue: https://github.com/trinodb/trino/issues/15368
   
   > Our team is testing the same on COPY ON WRITE  HUDI (0.10.1) tables with 
metadata enabled at version using Trino 400. And we are facing the error while 
reading from partitioned tables.
   > `Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex`.
   
   The issue was resolved by placing some dependencies in the classpath. 
Interestingly, those dependencies are [already included in the 
trino-hudi-bundle](https://github.com/apache/hudi/blob/release-0.12.1/packaging/hudi-trino-bundle/pom.xml#L69-L98).
 This particular issues tracks any gap in packaging.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   1. Write a Hudi COW table with the below properties and metadata enabled.
   2. Query the same table using the trino-hudi connector (properties mentioned 
below) with `hudi.metadata-enabled=true`.
   
   **Trino Hudi Connector Properties:**
   ```
   connector.name=hudi
   hive.metastore.uri={METASTORE_URI}
   hive.s3.iam-role={S3_IAM_ROLE}
   hive.metastore-refresh-interval=2m
   hive.metastore-timeout=3m
   hudi.max-outstanding-splits=1800
   hive.s3.max-error-retries=50
   hive.s3.connect-timeout=1m
   hive.s3.socket-timeout=2m
   hudi.parquet.use-column-names=true
   hudi.metadata-enabled=true
   ```
   
   **Hudi Properties set while writing:**
   ```
   hoodie.datasource.write.partitionpath.field = "insert_ds_ist",
   hoodie.datasource.write.recordkey.field = "id",
   hoodie.datasource.write.precombine.field = "_hoodie_incremental_key", (self 
generated column),
   hoodie.datasource.write.hive_style_partitioning = "true",
   hoodie.datasource.hive_sync.auto_create_database = "true",
   hoodie.parquet.compression.codec = "gzip",
   hoodie.table.name = "",
   hoodie.datasource.write.keygenerator.class = 
"org.apache.hudi.keygen.SimpleKeyGenerator",
   hoodie.datasource.write.table.type = "COPY_ON_WRITE",
   hoodie.metadata.enable = "true",
   hoodie.datasource.hive_sync.enable = "true",
   hoodie.datasource.hive_sync.partition_fields = "insert_ds_ist",
   hoodie.datasource.hive_sync.partition_extractor_class = 
"org.apache.hudi.hive.MultiPartKeysValueExtractor"
   ```
   
   **General information of table:**
   Total rows = 1,213,959,199
   Total Partitions = 2400+
   Total file objects = 120,000
   Total Size on S3 = 12~13 GB
   The table was upgraded from 0.9.0 to 0.10.1
   
   **Coordinator Relevant Logs:**
   
   **Expected behavior**
   
   They query should work out-of-the-box without having to place jars in 
classpath.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 2.4
   
   * Trino version : [400](https://github.com/trinodb/trino/tree/400)
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   Full stacktrace in 
   
[Partitioned_COW_Hudi_Coordinator_logs.log](https://github.com/apache/hudi/files/10323254/Partitioned_COW_Hudi_Coordinator_logs.log)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7580:
URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367736903

   
   ## CI report:
   
   * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045)
 
   * 363e7ec434dfac617a963387e65ffa1aa4b8308b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14050)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7580:
URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367735720

   
   ## CI report:
   
   * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045)
 
   * 363e7ec434dfac617a963387e65ffa1aa4b8308b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


hudi-bot commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367734589

   
   ## CI report:
   
   * 4de5c804a29ff11796ccae4308cbb2ce86def8e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14026)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14047)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367734470

   
   ## CI report:
   
   * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records

2022-12-29 Thread GitBox


hudi-bot commented on PR #7582:
URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367714336

   
   ## CI report:
   
   * 424e4a25b477f8aab3f8b4e5590023d20cca98f3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14049)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records

2022-12-29 Thread GitBox


hudi-bot commented on PR #7582:
URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367712487

   
   ## CI report:
   
   * 424e4a25b477f8aab3f8b4e5590023d20cca98f3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5488) Make sure Discrupt queue start first, then insert records

2022-12-29 Thread Hui An (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An updated HUDI-5488:
-
Description: 
We must to make sure to set up Disruptor's queue first, then producer can 
insert records to the queue. But currently we have no idea which thread start 
first, so this pr tries to fix it.

CompletableFuture consuming = startConsumingAsync();
CompletableFuture producing = startProducingAsync();

Also, I think the test TestDisruptorExecutionInSpark#testExecutor and 
TestDisruptorMessageQueue#testRecordReading failures relate to this bug.

> Make sure Discrupt queue start first, then insert records
> -
>
> Key: HUDI-5488
> URL: https://issues.apache.org/jira/browse/HUDI-5488
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core
>Reporter: Hui An
>Priority: Major
>  Labels: pull-request-available
>
> We must to make sure to set up Disruptor's queue first, then producer can 
> insert records to the queue. But currently we have no idea which thread start 
> first, so this pr tries to fix it.
> CompletableFuture consuming = startConsumingAsync();
> CompletableFuture producing = startProducingAsync();
> Also, I think the test TestDisruptorExecutionInSpark#testExecutor and 
> TestDisruptorMessageQueue#testRecordReading failures relate to this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] boneanxs commented on pull request #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records

2022-12-29 Thread GitBox


boneanxs commented on PR #7582:
URL: https://github.com/apache/hudi/pull/7582#issuecomment-1367711026

   @alexeykudinkin  @zhangyue19921010 Could you please help to take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5488) Make sure Discrupt queue start first, then insert records

2022-12-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5488:
-
Labels: pull-request-available  (was: )

> Make sure Discrupt queue start first, then insert records
> -
>
> Key: HUDI-5488
> URL: https://issues.apache.org/jira/browse/HUDI-5488
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: core
>Reporter: Hui An
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] boneanxs opened a new pull request, #7582: [HUDI-5488]Make sure Discrupt queue start first, then insert records

2022-12-29 Thread GitBox


boneanxs opened a new pull request, #7582:
URL: https://github.com/apache/hudi/pull/7582

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   We must to make sure to set up Disruptor's queue first, then producer can 
insert records to the queue. But currently we have no idea which thread start 
first, so this pr tries to fix it.
   
   ```java
   CompletableFuture consuming = startConsumingAsync();
   CompletableFuture producing = startProducingAsync();
   ```
   
   Also, I think the test `TestDisruptorExecutionInSpark#testExecutor` and 
`TestDisruptorMessageQueue#testRecordReading` failures relate to this bug.
   
   https://user-images.githubusercontent.com/10115332/210033047-7b3573ec-c43b-44b3-a898-c4269b6bfd14.png;>
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   none
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


hudi-bot commented on PR #7561:
URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367708973

   
   ## CI report:
   
   * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5488) Make sure Discrupt queue start first, then insert records

2022-12-29 Thread Hui An (Jira)
Hui An created HUDI-5488:


 Summary: Make sure Discrupt queue start first, then insert records
 Key: HUDI-5488
 URL: https://issues.apache.org/jira/browse/HUDI-5488
 Project: Apache Hudi
  Issue Type: Bug
  Components: core
Reporter: Hui An






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] danny0405 commented on a diff in pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


danny0405 commented on code in PR #7561:
URL: https://github.com/apache/hudi/pull/7561#discussion_r1059228760


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java:
##
@@ -210,11 +210,30 @@ public static HoodieDefaultTimeline 
getTimeline(HoodieTableMetaClient metaClient
 return activeTimeline;
   }
 
+  /**
+   * Returns a Hudi timeline with commits after the given instant time 
(exclusive).
+   *
+   * @param metaClient{@link HoodieTableMetaClient} instance.
+   * @param exclusiveStartInstantTime Start instant time (exclusive).
+   * @return Hudi timeline.
+   */
+  public static HoodieTimeline getCommitsTimelineAfter(
+  HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
+HoodieDefaultTimeline timeline =
+activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime)
+? metaClient.getArchivedTimeline(exclusiveStartInstantTime)
+.mergeTimeline(activeTimeline)
+: activeTimeline;
+return timeline.getCommitsTimeline()
+.findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE);
+  }

Review Comment:
   I mean if `activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime)` 
is true, the whole merged timeline should be scanned, there is no need to 
calling `#findInstantsAfter`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


danny0405 commented on code in PR #7571:
URL: https://github.com/apache/hudi/pull/7571#discussion_r1059227583


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestKeyRangeLookupTree.java:
##
@@ -68,7 +68,7 @@ public void 
testFileGroupLookUpManyEntriesWithSameStartValue() {
 updateExpectedMatchesToTest(toInsert);
 keyRangeLookupTree.insert(toInsert);
 for (int i = 0; i < 10; i++) {
-  endKey += 1 + RANDOM.nextInt(100);
+  endKey += 1 + RANDOM.nextInt(50);
   toInsert = new KeyRangeNode(startKey, Long.toString(endKey), 
UUID.randomUUID().toString());
   updateExpectedMatchesToTest(toInsert);

Review Comment:
   Yeah, the fix works, it is better if we can fix the record key comparing 
with Long instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-29 Thread GitBox


hudi-bot commented on PR #4966:
URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367694657

   
   ## CI report:
   
   * 24ea27ad2bc29400d8e5271f8f683662d0e0a93b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14048)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-29 Thread GitBox


hudi-bot commented on PR #4966:
URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367693592

   
   ## CI report:
   
   * 24ea27ad2bc29400d8e5271f8f683662d0e0a93b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


hudi-bot commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367691314

   
   ## CI report:
   
   * 4de5c804a29ff11796ccae4308cbb2ce86def8e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14026)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14047)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


hudi-bot commented on PR #7561:
URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367691272

   
   ## CI report:
   
   * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on a diff in pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


boneanxs commented on code in PR #7571:
URL: https://github.com/apache/hudi/pull/7571#discussion_r1059217632


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestKeyRangeLookupTree.java:
##
@@ -68,7 +68,7 @@ public void 
testFileGroupLookUpManyEntriesWithSameStartValue() {
 updateExpectedMatchesToTest(toInsert);
 keyRangeLookupTree.insert(toInsert);
 for (int i = 0; i < 10; i++) {
-  endKey += 1 + RANDOM.nextInt(100);
+  endKey += 1 + RANDOM.nextInt(50);
   toInsert = new KeyRangeNode(startKey, Long.toString(endKey), 
UUID.randomUUID().toString());
   updateExpectedMatchesToTest(toInsert);

Review Comment:
   As `KeyRangeNode` stores recordValue, which is always string value, 
`KeyRangeNode` doesn't need to compare with other type. I think the test 
purpose here wants to use Long's order to represent string's order to test 
`KeyRangeNode` function, so it can work if we force the `endKey` not exceed 
1000.
   
   Before the fix, the endKey's maxValue could be 100 * 10 + 250 = 1250, which 
can exceed 1000. As I forcily set `RANDOM` cannot get value exceed than 50 for 
each iteration, and the max iteration number is 10, so the endKey cannot exceed 
50 * 10 + 250(which is 750), smaller than 1000, so in this range, Long's order 
is same as the string's order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


boneanxs commented on PR #7571:
URL: https://github.com/apache/hudi/pull/7571#issuecomment-1367689666

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


hudi-bot commented on PR #7561:
URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367689402

   
   ## CI report:
   
   * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367689266

   
   ## CI report:
   
   * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14046)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] boneanxs commented on a diff in pull request #7571: [HUDI-4710]Fix flaky: TestKeyRangeLookupTree#testFileGroupLookUpManyEntriesWithSameStartValue

2022-12-29 Thread GitBox


boneanxs commented on code in PR #7571:
URL: https://github.com/apache/hudi/pull/7571#discussion_r1059217632


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/bloom/TestKeyRangeLookupTree.java:
##
@@ -68,7 +68,7 @@ public void 
testFileGroupLookUpManyEntriesWithSameStartValue() {
 updateExpectedMatchesToTest(toInsert);
 keyRangeLookupTree.insert(toInsert);
 for (int i = 0; i < 10; i++) {
-  endKey += 1 + RANDOM.nextInt(100);
+  endKey += 1 + RANDOM.nextInt(50);
   toInsert = new KeyRangeNode(startKey, Long.toString(endKey), 
UUID.randomUUID().toString());
   updateExpectedMatchesToTest(toInsert);

Review Comment:
   As `KeyRangeNode` stores recordValue, which is always string value, 
`KeyRangeNode` doesn't need to compare with other type. I think the test 
purpose here wants to use Long's order to represent string's order to test 
`KeyRangeNode` function, so it can work if we force the `endKey` not exceed 
1000.
   
   Before the fix, the endKey's maxValue could be 101 * 10 + 250 = 1260, which 
can exceed 1000. As I forcily set `RANDOM` cannot get value exceed than 50 for 
each iteration, and the max iteration number is 10, so the endKey cannot exceed 
51 * 10 + 250(which is 760), smaller than 1000, so in this range, Long's order 
is same as the string's order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


xicm commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367685026

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-29 Thread GitBox


SteNicholas commented on PR #7568:
URL: https://github.com/apache/hudi/pull/7568#issuecomment-1367682208

   @yihua, could you please review this pull request? @leesf has approved this 
changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7580:
URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367669253

   
   ## CI report:
   
   * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


hudi-bot commented on PR #7561:
URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367638148

   
   ## CI report:
   
   * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua merged pull request #7581: [MINOR][BLOG] - 2022 Blog post

2022-12-29 Thread GitBox


yihua merged PR #7581:
URL: https://github.com/apache/hudi/pull/7581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS][BLOG] 2022 Blog post (#7581)

2022-12-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 4880154bb11 [DOCS][BLOG] 2022 Blog post (#7581)
4880154bb11 is described below

commit 4880154bb1152353acbcc51b6390176e6d1e926b
Author: Kyle Weller 
AuthorDate: Thu Dec 29 15:45:29 2022 -0700

[DOCS][BLOG] 2022 Blog post (#7581)
---
 ...2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md |  89 +
 .../assets/images/blog/Apache-Hudi-2022-Review.png | Bin 0 -> 664778 bytes
 .../assets/images/blog/Apache-Hudi-Conferences.png | Bin 0 -> 6480488 bytes
 .../blog/Apache-Hudi-Pull-Request-History.png  | Bin 0 -> 296199 bytes
 4 files changed, 89 insertions(+)

diff --git a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md 
b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
new file mode 100644
index 000..82246324766
--- /dev/null
+++ b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
@@ -0,0 +1,89 @@
+---
+title: "Apache Hudi 2022 - A year in Review"
+excerpt: "2022 was the best year for Apache Hudi yet! Huge thank you to 
everyone who contributed!"
+author: Sivabalan Narayanan
+category: blog
+image: /assets/images/blog/Apache-Hudi-2022-Review.png
+tags:
+- apache hudi
+---
+
+
+
+## Apache Hudi Momentum
+As we wrap up 2022 I want to take the opportunity to reflect on and highlight 
the incredible progress of the Apache Hudi 
+project and most importantly, the community. First and foremost, I want to 
thank all of the contributors who have made 
+2022 the best year for the project ever. There were [over 2,200 
PRs](https://ossinsight.io/analyze/apache/hudi#pull-requests) 
+created (+38% YoY) and over 600+ users engaged on 
[Github](https://github.com/apache/hudi/). The Apache Hudi community 
+[slack 
channel](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w)
 has grown to more 
+than 2,600 users (+100% YoY growth) averaging nearly 200 messages per month! 
The most impressive stat is that with this 
+volume growth, the median response time to questions is ~3h. [Come join the 
community](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w)
 
+where people are sharing and helping each other!
+
+
+
+## Key Releases in 2022
+2022 has been a year jam packed with exciting new features for Apache Hudi 
across 0.11.0 and 0.12.0 releases. In addition to new features, 
vendor/ecosystem partnerships and relationships have been strengthened across 
many in the community. [AWS continues to double 
down](https://www.onehouse.ai/blog/apache-hudi-native-aws-integrations) on 
Apache Hudi, upgrading versions in 
[EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi.html), 
[Athena](https://docs.aws.amazon.com/athena [...]
+
+While there are too many features added in 2022 to list them all, take a look 
at some of the exciting highlights:
+
+- [Multi-Modal 
Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 is a first-of-its-kind high-performance indexing subsystem for the Lakehouse. 
It improves metadata lookup performance by up to 100x and reduces overall query 
latency by up to 30x. Two new indices were added to the metadata table - Bloom 
filter index that enables faster upsert performance and[  column stats index 
along with Data skipping](https://hudi.apache.org/bl [...]
+- Hudi added support for [asynchronous 
indexing](https://hudi.apache.org/releases/release-0.11.0/#async-indexer) to 
assist building such indices without blocking ingestion so that regular writers 
don't need to scale up resources for such one off spikes.
+- A new type of index called Bucket Index was introduced this year. This could 
be game changing for deterministic workloads with partitioned datasets. It is 
very light-weight and allows the distribution of records to buckets using a 
hash function.
+- Filesystem based Lock Provider - This implementation avoids the need of 
external systems and leverages the abilities of underlying filesystem to 
support lock provider needed for optimistic concurrency control in case of 
multiple writers. Please check the [lock 
configuration](https://hudi.apache.org/docs/configurations#Locks-Configurations)
 for details.
+- Deltastreamer Graceful Completion - Users can now configure a post-write 
completion strategy with deltastreamer continuous mode for graceful shutdown.
+- Schema on read is supported as an experimental feature since 0.11.0, 
allowing users to leverage Spark SQL DDL  support for [evolving data 
schema](https://hudi.apache.org/docs/schema_evolution) needs(drop, rename etc). 
 Added support for a lot of [CALL 
commands](https://hudi.apache.org/docs/procedures/) to invoke an array of 
actions on Hudi tables.
+- It is now feasible to 

[GitHub] [hudi] kywe665 commented on pull request #7581: [MINOR][BLOG] - 2022 Blog post

2022-12-29 Thread GitBox


kywe665 commented on PR #7581:
URL: https://github.com/apache/hudi/pull/7581#issuecomment-1367617115

   preview
   https://user-images.githubusercontent.com/1703248/210017096-a1fbd3c0-07ee-43a3-a794-eee6f555ee05.png;>
   https://user-images.githubusercontent.com/1703248/210017128-bf5c643f-b5b5-44f4-8e24-fed0c684cfde.png;>
   https://user-images.githubusercontent.com/1703248/210017182-da84774b-83b8-41c3-9904-3dc46099dede.png;>
   https://user-images.githubusercontent.com/1703248/210017165-6937efcd-4ab4-4544-8a15-2ddaf04ffd13.png;>
   https://user-images.githubusercontent.com/1703248/210017149-358a26be-5f05-4f45-b19b-3e923122d104.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7580:
URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367613254

   
   ## CI report:
   
   * df101606342f8b91be6cc232d99d7009c4577ed9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14045)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7580:
URL: https://github.com/apache/hudi/pull/7580#issuecomment-1367611336

   
   ## CI report:
   
   * df101606342f8b91be6cc232d99d7009c4577ed9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kywe665 opened a new pull request, #7581: [MINOR][BLOG] - 2022 Blog post

2022-12-29 Thread GitBox


kywe665 opened a new pull request, #7581:
URL: https://github.com/apache/hudi/pull/7581

   ### Change Logs
   
   added a blog post and images to docs site
   
   ### Impact
   
   no impact
   
   ### Risk level (write none, low medium or high below)
   
   none, docs only
   
   ### Documentation Update
   
   n/a
   
   ### Contributor's checklist
   
   - [X] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [X] Change Logs and Impact were stated clearly
   - [X] Adequate tests were added if applicable
   - [X] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #7472: [SUPPORT] Too many metadata timeline file caused by old rollback active timeline

2022-12-29 Thread GitBox


yihua commented on issue #7472:
URL: https://github.com/apache/hudi/issues/7472#issuecomment-1367591309

   Hi @yyar I've put up the fix #7580 and verified locally that it works.  
Could you try it and see if it solves your problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT

2022-12-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5434:
-
Labels: pull-request-available  (was: )

> Fix archival in MDT to not rely on rollbacks/clean in DT
> 
>
> Key: HUDI-5434
> URL: https://issues.apache.org/jira/browse/HUDI-5434
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> as of now, archival in MDT is guarded until first entry in DT's active 
> timeline. but DT could contain rollback that could date back few days or even 
> weeks. So, we need to fix that to check for first write action in DT (commit, 
> delta commit, replace commit) and then guard MDT archival based on that. 
>  
> Impact:
> could result in huge no of entries in active timeline in MDT. might hamper 
> perf or throttling in cloud stores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua opened a new pull request, #7580: [HUDI-5434] Fix archival in metadata table to not rely on completed rollback or clean in data table

2022-12-29 Thread GitBox


yihua opened a new pull request, #7580:
URL: https://github.com/apache/hudi/pull/7580

   ### Change Logs
   
   Before this PR, the archival for the metadata table uses the earliest 
instant of all actions from the active timeline of the data table.  In the 
archival process, CLEAN and ROLLBACK instants are archived separately apart 
from commits (check HoodieTimelineArchiver#getCleanInstantsToArchive).  Because 
of this, a very old completed CLEAN or ROLLBACK instant in the data table can 
block the archive of the metadata table timeline and causes the active timeline 
of the metadata table to be extremely long, leading to performance issues for 
loading the timeline.
   
   This PR changes the archival in metadata table to not rely on completed 
rollback or clean in data table, by archiving the metadata table's instants 
after the earliest commit (COMMIT, DELTA_COMMIT, and REPLACE_COMMIT only) and 
the earliest inflight instant (all actions) in the data table's active timeline.
   
   The savepoints are seamlessly handled here, i.e., the completed savepoints 
do not affect the archive process in the metadata table.
   
   ### Impact
   
   Makes the active timeline of the metadata table shorter and improves the 
performance of loading the active timeline of the metadata table.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (495b6fbb062 -> fb28ad8f737)

2022-12-29 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 495b6fbb062 [HUDI-5332] HiveSyncTool can avoid initializing all 
permanent custom functions of Hive (#7385)
 add fb28ad8f737 [HUDI-5420] Fix metadata table validator to exclude 
uncommitted log files due to retry (#7517)

No new revisions were added by this update.

Summary of changes:
 .../utilities/HoodieMetadataTableValidator.java| 109 +
 1 file changed, 91 insertions(+), 18 deletions(-)



[GitHub] [hudi] yihua merged pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry

2022-12-29 Thread GitBox


yihua merged PR #7517:
URL: https://github.com/apache/hudi/pull/7517


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


hudi-bot commented on PR #7561:
URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367579907

   
   ## CI report:
   
   * df28b5141ea2b920a55149668c12ebda1416194a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13979)
 
   * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14043)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry

2022-12-29 Thread GitBox


hudi-bot commented on PR #7517:
URL: https://github.com/apache/hudi/pull/7517#issuecomment-1367579852

   
   ## CI report:
   
   * eaa6d00e2952cd6b1dc6d67d9d06df99eb882b98 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14023)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7561: [HUDI-5477] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


hudi-bot commented on PR #7561:
URL: https://github.com/apache/hudi/pull/7561#issuecomment-1367577262

   
   ## CI report:
   
   * df28b5141ea2b920a55149668c12ebda1416194a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13979)
 
   * 40361ca7dd3d4cd00a6f154c30f17f2a6a5a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7448:
URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367577123

   
   ## CI report:
   
   * 9314399c3b40e65689ffeeade5be40ed289563f0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14042)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5477:

Status: Patch Available  (was: In Progress)

> Optimize timeline loading in Hudi sync client
> -
>
> Key: HUDI-5477
> URL: https://issues.apache.org/jira/browse/HUDI-5477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, meta-sync
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The Hudi archived timeline is always loaded during the metastore sync process 
> if the last sync time is given. Besides, the archived timeline is not cached 
> inside the meta client if the start instant time is given. These cause 
> performance issues and read timeout on cloud storage due to rate limiting on 
> requests because of loading archived timeline from the storage, when the 
> archived timeline is huge, e.g., hundreds of log files in 
> {{.hoodie/archived}} folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5477:

Status: In Progress  (was: Open)

> Optimize timeline loading in Hudi sync client
> -
>
> Key: HUDI-5477
> URL: https://issues.apache.org/jira/browse/HUDI-5477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, meta-sync
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The Hudi archived timeline is always loaded during the metastore sync process 
> if the last sync time is given. Besides, the archived timeline is not cached 
> inside the meta client if the start instant time is given. These cause 
> performance issues and read timeout on cloud storage due to rate limiting on 
> requests because of loading archived timeline from the storage, when the 
> archived timeline is huge, e.g., hundreds of log files in 
> {{.hoodie/archived}} folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5477:

Story Points: 2

> Optimize timeline loading in Hudi sync client
> -
>
> Key: HUDI-5477
> URL: https://issues.apache.org/jira/browse/HUDI-5477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, meta-sync
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The Hudi archived timeline is always loaded during the metastore sync process 
> if the last sync time is given. Besides, the archived timeline is not cached 
> inside the meta client if the start instant time is given. These cause 
> performance issues and read timeout on cloud storage due to rate limiting on 
> requests because of loading archived timeline from the storage, when the 
> archived timeline is huge, e.g., hundreds of log files in 
> {{.hoodie/archived}} folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5477:

Reviewers: Danny Chen

> Optimize timeline loading in Hudi sync client
> -
>
> Key: HUDI-5477
> URL: https://issues.apache.org/jira/browse/HUDI-5477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, meta-sync
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The Hudi archived timeline is always loaded during the metastore sync process 
> if the last sync time is given. Besides, the archived timeline is not cached 
> inside the meta client if the start instant time is given. These cause 
> performance issues and read timeout on cloud storage due to rate limiting on 
> requests because of loading archived timeline from the storage, when the 
> archived timeline is huge, e.g., hundreds of log files in 
> {{.hoodie/archived}} folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5477:

Sprint: 0.13.0 Final Sprint

> Optimize timeline loading in Hudi sync client
> -
>
> Key: HUDI-5477
> URL: https://issues.apache.org/jira/browse/HUDI-5477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, meta-sync
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> The Hudi archived timeline is always loaded during the metastore sync process 
> if the last sync time is given. Besides, the archived timeline is not cached 
> inside the meta client if the start instant time is given. These cause 
> performance issues and read timeout on cloud storage due to rate limiting on 
> requests because of loading archived timeline from the storage, when the 
> archived timeline is huge, e.g., hundreds of log files in 
> {{.hoodie/archived}} folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-5486) Update 0.12.x release notes with Long Term Support

2022-12-29 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-5486.
---
Resolution: Fixed

> Update 0.12.x release notes with Long Term Support 
> ---
>
> Key: HUDI-5486
> URL: https://issues.apache.org/jira/browse/HUDI-5486
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on a diff in pull request #7561: [HUDI-5477][DO NOT MERGE] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


yihua commented on code in PR #7561:
URL: https://github.com/apache/hudi/pull/7561#discussion_r1059134988


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() {
   }
 
   /**
-   * Returns fresh new archived commits as a timeline from startTs (inclusive).
-   *
-   * This is costly operation if really early endTs is specified.
-   * Be caution to use this only when the time range is short.
-   *
-   * This method is not thread safe.
+   * Returns the cached archived timeline from startTs (inclusive).
*
-   * @return Archived commit timeline
+   * @param startTs The start instant time (inclusive) of the archived 
timeline.
+   * @return the archived timeline.
*/
   public HoodieArchivedTimeline getArchivedTimeline(String startTs) {
-return new HoodieArchivedTimeline(this, startTs);
+return getArchivedTimeline(startTs, true);
+  }
+
+  /**
+   * Returns the cached archived timeline if using in-memory cache or a fresh 
new archived
+   * timeline if not using cache, from startTs (inclusive).
+   * 
+   * Instantiating an archived timeline is costly operation if really early 
startTs is
+   * specified.
+   * 
+   * This method is not thread safe.
+   *
+   * @param startTs  The start instant time (inclusive) of the archived 
timeline.
+   * @param useCache Whether to use in-memory cache.
+   * @return the archived timeline based on the arguments.
+   */
+  public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean 
useCache) {
+if (useCache) {
+  return archivedTimelineMap.computeIfAbsent(startTs, 
this::instantiateArchivedTimeline);

Review Comment:
   This is fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry

2022-12-29 Thread GitBox


hudi-bot commented on PR #7517:
URL: https://github.com/apache/hudi/pull/7517#issuecomment-1367536674

   
   ## CI report:
   
   * eaa6d00e2952cd6b1dc6d67d9d06df99eb882b98 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14023)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry

2022-12-29 Thread GitBox


hudi-bot commented on PR #7517:
URL: https://github.com/apache/hudi/pull/7517#issuecomment-1367534143

   
   ## CI report:
   
   * eaa6d00e2952cd6b1dc6d67d9d06df99eb882b98 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7371: [HUDI-3673] Clean up hbase shading dependencies

2022-12-29 Thread GitBox


hudi-bot commented on PR #7371:
URL: https://github.com/apache/hudi/pull/7371#issuecomment-1367533938

   
   ## CI report:
   
   * f3d658be1ab30458c286ace26ec67b4715e188fe Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #7561: [HUDI-5477][DO NOT MERGE] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


yihua commented on code in PR #7561:
URL: https://github.com/apache/hudi/pull/7561#discussion_r1059107210


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() {
   }
 
   /**
-   * Returns fresh new archived commits as a timeline from startTs (inclusive).
-   *
-   * This is costly operation if really early endTs is specified.
-   * Be caution to use this only when the time range is short.
-   *
-   * This method is not thread safe.
+   * Returns the cached archived timeline from startTs (inclusive).
*
-   * @return Archived commit timeline
+   * @param startTs The start instant time (inclusive) of the archived 
timeline.
+   * @return the archived timeline.
*/
   public HoodieArchivedTimeline getArchivedTimeline(String startTs) {
-return new HoodieArchivedTimeline(this, startTs);
+return getArchivedTimeline(startTs, true);
+  }
+
+  /**
+   * Returns the cached archived timeline if using in-memory cache or a fresh 
new archived
+   * timeline if not using cache, from startTs (inclusive).
+   * 
+   * Instantiating an archived timeline is costly operation if really early 
startTs is
+   * specified.
+   * 
+   * This method is not thread safe.
+   *
+   * @param startTs  The start instant time (inclusive) of the archived 
timeline.
+   * @param useCache Whether to use in-memory cache.
+   * @return the archived timeline based on the arguments.
+   */
+  public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean 
useCache) {
+if (useCache) {
+  return archivedTimelineMap.computeIfAbsent(startTs, 
this::instantiateArchivedTimeline);

Review Comment:
   The assumption is that there should be only one `startTs` in the cache so 
there is no need to clear it and the cache is destructed once the lifecycle of 
the meta client is over.  I can make it cleared if there is a new `startTs` 
coming in.



##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java:
##
@@ -210,11 +210,30 @@ public static HoodieDefaultTimeline 
getTimeline(HoodieTableMetaClient metaClient
 return activeTimeline;
   }
 
+  /**
+   * Returns a Hudi timeline with commits after the given instant time 
(exclusive).
+   *
+   * @param metaClient{@link HoodieTableMetaClient} instance.
+   * @param exclusiveStartInstantTime Start instant time (exclusive).
+   * @return Hudi timeline.
+   */
+  public static HoodieTimeline getCommitsTimelineAfter(
+  HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
+HoodieDefaultTimeline timeline =
+activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime)
+? metaClient.getArchivedTimeline(exclusiveStartInstantTime)
+.mergeTimeline(activeTimeline)
+: activeTimeline;
+return timeline.getCommitsTimeline()
+.findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE);
+  }

Review Comment:
   We need to scan all the instants since `exclusiveStartInstantTime` to figure 
out the touched partitions and it is possible that `exclusiveStartInstantTime` 
is before the start of the archived timeline, in which case we need to still 
scan the archived timeline (see #6662 for details).  In most of the cases, 
`exclusiveStartInstantTime` should be after the start of the active timeline, 
so the archived timeline is not loaded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7448:
URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367493784

   
   ## CI report:
   
   * b79a063798079dfdb34d61dc57ec0341e93d7c57 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14041)
 
   * 9314399c3b40e65689ffeeade5be40ed289563f0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14042)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config

2022-12-29 Thread GitBox


hudi-bot commented on PR #7575:
URL: https://github.com/apache/hudi/pull/7575#issuecomment-1367491053

   
   ## CI report:
   
   * a35c9c05aec17c775e39c0472fbe952178b2f60e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14021)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14024)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14039)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7448:
URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367490841

   
   ## CI report:
   
   * eaed1745913960ef5e40a323eedaeaf96438c5fb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13695)
 
   * b79a063798079dfdb34d61dc57ec0341e93d7c57 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14041)
 
   * 9314399c3b40e65689ffeeade5be40ed289563f0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7448:
URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367454386

   
   ## CI report:
   
   * eaed1745913960ef5e40a323eedaeaf96438c5fb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13695)
 
   * b79a063798079dfdb34d61dc57ec0341e93d7c57 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14041)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7371: [HUDI-3673] Clean up hbase shading dependencies

2022-12-29 Thread GitBox


hudi-bot commented on PR #7371:
URL: https://github.com/apache/hudi/pull/7371#issuecomment-1367454213

   
   ## CI report:
   
   * 2f69501f430d9e536a78b65e38e91cc710c69832 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13432)
 
   * f3d658be1ab30458c286ace26ec67b4715e188fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14040)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5160) Spark df saveAsTable failed with CTAS

2022-12-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5160:
-
Status: In Progress  (was: Open)

> Spark df saveAsTable failed with CTAS
> -
>
> Key: HUDI-5160
> URL: https://issues.apache.org/jira/browse/HUDI-5160
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: 董可伦
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> In 0.9.0 Version,It's ok,But now failed
> {code:java}
> import spark.implicits._
> val partitionValue = "2022-11-05"
> val df = Seq((1, "a1", 10, 1000, partitionValue)).toDF("id", "name", "value", 
> "ts", "dt")
> val tableName = "test_hudi_table"
> // Write a table by spark dataframe.
> df.write.format("hudi")
> .option(HoodieWriteConfig.TBL_NAME.key, tableName)
> .option(TABLE_TYPE.key, MOR_TABLE_TYPE_OPT_VAL)
> // .option(HoodieTableConfig.TYPE.key(), MOR_TABLE_TYPE_OPT_VAL)
> .option(RECORDKEY_FIELD.key, "id")
> .option(PRECOMBINE_FIELD.key, "ts")
> .option(PARTITIONPATH_FIELD.key, "dt")
> .option(KEYGENERATOR_CLASS_NAME.key, classOf[SimpleKeyGenerator].getName)
> .option(HoodieWriteConfig.INSERT_PARALLELISM_VALUE.key, "1")
> .option(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key, "1")
> .partitionBy("dt")
> .mode(SaveMode.Overwrite)
> .saveAsTable(tableName){code}
>  
> {code:java}
> Can't find primaryKey `uuid` in root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- id: integer (nullable = false)
>  |-- name: string (nullable = true)
>  |-- value: integer (nullable = false)
>  |-- ts: integer (nullable = false)
>  |-- dt: string (nullable = true)
> .
> java.lang.IllegalArgumentException: Can't find primaryKey `uuid` in root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- id: integer (nullable = false)
>  |-- name: string (nullable = true)
>  |-- value: integer (nullable = false)
>  |-- ts: integer (nullable = false)
>  |-- dt: string (nullable = true)
> .
>     at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
>     at 
> org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:201)
>     at 
> org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:200)
>     at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>     at 
> org.apache.spark.sql.hudi.HoodieOptionConfig$.validateTable(HoodieOptionConfig.scala:200)
>     at 
> org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.parseSchemaAndConfigs(HoodieCatalogTable.scala:256)
>     at 
> org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.initHoodieTable(HoodieCatalogTable.scala:171)
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:99){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5160) Spark df saveAsTable failed with CTAS

2022-12-29 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5160:
-
Status: Patch Available  (was: In Progress)

> Spark df saveAsTable failed with CTAS
> -
>
> Key: HUDI-5160
> URL: https://issues.apache.org/jira/browse/HUDI-5160
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: 董可伦
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> In 0.9.0 Version,It's ok,But now failed
> {code:java}
> import spark.implicits._
> val partitionValue = "2022-11-05"
> val df = Seq((1, "a1", 10, 1000, partitionValue)).toDF("id", "name", "value", 
> "ts", "dt")
> val tableName = "test_hudi_table"
> // Write a table by spark dataframe.
> df.write.format("hudi")
> .option(HoodieWriteConfig.TBL_NAME.key, tableName)
> .option(TABLE_TYPE.key, MOR_TABLE_TYPE_OPT_VAL)
> // .option(HoodieTableConfig.TYPE.key(), MOR_TABLE_TYPE_OPT_VAL)
> .option(RECORDKEY_FIELD.key, "id")
> .option(PRECOMBINE_FIELD.key, "ts")
> .option(PARTITIONPATH_FIELD.key, "dt")
> .option(KEYGENERATOR_CLASS_NAME.key, classOf[SimpleKeyGenerator].getName)
> .option(HoodieWriteConfig.INSERT_PARALLELISM_VALUE.key, "1")
> .option(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key, "1")
> .partitionBy("dt")
> .mode(SaveMode.Overwrite)
> .saveAsTable(tableName){code}
>  
> {code:java}
> Can't find primaryKey `uuid` in root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- id: integer (nullable = false)
>  |-- name: string (nullable = true)
>  |-- value: integer (nullable = false)
>  |-- ts: integer (nullable = false)
>  |-- dt: string (nullable = true)
> .
> java.lang.IllegalArgumentException: Can't find primaryKey `uuid` in root
>  |-- _hoodie_commit_time: string (nullable = true)
>  |-- _hoodie_commit_seqno: string (nullable = true)
>  |-- _hoodie_record_key: string (nullable = true)
>  |-- _hoodie_partition_path: string (nullable = true)
>  |-- _hoodie_file_name: string (nullable = true)
>  |-- id: integer (nullable = false)
>  |-- name: string (nullable = true)
>  |-- value: integer (nullable = false)
>  |-- ts: integer (nullable = false)
>  |-- dt: string (nullable = true)
> .
>     at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
>     at 
> org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:201)
>     at 
> org.apache.spark.sql.hudi.HoodieOptionConfig$$anonfun$validateTable$1.apply(HoodieOptionConfig.scala:200)
>     at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>     at 
> org.apache.spark.sql.hudi.HoodieOptionConfig$.validateTable(HoodieOptionConfig.scala:200)
>     at 
> org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.parseSchemaAndConfigs(HoodieCatalogTable.scala:256)
>     at 
> org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable.initHoodieTable(HoodieCatalogTable.scala:171)
>     at 
> org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:99){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7448: [HUDI-5160] Fix data source write save as table

2022-12-29 Thread GitBox


hudi-bot commented on PR #7448:
URL: https://github.com/apache/hudi/pull/7448#issuecomment-1367450975

   
   ## CI report:
   
   * eaed1745913960ef5e40a323eedaeaf96438c5fb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13695)
 
   * b79a063798079dfdb34d61dc57ec0341e93d7c57 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7371: [HUDI-3673] Clean up hbase shading dependencies

2022-12-29 Thread GitBox


hudi-bot commented on PR #7371:
URL: https://github.com/apache/hudi/pull/7371#issuecomment-1367450840

   
   ## CI report:
   
   * 2f69501f430d9e536a78b65e38e91cc710c69832 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13432)
 
   * f3d658be1ab30458c286ace26ec67b4715e188fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-12-29 Thread GitBox


xushiyan commented on PR #7139:
URL: https://github.com/apache/hudi/pull/7139#issuecomment-1367445735

   closing in favor of #7448 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-12-29 Thread GitBox


xushiyan closed pull request #7139: [HUDI-5160] Spark df saveAsTable failed 
with CTAS
URL: https://github.com/apache/hudi/pull/7139


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367443562

   
   ## CI report:
   
   * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #7578: [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support

2022-12-29 Thread GitBox


yihua commented on code in PR #7578:
URL: https://github.com/apache/hudi/pull/7578#discussion_r1059039497


##
website/releases/download.md:
##
@@ -7,14 +7,17 @@ last_modified_at: 2022-12-27T15:59:57-04:00
 ---
 
 ### Release 0.12.2
+* [Long Term Support](/releases/release-0.12.2#long-term-support): this is the 
latest stable release
 * Source Release : [Apache Hudi 0.12.2 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.2/hudi-0.12.2.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.sha512))

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support (#7578)

2022-12-29 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c18d9621a1c [HUDI-5486][DOCS] Update 0.12.x release notes with Long 
Term Support (#7578)
c18d9621a1c is described below

commit c18d9621a1c375c39bd5aaeb57ca13635753e601
Author: Y Ethan Guo 
AuthorDate: Thu Dec 29 08:10:13 2022 -0800

[HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support (#7578)
---
 website/releases/download.md   | 3 +++
 website/releases/release-0.12.0.md | 5 +
 website/releases/release-0.12.1.md | 5 +
 website/releases/release-0.12.2.md | 5 +
 4 files changed, 18 insertions(+)

diff --git a/website/releases/download.md b/website/releases/download.md
index 609fdff5862..e7ceb1d5c56 100644
--- a/website/releases/download.md
+++ b/website/releases/download.md
@@ -7,14 +7,17 @@ last_modified_at: 2022-12-27T15:59:57-04:00
 ---
 
 ### Release 0.12.2
+* [Long Term Support](/releases/release-0.12.2#long-term-support): this is the 
latest stable release
 * Source Release : [Apache Hudi 0.12.2 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.2/hudi-0.12.2.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.12.2/hudi-0.12.2.src.tgz.sha512))
 * Release Note : ([Release Note for Apache Hudi 
0.12.2](/releases/release-0.12.2))
 
 ### Release 0.12.1
+* [Long Term Support](/releases/release-0.12.1#long-term-support): upgrade to 
[0.12.2](/releases/release-0.12.2) for the latest stable release
 * Source Release : [Apache Hudi 0.12.1 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.1/hudi-0.12.1.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.12.1/hudi-0.12.1.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.12.1/hudi-0.12.1.src.tgz.sha512))
 * Release Note : ([Release Note for Apache Hudi 
0.12.1](/releases/release-0.12.1))
 
 ### Release 0.12.0
+* [Long Term Support](/releases/release-0.12.0#long-term-support): upgrade to 
[0.12.2](/releases/release-0.12.2) for the latest stable release
 * Source Release : [Apache Hudi 0.12.0 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.12.0/hudi-0.12.0.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.12.0/hudi-0.12.0.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.12.0/hudi-0.12.0.src.tgz.sha512))
 * Release Note : ([Release Note for Apache Hudi 
0.12.0](/releases/release-0.12.0))
 
diff --git a/website/releases/release-0.12.0.md 
b/website/releases/release-0.12.0.md
index ba072cb0423..fe764b5dd8a 100644
--- a/website/releases/release-0.12.0.md
+++ b/website/releases/release-0.12.0.md
@@ -7,6 +7,11 @@ last_modified_at: 2022-08-17T10:30:00+05:30
 ---
 # [Release 0.12.0](https://github.com/apache/hudi/releases/tag/release-0.12.0) 
([docs](/docs/quick-start-guide))
 
+## Long Term Support
+
+We aim to maintain 0.12 for a longer period of time and provide a stable 
release through the latest 0.12.x release for
+users to migrate to.  The latest 0.12 release is 
[0.12.2](/releases/release-0.12.2).
+
 ## Migration Guide
 
 In this release, there have been a few API and configuration updates listed 
below that warranted a new table version.
diff --git a/website/releases/release-0.12.1.md 
b/website/releases/release-0.12.1.md
index 709c8adbdcc..dbd98f98ed9 100644
--- a/website/releases/release-0.12.1.md
+++ b/website/releases/release-0.12.1.md
@@ -7,6 +7,11 @@ last_modified_at: 2022-08-17T10:30:00+05:30
 ---
 # [Release 0.12.1](https://github.com/apache/hudi/releases/tag/release-0.12.1) 
([docs](/docs/quick-start-guide))
 
+## Long Term Support
+
+We aim to maintain 0.12 for a longer period of time and provide a stable 
release through the latest 0.12.x release for
+users to migrate to.  The latest 0.12 release is 
[0.12.2](/releases/release-0.12.2).
+
 ## Migration Guide
 
 * This release (0.12.1) does not introduce any new table version, thus no 
migration is needed if you are on 0.12.0.
diff --git a/website/releases/release-0.12.2.md 
b/website/releases/release-0.12.2.md
index a40c1e032b8..3594206cda4 100644
--- a/website/releases/release-0.12.2.md
+++ b/website/releases/release-0.12.2.md
@@ -7,6 +7,11 @@ last_modified_at: 2022-12-27T10:30:00+05:30
 ---
 # [Release 0.12.2](https://github.com/apache/hudi/releases/tag/release-0.12.2) 
([docs](/docs/quick-start-guide))
 
+## Long Term Support
+
+We aim to maintain 0.12 for a longer period of time and provide a stable 
release through the latest 0.12.x release for
+users to migrate to.  This release (0.12.2) is the latest 0.12 release.
+
 ## Migration Guide
 
 * This release (0.12.2) does not introduce any new table version, thus no 
migration is needed if you are on 0.12.0.



[GitHub] [hudi] nsivabalan merged pull request #7578: [HUDI-5486][DOCS] Update 0.12.x release notes with Long Term Support

2022-12-29 Thread GitBox


nsivabalan merged PR #7578:
URL: https://github.com/apache/hudi/pull/7578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap

2022-12-29 Thread GitBox


hudi-bot commented on PR #7579:
URL: https://github.com/apache/hudi/pull/7579#issuecomment-1367390604

   
   ## CI report:
   
   * ba9aa020afa608a3b51d7085c48217d97bbc1881 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14032)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14037)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7575: [MINOR] Set engine when creating meta write config

2022-12-29 Thread GitBox


hudi-bot commented on PR #7575:
URL: https://github.com/apache/hudi/pull/7575#issuecomment-1367390548

   
   ## CI report:
   
   * a35c9c05aec17c775e39c0472fbe952178b2f60e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14021)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14024)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14039)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xccui commented on pull request #7575: [MINOR] Set engine when creating meta write config

2022-12-29 Thread GitBox


xccui commented on PR #7575:
URL: https://github.com/apache/hudi/pull/7575#issuecomment-1367387566

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] minihippo commented on a diff in pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.

2022-12-29 Thread GitBox


minihippo commented on code in PR #7572:
URL: https://github.com/apache/hudi/pull/7572#discussion_r1058989874


##
hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java:
##
@@ -120,7 +118,7 @@ private boolean checkIfExceptionInRetryList(Exception e) {
 
 // if users didn't set hoodie.filesystem.operation.retry.exceptions
 // we will retry all the IOException and RuntimeException
-if (retryExceptionsClasses.isEmpty()) {
+if (retryExceptionsClasses.equals(RETRY_EXCEPTION_CLASS)) {
   return true;
 }

Review Comment:
   fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] minihippo commented on a diff in pull request #7572: [HUDI-5483]Make retryhelper more suitable for common use.

2022-12-29 Thread GitBox


minihippo commented on code in PR #7572:
URL: https://github.com/apache/hudi/pull/7572#discussion_r1058989340


##
hudi-common/src/main/java/org/apache/hudi/common/util/RetryHelper.java:
##
@@ -36,9 +36,10 @@
  *
  * @param  Type of return value for checked function.
  */
-public class RetryHelper implements Serializable {
+public class RetryHelper implements Serializable {
   private static final Logger LOG = LogManager.getLogger(RetryHelper.class);
-  private transient CheckedFunction func;
+  private static final List> 
RETRY_EXCEPTION_CLASS = Arrays.asList(IOException.class, 
RuntimeException.class);

Review Comment:
   fix
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6983: [HUDI-5031] Fix MERGE INTO creates empty partition files when source table has partitions but target table does not

2022-12-29 Thread GitBox


hudi-bot commented on PR #6983:
URL: https://github.com/apache/hudi/pull/6983#issuecomment-1367320447

   
   ## CI report:
   
   * d2f4ce7779a835a6f524aabd8fa16c7c5dcc8c6e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14035)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367283242

   
   ## CI report:
   
   * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14038)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


xicm commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367282191

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2022-12-29 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1367275891

   
   ## CI report:
   
   * c20aa589730546c0c7bb82969c92aa6d364af101 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14020)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14033)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap

2022-12-29 Thread GitBox


hudi-bot commented on PR #7579:
URL: https://github.com/apache/hudi/pull/7579#issuecomment-1367272629

   
   ## CI report:
   
   * ba9aa020afa608a3b51d7085c48217d97bbc1881 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14032)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14037)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2022-12-29 Thread GitBox


hudi-bot commented on PR #7355:
URL: https://github.com/apache/hudi/pull/7355#issuecomment-1367272014

   
   ## CI report:
   
   * 33b17128e551a134dd8287d5f1a660f50b561848 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14034)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-29 Thread GitBox


SteNicholas commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058910341


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -432,6 +433,11 @@ private Stream getCommitInstantsToArchive() 
{
   table.getActiveTimeline(), 
config.getInlineCompactDeltaCommitMax())
   : Option.empty();
 
+  // The clustering commit instant can not be archived unless we ensure 
that the replaced files have been cleaned,
+  // without the replaced files metadata on the timeline, the fs view 
would expose duplicates for readers.
+  Option oldestInstantToRetainForClustering =

Review Comment:
   @leesf, refer to the naming of `oldestInstantToRetainForCompaction`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] leesf commented on a diff in pull request #7568: [HUDI-5341] CleanPlanner retains earliest commits must not be later than earliest pending commit

2022-12-29 Thread GitBox


leesf commented on code in PR #7568:
URL: https://github.com/apache/hudi/pull/7568#discussion_r1058904972


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -432,6 +433,11 @@ private Stream getCommitInstantsToArchive() 
{
   table.getActiveTimeline(), 
config.getInlineCompactDeltaCommitMax())
   : Option.empty();
 
+  // The clustering commit instant can not be archived unless we ensure 
that the replaced files have been cleaned,
+  // without the replaced files metadata on the timeline, the fs view 
would expose duplicates for readers.
+  Option oldestInstantToRetainForClustering =

Review Comment:
   this name is a little confused with 
`oldestPendingCompactionAndReplaceInstant` below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive (#7385)

2022-12-29 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository.

forwardxu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 495b6fbb062 [HUDI-5332] HiveSyncTool can avoid initializing all 
permanent custom functions of Hive (#7385)
495b6fbb062 is described below

commit 495b6fbb062c843d19de420acfefd3a6a2ee3c58
Author: cxzl25 
AuthorDate: Thu Dec 29 19:17:43 2022 +0800

[HUDI-5332] HiveSyncTool can avoid initializing all permanent custom 
functions of Hive (#7385)
---
 .../main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java| 11 ++-
 .../java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java   | 11 ++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git 
a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java
 
b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java
index c14536a2774..fbba5861741 100644
--- 
a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java
+++ 
b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HMSDDLExecutor.java
@@ -30,6 +30,7 @@ import 
org.apache.hudi.sync.common.model.PartitionValueExtractor;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.common.StatsSetupConst;
+import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.TableType;
 import org.apache.hadoop.hive.metastore.api.Database;
@@ -48,6 +49,7 @@ import org.apache.log4j.Logger;
 import org.apache.parquet.schema.MessageType;
 import org.apache.thrift.TException;
 
+import java.lang.reflect.InvocationTargetException;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.LinkedHashMap;
@@ -78,7 +80,14 @@ public class HMSDDLExecutor implements DDLExecutor {
   public HMSDDLExecutor(HiveSyncConfig syncConfig) throws HiveException, 
MetaException {
 this.syncConfig = syncConfig;
 this.databaseName = syncConfig.getStringOrDefault(META_SYNC_DATABASE_NAME);
-this.client = Hive.get(syncConfig.getHiveConf()).getMSC();
+HiveConf hiveConf = syncConfig.getHiveConf();
+IMetaStoreClient tempMetaStoreClient;
+try {
+  tempMetaStoreClient = ((Hive) 
Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, 
hiveConf)).getMSC();
+} catch (NoSuchMethodException | IllegalAccessException | 
IllegalArgumentException | InvocationTargetException ex) {
+  tempMetaStoreClient = Hive.get(hiveConf).getMSC();
+}
+this.client = tempMetaStoreClient;
 try {
   this.partitionValueExtractor =
   (PartitionValueExtractor) 
Class.forName(syncConfig.getStringOrDefault(META_SYNC_PARTITION_EXTRACTOR_CLASS)).newInstance();
diff --git 
a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java
 
b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java
index 93ae3cfbf73..e0f7dab5f35 100644
--- 
a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java
+++ 
b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/HiveQueryDDLExecutor.java
@@ -23,6 +23,7 @@ import org.apache.hudi.hive.HiveSyncConfig;
 import org.apache.hudi.hive.HoodieHiveSyncException;
 import org.apache.hudi.hive.util.HivePartitionUtil;
 
+import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.api.FieldSchema;
 import org.apache.hadoop.hive.metastore.api.MetaException;
@@ -37,6 +38,7 @@ import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 
 import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.HashMap;
@@ -59,7 +61,14 @@ public class HiveQueryDDLExecutor extends 
QueryBasedDDLExecutor {
 
   public HiveQueryDDLExecutor(HiveSyncConfig config) throws HiveException, 
MetaException {
 super(config);
-this.metaStoreClient = Hive.get(config.getHiveConf()).getMSC();
+HiveConf hiveConf = config.getHiveConf();
+IMetaStoreClient tempMetaStoreClient;
+try {
+  tempMetaStoreClient = ((Hive) 
Hive.class.getMethod("getWithoutRegisterFns", HiveConf.class).invoke(null, 
hiveConf)).getMSC();
+} catch (NoSuchMethodException | IllegalAccessException | 
IllegalArgumentException | InvocationTargetException ex) {
+  tempMetaStoreClient = Hive.get(hiveConf).getMSC();
+}
+this.metaStoreClient = tempMetaStoreClient;
 try {
   this.sessionState = new SessionState(config.getHiveConf(),
   UserGroupInformation.getCurrentUser().getShortUserName());



[GitHub] [hudi] XuQianJin-Stars merged pull request #7385: [HUDI-5332] HiveSyncTool can avoid initializing all permanent custom functions of Hive

2022-12-29 Thread GitBox


XuQianJin-Stars merged PR #7385:
URL: https://github.com/apache/hudi/pull/7385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokeshj1703 commented on issue #7363: [SUPPORT] how to get hudi table schema and get table list under the same database

2022-12-29 Thread GitBox


lokeshj1703 commented on issue #7363:
URL: https://github.com/apache/hudi/issues/7363#issuecomment-1367241201

   ```
 public static final ConfigProperty CREATE_SCHEMA = ConfigProperty
 .key("hoodie.table.create.schema")
 .noDefaultValue()
 .withDocumentation("Schema used when creating the table, for the first 
time.");
   
   ```
   This is the config value returned by function 
`hoodieTableMetaClient.getTableConfig().getTableCreateSchema()`. There is no 
default value for this config. It seems this would return a value only if 
configured.
   ```
   scala> import org.apache.hudi.common.table.TableSchemaResolver;
   import org.apache.hudi.common.table.TableSchemaResolver
   
   scala> var schemaResolver = new TableSchemaResolver(hoodieTableMetaClient);
   schemaResolver: org.apache.hudi.common.table.TableSchemaResolver = 
org.apache.hudi.common.table.TableSchemaResolver@3662dc9b
   
   scala> schemaResolver.getTableAvroSchema()
   res20: org.apache.avro.Schema = 
{"type":"record","name":"hudi_table_record","namespace":"hoodie.hudi_table","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"emp_id","type":["null","long"],"default":null},{"name":"employee_name","type":["null","string"],"default":null},{"name":"department","type":["null","string"],"default":null},{"name":"state","type":["null","string"],"default":null},{"name":"salary","type":["null","long"...
   scala> schemaResolver.getTableParquetSchema()
   res21: org.apache.parquet.schema.MessageType =
   message hoodie.hudi_table.hudi_table_record {
 optional binary _hoodie_commit_time (UTF8);
 optional binary _hoodie_commit_seqno (UTF8);
 optional binary _hoodie_record_key (UTF8);
 optional binary _hoodie_partition_path (UTF8);
 optional binary _hoodie_file_name (UTF8);
 optional int64 emp_id;
 optional binary employee_name (UTF8);
 optional binary department (UTF8);
 optional binary state (UTF8);
 optional int64 salary;
 optional int64 age;
 optional int64 bonus;
 optional int64 ts;
   }
   ```
   You can use the above snippet for fetching the table schema instead. 
   
   cc @xushiyan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] perfectcw commented on issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time

2022-12-29 Thread GitBox


perfectcw commented on issue #7570:
URL: https://github.com/apache/hudi/issues/7570#issuecomment-1367235107

   > 
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] cxzl25 commented on pull request #7579: [HUDI-5487] Reduce duplicate Logs in ExternalSpillableMap

2022-12-29 Thread GitBox


cxzl25 commented on PR #7579:
URL: https://github.com/apache/hudi/pull/7579#issuecomment-1367234808

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lucasberlang closed issue #7223: [SUPPORT] Error to write .hoodie_partition_metadata in IBM Cloud Object Storage

2022-12-29 Thread GitBox


lucasberlang closed issue #7223: [SUPPORT] Error to write 
.hoodie_partition_metadata in IBM Cloud Object Storage
URL: https://github.com/apache/hudi/issues/7223


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lucasberlang commented on issue #7223: [SUPPORT] Error to write .hoodie_partition_metadata in IBM Cloud Object Storage

2022-12-29 Thread GitBox


lucasberlang commented on issue #7223:
URL: https://github.com/apache/hudi/issues/7223#issuecomment-1367233792

   Good news! now is working,
   I finally fixed it by adding these properties to the core-site.xml
   
   ```xml
   
   
   
 
   fs.s3a.access.key
   
 
 
   fs.s3a.secret.key
   
 
 
 
   fs.s3a.awsAccessKeyId 
   
 
 
   fs.s3a.awsSecretAccessKey
   
 
 
   fs.s3a.server-side-encryption.key
   
 
   
   ```
   
   
   Thanks @yihua for the support!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] fengjian428 commented on issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time

2022-12-29 Thread GitBox


fengjian428 commented on issue #7570:
URL: https://github.com/apache/hudi/issues/7570#issuecomment-1367231286

   > > > Thanks for your reply. And could you explain the specific reason? Is 
it because some commits are archived so cannot be synced to hive.
   > > 
   > > 
   > > the sync logic is: check last_update_time in hive table properties, get 
all commits from that time, then update last_update_time,this is not working 
for multiple writers
   > 
   > Is that means, when 20221227042855832.commit goes to sync hive, if the 
last_update_time in hive table properties is 20221227042906103, then the commit 
of 20221227042855832 will not be synced to hive.
   
   yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-29 Thread GitBox


hudi-bot commented on PR #4966:
URL: https://github.com/apache/hudi/pull/4966#issuecomment-1367230023

   
   ## CI report:
   
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >