[GitHub] [hudi] audas007 commented on issue #8017: [SUPPORT] Parquet file size is small after running deltastreamer in BULK_INSERT which results in large number of files under same partitioning

2023-04-13 Thread via GitHub


audas007 commented on issue #8017:
URL: https://github.com/apache/hudi/issues/8017#issuecomment-1508006549

   Was able to get this to work, with a config 
   hoodie.copyonwrite.record.size.estimate=150 
   per suggestion here 
https://github.com/apache/hudi/issues/1583#issuecomment-622894674


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ThinkerLei commented on issue #8425: [SUPPORT] When the downstream tasks read the logfile, use the startoffset of each logfile recorded in the deltacommit metadata to read the logfile

2023-04-13 Thread via GitHub


ThinkerLei commented on issue #8425:
URL: https://github.com/apache/hudi/issues/8425#issuecomment-1508002140

   @danny0405 thanks for your quick feedback. I'm going to  make some 
modifications here
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507993349

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16340)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #7834: [HUDI-5690] Add simpleBucketPartitioner to support using the simple bucket index under bulkinsert

2023-04-13 Thread via GitHub


bvaradar commented on PR #7834:
URL: https://github.com/apache/hudi/pull/7834#issuecomment-1507984765

   Code changes look good. Will wait for tests to pass before merging
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] LinMingQiang opened a new issue, #8459: [Discuss] Do we need to promote the bucket number as table config instead of a write config

2023-04-13 Thread via GitHub


LinMingQiang opened a new issue, #8459:
URL: https://github.com/apache/hudi/issues/8459

   **_Tips before filing an issue_**
   
   Users may sometimes modify the bucket num, and the inconsistency of the 
bucket num will lead to data duplication and make it unavailability.  So, do we 
need to promote the bucket number as table config instead of a write config, 
this way, we can perform a configuration consistency check before starting the 
job.
   pr link:  https://github.com/apache/hudi/pull/8338
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch asf-site updated: [DOCS][MINOR] Add new blogs (#8458)

2023-04-13 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c3fbf95bef9 [DOCS][MINOR] Add new blogs (#8458)
c3fbf95bef9 is described below

commit c3fbf95bef9986759e87edaf4161aa2efa815ced
Author: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com>
AuthorDate: Thu Apr 13 22:55:27 2023 -0700

[DOCS][MINOR] Add new blogs (#8458)
---
 ...ake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx |  1 -
 ...ur-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx | 13 +
 website/src/pages/videos.md |  2 --
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git 
a/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx
 
b/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx
index ccd88bf7c5e..8dfe96d34a4 100644
--- 
a/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx
+++ 
b/website/blog/2023-03-16-Setting-Uber-Transactional-Data-Lake-in-Motion-with-Incremental-ETL-Using-Apache-Hudi.mdx
@@ -1,4 +1,3 @@
-
 ---
 title: "Setting Uber’s Transactional Data Lake in Motion with Incremental ETL 
Using Apache Hudi"
 authors: 
diff --git 
a/website/blog/2023-04-07-Speed-up-your-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx
 
b/website/blog/2023-04-07-Speed-up-your-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx
new file mode 100644
index 000..f2a9d3e2a25
--- /dev/null
+++ 
b/website/blog/2023-04-07-Speed-up-your-write-latencies-using-Bucket-Index-in-Apache-Hudi.mdx
@@ -0,0 +1,13 @@
+---
+title: "Speed up your write latencies using Bucket Index in Apache Hudi"
+authors: 
+- name: Sivabalan Narayanan
+category: blog
+tags:
+- how-to
+- indexing
+- hudi
+---
+import Redirect from '@site/src/components/Redirect';
+
+https://medium.com/@simpsons/speed-up-your-write-latencies-using-bucket-index-in-apache-hudi-2f7c297493dc";>Redirecting...
 please wait!! 
diff --git a/website/src/pages/videos.md b/website/src/pages/videos.md
index a3ce0cce5a0..b5a6ab64e8f 100644
--- a/website/src/pages/videos.md
+++ b/website/src/pages/videos.md
@@ -160,5 +160,3 @@ last_modified_at: 2022-12-21T15:59:57-04:00
 
 58. [Data Analysis for Apache Hudi Blogs on Medium with 
Pandas](https://www.youtube.com/watch?v=a7FD4zIOwVg)- By Soumil Shah, Mar 24th 
2023
 
-58. [How to scrape all Blogs about a topic from medium like pro with 
Python](https://www.youtube.com/watch?v=-KUSaC_1X6M)- By Soumil Shah, Mar 24th 
2023
-



[GitHub] [hudi] nsivabalan merged pull request #8458: [DOCS][MINOR] Add new blogs

2023-04-13 Thread via GitHub


nsivabalan merged PR #8458:
URL: https://github.com/apache/hudi/pull/8458


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bhasudha opened a new pull request, #8458: [DOCS][MINOR] Add new blogs

2023-04-13 Thread via GitHub


bhasudha opened a new pull request, #8458:
URL: https://github.com/apache/hudi/pull/8458

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-13 Thread via GitHub


SteNicholas commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507933748

   @Zouxxyy, did you take the compatibility of this change into consideration? 
With this change, the config value of 
`clustering.plan.strategy.small.file.limit` must be changed when upgrade to the 
lastest version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-13 Thread via GitHub


codope commented on code in PR #7627:
URL: https://github.com/apache/hudi/pull/7627#discussion_r1166270869


##
hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc:
##
@@ -128,6 +128,11 @@
 "HoodieIndexCommitMetadata"
  ],
  "default": null
+  },
+  {
+ "name":"stateTransitionTime",
+ "type":["null","string"],
+ "default": null

Review Comment:
   +1 we should save it in the archived metadata. I can see other potential use 
cases when there can be holes in the timeline after we allow archival beyond 
savepoint. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 opened a new pull request, #8457: demo change for compaction docs

2023-04-13 Thread via GitHub


nfarah86 opened a new pull request, #8457:
URL: https://github.com/apache/hudi/pull/8457

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   i made a change to the compaction doc header
   
   https://user-images.githubusercontent.com/5392555/231940689-b397a167-73c0-48e1-8f69-16e71f081564.png";>
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] t-raghavan commented on issue #8016: Inline Clustering : Clustering failed to write to files

2023-04-13 Thread via GitHub


t-raghavan commented on issue #8016:
URL: https://github.com/apache/hudi/issues/8016#issuecomment-1507900897

   Thanks for the suggestion and it worked. 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507885169

   
   ## CI report:
   
   * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16345)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507884806

   
   ## CI report:
   
   * 44487a13a5abd52affb6212f85482976f461790a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16336)
 
   * 054fbfeae4583b99bc4a6cd319be5fc9e4572214 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16344)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan merged pull request #7993: [MINOR] Fix hard-coded storage level for indexing

2023-04-13 Thread via GitHub


xushiyan merged PR #7993:
URL: https://github.com/apache/hudi/pull/7993


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507881515

   
   ## CI report:
   
   * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507881248

   
   ## CI report:
   
   * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335)
 
   * 44487a13a5abd52affb6212f85482976f461790a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16336)
 
   * 054fbfeae4583b99bc4a6cd319be5fc9e4572214 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8443: [HUDI-6068] Improve logic of getOldestInstantToRetainForClustering wh…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8443:
URL: https://github.com/apache/hudi/pull/8443#issuecomment-1507878055

   
   ## CI report:
   
   * 996608e7ab379d38fdd997b9532b8b90dcfe99ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16300)
 
   * 0dff5a604b6db928602ad4c242464a1bb52feb91 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16343)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8416: [SUPPORT] data loss after createRdd method in HoodieSparkUtils.scala

2023-04-13 Thread via GitHub


ad1happy2go commented on issue #8416:
URL: https://github.com/apache/hudi/issues/8416#issuecomment-1507871268

   https://github.com/apache/hudi/pull/7334 fixed the issue in 0.13.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ad1happy2go commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

2023-04-13 Thread via GitHub


ad1happy2go commented on issue #8436:
URL: https://github.com/apache/hudi/issues/8436#issuecomment-1507866544

   I see issue related to quotes in the spark-submit command. Try this - 
   
   spark-submit --class org.apache.hudi.utilities.HoodieCleaner 
/usr/lib/hudi/hudi-utilities-bundle.jar --target-base-path 
s3://edi-dp-qa-datalake/DATA_PLATFORM/ods/ods_d_crm_crmd_customer_i_prod_r/hudiTable/.hoodie
 --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf 
hoodie.cleaner.commits.retained=10 --hoodie-conf hoodie.cleaner.parallelism=200
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

2023-04-13 Thread via GitHub


Zouxxyy commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507861640

   @danny0405 One problem is that there is only `clean.async.enabled` in flink, 
but there is `hoodie.clean.automatic` in spark to control whether to clean 
automatically. 
   Should we add parameter, or use `clean.async.enabled` uniformly to control 
the clean behavior in flink?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6078) Clean is always triggered with flink

2023-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6078:
-
Labels: pull-request-available  (was: )

> Clean is always triggered with flink
> 
>
> Key: HUDI-6078
> URL: https://issues.apache.org/jira/browse/HUDI-6078
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] Zouxxyy opened a new pull request, #8456: [HUDI-6078] Make clean controlled by parameter in flink

2023-04-13 Thread via GitHub


Zouxxyy opened a new pull request, #8456:
URL: https://github.com/apache/hudi/pull/8456

   ### Change Logs
   
   Make clean controlled by parameters in flink
   
   ### Impact
   
   Make clean controlled by parameters in flink
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6078) Clean is always triggered with flink

2023-04-13 Thread zouxxyy (Jira)
zouxxyy created HUDI-6078:
-

 Summary: Clean is always triggered with flink
 Key: HUDI-6078
 URL: https://issues.apache.org/jira/browse/HUDI-6078
 Project: Apache Hudi
  Issue Type: Bug
Reporter: zouxxyy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8450: [HUDI-6073] Table create schema should not include metadata fields

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8450:
URL: https://github.com/apache/hudi/pull/8450#issuecomment-1507853676

   
   ## CI report:
   
   * fd8641d8f762b47ad2a0721d406083deb7d23933 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16315)
 
   * b9e5ebaba08cf4176ef039298ee7719e50aba3d3 UNKNOWN
   * 5433223d35c2216ef5d58c2705466bb8a0550a1c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16341)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8443: [HUDI-6068] Improve logic of getOldestInstantToRetainForClustering wh…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8443:
URL: https://github.com/apache/hudi/pull/8443#issuecomment-1507853639

   
   ## CI report:
   
   * 996608e7ab379d38fdd997b9532b8b90dcfe99ce Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16300)
 
   * 0dff5a604b6db928602ad4c242464a1bb52feb91 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507853543

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * e3b8fc000ef3a7d57dd23aec1c9b37afaeb3c2dd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16319)
 
   * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16340)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507848469

   
   ## CI report:
   
   * f0215afb8f8298848391fa8168189832c614a667 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16342)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8450: [HUDI-6073] Table create schema should not include metadata fields

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8450:
URL: https://github.com/apache/hudi/pull/8450#issuecomment-1507848443

   
   ## CI report:
   
   * fd8641d8f762b47ad2a0721d406083deb7d23933 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16315)
 
   * b9e5ebaba08cf4176ef039298ee7719e50aba3d3 UNKNOWN
   * 5433223d35c2216ef5d58c2705466bb8a0550a1c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507848318

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * e3b8fc000ef3a7d57dd23aec1c9b37afaeb3c2dd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16319)
 
   * cb05421be9bb950f7dadfc6a5cdfa4c07e5de6a3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507843416

   
   ## CI report:
   
   * f0215afb8f8298848391fa8168189832c614a667 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8450: [HUDI-6073] Table create schema should not include metadata fields

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8450:
URL: https://github.com/apache/hudi/pull/8450#issuecomment-1507843227

   
   ## CI report:
   
   * fd8641d8f762b47ad2a0721d406083deb7d23933 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16315)
 
   * b9e5ebaba08cf4176ef039298ee7719e50aba3d3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] voonhous commented on a diff in pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-13 Thread via GitHub


voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1165173485


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##
@@ -283,34 +291,66 @@ public void write(ArrayData array, int ordinal) {
 }
   }
 
-  /**
-   * Timestamp of INT96 bytes, julianDay(4) + nanosOfDay(8). See
-   * 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
-   * TIMESTAMP_MILLIS and TIMESTAMP_MICROS are the deprecated ConvertedType.
-   */
   private class Timestamp64Writer implements FieldWriter {
-private Timestamp64Writer() {
+
+private final int precision;
+private Timestamp64Writer(int precision) {
+  this.precision = precision;
 }
 
 @Override
 public void write(RowData row, int ordinal) {
-  recordConsumer.addLong(timestampToInt64(row.getTimestamp(ordinal, 3)));
+  TimestampData timestampData = row.getTimestamp(ordinal, precision);
+  recordConsumer.addLong(timestampToInt64(timestampData, precision));
 }
 
 @Override
 public void write(ArrayData array, int ordinal) {
-  recordConsumer.addLong(timestampToInt64(array.getTimestamp(ordinal, 3)));
+  TimestampData timestampData = array.getTimestamp(ordinal, precision);
+  recordConsumer.addLong(timestampToInt64(timestampData, precision));
 }
   }
 
-  private long timestampToInt64(TimestampData timestampData) {
-return utcTimestamp ? timestampData.getMillisecond() : 
timestampData.toTimestamp().getTime();
+  /**
+   * Converts a {@code TimestampData} to its corresponding int64 value. This 
function only accepts TimestampData of
+   * precision 3 or 6. Special attention will need to be given to a 
TimestampData of precision = 6.
+   * 
+   * For example representing `1970-01-01T00:00:03.11` of precision 6 will 
have:
+   * 
+   *   millisecond = 3100
+   *   nanoOfMillisecond = 1000
+   * 
+   * As such, the int64 value will be:
+   * 
+   * millisecond * 1000 + nanoOfMillisecond / 1000
+   *
+   * @param timestampData TimestampData to be converted to int64 format
+   * @param precision the precision of the TimestampData
+   * @return int64 value of the TimestampData
+   */
+  private long timestampToInt64(TimestampData timestampData, int precision) {
+if (!utcTimestamp) {
+  // toTimestamp is agnostic of precision
+  return timestampData.toTimestamp().getTime();

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] waitingF commented on a diff in pull request #8376: [HUDI-6019] support config minPartitions when reading from kafka

2023-04-13 Thread via GitHub


waitingF commented on code in PR #8376:
URL: https://github.com/apache/hudi/pull/8376#discussion_r1166212077


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java:
##
@@ -148,9 +166,58 @@ public static OffsetRange[] 
computeOffsetRanges(Map fromOf
 }

Review Comment:
   @bvaradar  I think the algorithm would not work well in data skew case. 
   In data skew case, it will not divvy partition evenly. For example, given 
topic partitions "0:0->100, 1:0->500" and minPartitions=3, the algorithm will 
generate 2 ranges: "0:0->100, 1:0->200, 1:200->300", for the 2 ranges of 
partition 1, they are not divvied evenly. Given more skew partitions, it will 
be worse.
   In the data skew case, resplit will generate even ranges for one 
TopicPartition. Because it will allocate ranges for topic partitions first, 
then based on the allocated ranges resplit into roughly minPartitions ranges.
   Based on this and the complex of the resplit should be very small, I think 
resplit should be better.
   How do you think?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6077) Add more partition push down filters

2023-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6077:
-
Labels: pull-request-available  (was: )

> Add more partition push down filters
> 
>
> Key: HUDI-6077
> URL: https://issues.apache.org/jira/browse/HUDI-6077
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Hui An
>Priority: Major
>  Labels: pull-request-available
>
> 1. Implement some basic `Expression`s for HUDI
> 2. Try to convert all spark `Expression` to HUDI `Expression`
> 3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI 
> `Expression`
> 4. Currently, we only support push down `EqualTo` Filters if it's the first 
> level of partitions(by path prefix), this pr tries to pushdown more complex 
> partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. 
> Through Parallel listing partition paths,  will use `PartialBindVisitor` to 
> bind partitions which are listed, and change the unresolved references to 
> `AlwaysTrue`.
> e.g.
> {code:java}
> Given the table has 3 partition levels: year, month, day. And the existing 
> table partition paths are:
> year=2023/month=2/day=11
> year=2023/month=2/day=12
> year=2024/month=2/day=12
> If we want to push down the filter `year=2023 AND day=12`, When listing the 
> partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
> Since `day` is not provided, the filter will be modified to `year=2023 AND 
> TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
> Then starts to parallel listing first 2 paths, since the day is still not 
> provided, these 2 paths still are selected.
> And finally listing the last partition level, the filter `year=2023 AND 
> day=12` will be used and return `year=2023/month=2/day=12`
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] boneanxs commented on pull request #8452: [WIP] [HUDI-6077] Add more partition push down filters

2023-04-13 Thread via GitHub


boneanxs commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1507833845

   > @boneanxs Can you please create a JIRA and add more details to it? Don't 
we already push down `EqualTo`?
   
   Thanks @codope, updated the description. Currently we only push down 
`EqualTo` if the filter is the first partition level of table. This pr 1) 
support all partition level filters 2) try to push more filters when performing 
listing partitions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6077) Add more partition push down filters

2023-04-13 Thread Hui An (Jira)
Hui An created HUDI-6077:


 Summary: Add more partition push down filters
 Key: HUDI-6077
 URL: https://issues.apache.org/jira/browse/HUDI-6077
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Hui An


1. Implement some basic `Expression`s for HUDI
2. Try to convert all spark `Expression` to HUDI `Expression`
3. Implement `PartialBindVisitor` and `BindVistor` to bind values to HUDI 
`Expression`
4. Currently, we only support push down `EqualTo` Filters if it's the first 
level of partitions(by path prefix), this pr tries to pushdown more complex 
partition filters(Like `And`, `Or`, `EqualTo` etc) when fetching partitions. 
Through Parallel listing partition paths,  will use `PartialBindVisitor` to 
bind partitions which are listed, and change the unresolved references to 
`AlwaysTrue`.
e.g.


{code:java}
Given the table has 3 partition levels: year, month, day. And the existing 
table partition paths are:
year=2023/month=2/day=11
year=2023/month=2/day=12
year=2024/month=2/day=12
If we want to push down the filter `year=2023 AND day=12`, When listing the 
partition first level `year`, will bind schema `year` to `PartialBindVisitor`.
Since `day` is not provided, the filter will be modified to `year=2023 AND 
TRUE`(optimized to `year=2023`), so the first 2 paths will be selected.
Then starts to parallel listing first 2 paths, since the day is still not 
provided, these 2 paths still are selected.
And finally listing the last partition level, the filter `year=2023 AND day=12` 
will be used and return `year=2023/month=2/day=12`
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] Zouxxyy commented on pull request #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-13 Thread via GitHub


Zouxxyy commented on PR #8455:
URL: https://github.com/apache/hudi/pull/8455#issuecomment-1507819932

   @danny0405  Can you help with a review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (d0a13e64c8c -> 46c9bc1791b)

2023-04-13 Thread biyan
This is an automated email from the ASF dual-hosted git repository.

biyan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from d0a13e64c8c [HUDI-6074] Check inlineClusteringEnabled in 
isAsyncClusteringEnabled (#8453)
 add 46c9bc1791b [HUDI-6000] Fix RunClusteringProcedure when no partition 
matched (#8318)

No new revisions were added by this update.

Summary of changes:
 .../sql/hudi/command/procedures/RunClusteringProcedure.scala  | 11 +++
 .../spark/sql/hudi/procedure/TestClusteringProcedure.scala|  9 -
 2 files changed, 15 insertions(+), 5 deletions(-)



[GitHub] [hudi] YannByron merged pull request #8318: [HUDI-6000] Fix RunClusteringProcedure when no partition matched

2023-04-13 Thread via GitHub


YannByron merged PR #8318:
URL: https://github.com/apache/hudi/pull/8318


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6074) check inlineClusteringEnabled in isAsyncClusteringEnabled

2023-04-13 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6074:
--
Fix Version/s: 0.14.0

> check inlineClusteringEnabled in isAsyncClusteringEnabled
> -
>
> Key: HUDI-6074
> URL: https://issues.apache.org/jira/browse/HUDI-6074
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6074) check inlineClusteringEnabled in isAsyncClusteringEnabled

2023-04-13 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-6074.
-
Resolution: Fixed

> check inlineClusteringEnabled in isAsyncClusteringEnabled
> -
>
> Key: HUDI-6074
> URL: https://issues.apache.org/jira/browse/HUDI-6074
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled (#8453)

2023-04-13 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d0a13e64c8c [HUDI-6074] Check inlineClusteringEnabled in 
isAsyncClusteringEnabled (#8453)
d0a13e64c8c is described below

commit d0a13e64c8c755e28c2c0920d246f711b0663bc1
Author: Zouxxyy 
AuthorDate: Fri Apr 14 09:50:36 2023 +0800

[HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled 
(#8453)
---
 .../main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala| 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
index 1f9e218572e..d338f74bc5a 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
@@ -1015,18 +1015,16 @@ object HoodieSparkSqlWriter {
tableConfig: HoodieTableConfig,
parameters: Map[String, String], 
configuration: Configuration): Boolean = {
 log.info(s"Config.inlineCompactionEnabled ? 
${client.getConfig.inlineCompactionEnabled}")
-if (asyncCompactionTriggerFnDefined && 
!client.getConfig.inlineCompactionEnabled
-  && parameters.get(ASYNC_COMPACT_ENABLE.key).exists(r => r.toBoolean)) {
-  tableConfig.getTableType == HoodieTableType.MERGE_ON_READ
-} else {
-  false
-}
+(asyncCompactionTriggerFnDefined && 
!client.getConfig.inlineCompactionEnabled
+  && parameters.get(ASYNC_COMPACT_ENABLE.key).exists(r => r.toBoolean)
+  && tableConfig.getTableType == HoodieTableType.MERGE_ON_READ)
   }
 
   private def isAsyncClusteringEnabled(client: SparkRDDWriteClient[_],
parameters: Map[String, String]): 
Boolean = {
 log.info(s"Config.asyncClusteringEnabled ? 
${client.getConfig.isAsyncClusteringEnabled}")
-asyncClusteringTriggerFnDefined && 
client.getConfig.isAsyncClusteringEnabled
+(asyncClusteringTriggerFnDefined && 
!client.getConfig.inlineClusteringEnabled
+  && client.getConfig.isAsyncClusteringEnabled)
   }
 
   /**



[GitHub] [hudi] codope merged pull request #8453: [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled

2023-04-13 Thread via GitHub


codope merged PR #8453:
URL: https://github.com/apache/hudi/pull/8453


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6048) Exceptions should not be thrown when querying partitions that do not exist

2023-04-13 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-6048:
--
Fix Version/s: 0.14.0

> Exceptions should not be thrown when querying partitions that do not exist
> --
>
> Key: HUDI-6048
> URL: https://issues.apache.org/jira/browse/HUDI-6048
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-6048) Exceptions should not be thrown when querying partitions that do not exist

2023-04-13 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-6048.
-
Resolution: Fixed

> Exceptions should not be thrown when querying partitions that do not exist
> --
>
> Key: HUDI-6048
> URL: https://issues.apache.org/jira/browse/HUDI-6048
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6076) clustering.plan.strategy.small.file.limit's unit should is byte

2023-04-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6076:
-
Labels: pull-request-available  (was: )

> clustering.plan.strategy.small.file.limit's unit should is byte
> ---
>
> Key: HUDI-6076
> URL: https://issues.apache.org/jira/browse/HUDI-6076
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: zouxxyy
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] Zouxxyy opened a new pull request, #8455: [HUDI-6076] Change clustering.plan.strategy.small.file.limit's unit to byte

2023-04-13 Thread via GitHub


Zouxxyy opened a new pull request, #8455:
URL: https://github.com/apache/hudi/pull/8455

   ### Change Logs
   
   `clustering.plan.strategy.target.file.max.bytes`'s unit is byte, 
`clustering.plan.strategy.small.file.limit` should be unified with it.  And 
they also compare sizes somewhere, there's not even a uniform unit here, like 
this
   
   ```java
 
this.conf.setLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT.key(),
 
this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES) 
> this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT)
   ? 
this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_SMALL_FILE_LIMIT)
 : 
this.conf.getLong(FlinkOptions.CLUSTERING_PLAN_STRATEGY_TARGET_FILE_MAX_BYTES));
   ```
   
   ### Impact
   
   Change clustering.plan.strategy.small.file.limit's unit to byte
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated: [HUDI-6048] Check if partition exists before list partition by path prefix (#8402)

2023-04-13 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 657b837aaa6 [HUDI-6048] Check if partition exists before list 
partition by path prefix (#8402)
657b837aaa6 is described below

commit 657b837aaa6fa825945625579c52ff7365b1ecfd
Author: Zouxxyy 
AuthorDate: Fri Apr 14 09:48:48 2023 +0800

[HUDI-6048] Check if partition exists before list partition by path prefix 
(#8402)
---
 .../src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala | 4 +++-
 .../src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala   | 7 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala
index a9a20057795..6459c967c56 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala
@@ -300,7 +300,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
   // prefix to try to reduce the scope of the required file-listing
   val relativePartitionPathPrefix = 
composeRelativePartitionPath(staticPartitionColumnNameValuePairs)
 
-  if (staticPartitionColumnNameValuePairs.length == 
partitionColumnNames.length) {
+  if (!metaClient.getFs.exists(new Path(getBasePath, 
relativePartitionPathPrefix))) {
+Seq()
+  } else if (staticPartitionColumnNameValuePairs.length == 
partitionColumnNames.length) {
 // In case composed partition path is complete, we can return it 
directly avoiding extra listing operation
 Seq(new PartitionPath(relativePartitionPathPrefix, 
staticPartitionColumnNameValuePairs.map(_._2._2.asInstanceOf[AnyRef]).toArray))
   } else {
diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala
 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala
index e69819fb6f4..ed73940186d 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala
@@ -519,7 +519,12 @@ class TestHoodieFileIndex extends 
HoodieSparkClientTestBase with ScalaAssertionS
 EqualTo(attribute("region_code"), literal("1"))),
 "dt = '2023/01/01' and region_code = '1'",
 enablePartitionPathPrefixAnalysis,
-Seq(("1", "2023/01/01")))
+Seq(("1", "2023/01/01"))),
+  // no partition matched
+  (Seq(EqualTo(attribute("region_code"), literal("0"))),
+"region_code = '0'",
+enablePartitionPathPrefixAnalysis,
+Seq())
 )
 
 testCases.foreach(testCase => {



[GitHub] [hudi] codope merged pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix

2023-04-13 Thread via GitHub


codope merged PR #8402:
URL: https://github.com/apache/hudi/pull/8402


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 closed pull request #8454: change for demo

2023-04-13 Thread via GitHub


nfarah86 closed pull request #8454: change for demo
URL: https://github.com/apache/hudi/pull/8454


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-6076) clustering.plan.strategy.small.file.limit's unit should is byte

2023-04-13 Thread zouxxyy (Jira)
zouxxyy created HUDI-6076:
-

 Summary: clustering.plan.strategy.small.file.limit's unit should 
is byte
 Key: HUDI-6076
 URL: https://issues.apache.org/jira/browse/HUDI-6076
 Project: Apache Hudi
  Issue Type: Bug
Reporter: zouxxyy






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #8454: change for demo

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8454:
URL: https://github.com/apache/hudi/pull/8454#issuecomment-1507810698

   
   ## CI report:
   
   * 9b3e77b8a3f70da310c72fba7177932bef4bb548 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16337)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507810397

   
   ## CI report:
   
   * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335)
 
   * 44487a13a5abd52affb6212f85482976f461790a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16336)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8454: change for demo

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8454:
URL: https://github.com/apache/hudi/pull/8454#issuecomment-1507807085

   
   ## CI report:
   
   * 9b3e77b8a3f70da310c72fba7177932bef4bb548 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507802385

   
   ## CI report:
   
   * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN
   * 4adec3849535bce65c8d1a3d1909679a94da4d44 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16334)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507802024

   
   ## CI report:
   
   * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335)
 
   * 44487a13a5abd52affb6212f85482976f461790a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #8353: [MINOR] Remove unused code

2023-04-13 Thread via GitHub


bvaradar commented on PR #8353:
URL: https://github.com/apache/hudi/pull/8353#issuecomment-1507795687

   @huangxiaopingRD : Can you merge all the refactoring code to a single PR. 
Makes it easy to review and land. 
   
   Thanks,
   Balaji.V


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nfarah86 opened a new pull request, #8454: change for demo

2023-04-13 Thread via GitHub


nfarah86 opened a new pull request, #8454:
URL: https://github.com/apache/hudi/pull/8454

   ### Change Logs
   
   THIS IS A DEMO 
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507771015

   
   ## CI report:
   
   * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325)
 
   * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335)
 
   * 44487a13a5abd52affb6212f85482976f461790a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507767580

   
   ## CI report:
   
   * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325)
 
   * a7b0e52741609f58fa47adabbcc34387e5f1b678 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16335)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507763904

   
   ## CI report:
   
   * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325)
 
   * a7b0e52741609f58fa47adabbcc34387e5f1b678 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8440: [DO NOT MERGE] run gh actions with java 17

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8440:
URL: https://github.com/apache/hudi/pull/8440#issuecomment-1507730119

   
   ## CI report:
   
   * 6b7a1a58b0df09c8e262cda3fc3087e08cbf3905 UNKNOWN
   * 7a3e534cf0abd559f9eec2640dcc7e18b9215288 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16331)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507726163

   
   ## CI report:
   
   * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN
   * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16333)
 
   * 4adec3849535bce65c8d1a3d1909679a94da4d44 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16334)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8434:
URL: https://github.com/apache/hudi/pull/8434#issuecomment-1507726100

   
   ## CI report:
   
   * 4197e6d1216ee9bc269fb5368ab6c919c55ec146 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16329)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8410: [HUDI-6050] Fix write helper deduplicate records lost origin data operation

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8410:
URL: https://github.com/apache/hudi/pull/8410#issuecomment-1507726024

   
   ## CI report:
   
   * 98195cd8fa7423d6756c845f6afcc0d6d2fb2eea Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16328)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507695160

   
   ## CI report:
   
   * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332)
 
   * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN
   * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16333)
 
   * 4adec3849535bce65c8d1a3d1909679a94da4d44 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8449: [HUDI-6071] If the Flink Hive Catalog is used and the table type is B…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8449:
URL: https://github.com/apache/hudi/pull/8449#issuecomment-1507685663

   
   ## CI report:
   
   * c8acee7666a4cabce9b9eb76b1da71b1f6826bf9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16326)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507645429

   
   ## CI report:
   
   * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332)
 
   * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN
   * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16333)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507639266

   
   ## CI report:
   
   * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332)
 
   * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN
   * d69ef5d7b5ce3d2257a0fc4e5d6961690cf10ae2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507630741

   
   ## CI report:
   
   * b485d5800a466164d1619e2d7696c9116bd7d123 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16297)
 
   * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332)
 
   * 88ef51190b31feff3db942d9da947196f3ea9172 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8329:
URL: https://github.com/apache/hudi/pull/8329#issuecomment-1507630311

   
   ## CI report:
   
   * 9d2145a0062e913ae8ad7008103a808333929fd8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16325)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507574064

   
   ## CI report:
   
   * b485d5800a466164d1619e2d7696c9116bd7d123 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16297)
 
   * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8303:
URL: https://github.com/apache/hudi/pull/8303#issuecomment-1507573458

   
   ## CI report:
   
   * 9cda89b23cbf8514e1c2e0049eea4624f3b49f10 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16324)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8439: [DO NOT MERGE] run tests with java 11

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8439:
URL: https://github.com/apache/hudi/pull/8439#issuecomment-1507521807

   
   ## CI report:
   
   * b485d5800a466164d1619e2d7696c9116bd7d123 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16297)
 
   * 8c7cfc0fb2571cb55c93434fe79a71dbdf20e35a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8453: [HUDI-6074] Check inlineClusteringEnabled in isAsyncClusteringEnabled

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8453:
URL: https://github.com/apache/hudi/pull/8453#issuecomment-1507504914

   
   ## CI report:
   
   * fbfacaab486ef7bc97a5880d91f7bbd88830e789 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16322)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8440: [DO NOT MERGE] run gh actions with java 17

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8440:
URL: https://github.com/apache/hudi/pull/8440#issuecomment-1507448923

   
   ## CI report:
   
   * 6b7a1a58b0df09c8e262cda3fc3087e08cbf3905 UNKNOWN
   * b268bf853f7cbb88f3204a8f50d26ac44a8edc2a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16293)
 
   * 7a3e534cf0abd559f9eec2640dcc7e18b9215288 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16331)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8440: [DO NOT MERGE] run gh actions with java 17

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8440:
URL: https://github.com/apache/hudi/pull/8440#issuecomment-1507440429

   
   ## CI report:
   
   * 6b7a1a58b0df09c8e262cda3fc3087e08cbf3905 UNKNOWN
   * b268bf853f7cbb88f3204a8f50d26ac44a8edc2a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16293)
 
   * 7a3e534cf0abd559f9eec2640dcc7e18b9215288 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8445: [HUDI-3088] Use Spark 3.2 as default Spark version

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8445:
URL: https://github.com/apache/hudi/pull/8445#issuecomment-1507356898

   
   ## CI report:
   
   * 25574338253f1e7c9db7eabcb1239a8cb5ca2b1d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16321)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8418: [HUDI-6052] Standardise TIMESTAMP(6) format when writing to Parquet f…

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8418:
URL: https://github.com/apache/hudi/pull/8418#issuecomment-1507356645

   
   ## CI report:
   
   * 1579a6e6966b051e9a6a5b696a9e6e35500a929a UNKNOWN
   * e3b8fc000ef3a7d57dd23aec1c9b37afaeb3c2dd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16319)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6075) Improve config generation script and docs

2023-04-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6075:

Summary: Improve config generation script and docs  (was: Improve config 
generation script)

> Improve config generation script and docs
> -
>
> Key: HUDI-6075
> URL: https://issues.apache.org/jira/browse/HUDI-6075
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: configs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6075) Improve config generation script

2023-04-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6075:

Fix Version/s: 0.14.0

> Improve config generation script
> 
>
> Key: HUDI-6075
> URL: https://issues.apache.org/jira/browse/HUDI-6075
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6075) Improve config generation script

2023-04-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6075:

Story Points: 0.5

> Improve config generation script
> 
>
> Key: HUDI-6075
> URL: https://issues.apache.org/jira/browse/HUDI-6075
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6075) Improve config generation script

2023-04-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-6075:

Component/s: configs
  Epic Link: HUDI-5738

> Improve config generation script
> 
>
> Key: HUDI-6075
> URL: https://issues.apache.org/jira/browse/HUDI-6075
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: configs
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6075) Improve config generation script

2023-04-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-6075:
---

Assignee: Ethan Guo

> Improve config generation script
> 
>
> Key: HUDI-6075
> URL: https://issues.apache.org/jira/browse/HUDI-6075
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6075) Improve config generation script

2023-04-13 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6075:
---

 Summary: Improve config generation script
 Key: HUDI-6075
 URL: https://issues.apache.org/jira/browse/HUDI-6075
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] codope commented on a diff in pull request #8402: [HUDI-6048] Check if partition exists before list partition by path prefix

2023-04-13 Thread via GitHub


codope commented on code in PR #8402:
URL: https://github.com/apache/hudi/pull/8402#discussion_r1165824450


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##
@@ -299,7 +299,9 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
   // prefix to try to reduce the scope of the required file-listing
   val relativePartitionPathPrefix = 
composeRelativePartitionPath(staticPartitionColumnNameValuePairs)
 
-  if (staticPartitionColumnNameValuePairs.length == 
partitionColumnNames.length) {
+  if (!metaClient.getFs.exists(new Path(getBasePath, 
relativePartitionPathPrefix))) {

Review Comment:
   Got it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-5990) Incremental queries on MOR sometimes miss data

2023-04-13 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5990.
-
Resolution: Fixed

> Incremental queries on MOR sometimes miss data
> --
>
> Key: HUDI-5990
> URL: https://issues.apache.org/jira/browse/HUDI-5990
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Affects Versions: 0.12.2, 0.13.0
>Reporter: ruofan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> env: hudi-0.12.2 spark-3.2.0
> Currently,we have a hudi timeline and data files.
> {code:java}
> -rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095758155.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 
> 20230326095758155.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 
> 20230326095758155.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:58 20230326095810406.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095810406.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095810406.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095811072.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095811072.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095811072.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095820974.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095820974.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095820974.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095830980.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095830980.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095830980.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 
> 20230326095840978.compaction.requested
> -rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095841125.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095841125.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095841125.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:59 20230326095850994.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095850994.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 
> 20230326095850994.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095900988.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 
> 20230326095900988.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 
> 20230326095900988.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095910983.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 
> 20230326095910983.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 
> 20230326095910983.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 
> 20230326095920986.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 
> 20230326095920986.deltacommit.requested
> -rw-r--r--  1 rfyu rfyu 1.5K 3月  26 09:58 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 
> .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code}
> We use spark to incrementally query this hudi table. Data maybe go missing 
> due to the incremental range contains an incomplete compaction plan.
> There is an example of incremental query.Normally, from begin_instance_time 
> to end_instance_time, 6 commits should have been found, but only 3 were found.
> {code:java}
> sql:
> call 
> copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988');
> select _hoodie_commit_time,count(*) from incremental_table group by 
> _hoodie_commit_time order by _hoodie_commit_time desc;
> actual result: 
> +---++
> |_hoodie_commit_time|count(

[hudi] branch master updated: [HUDI-5990] Avoid missing data during incremental queries (#8299)

2023-04-13 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c91d7e1f78d [HUDI-5990] Avoid missing data during incremental queries 
(#8299)
c91d7e1f78d is described below

commit c91d7e1f78dbb4a12dab23b5d4b147bfb150002a
Author: rfyu <39233058+r...@users.noreply.github.com>
AuthorDate: Fri Apr 14 01:15:13 2023 +0800

[HUDI-5990] Avoid missing data during incremental queries (#8299)

The reason for missing data is that the timeline used by
`MergeOnReadIncrementalRelation` only contain completed
instants. When the incremental range contains an incomplete
compaction plan, fsView.getLatestMergedFileSlicesBeforeOrOn
in collectFileSplits will filter out some fileslices.
---
 .../hudi/MergeOnReadIncrementalRelation.scala  |  4 +-
 .../functional/TestParquetColumnProjection.scala   | 75 --
 2 files changed, 73 insertions(+), 6 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
index 93bf730a56d..636624f3950 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
@@ -60,9 +60,9 @@ case class MergeOnReadIncrementalRelation(override val 
sqlContext: SQLContext,
 
   override protected def timeline: HoodieTimeline = {
 if (fullTableScan) {
-  super.timeline
+  metaClient.getCommitsAndCompactionTimeline
 } else {
-  super.timeline.findInstantsInRange(startTimestamp, endTimestamp)
+  
metaClient.getCommitsAndCompactionTimeline.findInstantsInRange(startTimestamp, 
endTimestamp)
 }
   }
 
diff --git 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala
 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala
index 0eefc7beeec..eaf1839d5dc 100644
--- 
a/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala
+++ 
b/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala
@@ -22,12 +22,13 @@ import org.apache.calcite.runtime.SqlFunctions.abs
 import org.apache.hudi.HoodieBaseRelation.projectSchema
 import org.apache.hudi.common.config.{HoodieMetadataConfig, 
HoodieStorageConfig}
 import org.apache.hudi.common.model.{HoodieRecord, 
OverwriteNonDefaultsWithLatestAvroPayload}
-import org.apache.hudi.common.table.HoodieTableConfig
+import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient}
 import org.apache.hudi.common.testutils.{HadoopMapRedUtils, 
HoodieTestDataGenerator}
-import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.config.{HoodieCompactionConfig, HoodieWriteConfig}
+import org.apache.hudi.keygen.NonpartitionedKeyGenerator
 import org.apache.hudi.testutils.SparkClientFunctionalTestHarness
 import 
org.apache.hudi.testutils.SparkClientFunctionalTestHarness.getSparkSqlConf
-import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
DefaultSource, HoodieBaseRelation, HoodieSparkUtils, HoodieUnsafeRDD}
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
DefaultSource, HoodieBaseRelation, HoodieMergeOnReadRDD, HoodieSparkUtils, 
HoodieUnsafeRDD}
 import org.apache.parquet.hadoop.util.counters.BenchmarkCounter
 import org.apache.spark.SparkConf
 import org.apache.spark.internal.Logging
@@ -252,7 +253,6 @@ class TestParquetColumnProjection extends 
SparkClientFunctionalTestHarness with
 runTest(tableState, DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL, 
DataSourceReadOptions.REALTIME_PAYLOAD_COMBINE_OPT_VAL, fullColumnsReadStats)
   }
 
-  // TODO add test for incremental query of the table with logs
   @Test
   def testMergeOnReadIncrementalRelationWithNoDeltaLogs(): Unit = {
 val tablePath = s"$basePath/mor-no-logs"
@@ -296,6 +296,41 @@ class TestParquetColumnProjection extends 
SparkClientFunctionalTestHarness with
   projectedColumnsReadStats, incrementalOpts)
   }
 
+  @Test
+  def testMergeOnReadIncrementalRelationWithDeltaLogs(): Unit = {
+val tablePath = s"$basePath/mor-with-logs-incr"
+val targetRecordsCount = 100
+
+bootstrapMORTableWithDeltaLog(tablePath, targetRecordsCount, 
defaultWriteOpts, populateMetaFields = true)
+
+println(s"Running test for $tablePath / incremental")
+/**
+ * State of timeline and updated data
+ * 
+--+--+--+--++--+--+--+

[GitHub] [hudi] codope merged pull request #8299: [HUDI-5990]Avoid missing data during incremental queries

2023-04-13 Thread via GitHub


codope merged PR #8299:
URL: https://github.com/apache/hudi/pull/8299


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] sydneyhoran commented on pull request #5071: [HUDI-1881]: draft implementation for trigger based on data availability

2023-04-13 Thread via GitHub


sydneyhoran commented on PR #5071:
URL: https://github.com/apache/hudi/pull/5071#issuecomment-1507301215

   I am also looking forward to this PR being merged 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1507251814

   
   ## CI report:
   
   * 059463c77c641929a07e9b9ebb9e369d746c157f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16318)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8299: [HUDI-5990]Avoid missing data during incremental queries

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8299:
URL: https://github.com/apache/hudi/pull/8299#issuecomment-1507251690

   
   ## CI report:
   
   * c0fc740641546218180be303626a86aea628b3a2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16317)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on pull request #8452: [WIP] Add more partition push down filters

2023-04-13 Thread via GitHub


codope commented on PR #8452:
URL: https://github.com/apache/hudi/pull/8452#issuecomment-1507244545

   @boneanxs Can you please create a JIRA and add more details to it?
   Don't we already push down `EqualTo`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8434:
URL: https://github.com/apache/hudi/pull/8434#issuecomment-1507195751

   
   ## CI report:
   
   * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269)
 
   * 4197e6d1216ee9bc269fb5368ab6c919c55ec146 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16329)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8410: [HUDI-6050] Fix write helper deduplicate records lost origin data operation

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8410:
URL: https://github.com/apache/hudi/pull/8410#issuecomment-1507195540

   
   ## CI report:
   
   * efda2dce4010d10f0342c30bfb45adf1cf3fe5c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16199)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16327)
 
   * 98195cd8fa7423d6756c845f6afcc0d6d2fb2eea Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16328)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] slfan1989 commented on a diff in pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.

2023-04-13 Thread via GitHub


slfan1989 commented on code in PR #8435:
URL: https://github.com/apache/hudi/pull/8435#discussion_r1165710809


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java:
##
@@ -129,10 +129,11 @@ public Map getTableSchema(String 
tableName) {
 ResultSet result = null;
 try {
   DatabaseMetaData databaseMetaData = connection.getMetaData();
-  result = databaseMetaData.getColumns(null, databaseName, tableName, 
null);
+  String catalog = connection.getCatalog();
+  result = databaseMetaData.getColumns(catalog, databaseName, tableName, 
"%");

Review Comment:
   @danny0405 Thank you very much for your help in reviewing the code!
   
   For this part of JDBC, we use HiveJDBC. I refer to the usage of Hive#Beeline 
and modify this part of the code.
   
   HiveDatabaseMetaData#getColumns
   ```
 public class HiveDatabaseMetaData implements DatabaseMetaData {
 public ResultSet getColumns(String catalog, String schemaPattern,
 String tableNamePattern, String columnNamePattern) throws SQLException 
{}
 .
 }
   ```
   
   The call stack of the code is as follows:
   ```
   Hive 
 \-- CLIService#getColumns
 \-- HiveSessionImpl#getColumns
  \-- GetColumnsOperation#runInternal
   ```
   
   By reading GetColumnsOperation#runInternal, we can find that `catalogName` 
has no obvious effect, so we can set it to null,  But Hive's Beeline code 
directly uses `HiveConnetion's getCatalog`,  So I also follow Hive's usage.
   
   If columnNamePattern is null, it means to get all fields, but this is not 
conducive to reading, % means no filtering, matching all columns, which is 
easier to understand.
   
   Beeline#getColumns
   ```
 ResultSet getColumns(String table) throws SQLException {
   if (!(assertConnection())) {
 return null;
   }
   return getDatabaseConnection().getDatabaseMetaData().getColumns(
   
getDatabaseConnection().getDatabaseMetaData().getConnection().getCatalog(), 
null, table, "%");
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8434:
URL: https://github.com/apache/hudi/pull/8434#issuecomment-1507183590

   
   ## CI report:
   
   * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269)
 
   * 4197e6d1216ee9bc269fb5368ab6c919c55ec146 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #8410: [HUDI-6050] Fix write helper deduplicate records lost origin data operation

2023-04-13 Thread via GitHub


hudi-bot commented on PR #8410:
URL: https://github.com/apache/hudi/pull/8410#issuecomment-1507183327

   
   ## CI report:
   
   * efda2dce4010d10f0342c30bfb45adf1cf3fe5c6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16199)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16327)
 
   * 98195cd8fa7423d6756c845f6afcc0d6d2fb2eea UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] slfan1989 commented on a diff in pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.

2023-04-13 Thread via GitHub


slfan1989 commented on code in PR #8435:
URL: https://github.com/apache/hudi/pull/8435#discussion_r1165710809


##
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/ddl/JDBCExecutor.java:
##
@@ -129,10 +129,11 @@ public Map getTableSchema(String 
tableName) {
 ResultSet result = null;
 try {
   DatabaseMetaData databaseMetaData = connection.getMetaData();
-  result = databaseMetaData.getColumns(null, databaseName, tableName, 
null);
+  String catalog = connection.getCatalog();
+  result = databaseMetaData.getColumns(catalog, databaseName, tableName, 
"%");

Review Comment:
   @danny0405 Thank you very much for your help in reviewing the code!
   
   For this part of JDBC, we use HiveJDBC. Before I changed it, I referred to 
some codes of Hive.
   
   
   HiveDatabaseMetaData#getColumns
   ```
 public class HiveDatabaseMetaData implements DatabaseMetaData {
 public ResultSet getColumns(String catalog, String schemaPattern,
 String tableNamePattern, String columnNamePattern) throws SQLException 
{}
 .
 }
   ```
   
   The call stack of the code is as follows:
   ```
   Hive 
 \-- CLIService#getColumns
 \-- HiveSessionImpl#getColumns
  \-- GetColumnsOperation#runInternal
   ```
   
   By reading GetColumnsOperation#runInternal, we can find that `catalogName` 
has no obvious effect, so we can set it to null,  But Hive's Beeline code 
directly uses `HiveConnetion's getCatalog`,  So I also follow Hive's usage.
   
   If columnNamePattern is null, it means to get all fields, but this is not 
conducive to reading, % means no filtering, matching all columns, which is 
easier to understand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >