[GitHub] [hudi] hudi-bot commented on pull request #6311: [HUDI-4548] Unpack the column max/min to string instead of Utf8 for M…
hudi-bot commented on PR #6311: URL: https://github.com/apache/hudi/pull/6311#issuecomment-1206109906 ## CI report: * 17848d0f924115607c4144b3fa0a218333e89c99 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10602) * 04f067fce6df4225c497caeecd63dba7d069ba75 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10606) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206109860 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10595) * 666088efaacc584a5f36db4df2f44f358e1ba53c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10599) * 14aa4355ee414a6cb4814950216fe5ea93ccba16 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10605) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6311: [HUDI-4548] Unpack the column max/min to string instead of Utf8 for M…
hudi-bot commented on PR #6311: URL: https://github.com/apache/hudi/pull/6311#issuecomment-1206106505 ## CI report: * 17848d0f924115607c4144b3fa0a218333e89c99 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10602) * 04f067fce6df4225c497caeecd63dba7d069ba75 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206106447 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10595) * 666088efaacc584a5f36db4df2f44f358e1ba53c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10599) * 14aa4355ee414a6cb4814950216fe5ea93ccba16 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6310: [HUDI-4474] Fix inferring props for meta sync
hudi-bot commented on PR #6310: URL: https://github.com/apache/hudi/pull/6310#issuecomment-1206103377 ## CI report: * 366dc59d094ffcdd05ba7cdf905b85cb684a9fa7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10601) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6309: [HUDI-4547] Fix SortOperatorGen sort indices
hudi-bot commented on PR #6309: URL: https://github.com/apache/hudi/pull/6309#issuecomment-1206103348 ## CI report: * f6df4432d24639619566565e3fac86cbd855ce9d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10600) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance
hudi-bot commented on PR #6046: URL: https://github.com/apache/hudi/pull/6046#issuecomment-1206102844 ## CI report: * 5a6ac9622379715e890f1ec1cd7be9422febeb5c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10597) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-4551) The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment
[ https://issues.apache.org/jira/browse/HUDI-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-4551: Assignee: Nicholas Jiang > The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the > parallelism of the execution environment > -- > > Key: HUDI-4551 > URL: https://issues.apache.org/jira/browse/HUDI-4551 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Minor > > The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is 4, which > could be the parallelism of the execution environment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4550) Investigate why rollback is triggered for completed instant
[ https://issues.apache.org/jira/browse/HUDI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4550: -- Fix Version/s: 0.13.0 > Investigate why rollback is triggered for completed instant > --- > > Key: HUDI-4550 > URL: https://issues.apache.org/jira/browse/HUDI-4550 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Fix For: 0.13.0 > > > See issue [https://github.com/apache/hudi/issues/6224] > Ideally, rollback should not be triggered for a completed instant. But, if it > does then it should be safe to fallback to listing based rollback. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-4551) The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment
Nicholas Jiang created HUDI-4551: Summary: The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is the parallelism of the execution environment Key: HUDI-4551 URL: https://issues.apache.org/jira/browse/HUDI-4551 Project: Apache Hudi Issue Type: Improvement Components: flink Reporter: Nicholas Jiang The default value of READ_TASKS, WRITE_TASKS, CLUSTERING_TASKS is 4, which could be the parallelism of the execution environment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated (e03cd0a198 -> fcdd4cf06c)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e03cd0a198 [HUDI-4545] Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload (#6306) add fcdd4cf06c [HUDI-4544] support retain hour cleaning policy for flink (#6300) No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/hudi/configuration/FlinkOptions.java | 8 .../main/java/org/apache/hudi/streamer/FlinkStreamerConfig.java | 7 +++ .../src/main/java/org/apache/hudi/util/StreamerUtil.java | 1 + 3 files changed, 16 insertions(+)
[GitHub] [hudi] garyli1019 merged pull request #6300: [HUDI-4544] support retain hour cleaning policy for flink
garyli1019 merged PR #6300: URL: https://github.com/apache/hudi/pull/6300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on issue #6224: [SUPPORT] Caused by: java.lang.IllegalArgumentException: Cannot use marker based rollback strategy on completed instant
codope commented on issue #6224: URL: https://github.com/apache/hudi/issues/6224#issuecomment-1206097133 @jtchen-study Ideally, rollback is triggered only for failed writes. As such fallback to listing-based rollback should be safe but we need to understand how rollback got triggered for completed instant. Can you describe the sequence of events that happened and also the setup. Steps to reproduce would be very helpful. Is this a single-writer or multi-writer scenario? I've create HUDI-4550 to track the investigation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4550) Investigate why rollback is triggered for completed instant
Sagar Sumit created HUDI-4550: - Summary: Investigate why rollback is triggered for completed instant Key: HUDI-4550 URL: https://issues.apache.org/jira/browse/HUDI-4550 Project: Apache Hudi Issue Type: Task Reporter: Sagar Sumit See issue [https://github.com/apache/hudi/issues/6224] Ideally, rollback should not be triggered for a completed instant. But, if it does then it should be safe to fallback to listing based rollback. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4536) ClusteringOperator causes the NullPointerException when writing with BulkInsertWriterHelper in clustering
[ https://issues.apache.org/jira/browse/HUDI-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang closed HUDI-4536. Reviewers: Danny Chen Resolution: Fixed > ClusteringOperator causes the NullPointerException when writing with > BulkInsertWriterHelper in clustering > - > > Key: HUDI-4536 > URL: https://issues.apache.org/jira/browse/HUDI-4536 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > ClusteringOperator causes the NullPointerException when writing with > BulkInsertWriterHelper for clustering, because the BulkInsertWriterHelper > isn't set to null after close. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] branch master updated: [HUDI-4545] Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload (#6306)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e03cd0a198 [HUDI-4545] Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload (#6306) e03cd0a198 is described below commit e03cd0a198f63df7fb7ba71d1c9a0b01ae33f021 Author: Danny Chan AuthorDate: Fri Aug 5 14:16:53 2022 +0800 [HUDI-4545] Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload (#6306) --- .../model/OverwriteNonDefaultsWithLatestAvroPayload.java | 8 ++-- .../model/TestOverwriteNonDefaultsWithLatestAvroPayload.java | 11 +-- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java b/hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java index 93ac96cb42..6ce99aae21 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteNonDefaultsWithLatestAvroPayload.java @@ -20,6 +20,7 @@ package org.apache.hudi.common.model; import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; import org.apache.avro.generic.IndexedRecord; import org.apache.hudi.common.util.Option; @@ -60,16 +61,19 @@ public class OverwriteNonDefaultsWithLatestAvroPayload extends OverwriteWithLate if (isDeleteRecord(insertRecord)) { return Option.empty(); } else { + final GenericRecordBuilder builder = new GenericRecordBuilder(schema); List fields = schema.getFields(); fields.forEach(field -> { Object value = insertRecord.get(field.name()); value = field.schema().getType().equals(Schema.Type.STRING) && value != null ? value.toString() : value; Object defaultValue = field.defaultVal(); if (!overwriteField(value, defaultValue)) { - currentRecord.put(field.name(), value); + builder.set(field, value); +} else { + builder.set(field, currentRecord.get(field.pos())); } }); - return Option.of(currentRecord); + return Option.of(builder.build()); } } } diff --git a/hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java b/hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java index c6eee05b87..9e3405b304 100644 --- a/hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java +++ b/hudi-common/src/test/java/org/apache/hudi/common/model/TestOverwriteNonDefaultsWithLatestAvroPayload.java @@ -22,6 +22,7 @@ import org.apache.avro.JsonProperties; import org.apache.avro.Schema; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; @@ -31,6 +32,7 @@ import java.util.Collections; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertNotSame; /** * Unit tests {@link TestOverwriteNonDefaultsWithLatestAvroPayload}. @@ -85,8 +87,13 @@ public class TestOverwriteNonDefaultsWithLatestAvroPayload { assertEquals(record1, payload1.getInsertValue(schema).get()); assertEquals(record2, payload2.getInsertValue(schema).get()); -assertEquals(payload1.combineAndGetUpdateValue(record2, schema).get(), record1); -assertEquals(payload2.combineAndGetUpdateValue(record1, schema).get(), record3); +IndexedRecord combinedVal1 = payload1.combineAndGetUpdateValue(record2, schema).get(); +assertEquals(combinedVal1, record1); +assertNotSame(combinedVal1, record1); + +IndexedRecord combinedVal2 = payload2.combineAndGetUpdateValue(record1, schema).get(); +assertEquals(combinedVal2, record3); +assertNotSame(combinedVal2, record3); } @Test
[jira] [Commented] (HUDI-4545) Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload
[ https://issues.apache.org/jira/browse/HUDI-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575596#comment-17575596 ] Danny Chen commented on HUDI-4545: -- Fixed via master branch: e03cd0a198f63df7fb7ba71d1c9a0b01ae33f021 > Do not modify the current record directly for > OverwriteNonDefaultsWithLatestAvroPayload > --- > > Key: HUDI-4545 > URL: https://issues.apache.org/jira/browse/HUDI-4545 > Project: Apache Hudi > Issue Type: Bug > Components: core >Affects Versions: 0.12.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > Currently, we use short-cut logic: > {code:java} > a == b > // for example: HoodieMergeHandle#writeUpdateRecord > {code} > to decide whether the update happens, in principle, we should not modify the > records from disk directly, they should be kept as immutable, for any > changes, we should return new records instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4505) Returns instead of throws if lock file exists for FileSystemBasedLockProvider
[ https://issues.apache.org/jira/browse/HUDI-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4505: -- Priority: Blocker (was: Major) > Returns instead of throws if lock file exists for FileSystemBasedLockProvider > - > > Key: HUDI-4505 > URL: https://issues.apache.org/jira/browse/HUDI-4505 > Project: Apache Hudi > Issue Type: Improvement > Components: core >Reporter: Danny Chen >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.0 > > Attachments: image-2022-07-29-15-33-04-206.png > > > To avoid the verbose log like below: > > !image-2022-07-29-15-33-04-206.png|width=755,height=269! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4504) Disable metadata table by default for flink
[ https://issues.apache.org/jira/browse/HUDI-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4504: -- Priority: Blocker (was: Major) > Disable metadata table by default for flink > --- > > Key: HUDI-4504 > URL: https://issues.apache.org/jira/browse/HUDI-4504 > Project: Apache Hudi > Issue Type: Task > Components: flink >Reporter: Danny Chen >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HUDI-4545) Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload
[ https://issues.apache.org/jira/browse/HUDI-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-4545. -- > Do not modify the current record directly for > OverwriteNonDefaultsWithLatestAvroPayload > --- > > Key: HUDI-4545 > URL: https://issues.apache.org/jira/browse/HUDI-4545 > Project: Apache Hudi > Issue Type: Bug > Components: core >Affects Versions: 0.12.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > Currently, we use short-cut logic: > {code:java} > a == b > // for example: HoodieMergeHandle#writeUpdateRecord > {code} > to decide whether the update happens, in principle, we should not modify the > records from disk directly, they should be kept as immutable, for any > changes, we should return new records instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 merged pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
danny0405 merged PR #6306: URL: https://github.com/apache/hudi/pull/6306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
danny0405 commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1206083880 The failed it should not be affected by this patch and it succeed i last run: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=10593&view=logs&s=859b8d9a-8fd6-5a5c-6f5e-f84f1990894e, so i would just merge the PR then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
hudi-bot commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1206072734 ## CI report: * 04e513ba7885d107713277a0a7964c3a082d7405 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6311: [HUDI-4548] Unpack the column max/min to string instead of Utf8 for M…
hudi-bot commented on PR #6311: URL: https://github.com/apache/hudi/pull/6311#issuecomment-1206068121 ## CI report: * 17848d0f924115607c4144b3fa0a218333e89c99 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10602) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6310: [HUDI-4474] Fix inferring props for meta sync
hudi-bot commented on PR #6310: URL: https://github.com/apache/hudi/pull/6310#issuecomment-1206068104 ## CI report: * 366dc59d094ffcdd05ba7cdf905b85cb684a9fa7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10601) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6309: [HUDI-4547] Fix SortOperatorGen sort indices
hudi-bot commented on PR #6309: URL: https://github.com/apache/hudi/pull/6309#issuecomment-1206068086 ## CI report: * f6df4432d24639619566565e3fac86cbd855ce9d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10600) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206068076 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10595) * 666088efaacc584a5f36db4df2f44f358e1ba53c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10599) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #6227: [HUDI-4496] Fixing Orc support broken for Spark 3.x and more
codope commented on code in PR #6227: URL: https://github.com/apache/hudi/pull/6227#discussion_r938462895 ## hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala: ## @@ -223,6 +215,20 @@ private[sql] class AvroSerializer( val numFields = st.length (getter, ordinal) => structConverter(getter.getStruct(ordinal, numFields)) + + // Following section is amended to the original (Spark's) implementation + // >>> BEGINS + + + case (st: StructType, UNION) => Review Comment: Very good point! Sounds good. Let's make sure that the annotation is common across code, easily searchable. Probably, we should put this in coding guidelines as well? https://hudi.apache.org/contribute/developer-setup#coding-guidelines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on pull request #6248: [HUDI-4303] Adding 4 to 5 upgrade handler to check for old deprecated "default" partition value
YuweiXiao commented on PR #6248: URL: https://github.com/apache/hudi/pull/6248#issuecomment-1206062611 Hey @nsivabalan, just wondering why we are changing the default partition values. Is it only a new standard or other systems rely on this (like query engine)? Also, what if the user's partition has the name as `default`? Seems we cannot even verify it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6311: [HUDI-4548] Unpack the column max/min to string instead of Utf8 for M…
hudi-bot commented on PR #6311: URL: https://github.com/apache/hudi/pull/6311#issuecomment-1206065938 ## CI report: * 17848d0f924115607c4144b3fa0a218333e89c99 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6310: [HUDI-4474] Fix inferring props for meta sync
hudi-bot commented on PR #6310: URL: https://github.com/apache/hudi/pull/6310#issuecomment-1206065914 ## CI report: * 366dc59d094ffcdd05ba7cdf905b85cb684a9fa7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6309: [HUDI-4547] Fix SortOperatorGen sort indices
hudi-bot commented on PR #6309: URL: https://github.com/apache/hudi/pull/6309#issuecomment-1206065893 ## CI report: * f6df4432d24639619566565e3fac86cbd855ce9d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206065867 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10595) * 666088efaacc584a5f36db4df2f44f358e1ba53c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206063538 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10595) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6141: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table
hudi-bot commented on PR #6141: URL: https://github.com/apache/hudi/pull/6141#issuecomment-1206063311 ## CI report: * 2a493fcafb42e21cbfcae3787ab30853319f4bf3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Gatsby-Lee commented on issue #6024: [SUPPORT] DELETE_PARTITION causes AWS Athena Query failure
Gatsby-Lee commented on issue #6024: URL: https://github.com/apache/hudi/issues/6024#issuecomment-1206062572 @codope hi, First, as of 0.11.x, DELETE_PARTITION ( in AWS Glue Catalog ) doesn't fail or raise exception. ( It's different from 0.10.x ) Second, like you said the actual delete is done by cleaner ( lazy ), but before the actual delete, Hudi seems to try to delete metadata in AWS Glue Catalog first. Third, org_id=5 has never existed. I will try to replicate the issue with 0.11.1 and post the output here. ( I don't remember if I reproduced this issue with 0.11.0 or not. Anyway, I will try again ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4549) hive sync bundle causes class loader issue
Raymond Xu created HUDI-4549: Summary: hive sync bundle causes class loader issue Key: HUDI-4549 URL: https://issues.apache.org/jira/browse/HUDI-4549 Project: Apache Hudi Issue Type: Bug Components: dependencies Reporter: Raymond Xu Fix For: 0.12.0 A weird classpath issue i found: when testing deltastreamer using hudi-utilities-slim-bundle, if i put --jars hudi-hive-sync-bundle.jar,hudi-spark-bundle.jar then i’ll get this error when writing {code:java} Caused by: java.lang.NoSuchMethodError: org.apache.hudi.avro.MercifulJsonConverter.convert(Ljava/lang/String;Lorg/apache/avro/Schema;)Lorg/apache/avro/generic/GenericRecord; at org.apache.hudi.utilities.sources.helpers.AvroConvertor.fromJson(AvroConvertor.java:86) at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070) {code} if i put the spark bundle before the hive sync bundle, then no issue. Without hive-sync-bundle, also no issue. So hive-sync-bundle somehow messes up with classpath? not sure why it reports a hudi-common API not found… caused by shading avro? the same behavior i observed with aws-bundle, which makes sense, as it’s a superset of hive-sync-bundle -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4548) Unpack the column max/min to string instead of Utf8 for Mor table
[ https://issues.apache.org/jira/browse/HUDI-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4548: - Labels: pull-request-available (was: ) > Unpack the column max/min to string instead of Utf8 for Mor table > - > > Key: HUDI-4548 > URL: https://issues.apache.org/jira/browse/HUDI-4548 > Project: Apache Hudi > Issue Type: Bug > Components: core >Affects Versions: 0.12.0 >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] danny0405 opened a new pull request, #6311: [HUDI-4548] Unpack the column max/min to string instead of Utf8 for M…
danny0405 opened a new pull request, #6311: URL: https://github.com/apache/hudi/pull/6311 …or table ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4548) Unpack the column max/min to string instead of Utf8 for Mor table
Danny Chen created HUDI-4548: Summary: Unpack the column max/min to string instead of Utf8 for Mor table Key: HUDI-4548 URL: https://issues.apache.org/jira/browse/HUDI-4548 Project: Apache Hudi Issue Type: Bug Components: core Affects Versions: 0.12.0 Reporter: Danny Chen Fix For: 0.12.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan opened a new pull request, #6310: [HUDI-4474] Fix inferring props for meta sync
xushiyan opened a new pull request, #6310: URL: https://github.com/apache/hudi/pull/6310 - `HoodieConfig#setDefaults` looks up declared fields, so should pass static class for reflection, otherwise, subclasses of HoodieSyncConfig won't set defaults properly - Pass all write client configs of deltastreamer to meta sync -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4547) Partition sorting does not take effect when use bucket_insert.
[ https://issues.apache.org/jira/browse/HUDI-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4547: - Labels: pull-request-available (was: ) > Partition sorting does not take effect when use bucket_insert. > -- > > Key: HUDI-4547 > URL: https://issues.apache.org/jira/browse/HUDI-4547 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Reporter: HunterHunter >Priority: Major > Labels: pull-request-available > > https://github.com/apache/hudi/issues/6301 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] LinMingQiang opened a new pull request, #6309: [HUDI-4547] fix Partition sorting does not take effect when use bucke…
LinMingQiang opened a new pull request, #6309: URL: https://github.com/apache/hudi/pull/6309 …t_insert. Signed-off-by: HunterXHunter <1356469...@qq.com> ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* https://github.com/apache/hudi/issues/6301 ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4385) Support to trigger the compaction in the flink batch mode.
[ https://issues.apache.org/jira/browse/HUDI-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4385: - Fix Version/s: 0.12.0 (was: 0.13.0) > Support to trigger the compaction in the flink batch mode. > --- > > Key: HUDI-4385 > URL: https://issues.apache.org/jira/browse/HUDI-4385 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: HunterHunter >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > Configure parameter `compaction.batch.mode.enabled` to decide whether to > enable offline `compaction`, users no longer need to perform `offline > compaction` separately. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4348) merge into will cause data quality in concurrent scene
[ https://issues.apache.org/jira/browse/HUDI-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4348: -- Fix Version/s: 0.12.0 > merge into will cause data quality in concurrent scene > -- > > Key: HUDI-4348 > URL: https://issues.apache.org/jira/browse/HUDI-4348 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > a hudi table with 15 billion pieces of data, the update records has 30 > million every day, the 1000 records is different with hive table. > > when I set `executor-cores 1` and `spark.task.cpus 1`, there is no problem, > but when the parallelism over 1 in every executor, the data quality will > appear. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4348) merge into will cause data quality in concurrent scene
[ https://issues.apache.org/jira/browse/HUDI-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-4348. - Resolution: Fixed > merge into will cause data quality in concurrent scene > -- > > Key: HUDI-4348 > URL: https://issues.apache.org/jira/browse/HUDI-4348 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: KnightChess >Assignee: KnightChess >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.0 > > > a hudi table with 15 billion pieces of data, the update records has 30 > million every day, the 1000 records is different with hive table. > > when I set `executor-cores 1` and `spark.task.cpus 1`, there is no problem, > but when the parallelism over 1 in every executor, the data quality will > appear. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-4217) improve repeat init object in ExpressionPayload
[ https://issues.apache.org/jira/browse/HUDI-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-4217. - Resolution: Fixed > improve repeat init object in ExpressionPayload > --- > > Key: HUDI-4217 > URL: https://issues.apache.org/jira/browse/HUDI-4217 > Project: Apache Hudi > Issue Type: Improvement >Reporter: KnightChess >Assignee: KnightChess >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Attachments: flamegraph4.svg, image-2022-06-10-10-07-45-715.png > > > ExpressionPayload will repeat init object in the same schema, it cost lots of > cpu time -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4348) merge into will cause data quality in concurrent scene
[ https://issues.apache.org/jira/browse/HUDI-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4348: -- Priority: Blocker (was: Major) > merge into will cause data quality in concurrent scene > -- > > Key: HUDI-4348 > URL: https://issues.apache.org/jira/browse/HUDI-4348 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: KnightChess >Assignee: KnightChess >Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.0 > > > a hudi table with 15 billion pieces of data, the update records has 30 > million every day, the 1000 records is different with hive table. > > when I set `executor-cores 1` and `spark.task.cpus 1`, there is no problem, > but when the parallelism over 1 in every executor, the data quality will > appear. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-4541) Flink job fails with column stats enabled in metadata table due to NotSerializableException
[ https://issues.apache.org/jira/browse/HUDI-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575564#comment-17575564 ] Danny Chen commented on HUDI-4541: -- You can try per-job submission mode instead. > Flink job fails with column stats enabled in metadata table due to > NotSerializableException > > > Key: HUDI-4541 > URL: https://issues.apache.org/jira/browse/HUDI-4541 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Blocker > Fix For: 0.12.0 > > Attachments: Screen Shot 2022-08-04 at 17.10.05.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Run the following from the Flink quick start guide with metadata table, > column stats, and data skipping enabled > {code:java} > CREATE TABLE t1( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', -- this creates a MERGE_ON_READ table, by > default is COPY_ON_WRITE > 'metadata.enabled' = 'true', -- enables multi-modal index and metadata table > 'hoodie.metadata.index.column.stats.enable' = 'true', -- enables column > stats in metadata table > 'read.data.skipping.enabled' = 'true' -- enables data skipping > ); > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); {code} > !Screen Shot 2022-08-04 at 17.10.05.png|width=1130,height=463! > Exception: > {code:java} > 2022-08-04 17:04:41 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) > at > org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) > at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) > at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.
[GitHub] [hudi] hudi-bot commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
hudi-bot commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1206034476 ## CI report: * 137f2e09f90bc9f179f3c94c844be7be5e5f2325 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10593) * 04e513ba7885d107713277a0a7964c3a082d7405 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10598) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
hudi-bot commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1206032240 ## CI report: * 137f2e09f90bc9f179f3c94c844be7be5e5f2325 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10593) * 04e513ba7885d107713277a0a7964c3a082d7405 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance
hudi-bot commented on PR #6046: URL: https://github.com/apache/hudi/pull/6046#issuecomment-1206031953 ## CI report: * dfd50cd0007c4ff48b3e0e27c368d573e47560a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10486) * 5a6ac9622379715e890f1ec1cd7be9422febeb5c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10597) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4547) Partition sorting does not take effect when use bucket_insert.
HunterHunter created HUDI-4547: -- Summary: Partition sorting does not take effect when use bucket_insert. Key: HUDI-4547 URL: https://issues.apache.org/jira/browse/HUDI-4547 Project: Apache Hudi Issue Type: Bug Components: flink Reporter: HunterHunter https://github.com/apache/hudi/issues/6301 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] eric9204 opened a new issue, #6308: [SUPPORT] Spark multi writer failed,seems like clazz conflict ! ! !
eric9204 opened a new issue, #6308: URL: https://github.com/apache/hudi/issues/6308 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy
hudi-bot commented on PR #6267: URL: https://github.com/apache/hudi/pull/6267#issuecomment-1206029678 ## CI report: * 43899bb9bf0456c877213ca8bf8641d8258d6903 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10594) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6246: [HUDI-4543] able to disable precombine field when table schema contains a field named ts
hudi-bot commented on PR #6246: URL: https://github.com/apache/hudi/pull/6246#issuecomment-1206029611 ## CI report: * 7b04e73fecb574e199a3aad9e74dd6c9ae45d123 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
hudi-bot commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1206029749 ## CI report: * 137f2e09f90bc9f179f3c94c844be7be5e5f2325 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6141: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table
hudi-bot commented on PR #6141: URL: https://github.com/apache/hudi/pull/6141#issuecomment-1206029481 ## CI report: * 23f96b3ecc8812ffae7f9e692e883cdabba03eb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10541) * 2a493fcafb42e21cbfcae3787ab30853319f4bf3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10596) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance
hudi-bot commented on PR #6046: URL: https://github.com/apache/hudi/pull/6046#issuecomment-1206029376 ## CI report: * dfd50cd0007c4ff48b3e0e27c368d573e47560a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10486) * 5a6ac9622379715e890f1ec1cd7be9422febeb5c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiaozhch5 closed issue #6301: [SUPPORT] Flink uses bulk_insert mode to load the data from hdfs file to hudi very slow.
xiaozhch5 closed issue #6301: [SUPPORT] Flink uses bulk_insert mode to load the data from hdfs file to hudi very slow. URL: https://github.com/apache/hudi/issues/6301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6141: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table
hudi-bot commented on PR #6141: URL: https://github.com/apache/hudi/pull/6141#issuecomment-1206007248 ## CI report: * 23f96b3ecc8812ffae7f9e692e883cdabba03eb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10541) * 2a493fcafb42e21cbfcae3787ab30853319f4bf3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing commented on issue #6126: [SUPPORT] Hudi Table(MOR) not getting created from Flink Sql Client Shell
yuzhaojing commented on issue #6126: URL: https://github.com/apache/hudi/issues/6126#issuecomment-1206006030 Sure, I will try to reproduce it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206003270 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10595) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.
wzx140 commented on code in PR #5629: URL: https://github.com/apache/hudi/pull/5629#discussion_r938412210 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/util/HoodieSparkRecordUtils.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.util; + +import org.apache.hudi.HoodieInternalRowUtils; +import org.apache.hudi.commmon.model.HoodieSparkRecord; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.model.HoodieOperation; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.keygen.RowKeyGeneratorHelper; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.DataType; +import org.apache.spark.sql.types.StructField; +import org.apache.spark.sql.types.StructType; + +import java.util.List; + +import scala.Tuple2; + +public class HoodieSparkRecordUtils { + + /** + * Utility method to convert bytes to HoodieRecord using schema and payload class. + */ + public static HoodieRecord convertToHoodieSparkRecord(InternalRow data, StructType structType) { +return new HoodieSparkRecord(data, structType); + } + + /** + * Utility method to convert InternalRow to HoodieRecord using schema and payload class. + */ + public static HoodieRecord convertToHoodieSparkRecord(StructType structType, InternalRow data, String preCombineField, boolean withOperationField) { +return convertToHoodieSparkRecord(structType, data, preCombineField, +Pair.of(HoodieRecord.RECORD_KEY_METADATA_FIELD, HoodieRecord.PARTITION_PATH_METADATA_FIELD), +withOperationField, Option.empty()); + } + + public static HoodieRecord convertToHoodieSparkRecord(StructType structType, InternalRow data, String preCombineField, boolean withOperationField, + Option partitionName) { +return convertToHoodieSparkRecord(structType, data, preCombineField, +Pair.of(HoodieRecord.RECORD_KEY_METADATA_FIELD, HoodieRecord.PARTITION_PATH_METADATA_FIELD), +withOperationField, partitionName); + } + + /** + * Utility method to convert bytes to HoodieRecord using schema and payload class. + */ + public static HoodieRecord convertToHoodieSparkRecord(StructType structType, InternalRow data, String preCombineField, Pair recordKeyPartitionPathFieldPair, + boolean withOperationField, Option partitionName) { +final String recKey = getValue(structType, recordKeyPartitionPathFieldPair.getKey(), data).toString(); +final String partitionPath = (partitionName.isPresent() ? partitionName.get() : +getValue(structType, recordKeyPartitionPathFieldPair.getRight(), data).toString()); + +Object preCombineVal = getPreCombineVal(structType, data, preCombineField); Review Comment: @alexeykudinkin Thank you for your advice, I will study it carefully. This sounds reasonable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.
wzx140 commented on code in PR #5629: URL: https://github.com/apache/hudi/pull/5629#discussion_r938408127 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordMerger.java: ## @@ -30,9 +34,19 @@ * It can implement the merging logic of HoodieRecord of different engines * and avoid the performance consumption caused by the serialization/deserialization of Avro payload. */ -public interface HoodieMerge extends Serializable { - - HoodieRecord preCombine(HoodieRecord older, HoodieRecord newer); +@PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING) +public interface HoodieRecordMerger extends Serializable { + + /** + * This method converges combineAndGetUpdateValue and precombine from HoodiePayload. + * It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we can translate as having 3 versions A, B, C + * of the single record, both orders of operations applications have to yield the same result) + */ + Option merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException; Review Comment: @alexeykudinkin Maybe vc means merge function will deduplicate the records for insertion. Do you think we should put the shouldCombine marker in the record? cc @vinothchandar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
hudi-bot commented on PR #6307: URL: https://github.com/apache/hudi/pull/6307#issuecomment-1206001322 ## CI report: * 5e75dee8c56cb14110b33548c09aad222adc57d2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on issue #6024: [SUPPORT] DELETE_PARTITION causes AWS Athena Query failure
codope commented on issue #6024: URL: https://github.com/apache/hudi/issues/6024#issuecomment-1205997925 Btw, `org_id=5_\$folder$` maybe an S3 thing. Did the partition `org_id=5` ever existed before? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] trushev commented on a diff in pull request #6276: [HUDI-4523] Sequential submitting of flink jobs leads to java.net.ConnectException
trushev commented on code in PR #6276: URL: https://github.com/apache/hudi/pull/6276#discussion_r938401579 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java: ## @@ -124,10 +124,8 @@ public FileSystemViewManager getViewManager() { return viewManager; } - public boolean canReuseFor(String basePath) { -return this.server != null -&& this.viewManager != null -&& this.basePath.equals(basePath); + public boolean canReuse() { +return this.server != null && this.viewManager != null; } Review Comment: I've checked the same timeservice for different `basePath`. It works on my test. Do you think we need to reuse the same timeline service for different path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4546) Optimize catalog cast logic in HoodieSpark3Analysis
leesf created HUDI-4546: --- Summary: Optimize catalog cast logic in HoodieSpark3Analysis Key: HUDI-4546 URL: https://issues.apache.org/jira/browse/HUDI-4546 Project: Apache Hudi Issue Type: Improvement Reporter: leesf Assignee: leesf In HoodieSpark3Analysis, if it is CreateV2Table, there is no need to cast the HoodieCatalog since CreateV2Table contains TableCatalog and we would use it directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Attachment: spring-shell-1.2.0.RELEASE.jar > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empty result for command show > fsview all · Issue #6177 · apache/hudi > (github.com)|https://github.com/apache/hudi/issues/6177] > **Describe the problem you faced** > Hudi cli got empty result after running command show fsview all. > ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) > The type of table t1 is COW and I am sure that the parquet file is actually > generated inside data folder. Also, the parquet files are not damaged as the > data could be retrieved correctly by reading as Hudi table or directly > reading each parquet file(using Spark). > **To Reproduce** > Steps to reproduce the behavior: > 1. Enter Flink SQL client. > 2. Execute the SQL and check the data was written successfully. > ```sql > CREATE TABLE t1( > uuid VARCHAR(20), > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 'hdfs:///path/to/table/', > 'table.type' = 'COPY_ON_WRITE' > ); > -- insert data using values > INSERT INTO t1 VALUES > ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); > ``` > 3. Enter Hudi cli and execute `show fsview all` > **Expected behavior** > `show fsview all` in Hudi cli should return all file slices. > **Environment Description** > * Hudi version : 0.11.1 > * Spark version : 3.1.1 > * Hive version : 3.1.0 > * Hadoop version : 3.1.1 > * Storage (HDFS/S3/GCS..) : HDFS > * Running on Docker? (yes/no) : no > **Additional context** > No. > **Stacktrace** > N/A > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4546) Optimize catalog cast logic in HoodieSpark3Analysis
[ https://issues.apache.org/jira/browse/HUDI-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4546: - Labels: pull-request-available (was: ) > Optimize catalog cast logic in HoodieSpark3Analysis > --- > > Key: HUDI-4546 > URL: https://issues.apache.org/jira/browse/HUDI-4546 > Project: Apache Hudi > Issue Type: Improvement >Reporter: leesf >Assignee: leesf >Priority: Major > Labels: pull-request-available > > In HoodieSpark3Analysis, if it is CreateV2Table, there is no need to cast the > HoodieCatalog since CreateV2Table contains TableCatalog and we would use it > directly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] leesf opened a new pull request, #6307: [HUDI-4546] Optimize catalog cast logic in HoodieSpark3Analysis
leesf opened a new pull request, #6307: URL: https://github.com/apache/hudi/pull/6307 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Description: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] {*}{{*}}Describe the problem you faced{{*}}{*} Hudi cli got empty result after running command show fsview all. ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). {*}{{*}}To Reproduce{{*}}{*} Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); – insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` {*}{{*}}Expected behavior{{*}}{*} `show fsview all` in Hudi cli should return all file slices. {*}{{*}}Environment Description{{*}}{*} * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no {*}{{*}}Additional context{{*}}{*} No. {*}{{*}}Stacktrace{{*}}{*} N/A Temporary solution: I modified and recompiled spring-shell 1.2.0.RELEASE. Please download the attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. was: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] *{*}Describe the problem you faced{*}* Hudi cli got empty result after running command show fsview all. ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). *{*}To Reproduce{*}* Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); – insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` *{*}Expected behavior{*}* `show fsview all` in Hudi cli should return all file slices. *{*}Environment Description{*}* * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no *{*}Additional context{*}* No. *{*}Stacktrace{*}* N/A Temporary solution: I modified and reocmpiled spring-shell 1.2.0.RELEASE. Please download the attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version :
[jira] [Updated] (HUDI-4485) Hudi cli got empty result for command show fsview all
[ https://issues.apache.org/jira/browse/HUDI-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yao Zhang updated HUDI-4485: Description: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] *{*}Describe the problem you faced{*}* Hudi cli got empty result after running command show fsview all. ![image]([https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png]) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). *{*}To Reproduce{*}* Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); – insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` *{*}Expected behavior{*}* `show fsview all` in Hudi cli should return all file slices. *{*}Environment Description{*}* * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no *{*}Additional context{*}* No. *{*}Stacktrace{*}* N/A Temporary solution: I modified and reocmpiled spring-shell 1.2.0.RELEASE. Please download the attachment and replace the same file in ${HUDI_CLI_DIR}/target/lib/. was: This issue is from: [[SUPPORT] Hudi cli got empty result for command show fsview all · Issue #6177 · apache/hudi (github.com)|https://github.com/apache/hudi/issues/6177] **Describe the problem you faced** Hudi cli got empty result after running command show fsview all. ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) The type of table t1 is COW and I am sure that the parquet file is actually generated inside data folder. Also, the parquet files are not damaged as the data could be retrieved correctly by reading as Hudi table or directly reading each parquet file(using Spark). **To Reproduce** Steps to reproduce the behavior: 1. Enter Flink SQL client. 2. Execute the SQL and check the data was written successfully. ```sql CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs:///path/to/table/', 'table.type' = 'COPY_ON_WRITE' ); -- insert data using values INSERT INTO t1 VALUES ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'), ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'), ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'), ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'), ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'), ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'), ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'), ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4'); ``` 3. Enter Hudi cli and execute `show fsview all` **Expected behavior** `show fsview all` in Hudi cli should return all file slices. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.1.1 * Hive version : 3.1.0 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** No. **Stacktrace** N/A > Hudi cli got empty result for command show fsview all > - > > Key: HUDI-4485 > URL: https://issues.apache.org/jira/browse/HUDI-4485 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Affects Versions: 0.11.1 > Environment: Hudi version : 0.11.1 > Spark version : 3.1.1 > Hive version : 3.1.0 > Hadoop version : 3.1.1 >Reporter: Yao Zhang >Priority: Minor > Fix For: 0.13.0 > > Attachments: spring-shell-1.2.0.RELEASE.jar > > > This issue is from: [[SUPPORT] Hudi cli got empt
[GitHub] [hudi] hudi-bot commented on pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy
hudi-bot commented on PR #6267: URL: https://github.com/apache/hudi/pull/6267#issuecomment-1205974002 ## CI report: * Unknown: [CANCELED](TBD) * 43899bb9bf0456c877213ca8bf8641d8258d6903 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10594) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] flashJd commented on pull request #6246: [HUDI-4543] able to disable precombine field when table schema contains a field named ts
flashJd commented on PR #6246: URL: https://github.com/apache/hudi/pull/6246#issuecomment-1205972641 > I've filed a Jira ticket and changed the commit titile, also fixed the checkStyle conflicts, can you help approval the workflow, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] flashJd commented on pull request #6246: [HUDI-4543] able to disable precombine field when table schema contains a field named ts
flashJd commented on PR #6246: URL: https://github.com/apache/hudi/pull/6246#issuecomment-1205972885 > Thanks for the contribution, can we log a JIRA issue and change the commit titile to form like: `[HUDI-${JIRA issue ID}] ${your actual commit title}`. I've filed a Jira ticket and changed the commit titile, also fixed the checkStyle conflicts, can you help approval the workflow, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
hudi-bot commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1205971247 ## CI report: * 137f2e09f90bc9f179f3c94c844be7be5e5f2325 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10593) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy
hudi-bot commented on PR #6267: URL: https://github.com/apache/hudi/pull/6267#issuecomment-1205971169 ## CI report: * Unknown: [CANCELED](TBD) * 43899bb9bf0456c877213ca8bf8641d8258d6903 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy
Zouxxyy commented on PR #6267: URL: https://github.com/apache/hudi/pull/6267#issuecomment-1205969782 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Zouxxyy commented on a diff in pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy
Zouxxyy commented on code in PR #6267: URL: https://github.com/apache/hudi/pull/6267#discussion_r938389569 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -248,17 +248,17 @@ private Pair> getFilesToCleanKeepingLatestVersions( while (fileSliceIterator.hasNext() && keepVersions > 0) { // Skip this most recent version +fileSliceIterator.next(); +keepVersions--; + } + // Delete the remaining files + while (fileSliceIterator.hasNext()) { Review Comment: I added a test case, hope it helps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron commented on pull request #6267: [HUDI-4515] Fix savepoints will be cleaned in keeping latest versions policy
YannByron commented on PR #6267: URL: https://github.com/apache/hudi/pull/6267#issuecomment-1205968918 @nsivabalan please trigger to run the workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
hudi-bot commented on PR #6306: URL: https://github.com/apache/hudi/pull/6306#issuecomment-1205968594 ## CI report: * 137f2e09f90bc9f179f3c94c844be7be5e5f2325 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6246: [HUDI-4543] able to disable precombine field when table schema contains a field named ts
hudi-bot commented on PR #6246: URL: https://github.com/apache/hudi/pull/6246#issuecomment-1205962815 ## CI report: * 39773f8cf8f7a8441963080fd43d8ea04f1a74c9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10586) * 7b04e73fecb574e199a3aad9e74dd6c9ae45d123 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-4543) can't enable the proc_time/natural order sequence semantics when a ts field exists in the table schema
[ https://issues.apache.org/jira/browse/HUDI-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4543: - Labels: pull-request-available (was: ) > can't enable the proc_time/natural order sequence semantics when a ts field > exists in the table schema > -- > > Key: HUDI-4543 > URL: https://issues.apache.org/jira/browse/HUDI-4543 > Project: Apache Hudi > Issue Type: Bug >Reporter: yonghua jian >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4545) Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload
[ https://issues.apache.org/jira/browse/HUDI-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4545: - Labels: pull-request-available (was: ) > Do not modify the current record directly for > OverwriteNonDefaultsWithLatestAvroPayload > --- > > Key: HUDI-4545 > URL: https://issues.apache.org/jira/browse/HUDI-4545 > Project: Apache Hudi > Issue Type: Bug > Components: core >Affects Versions: 0.12.0 >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > > Currently, we use short-cut logic: > {code:java} > a == b > // for example: HoodieMergeHandle#writeUpdateRecord > {code} > to decide whether the update happens, in principle, we should not modify the > records from disk directly, they should be kept as immutable, for any > changes, we should return new records instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #6246: [HUDI-4543] able to disable precombine field when table schema contains a field named ts
hudi-bot commented on PR #6246: URL: https://github.com/apache/hudi/pull/6246#issuecomment-1205965800 ## CI report: * 39773f8cf8f7a8441963080fd43d8ea04f1a74c9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10586) * 7b04e73fecb574e199a3aad9e74dd6c9ae45d123 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10592) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 opened a new pull request, #6306: [HUDI-4545] Do not modify the current record directly for OverwriteNo…
danny0405 opened a new pull request, #6306: URL: https://github.com/apache/hudi/pull/6306 …nDefaultsWithLatestAvroPayload ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-4545) Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload
Danny Chen created HUDI-4545: Summary: Do not modify the current record directly for OverwriteNonDefaultsWithLatestAvroPayload Key: HUDI-4545 URL: https://issues.apache.org/jira/browse/HUDI-4545 Project: Apache Hudi Issue Type: Bug Components: core Affects Versions: 0.12.0 Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.12.0 Currently, we use short-cut logic: {code:java} a == b // for example: HoodieMergeHandle#writeUpdateRecord {code} to decide whether the update happens, in principle, we should not modify the records from disk directly, they should be kept as immutable, for any changes, we should return new records instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[hudi] 01/02: update dynamodb lockk provider docs to include iam and additional dependencies
This is an automated email from the ASF dual-hosted git repository. wenningd pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git commit c47f323765b1abb5196b16d0224b5baa0940bc7e Author: atharvai AuthorDate: Thu Jul 21 11:11:09 2022 +0100 update dynamodb lockk provider docs to include iam and additional dependencies --- website/docs/concurrency_control.md | 33 + website/docs/configurations.md | 2 +- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/website/docs/concurrency_control.md b/website/docs/concurrency_control.md index e71cb4a8f2..689b7632f7 100644 --- a/website/docs/concurrency_control.md +++ b/website/docs/concurrency_control.md @@ -78,7 +78,10 @@ hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLoc hoodie.write.lock.dynamodb.table hoodie.write.lock.dynamodb.partition_key hoodie.write.lock.dynamodb.region +hoodie.write.lock.dynamodb.endpoint_url +hoodie.write.lock.dynamodb.billing_mode ``` + Also, to set up the credentials for accessing AWS resources, customers can pass the following props to Hudi jobs: ``` hoodie.aws.access.key @@ -87,6 +90,36 @@ hoodie.aws.session.token ``` If not configured, Hudi falls back to use [DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html). + +IAM policy for your service instance will need to add the following permissions: + +```json +{ + "Sid":"DynamoDBLocksTable", + "Effect": "Allow", + "Action": [ +"dynamodb:CreateTable", +"dynamodb:DeleteItem", +"dynamodb:DescribeTable", +"dynamodb:GetItem", +"dynamodb:PutItem", +"dynamodb:Scan", +"dynamodb:UpdateItem" + ], + "Resource": "arn:${Partition}:dynamodb:${Region}:${Account}:table/${TableName}" +} +``` +- `TableName` : same as `hoodie.write.lock.dynamodb.partition_key` +- `Region`: same as `hoodie.write.lock.dynamodb.region` + +AWS SDK dependencies are not bundled with Hudi from v0.10.x and will need to be added to your classpath. +Add the following Maven packages (check the latest versions at time of install): +``` +com.amazonaws:dynamodb-lock-client +com.amazonaws:aws-java-sdk-dynamodb +com.amazonaws:aws-java-sdk-core +``` + ## Datasource Writer The `hudi-spark` module offers the DataSource API to write (and read) a Spark DataFrame into a Hudi table. diff --git a/website/docs/configurations.md b/website/docs/configurations.md index b92e40f06c..6dcb76179b 100644 --- a/website/docs/configurations.md +++ b/website/docs/configurations.md @@ -1696,7 +1696,7 @@ Configs that control DynamoDB based locking mechanisms required for concurrency `Config Class`: org.apache.hudi.config.DynamoDbBasedLockConfig > hoodie.write.lock.dynamodb.billing_mode -> For DynamoDB based lock provider, by default it is PAY_PER_REQUEST mode +> For DynamoDB based lock provider, by default it is PAY_PER_REQUEST mode. Alternative is PROVISIONED > **Default Value**: PAY_PER_REQUEST (Optional) > `Config Param: DYNAMODB_LOCK_BILLING_MODE` > `Since Version: 0.10.0`
[hudi] 02/02: update versioned docs for dynamodb lock provider docs to include iam and additional dependencies
This is an automated email from the ASF dual-hosted git repository. wenningd pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git commit d89c0ce798f3842ab325ad2c7b879e554cf18214 Author: atharvai AuthorDate: Thu Jul 21 11:18:29 2022 +0100 update versioned docs for dynamodb lock provider docs to include iam and additional dependencies --- website/docs/concurrency_control.md| 1 - .../version-0.10.0/concurrency_control.md | 32 + .../version-0.10.1/concurrency_control.md | 33 +- .../version-0.11.0/concurrency_control.md | 33 +- .../version-0.11.1/concurrency_control.md | 33 +- 5 files changed, 128 insertions(+), 4 deletions(-) diff --git a/website/docs/concurrency_control.md b/website/docs/concurrency_control.md index 689b7632f7..25a523ee7c 100644 --- a/website/docs/concurrency_control.md +++ b/website/docs/concurrency_control.md @@ -90,7 +90,6 @@ hoodie.aws.session.token ``` If not configured, Hudi falls back to use [DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html). - IAM policy for your service instance will need to add the following permissions: ```json diff --git a/website/versioned_docs/version-0.10.0/concurrency_control.md b/website/versioned_docs/version-0.10.0/concurrency_control.md index a9a0d5860c..fe38f102cd 100644 --- a/website/versioned_docs/version-0.10.0/concurrency_control.md +++ b/website/versioned_docs/version-0.10.0/concurrency_control.md @@ -78,7 +78,10 @@ hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLoc hoodie.write.lock.dynamodb.table hoodie.write.lock.dynamodb.partition_key hoodie.write.lock.dynamodb.region +hoodie.write.lock.dynamodb.endpoint_url +hoodie.write.lock.dynamodb.billing_mode ``` + Also, to set up the credentials for accessing AWS resources, customers can pass the following props to Hudi jobs: ``` hoodie.aws.access.key @@ -87,6 +90,35 @@ hoodie.aws.session.token ``` If not configured, Hudi falls back to use [DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html). +IAM policy for your service instance will need to add the following permissions: + +```json +{ + "Sid":"DynamoDBLocksTable", + "Effect": "Allow", + "Action": [ +"dynamodb:CreateTable", +"dynamodb:DeleteItem", +"dynamodb:DescribeTable", +"dynamodb:GetItem", +"dynamodb:PutItem", +"dynamodb:Scan", +"dynamodb:UpdateItem" + ], + "Resource": "arn:${Partition}:dynamodb:${Region}:${Account}:table/${TableName}" +} +``` +- `TableName` : same as `hoodie.write.lock.dynamodb.partition_key` +- `Region`: same as `hoodie.write.lock.dynamodb.region` + +AWS SDK dependencies are not bundled with Hudi from v0.10.x and will need to be added to your classpath. +Add the following Maven packages (check the latest versions at time of install): +``` +com.amazonaws:dynamodb-lock-client +com.amazonaws:aws-java-sdk-dynamodb +com.amazonaws:aws-java-sdk-core +``` + ## Datasource Writer The `hudi-spark` module offers the DataSource API to write (and read) a Spark DataFrame into a Hudi table. diff --git a/website/versioned_docs/version-0.10.1/concurrency_control.md b/website/versioned_docs/version-0.10.1/concurrency_control.md index a9a0d5860c..6377c762bd 100644 --- a/website/versioned_docs/version-0.10.1/concurrency_control.md +++ b/website/versioned_docs/version-0.10.1/concurrency_control.md @@ -70,7 +70,6 @@ hoodie.write.lock.hivemetastore.table `The HiveMetastore URI's are picked up from the hadoop configuration file loaded during runtime.` **`Amazon DynamoDB`** based lock provider - Amazon DynamoDB based lock provides a simple way to support multi writing across different clusters ``` @@ -78,7 +77,10 @@ hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLoc hoodie.write.lock.dynamodb.table hoodie.write.lock.dynamodb.partition_key hoodie.write.lock.dynamodb.region +hoodie.write.lock.dynamodb.endpoint_url +hoodie.write.lock.dynamodb.billing_mode ``` + Also, to set up the credentials for accessing AWS resources, customers can pass the following props to Hudi jobs: ``` hoodie.aws.access.key @@ -87,6 +89,35 @@ hoodie.aws.session.token ``` If not configured, Hudi falls back to use [DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html). +IAM policy for your service instance will need to add the following permissions: + +```json +{ + "Sid":"DynamoDBLocksTable", + "Effect": "Allow", + "Action": [ +"dynamodb:CreateTable", +"dynamodb:DeleteItem", +"dynamodb:DescribeTable", +"dynamodb:GetItem", +"dynamodb:PutIte
[hudi] branch asf-site updated (5a69d734b6 -> d89c0ce798)
This is an automated email from the ASF dual-hosted git repository. wenningd pushed a change to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git from 5a69d734b6 GitHub Actions build asf-site new c47f323765 update dynamodb lockk provider docs to include iam and additional dependencies new d89c0ce798 update versioned docs for dynamodb lock provider docs to include iam and additional dependencies The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: website/docs/concurrency_control.md| 32 + website/docs/configurations.md | 2 +- .../version-0.10.0/concurrency_control.md | 32 + .../version-0.10.1/concurrency_control.md | 33 +- .../version-0.11.0/concurrency_control.md | 33 +- .../version-0.11.1/concurrency_control.md | 33 +- 6 files changed, 161 insertions(+), 4 deletions(-)
[GitHub] [hudi] zhedoubushishi merged pull request #6168: [DOCS] Update aws dynamodb lock provider docs
zhedoubushishi merged PR #6168: URL: https://github.com/apache/hudi/pull/6168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhedoubushishi commented on pull request #6168: [DOCS] Update aws dynamodb lock provider docs
zhedoubushishi commented on PR #6168: URL: https://github.com/apache/hudi/pull/6168#issuecomment-1205962690 @atharvai Thanks for updating the doc! LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
XuQianJin-Stars commented on code in PR #6284: URL: https://github.com/apache/hudi/pull/6284#discussion_r938382783 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -92,11 +92,12 @@ protected HoodieMergedLogRecordScanner(FileSystem fs, String basePath, List(maxMemorySizeInBytes, spillableMapBasePath, new DefaultSizeEstimator(), + this.records = new ExternalSpillableMap<>(maxMemorySizeInBytes, basePath + spillableMapBasePath, new DefaultSizeEstimator(), new HoodieRecordSizeEstimator(readerSchema), diskMapType, isBitCaskDiskMapCompressionEnabled); + Review Comment: > not sure if we can do this. spillableMapbase path is configurable. If one does not want "/tmp/" which is the default, they can always override using the configs. It is more troublesome for users to use, and this option needs to be specified additionally, and considering that the `basepath` will definitely be in a large data disk, the temporary directory can be put into the same disk as the `basepath`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full
XuQianJin-Stars commented on code in PR #6284: URL: https://github.com/apache/hudi/pull/6284#discussion_r938382783 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -92,11 +92,12 @@ protected HoodieMergedLogRecordScanner(FileSystem fs, String basePath, List(maxMemorySizeInBytes, spillableMapBasePath, new DefaultSizeEstimator(), + this.records = new ExternalSpillableMap<>(maxMemorySizeInBytes, basePath + spillableMapBasePath, new DefaultSizeEstimator(), new HoodieRecordSizeEstimator(readerSchema), diskMapType, isBitCaskDiskMapCompressionEnabled); + Review Comment: > not sure if we can do this. spillableMapbase path is configurable. If one does not want "/tmp/" which is the default, they can always override using the configs. It is more troublesome for users to use, and this option needs to be specified additionally, and considering that the basepath will definitely be in a large data disk, the temporary directory can be put into the same disk as the basepath. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a diff in pull request #6157: [HUDI-4431] Fix log file will not roll over to a new file
nsivabalan commented on code in PR #6157: URL: https://github.com/apache/hudi/pull/6157#discussion_r938380933 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java: ## @@ -94,7 +94,8 @@ private FSDataOutputStream getOutputStream() throws IOException, InterruptedExce Path path = logFile.getPath(); if (fs.exists(path)) { boolean isAppendSupported = StorageSchemes.isAppendSupported(fs.getScheme()); -if (isAppendSupported) { +boolean needRollOverToNewFile = fs.getFileStatus(path).getLen() > sizeThreshold; +if (isAppendSupported && !needRollOverToNewFile) { Review Comment: @XuQianJin-Stars : any updates on this end. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [DOCS] add description about clean policy based on hours (#6215)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 80c2f59190 [DOCS] add description about clean policy based on hours (#6215) 80c2f59190 is described below commit 80c2f591908680b8cb1d7c2f815a37840af1ee15 Author: feiyang_deepnova <736320...@qq.com> AuthorDate: Fri Aug 5 09:50:48 2022 +0800 [DOCS] add description about clean policy based on hours (#6215) Co-authored-by: linfey --- website/versioned_docs/version-0.11.1/hoodie_cleaner.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/website/versioned_docs/version-0.11.1/hoodie_cleaner.md b/website/versioned_docs/version-0.11.1/hoodie_cleaner.md index 10f1aa2450..34c1cf11d1 100644 --- a/website/versioned_docs/version-0.11.1/hoodie_cleaner.md +++ b/website/versioned_docs/version-0.11.1/hoodie_cleaner.md @@ -23,6 +23,9 @@ disk for at least 5 hours, thereby preventing the longest running query from fai This policy is useful when it is known how many MAX versions of the file does one want to keep at any given time. To achieve the same behaviour as before of preventing long running queries from failing, one should do their calculations based on data patterns. Alternatively, this policy is also useful if a user just wants to maintain 1 latest version of the file. +- **KEEP_LATEST_BY_HOURS**: This policy clean up based on hours.It is simple and useful when knowing that you want to keep files at any given time. + Corresponding to commits with commit times older than the configured number of hours to be retained are cleaned. + Currently you can configure by parameter 'hoodie.cleaner.hours.retained'. ### Configurations For details about all possible configurations and their default values see the [configuration docs](https://hudi.apache.org/docs/configurations#Compaction-Configs).
[GitHub] [hudi] nsivabalan merged pull request #6215: [DOCS] Added the description of the cleaning policy
nsivabalan merged PR #6215: URL: https://github.com/apache/hudi/pull/6215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #6216: [HUDI-4475] fix create table with not exists hoodie properties file
nsivabalan commented on PR #6216: URL: https://github.com/apache/hudi/pull/6216#issuecomment-1205950765 we added support to update hoodie.properties in a live env. mainly to update some table propertlies like metadata related props (list of partitions in metadata table). So, here is how upgrade works so that its fault tolerant and recoverable. orig.hoodie.properties Step1: take back up. cp orig.hoodie.properties backup.hoodie.properties. Step2: delete orig.hoodie.properties Step3: create new hoodie.properties in memory w/ any new properties required. create orig.hoodie.properties. Step4: delete backup.hoodie.properties. b/w step2 and step3, readers will read backup.hoodie.properties. Above is designed such that, if there is a crash at any point, we are safe and restarting the pipeline would suffice. ref: https://github.com/apache/hudi/blob/a75cc02273ae87c383ae1ed46f95006c366f70fc/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java#L344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org