[GitHub] [hudi] codope commented on a diff in pull request #7942: [HUDI-5753] Add docs for record payload
codope commented on code in PR #7942: URL: https://github.com/apache/hudi/pull/7942#discussion_r1106780501 ## website/docs/record_payload.md: ## @@ -0,0 +1,97 @@ +--- +title: Record Payload +keywords: [hudi, merge, upsert, precombine] +--- + +## Record Payload + +One of the core features of Hudi is the ability to incrementally upsert data, deduplicate and merge records on the fly. +Additionally, users can implement their custom logic to merge the input records with the record on storage. Record +payload is an abstract representation of a Hudi record that allows the aforementioned capability. As we shall see below, +Hudi provides out-of-box support for different payloads for different use cases, and a new record merger API for +optimized payload handling. But, first let us understand how record payload is used in the Hudi upsert path. + + + + + +Figure above shows the main stages that records go through while being written to the Hudi table. In the precombining +stage, Hudi performs any deduplication based on the payload implementation and precombine key configured by the user. +Further, on index lookup, Hudi identifies which records are being updated and the record payload implementation tells +Hudi how to merge the incoming record with the existing record on storage. + +### Existing Payloads + + OverwriteWithLatestAvroPayload + +This is the default record payload implementation. It picks the record with the greatest value (determined by calling +.compareTo() on the value of precombine key) to break ties and simply picks the latest record while merging. This gives +latest-write-wins style semantics. + + EventTimeAvroPayload + +Some use cases require merging records by event time and thus event time plays the role of an ordering field. This +payload is particularly useful in the case of late-arriving data. For such use cases, users need to set +the [payload event time field](/docs/configurations#RECORD_PAYLOAD) configuration. + + ExpressionPayload + +This payload is very useful when you want to merge or delete records based on some conditional expression, especially +when updating records using [`MERGE INTO`](/docs/quick-start-guide#mergeinto) statement. + + Payload to support partial update + +Typically, once the merge step resolves which record to pick, then the record on storage is fully replaced by the +resolved record. But, in some cases, the requirement is to update only certain fields and not replace the whole record. +This is called partial update. +`PartialUpdateAvroPayload` in Hudi provides out-box-support for such use cases. To illustrate the point, let us look at +a simple example: + +Let's say the order field is `ts` and schema is : + +``` +{ + [ +{"name":"id","type":"string"}, +{"name":"ts","type":"long"}, +{"name":"name","type":"string"}, +{"name":"price","type":"string"} + ] +} +``` + +Current record in storage: + +``` +id ts nameprice +1 2 name_1 null +``` + +Incoming record: + +``` +id ts nameprice +1 1 nullprice_1 +``` + +Result data after merging using `PartialUpdateAvroPayload`: + +``` +id ts nameprice +1 2 name_1 price_1 Review Comment: `ts` is the ordering field so the record with higher value is picked. Null value for `name` column in incoming record indeed gets replaced by value in the existing record. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7942: [HUDI-5753] Add docs for record payload
codope commented on code in PR #7942: URL: https://github.com/apache/hudi/pull/7942#discussion_r1106781029 ## website/docs/record_payload.md: ## @@ -0,0 +1,97 @@ +--- +title: Record Payload +keywords: [hudi, merge, upsert, precombine] +--- + +## Record Payload + +One of the core features of Hudi is the ability to incrementally upsert data, deduplicate and merge records on the fly. +Additionally, users can implement their custom logic to merge the input records with the record on storage. Record +payload is an abstract representation of a Hudi record that allows the aforementioned capability. As we shall see below, +Hudi provides out-of-box support for different payloads for different use cases, and a new record merger API for +optimized payload handling. But, first let us understand how record payload is used in the Hudi upsert path. + + + + + +Figure above shows the main stages that records go through while being written to the Hudi table. In the precombining +stage, Hudi performs any deduplication based on the payload implementation and precombine key configured by the user. +Further, on index lookup, Hudi identifies which records are being updated and the record payload implementation tells +Hudi how to merge the incoming record with the existing record on storage. + +### Existing Payloads + + OverwriteWithLatestAvroPayload + +This is the default record payload implementation. It picks the record with the greatest value (determined by calling +.compareTo() on the value of precombine key) to break ties and simply picks the latest record while merging. This gives +latest-write-wins style semantics. + + EventTimeAvroPayload + +Some use cases require merging records by event time and thus event time plays the role of an ordering field. This +payload is particularly useful in the case of late-arriving data. For such use cases, users need to set +the [payload event time field](/docs/configurations#RECORD_PAYLOAD) configuration. + + ExpressionPayload + +This payload is very useful when you want to merge or delete records based on some conditional expression, especially Review Comment: Didn't know that this is meant to be used internally. Is there a guard like that on payload class config? cc @alexeykudinkin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7942: [HUDI-5753] Add docs for record payload
codope commented on code in PR #7942: URL: https://github.com/apache/hudi/pull/7942#discussion_r1106781711 ## website/docs/record_payload.md: ## @@ -0,0 +1,97 @@ +--- +title: Record Payload +keywords: [hudi, merge, upsert, precombine] +--- + +## Record Payload + +One of the core features of Hudi is the ability to incrementally upsert data, deduplicate and merge records on the fly. +Additionally, users can implement their custom logic to merge the input records with the record on storage. Record +payload is an abstract representation of a Hudi record that allows the aforementioned capability. As we shall see below, +Hudi provides out-of-box support for different payloads for different use cases, and a new record merger API for +optimized payload handling. But, first let us understand how record payload is used in the Hudi upsert path. + + + + + +Figure above shows the main stages that records go through while being written to the Hudi table. In the precombining +stage, Hudi performs any deduplication based on the payload implementation and precombine key configured by the user. +Further, on index lookup, Hudi identifies which records are being updated and the record payload implementation tells +Hudi how to merge the incoming record with the existing record on storage. + +### Existing Payloads + + OverwriteWithLatestAvroPayload + +This is the default record payload implementation. It picks the record with the greatest value (determined by calling +.compareTo() on the value of precombine key) to break ties and simply picks the latest record while merging. This gives Review Comment: That's true. But, I wanted to keep things simple for the user as it is a concepts doc. Towards the end, I have pointed to the FAQ which has more details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] clp007 opened a new issue, #7960: [SUPPORT]
clp007 opened a new issue, #7960: URL: https://github.com/apache/hudi/issues/7960 **Describe the problem you faced** There is a problem when synchronizing the hudi table to bigquery. I'm not sure what the problem is and how to solve it; spark-submit --master yarn \ --packages com.google.cloud:google-cloud-bigquery:2.10.4 \ --jars /opt/hudi-gcp-bundle-0.12.1.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ /opt/hudi-utilities-bundle_2.12-0.12.1.jar \ --target-base-path gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \ --target-table bubble-pop-b01a0 \ --table-type COPY_ON_WRITE \ --base-file-format PARQUET \ --enable-sync \ --sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool \ --hoodie-conf hoodie.deltastreamer.source.dfs.root=gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \ --hoodie-conf hoodie.gcp.bigquery.sync.project_id=transferred \ --hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=temp_data \ --hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=us-central1 \ --hoodie-conf hoodie.gcp.bigquery.sync.table_name=temp_bubble-pop \ --hoodie-conf hoodie.gcp.bigquery.sync.base_path=gs://transfer-table-data/tmp/temp_bubble-pop/${NOW} \ --hoodie-conf hoodie.gcp.bigquery.sync.partition_fields=event_date \ --hoodie-conf hoodie.gcp.bigquery.sync.source_uri=gs://transfer-table-data/incremental/test/bubble-pop-b01a0/event_date=* \ --hoodie-conf hoodie.gcp.bigquery.sync.source_uri_prefix=gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \ --hoodie-conf hoodie.gcp.bigquery.sync.use_file_listing_from_metadata=true \ --hoodie-conf hoodie.gcp.bigquery.sync.assume_date_partitioning=false \ --hoodie-conf hoodie.datasource.write.recordkey.field=event_timestamp,event_name,user_pseudo_id,user_first_touch_timestamp,advertising_id \ --hoodie-conf hoodie.datasource.write.partitionpath.field=event_date \ --hoodie-conf hoodie.datasource.write.precombine.field=event_timestamp \ --hoodie-conf hoodie.datasource.write.keygenerator.type=COMPLEX \ --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \ --hoodie-conf hoodie.datasource.write.drop.partition.columns=true \ --hoodie-conf hoodie.partition.metafile.use.base.format=true \ --hoodie-conf hoodie.metadata.enable=true \ **To Reproduce** Steps to reproduce the behavior: 1. An error occurred when I ran the above script **Environment Description** * Hudi version : hudi-spark3.2-bundle_2.12:0.12.1 * Spark version :3.1 * Storage (HDFS/S3/GCS..) :GCS * Running on Docker? (yes/no) :no **Additional context** dataproc spark **Stacktrace** ```Add the stacktrace of the error.``` ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Got error ru nning delta sync once. Shutting down org.apache.hudi.exception.HoodieException: Please provide a valid schema provider class! at org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56) at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S ourceFormatAdapter.java:64) at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468) at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS treamer.java:204) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j ava:202) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j ava:571) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark Submit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
[GitHub] [hudi] hudi-bot commented on pull request #7958: [HUDI-5799] Fix Spark partition validation in TestBulkInsertInternalPartitionerForRows
hudi-bot commented on PR #7958: URL: https://github.com/apache/hudi/pull/7958#issuecomment-1430939533 ## CI report: * 36c706d1bb1a8f793ce874c9316aaf829aecd594 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15199) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df
hudi-bot commented on PR #7951: URL: https://github.com/apache/hudi/pull/7951#issuecomment-1430939405 ## CI report: * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7959: [HUDI-5800] Fix test failure in TestHoodieMergeOnReadTable
hudi-bot commented on PR #7959: URL: https://github.com/apache/hudi/pull/7959#issuecomment-1430939617 ## CI report: * 480d3a4b17476126e248eddea06713024fae0f2b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15200) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5800) Fix test failure in TestHoodieMergeOnReadTable
[ https://issues.apache.org/jira/browse/HUDI-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5800: - Labels: pull-request-available (was: ) > Fix test failure in TestHoodieMergeOnReadTable > -- > > Key: HUDI-5800 > URL: https://issues.apache.org/jira/browse/HUDI-5800 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > > The Jira fixes test failure in TestHoodieMergeOnReadTable.testReleaseResource > {code:java} > TestHoodieMergeOnReadTable.testReleaseResource:710 expected: <14> but was: > <3> {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xushiyan commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all
xushiyan commented on code in PR #7914: URL: https://github.com/apache/hudi/pull/7914#discussion_r1106832148 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java: ## @@ -246,6 +246,7 @@ protected HoodieWriteMetadata> executeClustering(HoodieC .performClustering(clusteringPlan, schema, instantTime); HoodieData writeStatusList = writeMetadata.getWriteStatuses(); HoodieData statuses = updateIndex(writeStatusList, writeMetadata); +context.putCachedDataIds(config.getBasePath(), instantTime, statuses.getId()); Review Comment: i wasn't happy with tracing every persisting call and thought about this approach but also wanted to keep the impacting scope narrow. A change in all persist() call may lead to unexpected side effects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all
xushiyan commented on code in PR #7914: URL: https://github.com/apache/hudi/pull/7914#discussion_r1106835666 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestSparkRDDWriteClient.java: ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.client; + +import org.apache.hudi.common.config.HoodieMetadataConfig; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieTableType; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.testutils.SparkClientFunctionalTestHarness; + +import org.apache.avro.generic.GenericRecord; +import org.apache.spark.api.java.JavaRDD; +import org.apache.spark.storage.StorageLevel; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +import java.io.IOException; +import java.net.URI; +import java.util.Collections; +import java.util.List; +import java.util.Properties; +import java.util.stream.Collectors; +import java.util.stream.Stream; + +import static org.apache.hudi.common.testutils.HoodieTestDataGenerator.getCommitTimeAtUTC; +import static org.apache.hudi.testutils.Assertions.assertNoWriteErrors; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class TestSparkRDDWriteClient extends SparkClientFunctionalTestHarness { + + static Stream testWriteClientReleaseResourcesShouldOnlyUnpersistRelevantRdds() { +return Stream.of( +Arguments.of(HoodieTableType.COPY_ON_WRITE, true), +Arguments.of(HoodieTableType.MERGE_ON_READ, true), +Arguments.of(HoodieTableType.COPY_ON_WRITE, false), +Arguments.of(HoodieTableType.MERGE_ON_READ, false) +); + } + + @ParameterizedTest + @MethodSource + void testWriteClientReleaseResourcesShouldOnlyUnpersistRelevantRdds(HoodieTableType tableType, boolean shouldReleaseResource) throws IOException { +final HoodieTableMetaClient metaClient = getHoodieMetaClient(hadoopConf(), URI.create(basePath()).getPath(), tableType, new Properties()); +final HoodieWriteConfig writeConfig = getConfigBuilder(true) +.withPath(metaClient.getBasePathV2().toString()) +.withAutoCommit(false) +.withReleaseResourceEnabled(shouldReleaseResource) + .withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build()) +.build(); +HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator(0xDEED); + +String instant0 = getCommitTimeAtUTC(0); +List extraRecords0 = dataGen.generateGenericRecords(10); +JavaRDD persistedRdd0 = jsc().parallelize(extraRecords0, 2).persist(StorageLevel.MEMORY_AND_DISK()); +context().putCachedDataIds(writeConfig.getBasePath(), instant0, persistedRdd0.id()); + +String instant1 = getCommitTimeAtUTC(1); +List extraRecords1 = dataGen.generateGenericRecords(10); +JavaRDD persistedRdd1 = jsc().parallelize(extraRecords1, 2).persist(StorageLevel.MEMORY_AND_DISK()); +context().putCachedDataIds(writeConfig.getBasePath(), instant1, persistedRdd1.id()); + +SparkRDDWriteClient writeClient = getHoodieWriteClient(writeConfig); +List records = dataGen.generateInserts(instant1, 10); +JavaRDD writeRecords = jsc().parallelize(records, 2); +writeClient.startCommitWithTime(instant1); +List writeStatuses = writeClient.insert(writeRecords, instant1).collect(); +assertNoWriteErrors(writeStatuses); +writeClient.commitStats(instant1, writeStatuses.stream().map(WriteStatus::getStat).collect(Collectors.toList()), +Option.empty(), metaClient.getCommitActionType()); +writeClient.close(); + +if (shouldReleaseResource) { + assertEquals(Collections.singletonList(persistedRdd0.id()), + context().getCachedDataIds(writeConfig.getBasePath(), instant0), + "RDDs cached for " + in
[jira] [Created] (HUDI-5801) Speed metaTable initializeFileGroups
loukey_j created HUDI-5801: -- Summary: Speed metaTable initializeFileGroups Key: HUDI-5801 URL: https://issues.apache.org/jira/browse/HUDI-5801 Project: Apache Hudi Issue Type: Improvement Reporter: loukey_j org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups Too slow when there are many filegroups -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] lokeshj1703 opened a new pull request, #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
lokeshj1703 opened a new pull request, #7961: URL: https://github.com/apache/hudi/pull/7961 ### Change Logs Modify DefaultHoodieRecordPayload to be able to handle a configured delete key and marker ### Impact NA ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5802) Allow configuration for deletes in DefaultHoodieRecordPayload
[ https://issues.apache.org/jira/browse/HUDI-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5802: - Labels: pull-request-available (was: ) > Allow configuration for deletes in DefaultHoodieRecordPayload > - > > Key: HUDI-5802 > URL: https://issues.apache.org/jira/browse/HUDI-5802 > Project: Apache Hudi > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > > Modify DefaultHoodieRecordPayload to be able to handle a configured delete > key and marker -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5802) Allow configuration for deletes in DefaultHoodieRecordPayload
Lokesh Jain created HUDI-5802: - Summary: Allow configuration for deletes in DefaultHoodieRecordPayload Key: HUDI-5802 URL: https://issues.apache.org/jira/browse/HUDI-5802 Project: Apache Hudi Issue Type: Bug Reporter: Lokesh Jain Assignee: Lokesh Jain Modify DefaultHoodieRecordPayload to be able to handle a configured delete key and marker -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
lokeshj1703 commented on code in PR #7961: URL: https://github.com/apache/hudi/pull/7961#discussion_r1106842865 ## hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java: ## @@ -71,18 +73,38 @@ public Option combineAndGetUpdateValue(IndexedRecord currentValue /* * Now check if the incoming record is a delete record. */ -return Option.of(incomingRecord); +return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord); } @Override public Option getInsertValue(Schema schema, Properties properties) throws IOException { -if (recordBytes.length == 0 || isDeletedRecord) { +if (recordBytes.length == 0) { return Option.empty(); } GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, schema); eventTime = updateEventTime(incomingRecord, properties); -return Option.of(incomingRecord); +return isDeleteRecord(incomingRecord, properties) ? Option.empty() : Option.of(incomingRecord); + } + + /** + * @param genericRecord instance of {@link GenericRecord} of interest. + * @param properties payload related properties + * @returns {@code true} if record represents a delete record. {@code false} otherwise. + */ + protected boolean isDeleteRecord(GenericRecord genericRecord, Properties properties) { +final String deleteKey = properties.getProperty(DELETE_KEY); +if (deleteKey == null) { + return super.isDeleteRecord(genericRecord); Review Comment: If `DELETE_MARKER` property is not set, should we throw an exception here or fall back to default? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7929: [HUDI-5754] Add new sources to deltastreamer docs
codope commented on code in PR #7929: URL: https://github.com/apache/hudi/pull/7929#discussion_r1106853958 ## website/docs/hoodie_deltastreamer.md: ## @@ -340,6 +388,26 @@ to trigger/processing of new or changed data as soon as it is available on S3. Insert code sample from this blog: https://hudi.apache.org/blog/2021/08/23/s3-events-source/#configuration-and-setup +### GCS Events +Google Cloud Storage (GCS) service provides an event notification mechanism which will post notifications when certain +events happen in your GCS bucket. You can read more at [Pub/Sub Notifications](https://cloud.google.com/storage/docs/pubsub-notifications/). +GCS will put these events in a Cloud Pub/Sub topic. Apache Hudi provides a GcsEventsSource that can read from Cloud Pub/Sub +to trigger/processing of new or changed data as soon as it is available on GCS. + + Setup +A detailed guide on [How to use the system](https://docs.google.com/document/d/1VfvtdvhXw6oEHPgZ_4Be2rkPxIzE0kBCNUiVDsXnSAA/edit#heading=h.tpmqk5oj0crt) is available. Review Comment: I think we need not put the whole document. We typically assume that users know how to enable event notifications. What we can add here is the two spark-submit command samples for the two sources. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all
xushiyan commented on code in PR #7914: URL: https://github.com/apache/hudi/pull/7914#discussion_r1106832148 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java: ## @@ -246,6 +246,7 @@ protected HoodieWriteMetadata> executeClustering(HoodieC .performClustering(clusteringPlan, schema, instantTime); HoodieData writeStatusList = writeMetadata.getWriteStatuses(); HoodieData statuses = updateIndex(writeStatusList, writeMetadata); +context.putCachedDataIds(config.getBasePath(), instantTime, statuses.getId()); Review Comment: i wasn't happy with tracing every persisting call and thought about this approach but also wanted to keep the impacting scope narrow. A change in all persist() call may lead to unexpected side effects. Also looks a bit weird to have a HoodieData to know about any HoodieEngineContext. Having HoodieEngineContext tracing all HoodieData from it's born and auto-cache its id makes more sense but it's a much bigger change wrt this PR's intention -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] loukey-lj opened a new pull request, #7962: [HUDI-5801] Speed metaTable initializeFileGroups
loukey-lj opened a new pull request, #7962: URL: https://github.com/apache/hudi/pull/7962 ### Change Logs org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups Too slow when there are many filegroups ### Impact NA ### Risk level (write none, low medium or high below) NA ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5801) Speed metaTable initializeFileGroups
[ https://issues.apache.org/jira/browse/HUDI-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5801: - Labels: pull-request-available (was: ) > Speed metaTable initializeFileGroups > > > Key: HUDI-5801 > URL: https://issues.apache.org/jira/browse/HUDI-5801 > Project: Apache Hudi > Issue Type: Improvement >Reporter: loukey_j >Priority: Major > Labels: pull-request-available > > org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups > Too slow when there are many filegroups > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1431012374 ## CI report: * f694a549ea265813f05767d69269fda2bb1ef279 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15161) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15188) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
hudi-bot commented on PR #7961: URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431022184 ## CI report: * 29189395a4d407c331c89c11b1e70e989d704b20 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7962: [HUDI-5801] Speed metaTable initializeFileGroups
hudi-bot commented on PR #7962: URL: https://github.com/apache/hudi/pull/7962#issuecomment-1431022243 ## CI report: * bd715641ef0532c50771d1ae02fdeb5f39e6a52c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1431031174 ## CI report: * b3e842754a302dc1372b330a8c32298d49732107 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14831) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14867) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15137) * c758e27d4d99c5e88b1ab7fe77fb89131aebce4d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7962: [HUDI-5801] Speed metaTable initializeFileGroups
hudi-bot commented on PR #7962: URL: https://github.com/apache/hudi/pull/7962#issuecomment-1431033376 ## CI report: * bd715641ef0532c50771d1ae02fdeb5f39e6a52c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15202) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
hudi-bot commented on PR #7961: URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431033302 ## CI report: * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1431042082 ## CI report: * b3e842754a302dc1372b330a8c32298d49732107 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14831) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14867) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15137) * c758e27d4d99c5e88b1ab7fe77fb89131aebce4d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15203) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7955: [HUDI-5649] Unify all the loggers to slf4j
hudi-bot commented on PR #7955: URL: https://github.com/apache/hudi/pull/7955#issuecomment-1431043789 ## CI report: * 8c05730d6eddec29b98d421b2edc95ae616dc29d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15193) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5803) Support Aliyun DFS Storage
Ran Tao created HUDI-5803: - Summary: Support Aliyun DFS Storage Key: HUDI-5803 URL: https://issues.apache.org/jira/browse/HUDI-5803 Project: Apache Hudi Issue Type: Bug Reporter: Ran Tao add support for Alibaba cloud dfs storage -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5803) Support Aliyun DFS Storage
[ https://issues.apache.org/jira/browse/HUDI-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688995#comment-17688995 ] Ran Tao commented on HUDI-5803: --- [~yanghua] hi. yang. what do u think? can u assign this ticket to me? > Support Aliyun DFS Storage > -- > > Key: HUDI-5803 > URL: https://issues.apache.org/jira/browse/HUDI-5803 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ran Tao >Priority: Major > > add support for Alibaba cloud dfs storage -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5804) hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation
Pramod Biligiri created HUDI-5804: - Summary: hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation Key: HUDI-5804 URL: https://issues.apache.org/jira/browse/HUDI-5804 Project: Apache Hudi Issue Type: Bug Components: cli Reporter: Pramod Biligiri In multiple places in the CommitsCommand, the ShellOption is missing the "–" parameter in its value attribute. One such example is shown below from "commit showpartitions": [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213] |@ShellOption(value = {"includeArchivedTimeline"}, help = "Include archived commits as well", defaultValue = "false") final boolean includeArchivedTimeline)| That should read value=\{"--includeArchivedTimeline"} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5804) hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation
[ https://issues.apache.org/jira/browse/HUDI-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pramod Biligiri updated HUDI-5804: -- Description: In multiple places in the CommitsCommand, the ShellOption is missing the "–" parameter in its value attribute. One such example is shown below from "commit showpartitions": [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213] |@ShellOption(value = \{"includeArchivedTimeline"}, help = "Include archived commits as well", defaultValue = "false") final boolean includeArchivedTimeline)| In the above, it should read 'value=\{"--includeArchivedTimeline"...}' was: In multiple places in the CommitsCommand, the ShellOption is missing the "–" parameter in its value attribute. One such example is shown below from "commit showpartitions": [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213] |@ShellOption(value = {"includeArchivedTimeline"}, help = "Include archived commits as well", defaultValue = "false") final boolean includeArchivedTimeline)| That should read value=\{"--includeArchivedTimeline"} > hudi-cli CommitsCommand - some options fail due to typo in ShellOption > annotation > - > > Key: HUDI-5804 > URL: https://issues.apache.org/jira/browse/HUDI-5804 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Reporter: Pramod Biligiri >Priority: Minor > > In multiple places in the CommitsCommand, the ShellOption is missing the "–" > parameter in its value attribute. One such example is shown below from > "commit showpartitions": > [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213] > |@ShellOption(value = \{"includeArchivedTimeline"}, help = "Include archived > commits as well", defaultValue = "false") final boolean > includeArchivedTimeline)| > In the above, it should read 'value=\{"--includeArchivedTimeline"...}' > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
hudi-bot commented on PR #7961: URL: https://github.com/apache/hudi/pull/7961#issuecomment-143782 ## CI report: * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201) * c8ac28edb845302c9f3afbc980b03782c0605564 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
hudi-bot commented on PR #7961: URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431125334 ## CI report: * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201) * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pramodbiligiri opened a new pull request, #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake
pramodbiligiri opened a new pull request, #7963: URL: https://github.com/apache/hudi/pull/7963 https://issues.apache.org/jira/browse/HUDI-5804 ### Change Logs Fix typo in use of ShellOption annotation in CommitsCommand class. There were a few places where the "--" prefix was missing. ### Impact Makes the following CLI flags actually available to be used by the user. Currently these were in the code but there was no way to invoke it: 1. commit showpartitions --includeArchivedTimeline 2. commit show_write_stats --includeArchivedTimeline 3. commit showfiles --includeArchivedTimeline ### Risk level (write none, low medium or high below) Low. Exposes existing functionality that was getting suppressed by mistake. ### Documentation Update The shell option will show up automatically in the hudi-cli help. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5804) hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation
[ https://issues.apache.org/jira/browse/HUDI-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5804: - Labels: pull-request-available (was: ) > hudi-cli CommitsCommand - some options fail due to typo in ShellOption > annotation > - > > Key: HUDI-5804 > URL: https://issues.apache.org/jira/browse/HUDI-5804 > Project: Apache Hudi > Issue Type: Bug > Components: cli >Reporter: Pramod Biligiri >Priority: Minor > Labels: pull-request-available > > In multiple places in the CommitsCommand, the ShellOption is missing the "–" > parameter in its value attribute. One such example is shown below from > "commit showpartitions": > [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213] > |@ShellOption(value = \{"includeArchivedTimeline"}, help = "Include archived > commits as well", defaultValue = "false") final boolean > includeArchivedTimeline)| > In the above, it should read 'value=\{"--includeArchivedTimeline"...}' > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake
hudi-bot commented on PR #7963: URL: https://github.com/apache/hudi/pull/7963#issuecomment-1431144608 ## CI report: * f5b811ac7f5fd8278fb1fcb27ce1e17ff05a9750 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lokeshj1703 closed pull request #7878: Dep tree diff 0.12.2 and 0.13.0
lokeshj1703 closed pull request #7878: Dep tree diff 0.12.2 and 0.13.0 URL: https://github.com/apache/hudi/pull/7878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7952: [MINOR] Fix format name and remove redundant line in examples
hudi-bot commented on PR #7952: URL: https://github.com/apache/hudi/pull/7952#issuecomment-1431226426 ## CI report: * 7b1012695ef498cd5ffadd4e87c58709e782a479 UNKNOWN * 8e06386f41311e3780846c5dcb4593e0ed863d3e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15190) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake
hudi-bot commented on PR #7963: URL: https://github.com/apache/hudi/pull/7963#issuecomment-1431226578 ## CI report: * f5b811ac7f5fd8278fb1fcb27ce1e17ff05a9750 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15205) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7933: [HUDI-5774] Fix prometheus configs for metadata table and support metric labels
hudi-bot commented on PR #7933: URL: https://github.com/apache/hudi/pull/7933#issuecomment-1431234245 ## CI report: * a02b393674ed4ae07d1eed67560f126ac06e178c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15158) * 638327ab8184a7b40d379bd8591f9e67f7fe70f7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15196) * 6f36de2e745401f48980bbe71513c40efaa83ac5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7933: [HUDI-5774] Fix prometheus configs for metadata table and support metric labels
hudi-bot commented on PR #7933: URL: https://github.com/apache/hudi/pull/7933#issuecomment-1431242582 ## CI report: * 638327ab8184a7b40d379bd8591f9e67f7fe70f7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15196) * 6f36de2e745401f48980bbe71513c40efaa83ac5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15206) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7940: [HUDI-5787] HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is fal
hudi-bot commented on PR #7940: URL: https://github.com/apache/hudi/pull/7940#issuecomment-1431242713 ## CI report: * 249ffe369a49308e2a65a0ac58389efd5b49d1ad Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15191) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #7940: [HUDI-5787] HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false
danny0405 merged PR #7940: URL: https://github.com/apache/hudi/pull/7940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (af61dea6f98 -> 25f6927b47d)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from af61dea6f98 [MINOR] Enable Azure CI to publish test results (#7943) add 25f6927b47d [HUDI-5787] HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false (#7940) No new revisions were added by this update. Summary of changes: .../hudi/table/catalog/TestHoodieHiveCatalog.java | 18 ++ .../org/apache/hudi/hive/ddl/HMSDDLExecutor.java | 2 +- .../org/apache/hudi/hive/TestHiveSyncTool.java | 28 ++ 3 files changed, 47 insertions(+), 1 deletion(-)
[jira] [Closed] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false
[ https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-5787. Fix Version/s: 0.14.0 Resolution: Fixed Fixed via master branch: 25f6927b47d5cfa6baad95e09fb88ad7ce2a1402 > HMSDDLExecutor should set table type to EXTERNAL_TABLE when > hoodie.datasource.hive_sync.create_managed_table of sync config is false > > > Key: HUDI-5787 > URL: https://issues.apache.org/jira/browse/HUDI-5787 > Project: Apache Hudi > Issue Type: Improvement > Components: flink >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.13.1, 0.14.0 > > > HMSDDLExecutor should set the table type of Hive table to EXTERNAL_TABLE when > hoodie.datasource.hive_sync.create_managed_table of sync config is set to > false. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] li36909 closed pull request #7957: [HUDI-5798] fix spark sql query error on mor table after flink cdc delete records
li36909 closed pull request #7957: [HUDI-5798] fix spark sql query error on mor table after flink cdc delete records URL: https://github.com/apache/hudi/pull/7957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7956: [HUDI-5797] fix use bulk insert error as row
hudi-bot commented on PR #7956: URL: https://github.com/apache/hudi/pull/7956#issuecomment-1431296270 ## CI report: * 5bd4d5c4de8fc54bf93fb7fd252b6e61fda85373 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15194) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5805) hive query on mor get empty result before compaction
lrz created HUDI-5805: - Summary: hive query on mor get empty result before compaction Key: HUDI-5805 URL: https://issues.apache.org/jira/browse/HUDI-5805 Project: Apache Hudi Issue Type: Bug Reporter: lrz Attachments: image-2023-02-15-20-48-08-819.png, image-2023-02-15-20-48-21-988.png when a mor table write data with flink cdc only, then before compaction the partition will only have log file, and no base file. then befor compaction, hive query result will always be empty. it's because when hive getSplit on a native table, hive will ignore a partition which only has files start with '.', and because hudi has not set storageHandle when sync hive meta, then hive treat it as native table !image-2023-02-15-20-48-08-819.png! !image-2023-02-15-20-48-21-988.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] li36909 opened a new pull request, #7964: [HUDI-5805] hive query on mor get empty result before compaction
li36909 opened a new pull request, #7964: URL: https://github.com/apache/hudi/pull/7964 Change Logs when a mor table write data with flink cdc only, then before compaction the partition will only have log file, and no base file. then befor compaction, hive query result will always be empty. it's because when hive getSplit on a native table, hive will ignore a partition which only has files start with '.', and because hudi has not set storageHandle when sync hive meta, then hive treat it as native table. Impact make storageHandle as DefaultStorageHandler when sync hive meta Risk level (write none, low medium or high below) none Documentation Update none _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5805) hive query on mor get empty result before compaction
[ https://issues.apache.org/jira/browse/HUDI-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5805: - Labels: pull-request-available (was: ) > hive query on mor get empty result before compaction > > > Key: HUDI-5805 > URL: https://issues.apache.org/jira/browse/HUDI-5805 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > Attachments: image-2023-02-15-20-48-08-819.png, > image-2023-02-15-20-48-21-988.png > > > when a mor table write data with flink cdc only, then before compaction the > partition will only have log file, and no base file. then befor compaction, > hive query result will always be empty. > it's because when hive getSplit on a native table, hive will ignore a > partition which only has files start with '.', and because hudi has not set > storageHandle when sync hive meta, then hive treat it as native table > !image-2023-02-15-20-48-08-819.png! > !image-2023-02-15-20-48-21-988.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] codope opened a new pull request, #7965: Merge query engine setup and querying data docs
codope opened a new pull request, #7965: URL: https://github.com/apache/hudi/pull/7965 ### Change Logs * Merge query engine setup docs into querying data docs. * Add ClickHouse to the list of supported query engines. * Update support matrix. ### Impact Public docs change. ### Risk level (write none, low medium or high below) low ### Documentation Update Stated as above. Pages affected: https://hudi.apache.org/docs/querying_data https://hudi.apache.org/docs/query_engine_setup ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-5798) spark-sql query fail on mor table after flink cdc application delete records
[ https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689110#comment-17689110 ] lrz commented on HUDI-5798: --- I fix this issue by add a special avro shade jar at spark/jars, and it seems not good to introduce into hudi project > spark-sql query fail on mor table after flink cdc application delete records > > > Key: HUDI-5798 > URL: https://issues.apache.org/jira/browse/HUDI-5798 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > > after flink cdc application delete records for a mor table, spark sql will > query fail on the table with below exception: > > Serialization trace: > orderingVal (org.apache.hudi.common.model.DeleteRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) > ... 23 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.avro.util.Utf8 > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc application delete records
[ https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-5798: -- Summary: spark sql query fail on mor table after flink cdc application delete records (was: spark-sql query fail on mor table after flink cdc application delete records) > spark sql query fail on mor table after flink cdc application delete records > > > Key: HUDI-5798 > URL: https://issues.apache.org/jira/browse/HUDI-5798 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > > after flink cdc application delete records for a mor table, spark sql will > query fail on the table with below exception: > > Serialization trace: > orderingVal (org.apache.hudi.common.model.DeleteRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) > ... 23 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.avro.util.Utf8 > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc delete records
[ https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lrz updated HUDI-5798: -- Summary: spark sql query fail on mor table after flink cdc delete records (was: spark sql query fail on mor table after flink cdc application delete records) > spark sql query fail on mor table after flink cdc delete records > > > Key: HUDI-5798 > URL: https://issues.apache.org/jira/browse/HUDI-5798 > Project: Apache Hudi > Issue Type: Bug >Reporter: lrz >Priority: Major > Labels: pull-request-available > > after flink cdc application delete records for a mor table, spark sql will > query fail on the table with below exception: > > Serialization trace: > orderingVal (org.apache.hudi.common.model.DeleteRecord) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693) > at > com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391) > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813) > at > org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104) > at > org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106) > at > org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473) > at > org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343) > ... 23 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hudi.org.apache.avro.util.Utf8 > at java.net.URLClassLoader.findClass(URLClassLoader.java:387) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154) > ... 37 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7964: [HUDI-5805] hive query on mor get empty result before compaction
hudi-bot commented on PR #7964: URL: https://github.com/apache/hudi/pull/7964#issuecomment-1431380448 ## CI report: * 6aed8cffab1f915790180de9b49188b0077e0e6a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7894: [HUDI-5729] Fix RowDataKeyGen method getRecordKey
hudi-bot commented on PR #7894: URL: https://github.com/apache/hudi/pull/7894#issuecomment-1431379719 ## CI report: * ddc28f53801f2e11401738d1c6acb74eec9c8fab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15195) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7964: [HUDI-5805] hive query on mor get empty result before compaction
hudi-bot commented on PR #7964: URL: https://github.com/apache/hudi/pull/7964#issuecomment-1431391906 ## CI report: * 6aed8cffab1f915790180de9b49188b0077e0e6a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15208) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5806) hudi-cli should have option to show nearest matching commit
Pramod Biligiri created HUDI-5806: - Summary: hudi-cli should have option to show nearest matching commit Key: HUDI-5806 URL: https://issues.apache.org/jira/browse/HUDI-5806 Project: Apache Hudi Issue Type: Improvement Components: cli Reporter: Pramod Biligiri When searching for a commit timestamp in hudi cli, there should be an option to display the nearest matching commits if no exact match is found. This will help in production support use cases to quickly know what was the recent commit activity in the period in which the user is interested in. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] pramodbiligiri opened a new pull request, #7966: [HUDI-5806] {W-I-P] hudi-cli option to find nearest matching commits
pramodbiligiri opened a new pull request, #7966: URL: https://github.com/apache/hudi/pull/7966 https://issues.apache.org/jira/browse/HUDI-5806 ### Change Logs Add a --nearestMatch boolean flag to "commit showfiles --commit COMMIT_INSTANT" to display nearest matching commit if no exact match found. ### Impact TODO _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) TODO _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update TODO _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist TODO - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5806) hudi-cli should have option to show nearest matching commit
[ https://issues.apache.org/jira/browse/HUDI-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5806: - Labels: pull-request-available (was: ) > hudi-cli should have option to show nearest matching commit > --- > > Key: HUDI-5806 > URL: https://issues.apache.org/jira/browse/HUDI-5806 > Project: Apache Hudi > Issue Type: Improvement > Components: cli >Reporter: Pramod Biligiri >Priority: Major > Labels: pull-request-available > > When searching for a commit timestamp in hudi cli, there should be an option > to display the nearest matching commits if no exact match is found. This will > help in production support use cases to quickly know what was the recent > commit activity in the period in which the user is interested in. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level
hudi-bot commented on PR #7941: URL: https://github.com/apache/hudi/pull/7941#issuecomment-1431464017 ## CI report: * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7966: [HUDI-5806] {W-I-P] hudi-cli option to find nearest matching commits
hudi-bot commented on PR #7966: URL: https://github.com/apache/hudi/pull/7966#issuecomment-1431477124 ## CI report: * 99ad822ddf41e3de76e1fd716756ef02396ad804 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7966: [HUDI-5806] {W-I-P] hudi-cli option to find nearest matching commits
hudi-bot commented on PR #7966: URL: https://github.com/apache/hudi/pull/7966#issuecomment-1431489630 ## CI report: * 99ad822ddf41e3de76e1fd716756ef02396ad804 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15209) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope opened a new pull request, #7967: [DOCS] Update metadata indexing doc
codope opened a new pull request, #7967: URL: https://github.com/apache/hudi/pull/7967 ### Change Logs Update metadata indexing docs. ### Impact Docs update. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7958: [HUDI-5799] Fix Spark partition validation in TestBulkInsertInternalPartitionerForRows
hudi-bot commented on PR #7958: URL: https://github.com/apache/hudi/pull/7958#issuecomment-1431561986 ## CI report: * 36c706d1bb1a8f793ce874c9316aaf829aecd594 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15199) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope opened a new pull request, #7968: [DOCS] Update hive metastore sync docs
codope opened a new pull request, #7968: URL: https://github.com/apache/hudi/pull/7968 ### Change Logs - Added a brief intro about Hive metastore. - Removed deprecated config. - Added default values and better explanation for rest of the configs. ### Impact Docs update. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wqwl611 opened a new issue, #7969: [SUPPORT] data loss in new base file.
wqwl611 opened a new issue, #7969: URL: https://github.com/apache/hudi/issues/7969 **Describe the problem you faced** I find some data loss in the new base file: [-9e95-4471-bba0-5604a282aa34-0_0-12-4_20230208003459996.parquet]. I doubt that compaction plan may miss some delta log. How can I check the archive compaction plan? https://user-images.githubusercontent.com/67826098/219074761-6150bcf1-89f5-4333-8eea-960105c07f94.png";> **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : 3.2.0 * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : hdfs * Running on Docker? (yes/no) :no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on issue #7902: [SUPPORT].UnresolvedUnionException: Not in union exception occurred when writing data through spark
jonvex commented on issue #7902: URL: https://github.com/apache/hudi/issues/7902#issuecomment-1431612529 If you take a look at the code for [UnresolvedUnionException.java](https://github.com/apache/avro/blob/f23eabb42f315b0db9135b075434b8a88680659c/lang/java/avro/src/main/java/org/apache/avro/UnresolvedUnionException.java), the ending item is 'unresolvedDatum'. In the exception you provided, that appears to be KEEP_LATEST_FILE_VERSIONS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on issue #7717: [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex dataty
jonvex commented on issue #7717: URL: https://github.com/apache/hudi/issues/7717#issuecomment-1431620989 Yes. It is not exactly the same issue. What I meant is I think the root cause is the same, and it can be solved by upgrading parquet-avro. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7959: [HUDI-5800] Fix test failure in TestHoodieMergeOnReadTable
hudi-bot commented on PR #7959: URL: https://github.com/apache/hudi/pull/7959#issuecomment-1431651570 ## CI report: * 480d3a4b17476126e248eddea06713024fae0f2b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15200) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7962: [HUDI-5801] Speed metaTable initializeFileGroups
hudi-bot commented on PR #7962: URL: https://github.com/apache/hudi/pull/7962#issuecomment-1431664630 ## CI report: * bd715641ef0532c50771d1ae02fdeb5f39e6a52c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15202) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7752: [MINOR] De-duplicating Iterator implementations
alexeykudinkin commented on code in PR #7752: URL: https://github.com/apache/hudi/pull/7752#discussion_r1107382524 ## hudi-common/src/main/java/org/apache/hudi/common/util/collection/CloseableMappingIterator.java: ## @@ -22,8 +22,8 @@ import java.util.function.Function; -// TODO java-doc -public class CloseableMappingIterator extends MappingIterator implements ClosableIterator { +public class CloseableMappingIterator extends MappingIterator Review Comment: Not sure i understand what warnings you're referring to -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7672: [HUDI-5557]Avoid converting columns that are not indexed in CSI
alexeykudinkin commented on code in PR #7672: URL: https://github.com/apache/hudi/pull/7672#discussion_r1107399443 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala: ## @@ -209,11 +209,11 @@ class ColumnStatsIndexSupport(spark: SparkSession, // NOTE: We're sorting the columns to make sure final index schema matches layout // of the transposed table val sortedTargetColumnsSet = TreeSet(queryColumns:_*) -val sortedTargetColumns = sortedTargetColumnsSet.toSeq // NOTE: This is a trick to avoid pulling all of [[ColumnStatsIndexSupport]] object into the lambdas' // closures below val indexedColumns = this.indexedColumns +val indexedTargetColumns = sortedTargetColumnsSet.filter(indexedColumns.contains(_)).toSeq Review Comment: Let's de-duplicate filtering and tie it up w/ index schema composition: - Let's make `composeIndexSchema` return (schema, targetIndexedColumns) - Let's move schema composition up here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all
alexeykudinkin commented on code in PR #7914: URL: https://github.com/apache/hudi/pull/7914#discussion_r1107408876 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java: ## @@ -246,6 +246,7 @@ protected HoodieWriteMetadata> executeClustering(HoodieC .performClustering(clusteringPlan, schema, instantTime); HoodieData writeStatusList = writeMetadata.getWriteStatuses(); HoodieData statuses = updateIndex(writeStatusList, writeMetadata); +context.putCachedDataIds(config.getBasePath(), instantTime, statuses.getId()); Review Comment: HoodieData is already tightly coupled (1:1) with HoodieEngineContext so there's nothing shady about HD API accepting HEC. Current approach doesn't really make sense as it's extremely brittle -- we can't expect that someone will be aware of needing to register the RDD whenever they persist. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #7678: [HUDI-5562] Add maven wrapper
alexeykudinkin commented on PR #7678: URL: https://github.com/apache/hudi/pull/7678#issuecomment-1431702493 CI is green: https://user-images.githubusercontent.com/428277/219101720-11800f78-d9a9-4558-b5e4-382875a55f13.png";> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=15092&view=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #7678: [HUDI-5562] Add maven wrapper
alexeykudinkin commented on PR #7678: URL: https://github.com/apache/hudi/pull/7678#issuecomment-1431704512 @wuzhenhua01 let's also update the docs to reflect that now `mvnw` should be invoked when building Hudi. Let's also update the CI scripts (both Github and Azure) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI
alexeykudinkin commented on code in PR #7885: URL: https://github.com/apache/hudi/pull/7885#discussion_r1107417607 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -19,22 +19,39 @@ package org.apache.hudi.common.util; +import com.fasterxml.jackson.databind.SerializationFeature; +import com.fasterxml.jackson.databind.module.SimpleModule; +import com.fasterxml.jackson.databind.util.StdDateFormat; import org.apache.hudi.exception.HoodieIOException; import com.fasterxml.jackson.annotation.JsonAutoDetect; import com.fasterxml.jackson.annotation.PropertyAccessor; import com.fasterxml.jackson.core.JsonProcessingException; import com.fasterxml.jackson.databind.DeserializationFeature; import com.fasterxml.jackson.databind.ObjectMapper; +import org.apache.log4j.LogManager; +import org.apache.log4j.Logger; Review Comment: We're using log4j internally -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs
hudi-bot commented on PR #7881: URL: https://github.com/apache/hudi/pull/7881#issuecomment-1431755767 ## CI report: * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN * c464e6ae5497b67c2fe3a456cff434114b1f297b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15022) * 4fd80c1f9dee94d53d213069c1ede42b1571858d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI
hudi-bot commented on PR #7885: URL: https://github.com/apache/hudi/pull/7885#issuecomment-1431755837 ## CI report: * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN * 02e61304e85a8eb02e30c12e33b044529caac064 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15178) * b909e31094ec9bb91695ab0e34a9c55f2162c192 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1431764617 ## CI report: * c758e27d4d99c5e88b1ab7fe77fb89131aebce4d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15203) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI
hudi-bot commented on PR #7885: URL: https://github.com/apache/hudi/pull/7885#issuecomment-1431765840 ## CI report: * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN * 02e61304e85a8eb02e30c12e33b044529caac064 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15178) * b909e31094ec9bb91695ab0e34a9c55f2162c192 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15210) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values
Alexey Kudinkin created HUDI-5807: - Summary: HoodieSparkParquetReader is not appending partition-path values Key: HUDI-5807 URL: https://issues.apache.org/jira/browse/HUDI-5807 Project: Apache Hudi Issue Type: Bug Components: spark Reporter: Alexey Kudinkin Fix For: 0.13.1 Current implementation of HoodieSparkParquetReader isn't supporting the case when "hoodie.datasource.write.drop.partition.columns" is set to true. In that case partition-path values are expected to be parsed from partition-path and be injected w/in the File Reader (this is behavior of Spark's own readers) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values
[ https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5807: -- Affects Version/s: 0.13.0 > HoodieSparkParquetReader is not appending partition-path values > --- > > Key: HUDI-5807 > URL: https://issues.apache.org/jira/browse/HUDI-5807 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Affects Versions: 0.13.0 >Reporter: Alexey Kudinkin >Priority: Blocker > Fix For: 0.13.1 > > > Current implementation of HoodieSparkParquetReader isn't supporting the case > when "hoodie.datasource.write.drop.partition.columns" is set to true. > In that case partition-path values are expected to be parsed from > partition-path and be injected w/in the File Reader (this is behavior of > Spark's own readers) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values
[ https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689276#comment-17689276 ] Alexey Kudinkin commented on HUDI-5807: --- We should do this by rebasing HoodieSparkFileReader onto ParquetFileFormat (to make sure we're creating readers same way as we do w/ Spark itself) {code:java} val parquetFileFormat = SparkAdapterSupport$.MODULE$.sparkAdapter() // TODO this should be based on the table config .createHoodieParquetFileFormat(true) .get(); {code} > HoodieSparkParquetReader is not appending partition-path values > --- > > Key: HUDI-5807 > URL: https://issues.apache.org/jira/browse/HUDI-5807 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Affects Versions: 0.13.0 >Reporter: Alexey Kudinkin >Priority: Blocker > Fix For: 0.13.1 > > > Current implementation of HoodieSparkParquetReader isn't supporting the case > when "hoodie.datasource.write.drop.partition.columns" is set to true. > In that case partition-path values are expected to be parsed from > partition-path and be injected w/in the File Reader (this is behavior of > Spark's own readers) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs
hudi-bot commented on PR #7881: URL: https://github.com/apache/hudi/pull/7881#issuecomment-1431777039 ## CI report: * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN * c464e6ae5497b67c2fe3a456cff434114b1f297b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15022) * 4fd80c1f9dee94d53d213069c1ede42b1571858d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15211) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bhasudha opened a new pull request, #7970: [DOCS] Change Config generator to generate subgroups
bhasudha opened a new pull request, #7970: URL: https://github.com/apache/hudi/pull/7970 Also make changes so required configs bubble up to the top of the section. ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bhasudha commented on pull request #7970: [DOCS] Change Config generator to generate subgroups
bhasudha commented on PR #7970: URL: https://github.com/apache/hudi/pull/7970#issuecomment-1431796407 ![config_generator_local_testing_screenshot](https://user-images.githubusercontent.com/2179254/219115743-ab7fdd3f-dea3-441f-8bdc-bf4113bc61d8.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload
hudi-bot commented on PR #7961: URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431832295 ## CI report: * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client
[ https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5477: - Fix Version/s: 0.12.3 > Optimize timeline loading in Hudi sync client > - > > Key: HUDI-5477 > URL: https://issues.apache.org/jira/browse/HUDI-5477 > Project: Apache Hudi > Issue Type: Improvement > Components: archiving, meta-sync >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0, 0.12.3 > > > The Hudi archived timeline is always loaded during the metastore sync process > if the last sync time is given. Besides, the archived timeline is not cached > inside the meta client if the start instant time is given. These cause > performance issues and read timeout on cloud storage due to rate limiting on > requests because of loading archived timeline from the storage, when the > archived timeline is huge, e.g., hundreds of log files in > {{.hoodie/archived}} folder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on issue #7969: [SUPPORT] data loss in new base file.
yihua commented on issue #7969: URL: https://github.com/apache/hudi/issues/7969#issuecomment-1431873843 Hi @wqwl611 thanks for raising this. Could you clarify what kind of data loss do you observe (missing records, updates not applied, missing columns, etc.)? Also, are there any failures or other table services running before the compaction happened, based on the Hudi timeline? To inspect the compaction plan, you may use [Hudi CLI compaction commands](https://hudi.apache.org/docs/cli#compactions) or directly check the requested compaction instant under `.hoodie/`(`avrocat .compaction.requested`). If the compaction commit is archived, you may only look at the archived file for now to understand the compaction plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8
kazdy commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1431876777 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #7960: [SUPPORT]
yihua commented on issue #7960: URL: https://github.com/apache/hudi/issues/7960#issuecomment-1431890161 Hi @clp007 thanks for the question. Based on the stacktrace, this issue is not related to BigQuery sync. Some arguments are missing for Deltastreamer, e.g., `--schemaprovider-class` (which causes `HoodieException: Please provide a valid schema provider class!`), `--source-class`. Please check [this](https://hudi.apache.org/docs/hoodie_deltastreamer) for Deltastreamer configs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex opened a new pull request, #7971: add support for kafka offsets
jonvex opened a new pull request, #7971: URL: https://github.com/apache/hudi/pull/7971 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on issue #7953: [SUPPORT]Code pending on writing data to S3 using Flink datastream API,and the target path is empty.
yihua commented on issue #7953: URL: https://github.com/apache/hudi/issues/7953#issuecomment-1431893619 @danny0405 could you help here on the Hudi Flink setup? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5808) Add support for kafka offsets in various sources
Jonathan Vexler created HUDI-5808: - Summary: Add support for kafka offsets in various sources Key: HUDI-5808 URL: https://issues.apache.org/jira/browse/HUDI-5808 Project: Apache Hudi Issue Type: Improvement Reporter: Jonathan Vexler Assignee: Jonathan Vexler Add support for kafka offsets in AvroKafkaSource and JsonKafkaSource -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua commented on issue #7909: Failed to create Marker file
yihua commented on issue #7909: URL: https://github.com/apache/hudi/issues/7909#issuecomment-1431901388 @koochiswathiTR this is likely caused by a concurrency bug handling marker creation requests at the timeline server, which is fixed by #6383, since 0.12.1 release. Are you able to try the new release? If the job remains on 0.11.1 release, you may set `hoodie.write.markers.type=DIRECT` to get unblocked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8
kazdy commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1431906206 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5809) Keep RFC-56 early conflict detection update to date
[ https://issues.apache.org/jira/browse/HUDI-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5809: Story Points: 0.5 > Keep RFC-56 early conflict detection update to date > --- > > Key: HUDI-5809 > URL: https://issues.apache.org/jira/browse/HUDI-5809 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.13.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5809) Keep RFC-56 early conflict detection update to date
[ https://issues.apache.org/jira/browse/HUDI-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5809: Fix Version/s: 0.13.1 > Keep RFC-56 early conflict detection update to date > --- > > Key: HUDI-5809 > URL: https://issues.apache.org/jira/browse/HUDI-5809 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Major > Fix For: 0.13.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5809) Keep RFC-56 early conflict detection update to date
Ethan Guo created HUDI-5809: --- Summary: Keep RFC-56 early conflict detection update to date Key: HUDI-5809 URL: https://issues.apache.org/jira/browse/HUDI-5809 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)