[GitHub] [hudi] codope commented on a diff in pull request #7942: [HUDI-5753] Add docs for record payload

2023-02-15 Thread via GitHub


codope commented on code in PR #7942:
URL: https://github.com/apache/hudi/pull/7942#discussion_r1106780501


##
website/docs/record_payload.md:
##
@@ -0,0 +1,97 @@
+---
+title: Record Payload 
+keywords: [hudi, merge, upsert, precombine]
+---
+
+## Record Payload
+
+One of the core features of Hudi is the ability to incrementally upsert data, 
deduplicate and merge records on the fly.
+Additionally, users can implement their custom logic to merge the input 
records with the record on storage. Record
+payload is an abstract representation of a Hudi record that allows the 
aforementioned capability. As we shall see below,
+Hudi provides out-of-box support for different payloads for different use 
cases, and a new record merger API for
+optimized payload handling. But, first let us understand how record payload is 
used in the Hudi upsert path.
+
+
+
+
+
+Figure above shows the main stages that records go through while being written 
to the Hudi table. In the precombining
+stage, Hudi performs any deduplication based on the payload implementation and 
precombine key configured by the user.
+Further, on index lookup, Hudi identifies which records are being updated and 
the record payload implementation tells
+Hudi how to merge the incoming record with the existing record on storage.
+
+### Existing Payloads
+
+ OverwriteWithLatestAvroPayload
+
+This is the default record payload implementation. It picks the record with 
the greatest value (determined by calling
+.compareTo() on the value of precombine key) to break ties and simply picks 
the latest record while merging. This gives
+latest-write-wins style semantics.
+
+ EventTimeAvroPayload
+
+Some use cases require merging records by event time and thus event time plays 
the role of an ordering field. This
+payload is particularly useful in the case of late-arriving data. For such use 
cases, users need to set
+the [payload event time field](/docs/configurations#RECORD_PAYLOAD) 
configuration.
+
+ ExpressionPayload
+
+This payload is very useful when you want to merge or delete records based on 
some conditional expression, especially
+when updating records using [`MERGE INTO`](/docs/quick-start-guide#mergeinto) 
statement.
+
+ Payload to support partial update
+
+Typically, once the merge step resolves which record to pick, then the record 
on storage is fully replaced by the
+resolved record. But, in some cases, the requirement is to update only certain 
fields and not replace the whole record.
+This is called partial update.
+`PartialUpdateAvroPayload` in Hudi provides out-box-support for such use 
cases. To illustrate the point, let us look at
+a simple example:
+
+Let's say the order field is `ts` and schema is :
+
+```
+{
+  [
+{"name":"id","type":"string"},
+{"name":"ts","type":"long"},
+{"name":"name","type":"string"},
+{"name":"price","type":"string"}
+  ]
+}
+```
+
+Current record in storage:
+
+```
+id  ts  nameprice
+1   2   name_1  null
+```
+
+Incoming record:
+
+```
+id  ts  nameprice
+1   1   nullprice_1
+```
+
+Result data after merging using `PartialUpdateAvroPayload`:
+
+```
+id  ts  nameprice
+1   2   name_1  price_1

Review Comment:
   `ts` is the ordering field so the record with higher value is picked. Null 
value for `name` column in incoming record indeed gets replaced by value in the 
existing record.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #7942: [HUDI-5753] Add docs for record payload

2023-02-15 Thread via GitHub


codope commented on code in PR #7942:
URL: https://github.com/apache/hudi/pull/7942#discussion_r1106781029


##
website/docs/record_payload.md:
##
@@ -0,0 +1,97 @@
+---
+title: Record Payload 
+keywords: [hudi, merge, upsert, precombine]
+---
+
+## Record Payload
+
+One of the core features of Hudi is the ability to incrementally upsert data, 
deduplicate and merge records on the fly.
+Additionally, users can implement their custom logic to merge the input 
records with the record on storage. Record
+payload is an abstract representation of a Hudi record that allows the 
aforementioned capability. As we shall see below,
+Hudi provides out-of-box support for different payloads for different use 
cases, and a new record merger API for
+optimized payload handling. But, first let us understand how record payload is 
used in the Hudi upsert path.
+
+
+
+
+
+Figure above shows the main stages that records go through while being written 
to the Hudi table. In the precombining
+stage, Hudi performs any deduplication based on the payload implementation and 
precombine key configured by the user.
+Further, on index lookup, Hudi identifies which records are being updated and 
the record payload implementation tells
+Hudi how to merge the incoming record with the existing record on storage.
+
+### Existing Payloads
+
+ OverwriteWithLatestAvroPayload
+
+This is the default record payload implementation. It picks the record with 
the greatest value (determined by calling
+.compareTo() on the value of precombine key) to break ties and simply picks 
the latest record while merging. This gives
+latest-write-wins style semantics.
+
+ EventTimeAvroPayload
+
+Some use cases require merging records by event time and thus event time plays 
the role of an ordering field. This
+payload is particularly useful in the case of late-arriving data. For such use 
cases, users need to set
+the [payload event time field](/docs/configurations#RECORD_PAYLOAD) 
configuration.
+
+ ExpressionPayload
+
+This payload is very useful when you want to merge or delete records based on 
some conditional expression, especially

Review Comment:
   Didn't know that this is meant to be used internally. Is there a guard like 
that on payload class config? cc @alexeykudinkin 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #7942: [HUDI-5753] Add docs for record payload

2023-02-15 Thread via GitHub


codope commented on code in PR #7942:
URL: https://github.com/apache/hudi/pull/7942#discussion_r1106781711


##
website/docs/record_payload.md:
##
@@ -0,0 +1,97 @@
+---
+title: Record Payload 
+keywords: [hudi, merge, upsert, precombine]
+---
+
+## Record Payload
+
+One of the core features of Hudi is the ability to incrementally upsert data, 
deduplicate and merge records on the fly.
+Additionally, users can implement their custom logic to merge the input 
records with the record on storage. Record
+payload is an abstract representation of a Hudi record that allows the 
aforementioned capability. As we shall see below,
+Hudi provides out-of-box support for different payloads for different use 
cases, and a new record merger API for
+optimized payload handling. But, first let us understand how record payload is 
used in the Hudi upsert path.
+
+
+
+
+
+Figure above shows the main stages that records go through while being written 
to the Hudi table. In the precombining
+stage, Hudi performs any deduplication based on the payload implementation and 
precombine key configured by the user.
+Further, on index lookup, Hudi identifies which records are being updated and 
the record payload implementation tells
+Hudi how to merge the incoming record with the existing record on storage.
+
+### Existing Payloads
+
+ OverwriteWithLatestAvroPayload
+
+This is the default record payload implementation. It picks the record with 
the greatest value (determined by calling
+.compareTo() on the value of precombine key) to break ties and simply picks 
the latest record while merging. This gives

Review Comment:
   That's true. But, I wanted to keep things simple for the user as it is a 
concepts doc. Towards the end, I have pointed to the FAQ which has more details.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] clp007 opened a new issue, #7960: [SUPPORT]

2023-02-15 Thread via GitHub


clp007 opened a new issue, #7960:
URL: https://github.com/apache/hudi/issues/7960

   
   **Describe the problem you faced**
   
   There is a problem when synchronizing the hudi table to bigquery. I'm not 
sure what the problem is and how to solve it;
   
   spark-submit --master yarn \
   --packages com.google.cloud:google-cloud-bigquery:2.10.4 \
   --jars /opt/hudi-gcp-bundle-0.12.1.jar \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   /opt/hudi-utilities-bundle_2.12-0.12.1.jar \
   --target-base-path 
gs://transfer-table-data/incremental/test/bubble-pop-b01a0 \
   --target-table bubble-pop-b01a0 \
   --table-type COPY_ON_WRITE \
   --base-file-format PARQUET \
   --enable-sync \
   --sync-tool-classes org.apache.hudi.gcp.bigquery.BigQuerySyncTool \
   --hoodie-conf 
hoodie.deltastreamer.source.dfs.root=gs://transfer-table-data/incremental/test/bubble-pop-b01a0
 \
   --hoodie-conf hoodie.gcp.bigquery.sync.project_id=transferred \
   --hoodie-conf hoodie.gcp.bigquery.sync.dataset_name=temp_data \
   --hoodie-conf hoodie.gcp.bigquery.sync.dataset_location=us-central1 \
   --hoodie-conf hoodie.gcp.bigquery.sync.table_name=temp_bubble-pop \
   --hoodie-conf 
hoodie.gcp.bigquery.sync.base_path=gs://transfer-table-data/tmp/temp_bubble-pop/${NOW}
 \
   --hoodie-conf hoodie.gcp.bigquery.sync.partition_fields=event_date \
   --hoodie-conf 
hoodie.gcp.bigquery.sync.source_uri=gs://transfer-table-data/incremental/test/bubble-pop-b01a0/event_date=*
 \
   --hoodie-conf 
hoodie.gcp.bigquery.sync.source_uri_prefix=gs://transfer-table-data/incremental/test/bubble-pop-b01a0
 \
   --hoodie-conf hoodie.gcp.bigquery.sync.use_file_listing_from_metadata=true \
   --hoodie-conf hoodie.gcp.bigquery.sync.assume_date_partitioning=false \
   --hoodie-conf 
hoodie.datasource.write.recordkey.field=event_timestamp,event_name,user_pseudo_id,user_first_touch_timestamp,advertising_id
 \
   --hoodie-conf hoodie.datasource.write.partitionpath.field=event_date \
   --hoodie-conf hoodie.datasource.write.precombine.field=event_timestamp \
   --hoodie-conf hoodie.datasource.write.keygenerator.type=COMPLEX \
   --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \
   --hoodie-conf hoodie.datasource.write.drop.partition.columns=true \
   --hoodie-conf hoodie.partition.metafile.use.base.format=true \
   --hoodie-conf hoodie.metadata.enable=true \
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. An error occurred when I ran the above script
   
   **Environment Description**
   
   * Hudi version : hudi-spark3.2-bundle_2.12:0.12.1
   
   * Spark version :3.1
   
   * Storage (HDFS/S3/GCS..) :GCS
   
   * Running on Docker? (yes/no) :no
   
   **Additional context**
   
   dataproc spark
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer: Got error 
ru
   nning delta sync once. Shutting down
   org.apache.hudi.exception.HoodieException: Please provide a valid schema 
provider class!
   at 
org.apache.hudi.utilities.sources.InputBatch.getSchemaProvider(InputBatch.java:56)

   at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(S
   ourceFormatAdapter.java:64)
   at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:468)
 
   at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:401)
  
   at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:305)  
  
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaS
   treamer.java:204)
   at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.j
   ava:202)
   at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.j
   ava:571)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(Spark
   Submit.scala:951)
   at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)

[GitHub] [hudi] hudi-bot commented on pull request #7958: [HUDI-5799] Fix Spark partition validation in TestBulkInsertInternalPartitionerForRows

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7958:
URL: https://github.com/apache/hudi/pull/7958#issuecomment-1430939533

   
   ## CI report:
   
   * 36c706d1bb1a8f793ce874c9316aaf829aecd594 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15199)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1430939405

   
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7959: [HUDI-5800] Fix test failure in TestHoodieMergeOnReadTable

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7959:
URL: https://github.com/apache/hudi/pull/7959#issuecomment-1430939617

   
   ## CI report:
   
   * 480d3a4b17476126e248eddea06713024fae0f2b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15200)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5800) Fix test failure in TestHoodieMergeOnReadTable

2023-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5800:
-
Labels: pull-request-available  (was: )

> Fix test failure in TestHoodieMergeOnReadTable
> --
>
> Key: HUDI-5800
> URL: https://issues.apache.org/jira/browse/HUDI-5800
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>
> The Jira fixes test failure in TestHoodieMergeOnReadTable.testReleaseResource
> {code:java}
> TestHoodieMergeOnReadTable.testReleaseResource:710 expected: <14> but was: 
> <3> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all

2023-02-15 Thread via GitHub


xushiyan commented on code in PR #7914:
URL: https://github.com/apache/hudi/pull/7914#discussion_r1106832148


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java:
##
@@ -246,6 +246,7 @@ protected HoodieWriteMetadata> 
executeClustering(HoodieC
 .performClustering(clusteringPlan, schema, instantTime);
 HoodieData writeStatusList = writeMetadata.getWriteStatuses();
 HoodieData statuses = updateIndex(writeStatusList, 
writeMetadata);
+context.putCachedDataIds(config.getBasePath(), instantTime, 
statuses.getId());

Review Comment:
   i wasn't happy with tracing every persisting call and thought about this 
approach but also wanted to keep the impacting scope narrow. A change in all 
persist() call may lead to unexpected side effects.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all

2023-02-15 Thread via GitHub


xushiyan commented on code in PR #7914:
URL: https://github.com/apache/hudi/pull/7914#discussion_r1106835666


##
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/TestSparkRDDWriteClient.java:
##
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client;
+
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.testutils.SparkClientFunctionalTestHarness;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.storage.StorageLevel;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.net.URI;
+import java.util.Collections;
+import java.util.List;
+import java.util.Properties;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import static 
org.apache.hudi.common.testutils.HoodieTestDataGenerator.getCommitTimeAtUTC;
+import static org.apache.hudi.testutils.Assertions.assertNoWriteErrors;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+class TestSparkRDDWriteClient extends SparkClientFunctionalTestHarness {
+
+  static Stream 
testWriteClientReleaseResourcesShouldOnlyUnpersistRelevantRdds() {
+return Stream.of(
+Arguments.of(HoodieTableType.COPY_ON_WRITE, true),
+Arguments.of(HoodieTableType.MERGE_ON_READ, true),
+Arguments.of(HoodieTableType.COPY_ON_WRITE, false),
+Arguments.of(HoodieTableType.MERGE_ON_READ, false)
+);
+  }
+
+  @ParameterizedTest
+  @MethodSource
+  void 
testWriteClientReleaseResourcesShouldOnlyUnpersistRelevantRdds(HoodieTableType 
tableType, boolean shouldReleaseResource) throws IOException {
+final HoodieTableMetaClient metaClient = getHoodieMetaClient(hadoopConf(), 
URI.create(basePath()).getPath(), tableType, new Properties());
+final HoodieWriteConfig writeConfig = getConfigBuilder(true)
+.withPath(metaClient.getBasePathV2().toString())
+.withAutoCommit(false)
+.withReleaseResourceEnabled(shouldReleaseResource)
+
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(false).build())
+.build();
+HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator(0xDEED);
+
+String instant0 = getCommitTimeAtUTC(0);
+List extraRecords0 = dataGen.generateGenericRecords(10);
+JavaRDD persistedRdd0 = jsc().parallelize(extraRecords0, 
2).persist(StorageLevel.MEMORY_AND_DISK());
+context().putCachedDataIds(writeConfig.getBasePath(), instant0, 
persistedRdd0.id());
+
+String instant1 = getCommitTimeAtUTC(1);
+List extraRecords1 = dataGen.generateGenericRecords(10);
+JavaRDD persistedRdd1 = jsc().parallelize(extraRecords1, 
2).persist(StorageLevel.MEMORY_AND_DISK());
+context().putCachedDataIds(writeConfig.getBasePath(), instant1, 
persistedRdd1.id());
+
+SparkRDDWriteClient writeClient = getHoodieWriteClient(writeConfig);
+List records = dataGen.generateInserts(instant1, 10);
+JavaRDD writeRecords = jsc().parallelize(records, 2);
+writeClient.startCommitWithTime(instant1);
+List writeStatuses = writeClient.insert(writeRecords, 
instant1).collect();
+assertNoWriteErrors(writeStatuses);
+writeClient.commitStats(instant1, 
writeStatuses.stream().map(WriteStatus::getStat).collect(Collectors.toList()),
+Option.empty(), metaClient.getCommitActionType());
+writeClient.close();
+
+if (shouldReleaseResource) {
+  assertEquals(Collections.singletonList(persistedRdd0.id()),
+  context().getCachedDataIds(writeConfig.getBasePath(), instant0),
+  "RDDs cached for " + in

[jira] [Created] (HUDI-5801) Speed metaTable initializeFileGroups

2023-02-15 Thread loukey_j (Jira)
loukey_j created HUDI-5801:
--

 Summary: Speed metaTable initializeFileGroups
 Key: HUDI-5801
 URL: https://issues.apache.org/jira/browse/HUDI-5801
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: loukey_j


org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups 

Too slow when there are many filegroups

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] lokeshj1703 opened a new pull request, #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


lokeshj1703 opened a new pull request, #7961:
URL: https://github.com/apache/hudi/pull/7961

   ### Change Logs
   
   Modify DefaultHoodieRecordPayload to be able to handle a configured delete 
key and marker
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5802) Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5802:
-
Labels: pull-request-available  (was: )

> Allow configuration for deletes in DefaultHoodieRecordPayload
> -
>
> Key: HUDI-5802
> URL: https://issues.apache.org/jira/browse/HUDI-5802
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>
> Modify DefaultHoodieRecordPayload to be able to handle a configured delete 
> key and marker



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5802) Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread Lokesh Jain (Jira)
Lokesh Jain created HUDI-5802:
-

 Summary: Allow configuration for deletes in 
DefaultHoodieRecordPayload
 Key: HUDI-5802
 URL: https://issues.apache.org/jira/browse/HUDI-5802
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Lokesh Jain
Assignee: Lokesh Jain


Modify DefaultHoodieRecordPayload to be able to handle a configured delete key 
and marker



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


lokeshj1703 commented on code in PR #7961:
URL: https://github.com/apache/hudi/pull/7961#discussion_r1106842865


##
hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java:
##
@@ -71,18 +73,38 @@ public Option 
combineAndGetUpdateValue(IndexedRecord currentValue
 /*
  * Now check if the incoming record is a delete record.
  */
-return Option.of(incomingRecord);
+return isDeleteRecord(incomingRecord, properties) ? Option.empty() : 
Option.of(incomingRecord);
   }
 
   @Override
   public Option getInsertValue(Schema schema, Properties 
properties) throws IOException {
-if (recordBytes.length == 0 || isDeletedRecord) {
+if (recordBytes.length == 0) {
   return Option.empty();
 }
 GenericRecord incomingRecord = HoodieAvroUtils.bytesToAvro(recordBytes, 
schema);
 eventTime = updateEventTime(incomingRecord, properties);
 
-return Option.of(incomingRecord);
+return isDeleteRecord(incomingRecord, properties) ? Option.empty() : 
Option.of(incomingRecord);
+  }
+
+  /**
+   * @param genericRecord instance of {@link GenericRecord} of interest.
+   * @param properties payload related properties
+   * @returns {@code true} if record represents a delete record. {@code false} 
otherwise.
+   */
+  protected boolean isDeleteRecord(GenericRecord genericRecord, Properties 
properties) {
+final String deleteKey = properties.getProperty(DELETE_KEY);
+if (deleteKey == null) {
+  return super.isDeleteRecord(genericRecord);

Review Comment:
   If `DELETE_MARKER` property is not set, should we throw an exception here or 
fall back to default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope commented on a diff in pull request #7929: [HUDI-5754] Add new sources to deltastreamer docs

2023-02-15 Thread via GitHub


codope commented on code in PR #7929:
URL: https://github.com/apache/hudi/pull/7929#discussion_r1106853958


##
website/docs/hoodie_deltastreamer.md:
##
@@ -340,6 +388,26 @@ to trigger/processing of new or changed data as soon as it 
is available on S3.
 
 Insert code sample from this blog: 
https://hudi.apache.org/blog/2021/08/23/s3-events-source/#configuration-and-setup
 
+### GCS Events
+Google Cloud Storage (GCS) service provides an event notification mechanism 
which will post notifications when certain
+events happen in your GCS bucket. You can read more at [Pub/Sub 
Notifications](https://cloud.google.com/storage/docs/pubsub-notifications/).
+GCS will put these events in a Cloud Pub/Sub topic. Apache Hudi provides a 
GcsEventsSource that can read from Cloud Pub/Sub
+to trigger/processing of new or changed data as soon as it is available on GCS.
+
+ Setup
+A detailed guide on [How to use the 
system](https://docs.google.com/document/d/1VfvtdvhXw6oEHPgZ_4Be2rkPxIzE0kBCNUiVDsXnSAA/edit#heading=h.tpmqk5oj0crt)
 is available.

Review Comment:
   I think we need not put the whole document. We typically assume that users 
know how to enable event notifications. What we can add here is the two 
spark-submit command samples for the two sources.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all

2023-02-15 Thread via GitHub


xushiyan commented on code in PR #7914:
URL: https://github.com/apache/hudi/pull/7914#discussion_r1106832148


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java:
##
@@ -246,6 +246,7 @@ protected HoodieWriteMetadata> 
executeClustering(HoodieC
 .performClustering(clusteringPlan, schema, instantTime);
 HoodieData writeStatusList = writeMetadata.getWriteStatuses();
 HoodieData statuses = updateIndex(writeStatusList, 
writeMetadata);
+context.putCachedDataIds(config.getBasePath(), instantTime, 
statuses.getId());

Review Comment:
   i wasn't happy with tracing every persisting call and thought about this 
approach but also wanted to keep the impacting scope narrow. A change in all 
persist() call may lead to unexpected side effects. Also looks a bit weird to 
have a HoodieData to know about any HoodieEngineContext. Having 
HoodieEngineContext tracing all HoodieData from it's born and auto-cache its id 
makes more sense but it's a much bigger change wrt this PR's intention



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] loukey-lj opened a new pull request, #7962: [HUDI-5801] Speed metaTable initializeFileGroups

2023-02-15 Thread via GitHub


loukey-lj opened a new pull request, #7962:
URL: https://github.com/apache/hudi/pull/7962

   ### Change Logs
   
   
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups 
   Too slow when there are many filegroups
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   NA
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5801) Speed metaTable initializeFileGroups

2023-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5801:
-
Labels: pull-request-available  (was: )

> Speed metaTable initializeFileGroups
> 
>
> Key: HUDI-5801
> URL: https://issues.apache.org/jira/browse/HUDI-5801
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: loukey_j
>Priority: Major
>  Labels: pull-request-available
>
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#initializeFileGroups 
> Too slow when there are many filegroups
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1431012374

   
   ## CI report:
   
   * f694a549ea265813f05767d69269fda2bb1ef279 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15161)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15188)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431022184

   
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7962: [HUDI-5801] Speed metaTable initializeFileGroups

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7962:
URL: https://github.com/apache/hudi/pull/7962#issuecomment-1431022243

   
   ## CI report:
   
   * bd715641ef0532c50771d1ae02fdeb5f39e6a52c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7362:
URL: https://github.com/apache/hudi/pull/7362#issuecomment-1431031174

   
   ## CI report:
   
   * b3e842754a302dc1372b330a8c32298d49732107 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14831)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14867)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15137)
 
   * c758e27d4d99c5e88b1ab7fe77fb89131aebce4d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7962: [HUDI-5801] Speed metaTable initializeFileGroups

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7962:
URL: https://github.com/apache/hudi/pull/7962#issuecomment-1431033376

   
   ## CI report:
   
   * bd715641ef0532c50771d1ae02fdeb5f39e6a52c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431033302

   
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7362:
URL: https://github.com/apache/hudi/pull/7362#issuecomment-1431042082

   
   ## CI report:
   
   * b3e842754a302dc1372b330a8c32298d49732107 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14831)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14867)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15137)
 
   * c758e27d4d99c5e88b1ab7fe77fb89131aebce4d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15203)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7955: [HUDI-5649] Unify all the loggers to slf4j

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7955:
URL: https://github.com/apache/hudi/pull/7955#issuecomment-1431043789

   
   ## CI report:
   
   * 8c05730d6eddec29b98d421b2edc95ae616dc29d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15193)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5803) Support Aliyun DFS Storage

2023-02-15 Thread Ran Tao (Jira)
Ran Tao created HUDI-5803:
-

 Summary: Support Aliyun DFS Storage
 Key: HUDI-5803
 URL: https://issues.apache.org/jira/browse/HUDI-5803
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Ran Tao


add support for Alibaba cloud dfs storage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5803) Support Aliyun DFS Storage

2023-02-15 Thread Ran Tao (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688995#comment-17688995
 ] 

Ran Tao commented on HUDI-5803:
---

[~yanghua] hi. yang. what do u think? can u assign this ticket to me?

> Support Aliyun DFS Storage
> --
>
> Key: HUDI-5803
> URL: https://issues.apache.org/jira/browse/HUDI-5803
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ran Tao
>Priority: Major
>
> add support for Alibaba cloud dfs storage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5804) hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation

2023-02-15 Thread Pramod Biligiri (Jira)
Pramod Biligiri created HUDI-5804:
-

 Summary: hudi-cli CommitsCommand - some options fail due to typo 
in ShellOption annotation
 Key: HUDI-5804
 URL: https://issues.apache.org/jira/browse/HUDI-5804
 Project: Apache Hudi
  Issue Type: Bug
  Components: cli
Reporter: Pramod Biligiri


In multiple places in the CommitsCommand, the ShellOption is missing the "–" 
parameter in its value attribute. One such example is shown below from "commit 
showpartitions":

[https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213]
|@ShellOption(value = {"includeArchivedTimeline"}, help = "Include archived 
commits as well", defaultValue = "false") final boolean 
includeArchivedTimeline)|


That should read value=\{"--includeArchivedTimeline"}

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5804) hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation

2023-02-15 Thread Pramod Biligiri (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pramod Biligiri updated HUDI-5804:
--
Description: 
In multiple places in the CommitsCommand, the ShellOption is missing the "–" 
parameter in its value attribute. One such example is shown below from "commit 
showpartitions":

[https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213]
|@ShellOption(value = \{"includeArchivedTimeline"}, help = "Include archived 
commits as well", defaultValue = "false") final boolean 
includeArchivedTimeline)|

In the above, it should read 'value=\{"--includeArchivedTimeline"...}'

 

 

  was:
In multiple places in the CommitsCommand, the ShellOption is missing the "–" 
parameter in its value attribute. One such example is shown below from "commit 
showpartitions":

[https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213]
|@ShellOption(value = {"includeArchivedTimeline"}, help = "Include archived 
commits as well", defaultValue = "false") final boolean 
includeArchivedTimeline)|


That should read value=\{"--includeArchivedTimeline"}

 

 


> hudi-cli CommitsCommand - some options fail due to typo in ShellOption 
> annotation
> -
>
> Key: HUDI-5804
> URL: https://issues.apache.org/jira/browse/HUDI-5804
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Reporter: Pramod Biligiri
>Priority: Minor
>
> In multiple places in the CommitsCommand, the ShellOption is missing the "–" 
> parameter in its value attribute. One such example is shown below from 
> "commit showpartitions":
> [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213]
> |@ShellOption(value = \{"includeArchivedTimeline"}, help = "Include archived 
> commits as well", defaultValue = "false") final boolean 
> includeArchivedTimeline)|
> In the above, it should read 'value=\{"--includeArchivedTimeline"...}'
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-143782

   
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201)
 
   * c8ac28edb845302c9f3afbc980b03782c0605564 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431125334

   
   ## CI report:
   
   * 29189395a4d407c331c89c11b1e70e989d704b20 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15201)
 
   * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] pramodbiligiri opened a new pull request, #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake

2023-02-15 Thread via GitHub


pramodbiligiri opened a new pull request, #7963:
URL: https://github.com/apache/hudi/pull/7963

   https://issues.apache.org/jira/browse/HUDI-5804
   
   ### Change Logs
   Fix typo in use of ShellOption annotation in CommitsCommand class. There 
were a few places where the "--" prefix was missing.
   
   ### Impact
   Makes the following CLI flags actually available to be used by the user. 
Currently these were in the code but there was no way to invoke it:
   1. commit showpartitions --includeArchivedTimeline
   2. commit show_write_stats --includeArchivedTimeline
   3. commit showfiles --includeArchivedTimeline
   
   ### Risk level (write none, low medium or high below)
   Low. Exposes existing functionality that was getting suppressed by mistake.
   
   ### Documentation Update
   The shell option will show up automatically in the hudi-cli help.
   
   ### Contributor's checklist
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5804) hudi-cli CommitsCommand - some options fail due to typo in ShellOption annotation

2023-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5804:
-
Labels: pull-request-available  (was: )

> hudi-cli CommitsCommand - some options fail due to typo in ShellOption 
> annotation
> -
>
> Key: HUDI-5804
> URL: https://issues.apache.org/jira/browse/HUDI-5804
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Reporter: Pramod Biligiri
>Priority: Minor
>  Labels: pull-request-available
>
> In multiple places in the CommitsCommand, the ShellOption is missing the "–" 
> parameter in its value attribute. One such example is shown below from 
> "commit showpartitions":
> [https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java#L213]
> |@ShellOption(value = \{"includeArchivedTimeline"}, help = "Include archived 
> commits as well", defaultValue = "false") final boolean 
> includeArchivedTimeline)|
> In the above, it should read 'value=\{"--includeArchivedTimeline"...}'
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7963:
URL: https://github.com/apache/hudi/pull/7963#issuecomment-1431144608

   
   ## CI report:
   
   * f5b811ac7f5fd8278fb1fcb27ce1e17ff05a9750 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] lokeshj1703 closed pull request #7878: Dep tree diff 0.12.2 and 0.13.0

2023-02-15 Thread via GitHub


lokeshj1703 closed pull request #7878: Dep tree diff 0.12.2 and 0.13.0
URL: https://github.com/apache/hudi/pull/7878


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7952: [MINOR] Fix format name and remove redundant line in examples

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7952:
URL: https://github.com/apache/hudi/pull/7952#issuecomment-1431226426

   
   ## CI report:
   
   * 7b1012695ef498cd5ffadd4e87c58709e782a479 UNKNOWN
   * 8e06386f41311e3780846c5dcb4593e0ed863d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15190)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7963:
URL: https://github.com/apache/hudi/pull/7963#issuecomment-1431226578

   
   ## CI report:
   
   * f5b811ac7f5fd8278fb1fcb27ce1e17ff05a9750 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15205)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7933: [HUDI-5774] Fix prometheus configs for metadata table and support metric labels

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7933:
URL: https://github.com/apache/hudi/pull/7933#issuecomment-1431234245

   
   ## CI report:
   
   * a02b393674ed4ae07d1eed67560f126ac06e178c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15158)
 
   * 638327ab8184a7b40d379bd8591f9e67f7fe70f7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15196)
 
   * 6f36de2e745401f48980bbe71513c40efaa83ac5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7933: [HUDI-5774] Fix prometheus configs for metadata table and support metric labels

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7933:
URL: https://github.com/apache/hudi/pull/7933#issuecomment-1431242582

   
   ## CI report:
   
   * 638327ab8184a7b40d379bd8591f9e67f7fe70f7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15196)
 
   * 6f36de2e745401f48980bbe71513c40efaa83ac5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15206)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7940: [HUDI-5787] HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is fal

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7940:
URL: https://github.com/apache/hudi/pull/7940#issuecomment-1431242713

   
   ## CI report:
   
   * 249ffe369a49308e2a65a0ac58389efd5b49d1ad Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15191)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 merged pull request #7940: [HUDI-5787] HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false

2023-02-15 Thread via GitHub


danny0405 merged PR #7940:
URL: https://github.com/apache/hudi/pull/7940


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[hudi] branch master updated (af61dea6f98 -> 25f6927b47d)

2023-02-15 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from af61dea6f98 [MINOR] Enable Azure CI to publish test results (#7943)
 add 25f6927b47d [HUDI-5787] HMSDDLExecutor should set table type to 
EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync 
config is false (#7940)

No new revisions were added by this update.

Summary of changes:
 .../hudi/table/catalog/TestHoodieHiveCatalog.java  | 18 ++
 .../org/apache/hudi/hive/ddl/HMSDDLExecutor.java   |  2 +-
 .../org/apache/hudi/hive/TestHiveSyncTool.java | 28 ++
 3 files changed, 47 insertions(+), 1 deletion(-)



[jira] [Closed] (HUDI-5787) HMSDDLExecutor should set table type to EXTERNAL_TABLE when hoodie.datasource.hive_sync.create_managed_table of sync config is false

2023-02-15 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-5787.

Fix Version/s: 0.14.0
   Resolution: Fixed

Fixed via master branch: 25f6927b47d5cfa6baad95e09fb88ad7ce2a1402

> HMSDDLExecutor should set table type to EXTERNAL_TABLE when 
> hoodie.datasource.hive_sync.create_managed_table of sync config is false
> 
>
> Key: HUDI-5787
> URL: https://issues.apache.org/jira/browse/HUDI-5787
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.0
>
>
> HMSDDLExecutor should set the table type of Hive table to EXTERNAL_TABLE when 
> hoodie.datasource.hive_sync.create_managed_table of sync config is set to 
> false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] li36909 closed pull request #7957: [HUDI-5798] fix spark sql query error on mor table after flink cdc delete records

2023-02-15 Thread via GitHub


li36909 closed pull request #7957: [HUDI-5798] fix spark sql query error on mor 
table after flink cdc delete records
URL: https://github.com/apache/hudi/pull/7957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7956: [HUDI-5797] fix use bulk insert error as row

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7956:
URL: https://github.com/apache/hudi/pull/7956#issuecomment-1431296270

   
   ## CI report:
   
   * 5bd4d5c4de8fc54bf93fb7fd252b6e61fda85373 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15194)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5805) hive query on mor get empty result before compaction

2023-02-15 Thread lrz (Jira)
lrz created HUDI-5805:
-

 Summary: hive query on mor get empty result before compaction
 Key: HUDI-5805
 URL: https://issues.apache.org/jira/browse/HUDI-5805
 Project: Apache Hudi
  Issue Type: Bug
Reporter: lrz
 Attachments: image-2023-02-15-20-48-08-819.png, 
image-2023-02-15-20-48-21-988.png

when a mor table write data with flink cdc only, then before compaction the 
partition will only have log file, and no base file. then befor compaction, 
hive query result will always be empty.

it's because when hive getSplit on a native table, hive will ignore a partition 
which only has files start with '.', and because hudi has not set storageHandle 
when sync hive meta, then hive treat it as native table

!image-2023-02-15-20-48-08-819.png!

!image-2023-02-15-20-48-21-988.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] li36909 opened a new pull request, #7964: [HUDI-5805] hive query on mor get empty result before compaction

2023-02-15 Thread via GitHub


li36909 opened a new pull request, #7964:
URL: https://github.com/apache/hudi/pull/7964

   Change Logs
   when a mor table write data with flink cdc only, then before compaction the 
partition will only have log file, and no base file. then befor compaction, 
hive query result will always be empty.
   
   it's because when hive getSplit on a native table, hive will ignore a 
partition which only has files start with '.', and because hudi has not set 
storageHandle when sync hive meta, then hive treat it as native table. 
   
   Impact
   make storageHandle as DefaultStorageHandler when sync hive meta
   
   Risk level (write none, low medium or high below)
   none
   
   Documentation Update
   none
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5805) hive query on mor get empty result before compaction

2023-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5805:
-
Labels: pull-request-available  (was: )

> hive query on mor get empty result before compaction
> 
>
> Key: HUDI-5805
> URL: https://issues.apache.org/jira/browse/HUDI-5805
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-02-15-20-48-08-819.png, 
> image-2023-02-15-20-48-21-988.png
>
>
> when a mor table write data with flink cdc only, then before compaction the 
> partition will only have log file, and no base file. then befor compaction, 
> hive query result will always be empty.
> it's because when hive getSplit on a native table, hive will ignore a 
> partition which only has files start with '.', and because hudi has not set 
> storageHandle when sync hive meta, then hive treat it as native table
> !image-2023-02-15-20-48-08-819.png!
> !image-2023-02-15-20-48-21-988.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] codope opened a new pull request, #7965: Merge query engine setup and querying data docs

2023-02-15 Thread via GitHub


codope opened a new pull request, #7965:
URL: https://github.com/apache/hudi/pull/7965

   ### Change Logs
   
   * Merge query engine setup docs into querying data docs.
   * Add ClickHouse to the list of supported query engines.
   * Update support matrix.
   
   ### Impact
   
   Public docs change.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   Stated as above. Pages affected:
   https://hudi.apache.org/docs/querying_data
   https://hudi.apache.org/docs/query_engine_setup
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-5798) spark-sql query fail on mor table after flink cdc application delete records

2023-02-15 Thread lrz (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689110#comment-17689110
 ] 

lrz commented on HUDI-5798:
---

I fix this issue by add a special avro shade jar at spark/jars, and it seems 
not good to introduce into hudi project  

> spark-sql query fail on mor table after flink cdc application delete records
> 
>
> Key: HUDI-5798
> URL: https://issues.apache.org/jira/browse/HUDI-5798
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
>
> after flink cdc application delete records for a mor table, spark sql will 
> query fail on the table with below exception:
>  
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.DeleteRecord)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>     at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
>     at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>     at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
>     at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>     at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>     ... 23 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.avro.util.Utf8
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>     ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc application delete records

2023-02-15 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-5798:
--
Summary: spark sql query fail on mor table after flink cdc application 
delete records  (was: spark-sql query fail on mor table after flink cdc 
application delete records)

> spark sql query fail on mor table after flink cdc application delete records
> 
>
> Key: HUDI-5798
> URL: https://issues.apache.org/jira/browse/HUDI-5798
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
>
> after flink cdc application delete records for a mor table, spark sql will 
> query fail on the table with below exception:
>  
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.DeleteRecord)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>     at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
>     at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>     at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
>     at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>     at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>     ... 23 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.avro.util.Utf8
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>     ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5798) spark sql query fail on mor table after flink cdc delete records

2023-02-15 Thread lrz (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lrz updated HUDI-5798:
--
Summary: spark sql query fail on mor table after flink cdc delete records  
(was: spark sql query fail on mor table after flink cdc application delete 
records)

> spark sql query fail on mor table after flink cdc delete records
> 
>
> Key: HUDI-5798
> URL: https://issues.apache.org/jira/browse/HUDI-5798
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: lrz
>Priority: Major
>  Labels: pull-request-available
>
> after flink cdc application delete records for a mor table, spark sql will 
> query fail on the table with below exception:
>  
> Serialization trace:
> orderingVal (org.apache.hudi.common.model.DeleteRecord)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
>     at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
>     at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
>     at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
>     at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
>     at 
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
>     at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
>     at 
> org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:104)
>     at 
> org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:78)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.deserialize(HoodieDeleteBlock.java:106)
>     at 
> org.apache.hudi.common.table.log.block.HoodieDeleteBlock.getRecordsToDelete(HoodieDeleteBlock.java:91)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
>     at 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
>     ... 23 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.org.apache.avro.util.Utf8
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>     ... 37 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7964: [HUDI-5805] hive query on mor get empty result before compaction

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7964:
URL: https://github.com/apache/hudi/pull/7964#issuecomment-1431380448

   
   ## CI report:
   
   * 6aed8cffab1f915790180de9b49188b0077e0e6a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7894: [HUDI-5729] Fix RowDataKeyGen method getRecordKey

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7894:
URL: https://github.com/apache/hudi/pull/7894#issuecomment-1431379719

   
   ## CI report:
   
   * ddc28f53801f2e11401738d1c6acb74eec9c8fab Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15195)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7964: [HUDI-5805] hive query on mor get empty result before compaction

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7964:
URL: https://github.com/apache/hudi/pull/7964#issuecomment-1431391906

   
   ## CI report:
   
   * 6aed8cffab1f915790180de9b49188b0077e0e6a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15208)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5806) hudi-cli should have option to show nearest matching commit

2023-02-15 Thread Pramod Biligiri (Jira)
Pramod Biligiri created HUDI-5806:
-

 Summary: hudi-cli should have option to show nearest matching 
commit
 Key: HUDI-5806
 URL: https://issues.apache.org/jira/browse/HUDI-5806
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cli
Reporter: Pramod Biligiri


When searching for a commit timestamp in hudi cli, there should be an option to 
display the nearest matching commits if no exact match is found. This will help 
in production support use cases to quickly know what was the recent commit 
activity in the period in which the user is interested in.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] pramodbiligiri opened a new pull request, #7966: [HUDI-5806] {W-I-P] hudi-cli option to find nearest matching commits

2023-02-15 Thread via GitHub


pramodbiligiri opened a new pull request, #7966:
URL: https://github.com/apache/hudi/pull/7966

   https://issues.apache.org/jira/browse/HUDI-5806
   
   ### Change Logs
   
   Add a --nearestMatch boolean flag to "commit showfiles --commit 
COMMIT_INSTANT" to display nearest matching commit if no exact match found.
   
   ### Impact
   TODO
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   TODO
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   TODO
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   TODO
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5806) hudi-cli should have option to show nearest matching commit

2023-02-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5806:
-
Labels: pull-request-available  (was: )

> hudi-cli should have option to show nearest matching commit
> ---
>
> Key: HUDI-5806
> URL: https://issues.apache.org/jira/browse/HUDI-5806
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Pramod Biligiri
>Priority: Major
>  Labels: pull-request-available
>
> When searching for a commit timestamp in hudi cli, there should be an option 
> to display the nearest matching commits if no exact match is found. This will 
> help in production support use cases to quickly know what was the recent 
> commit activity in the period in which the user is interested in.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1431464017

   
   ## CI report:
   
   * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7966: [HUDI-5806] {W-I-P] hudi-cli option to find nearest matching commits

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7966:
URL: https://github.com/apache/hudi/pull/7966#issuecomment-1431477124

   
   ## CI report:
   
   * 99ad822ddf41e3de76e1fd716756ef02396ad804 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7966: [HUDI-5806] {W-I-P] hudi-cli option to find nearest matching commits

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7966:
URL: https://github.com/apache/hudi/pull/7966#issuecomment-1431489630

   
   ## CI report:
   
   * 99ad822ddf41e3de76e1fd716756ef02396ad804 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15209)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope opened a new pull request, #7967: [DOCS] Update metadata indexing doc

2023-02-15 Thread via GitHub


codope opened a new pull request, #7967:
URL: https://github.com/apache/hudi/pull/7967

   ### Change Logs
   
   Update metadata indexing docs.
   
   ### Impact
   
   Docs update.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7958: [HUDI-5799] Fix Spark partition validation in TestBulkInsertInternalPartitionerForRows

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7958:
URL: https://github.com/apache/hudi/pull/7958#issuecomment-1431561986

   
   ## CI report:
   
   * 36c706d1bb1a8f793ce874c9316aaf829aecd594 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15199)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] codope opened a new pull request, #7968: [DOCS] Update hive metastore sync docs

2023-02-15 Thread via GitHub


codope opened a new pull request, #7968:
URL: https://github.com/apache/hudi/pull/7968

   ### Change Logs
   
   - Added a brief intro about Hive metastore.
   - Removed deprecated config.
   - Added default values and better explanation for rest of the configs.
   
   
   ### Impact
   
   Docs update.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] wqwl611 opened a new issue, #7969: [SUPPORT] data loss in new base file.

2023-02-15 Thread via GitHub


wqwl611 opened a new issue, #7969:
URL: https://github.com/apache/hudi/issues/7969

   **Describe the problem you faced**
   I find some data loss in the  new base file: 
[-9e95-4471-bba0-5604a282aa34-0_0-12-4_20230208003459996.parquet].
   I doubt that compaction plan may miss some delta log. 
   How can I check the archive compaction plan?
   https://user-images.githubusercontent.com/67826098/219074761-6150bcf1-89f5-4333-8eea-960105c07f94.png";>
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version : 3.2.0
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : hdfs
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on issue #7902: [SUPPORT].UnresolvedUnionException: Not in union exception occurred when writing data through spark

2023-02-15 Thread via GitHub


jonvex commented on issue #7902:
URL: https://github.com/apache/hudi/issues/7902#issuecomment-1431612529

   If you take a look at the code for 
[UnresolvedUnionException.java](https://github.com/apache/avro/blob/f23eabb42f315b0db9135b075434b8a88680659c/lang/java/avro/src/main/java/org/apache/avro/UnresolvedUnionException.java),
 the ending item is 'unresolvedDatum'. In the exception you provided, that 
appears to be KEEP_LATEST_FILE_VERSIONS.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex commented on issue #7717: [SUPPORT] org.apache.avro.SchemaParseException: Can't redefine: array When there are Top level variables , Struct and Array[struct] (no complex dataty

2023-02-15 Thread via GitHub


jonvex commented on issue #7717:
URL: https://github.com/apache/hudi/issues/7717#issuecomment-1431620989

   Yes. It is not exactly the same issue. What I meant is I think the root 
cause is the same, and it can be solved by upgrading parquet-avro.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7959: [HUDI-5800] Fix test failure in TestHoodieMergeOnReadTable

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7959:
URL: https://github.com/apache/hudi/pull/7959#issuecomment-1431651570

   
   ## CI report:
   
   * 480d3a4b17476126e248eddea06713024fae0f2b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15200)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7962: [HUDI-5801] Speed metaTable initializeFileGroups

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7962:
URL: https://github.com/apache/hudi/pull/7962#issuecomment-1431664630

   
   ## CI report:
   
   * bd715641ef0532c50771d1ae02fdeb5f39e6a52c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15202)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7752: [MINOR] De-duplicating Iterator implementations

2023-02-15 Thread via GitHub


alexeykudinkin commented on code in PR #7752:
URL: https://github.com/apache/hudi/pull/7752#discussion_r1107382524


##
hudi-common/src/main/java/org/apache/hudi/common/util/collection/CloseableMappingIterator.java:
##
@@ -22,8 +22,8 @@
 
 import java.util.function.Function;
 
-// TODO java-doc
-public class CloseableMappingIterator extends MappingIterator 
implements ClosableIterator {
+public class CloseableMappingIterator extends MappingIterator

Review Comment:
   Not sure i understand what warnings you're referring to



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7672: [HUDI-5557]Avoid converting columns that are not indexed in CSI

2023-02-15 Thread via GitHub


alexeykudinkin commented on code in PR #7672:
URL: https://github.com/apache/hudi/pull/7672#discussion_r1107399443


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##
@@ -209,11 +209,11 @@ class ColumnStatsIndexSupport(spark: SparkSession,
 // NOTE: We're sorting the columns to make sure final index schema matches 
layout
 //   of the transposed table
 val sortedTargetColumnsSet = TreeSet(queryColumns:_*)
-val sortedTargetColumns = sortedTargetColumnsSet.toSeq
 
 // NOTE: This is a trick to avoid pulling all of 
[[ColumnStatsIndexSupport]] object into the lambdas'
 //   closures below
 val indexedColumns = this.indexedColumns
+val indexedTargetColumns = 
sortedTargetColumnsSet.filter(indexedColumns.contains(_)).toSeq

Review Comment:
   Let's de-duplicate filtering and tie it up w/ index schema composition:
   
- Let's make `composeIndexSchema` return (schema, targetIndexedColumns)
- Let's move schema composition up here 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7914: [HUDI-5080] Unpersist only relevant RDDs instead of all

2023-02-15 Thread via GitHub


alexeykudinkin commented on code in PR #7914:
URL: https://github.com/apache/hudi/pull/7914#discussion_r1107408876


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java:
##
@@ -246,6 +246,7 @@ protected HoodieWriteMetadata> 
executeClustering(HoodieC
 .performClustering(clusteringPlan, schema, instantTime);
 HoodieData writeStatusList = writeMetadata.getWriteStatuses();
 HoodieData statuses = updateIndex(writeStatusList, 
writeMetadata);
+context.putCachedDataIds(config.getBasePath(), instantTime, 
statuses.getId());

Review Comment:
   HoodieData is already tightly coupled (1:1) with HoodieEngineContext so 
there's nothing shady about HD API accepting HEC.
   
   Current approach doesn't really make sense as it's extremely brittle -- we 
can't expect that someone will be aware of needing to register the RDD whenever 
they persist.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on pull request #7678: [HUDI-5562] Add maven wrapper

2023-02-15 Thread via GitHub


alexeykudinkin commented on PR #7678:
URL: https://github.com/apache/hudi/pull/7678#issuecomment-1431702493

   CI is green:
   
   https://user-images.githubusercontent.com/428277/219101720-11800f78-d9a9-4558-b5e4-382875a55f13.png";>
   
   
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=15092&view=results
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on pull request #7678: [HUDI-5562] Add maven wrapper

2023-02-15 Thread via GitHub


alexeykudinkin commented on PR #7678:
URL: https://github.com/apache/hudi/pull/7678#issuecomment-1431704512

   @wuzhenhua01 let's also update the docs to reflect that now `mvnw` should be 
invoked when building Hudi.
   
   Let's also update the CI scripts (both Github and Azure)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI

2023-02-15 Thread via GitHub


alexeykudinkin commented on code in PR #7885:
URL: https://github.com/apache/hudi/pull/7885#discussion_r1107417607


##
hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java:
##
@@ -19,22 +19,39 @@
 
 package org.apache.hudi.common.util;
 
+import com.fasterxml.jackson.databind.SerializationFeature;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import com.fasterxml.jackson.databind.util.StdDateFormat;
 import org.apache.hudi.exception.HoodieIOException;
 
 import com.fasterxml.jackson.annotation.JsonAutoDetect;
 import com.fasterxml.jackson.annotation.PropertyAccessor;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.DeserializationFeature;
 import com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;

Review Comment:
   We're using log4j internally



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1431755767

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * c464e6ae5497b67c2fe3a456cff434114b1f297b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15022)
 
   * 4fd80c1f9dee94d53d213069c1ede42b1571858d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7885:
URL: https://github.com/apache/hudi/pull/7885#issuecomment-1431755837

   
   ## CI report:
   
   * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN
   * 02e61304e85a8eb02e30c12e33b044529caac064 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15178)
 
   * b909e31094ec9bb91695ab0e34a9c55f2162c192 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7362:
URL: https://github.com/apache/hudi/pull/7362#issuecomment-1431764617

   
   ## CI report:
   
   * c758e27d4d99c5e88b1ab7fe77fb89131aebce4d Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15203)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7885:
URL: https://github.com/apache/hudi/pull/7885#issuecomment-1431765840

   
   ## CI report:
   
   * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN
   * 02e61304e85a8eb02e30c12e33b044529caac064 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15178)
 
   * b909e31094ec9bb91695ab0e34a9c55f2162c192 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15210)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

2023-02-15 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-5807:
-

 Summary: HoodieSparkParquetReader is not appending partition-path 
values
 Key: HUDI-5807
 URL: https://issues.apache.org/jira/browse/HUDI-5807
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark
Reporter: Alexey Kudinkin
 Fix For: 0.13.1


Current implementation of HoodieSparkParquetReader isn't supporting the case 
when "hoodie.datasource.write.drop.partition.columns" is set to true.

In that case partition-path values are expected to be parsed from 
partition-path and be injected w/in the File Reader (this is behavior of 
Spark's own readers)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

2023-02-15 Thread Alexey Kudinkin (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5807:
--
Affects Version/s: 0.13.0

> HoodieSparkParquetReader is not appending partition-path values
> ---
>
> Key: HUDI-5807
> URL: https://issues.apache.org/jira/browse/HUDI-5807
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.13.0
>Reporter: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.13.1
>
>
> Current implementation of HoodieSparkParquetReader isn't supporting the case 
> when "hoodie.datasource.write.drop.partition.columns" is set to true.
> In that case partition-path values are expected to be parsed from 
> partition-path and be injected w/in the File Reader (this is behavior of 
> Spark's own readers)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

2023-02-15 Thread Alexey Kudinkin (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689276#comment-17689276
 ] 

Alexey Kudinkin commented on HUDI-5807:
---

We should do this by rebasing HoodieSparkFileReader onto ParquetFileFormat (to 
make sure we're creating readers same way as we do w/ Spark itself)
{code:java}
val parquetFileFormat = SparkAdapterSupport$.MODULE$.sparkAdapter()
// TODO this should be based on the table config
.createHoodieParquetFileFormat(true)
.get(); {code}

> HoodieSparkParquetReader is not appending partition-path values
> ---
>
> Key: HUDI-5807
> URL: https://issues.apache.org/jira/browse/HUDI-5807
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 0.13.0
>Reporter: Alexey Kudinkin
>Priority: Blocker
> Fix For: 0.13.1
>
>
> Current implementation of HoodieSparkParquetReader isn't supporting the case 
> when "hoodie.datasource.write.drop.partition.columns" is set to true.
> In that case partition-path values are expected to be parsed from 
> partition-path and be injected w/in the File Reader (this is behavior of 
> Spark's own readers)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7881:
URL: https://github.com/apache/hudi/pull/7881#issuecomment-1431777039

   
   ## CI report:
   
   * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN
   * c464e6ae5497b67c2fe3a456cff434114b1f297b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15022)
 
   * 4fd80c1f9dee94d53d213069c1ede42b1571858d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15211)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bhasudha opened a new pull request, #7970: [DOCS] Change Config generator to generate subgroups

2023-02-15 Thread via GitHub


bhasudha opened a new pull request, #7970:
URL: https://github.com/apache/hudi/pull/7970

   Also make changes so required configs  bubble up to the top of the section.
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] bhasudha commented on pull request #7970: [DOCS] Change Config generator to generate subgroups

2023-02-15 Thread via GitHub


bhasudha commented on PR #7970:
URL: https://github.com/apache/hudi/pull/7970#issuecomment-1431796407

   
![config_generator_local_testing_screenshot](https://user-images.githubusercontent.com/2179254/219115743-ab7fdd3f-dea3-441f-8bdc-bf4113bc61d8.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7961: [HUDI-5802] Allow configuration for deletes in DefaultHoodieRecordPayload

2023-02-15 Thread via GitHub


hudi-bot commented on PR #7961:
URL: https://github.com/apache/hudi/pull/7961#issuecomment-1431832295

   
   ## CI report:
   
   * c8ac28edb845302c9f3afbc980b03782c0605564 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15204)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5477) Optimize timeline loading in Hudi sync client

2023-02-15 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5477:
-
Fix Version/s: 0.12.3

> Optimize timeline loading in Hudi sync client
> -
>
> Key: HUDI-5477
> URL: https://issues.apache.org/jira/browse/HUDI-5477
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: archiving, meta-sync
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.3
>
>
> The Hudi archived timeline is always loaded during the metastore sync process 
> if the last sync time is given. Besides, the archived timeline is not cached 
> inside the meta client if the start instant time is given. These cause 
> performance issues and read timeout on cloud storage due to rate limiting on 
> requests because of loading archived timeline from the storage, when the 
> archived timeline is huge, e.g., hundreds of log files in 
> {{.hoodie/archived}} folder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on issue #7969: [SUPPORT] data loss in new base file.

2023-02-15 Thread via GitHub


yihua commented on issue #7969:
URL: https://github.com/apache/hudi/issues/7969#issuecomment-1431873843

   Hi @wqwl611 thanks for raising this.  Could you clarify what kind of data 
loss do you observe (missing records, updates not applied, missing columns, 
etc.)?  Also, are there any failures or other table services running before the 
compaction happened, based on the Hudi timeline?
   
   To inspect the compaction plan, you may use [Hudi CLI compaction 
commands](https://hudi.apache.org/docs/cli#compactions) or directly check the 
requested compaction instant under `.hoodie/`(`avrocat 
.compaction.requested`).  If the compaction commit is archived, 
you may only look at the archived file for now to understand the compaction 
plan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-02-15 Thread via GitHub


kazdy commented on PR #7922:
URL: https://github.com/apache/hudi/pull/7922#issuecomment-1431876777

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #7960: [SUPPORT]

2023-02-15 Thread via GitHub


yihua commented on issue #7960:
URL: https://github.com/apache/hudi/issues/7960#issuecomment-1431890161

   Hi @clp007 thanks for the question.  Based on the stacktrace, this issue is 
not related to BigQuery sync.  Some arguments are missing for Deltastreamer, 
e.g., `--schemaprovider-class` (which causes `HoodieException: Please provide a 
valid schema provider class!`), `--source-class`.  Please check 
[this](https://hudi.apache.org/docs/hoodie_deltastreamer) for Deltastreamer 
configs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jonvex opened a new pull request, #7971: add support for kafka offsets

2023-02-15 Thread via GitHub


jonvex opened a new pull request, #7971:
URL: https://github.com/apache/hudi/pull/7971

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on issue #7953: [SUPPORT]Code pending on writing data to S3 using Flink datastream API,and the target path is empty.

2023-02-15 Thread via GitHub


yihua commented on issue #7953:
URL: https://github.com/apache/hudi/issues/7953#issuecomment-1431893619

   @danny0405 could you help here on the Hudi Flink setup?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5808) Add support for kafka offsets in various sources

2023-02-15 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-5808:
-

 Summary: Add support for kafka offsets in various sources
 Key: HUDI-5808
 URL: https://issues.apache.org/jira/browse/HUDI-5808
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler


Add support for kafka offsets in AvroKafkaSource and JsonKafkaSource



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yihua commented on issue #7909: Failed to create Marker file

2023-02-15 Thread via GitHub


yihua commented on issue #7909:
URL: https://github.com/apache/hudi/issues/7909#issuecomment-1431901388

   @koochiswathiTR this is likely caused by a concurrency bug handling marker 
creation requests at the timeline server, which is fixed by #6383, since 0.12.1 
release.  Are you able to try the new release?
   
   If the job remains on 0.11.1 release, you may set 
`hoodie.write.markers.type=DIRECT` to get unblocked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-02-15 Thread via GitHub


kazdy commented on PR #7922:
URL: https://github.com/apache/hudi/pull/7922#issuecomment-1431906206

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5809) Keep RFC-56 early conflict detection update to date

2023-02-15 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5809:

Story Points: 0.5

> Keep RFC-56 early conflict detection update to date
> ---
>
> Key: HUDI-5809
> URL: https://issues.apache.org/jira/browse/HUDI-5809
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.13.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5809) Keep RFC-56 early conflict detection update to date

2023-02-15 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5809:

Fix Version/s: 0.13.1

> Keep RFC-56 early conflict detection update to date
> ---
>
> Key: HUDI-5809
> URL: https://issues.apache.org/jira/browse/HUDI-5809
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5809) Keep RFC-56 early conflict detection update to date

2023-02-15 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5809:
---

 Summary: Keep RFC-56 early conflict detection update to date
 Key: HUDI-5809
 URL: https://issues.apache.org/jira/browse/HUDI-5809
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   >