[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-07-03 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1251488891


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java:
##
@@ -227,9 +231,21 @@ public static HoodieWriteResult 
doWriteOperation(SparkRDDWriteClient client, Jav
 }
   }
 
-  public static HoodieWriteResult doDeleteOperation(SparkRDDWriteClient 
client, JavaRDD hoodieKeys,
-  String instantTime) {
-return new HoodieWriteResult(client.delete(hoodieKeys, instantTime));
+  public static HoodieWriteResult doDeleteOperation(SparkRDDWriteClient 
client, JavaRDD>> 
hoodieKeysAndLocations,
+  String instantTime, boolean isPrepped) {
+
+if (isPrepped) {
+  JavaRDD records = hoodieKeysAndLocations.map(tuple -> {
+HoodieRecord record = 
client.getConfig().getRecordMerger().getRecordType() == 
HoodieRecord.HoodieRecordType.AVRO

Review Comment:
   Fixed. Moved client outside of the `map` function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-29 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1247016425


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java:
##
@@ -41,13 +42,30 @@ public Option> 
merge(HoodieRecord older, Schema oldSc
 ValidationUtils.checkArgument(older.getRecordType() == 
HoodieRecordType.SPARK);
 ValidationUtils.checkArgument(newer.getRecordType() == 
HoodieRecordType.SPARK);
 
-if (newer.getData() == null) {
-  // Delete record
-  return Option.empty();
+if (newer instanceof HoodieSparkRecord) {
+  HoodieSparkRecord newSparkRecord = (HoodieSparkRecord) newer;
+  if (newSparkRecord.isDeleted()) {
+// Delete record
+return Option.empty();
+  }
+} else {
+  if (newer.getData() == null) {

Review Comment:
   Test case failures occur in `TestMORDataSource` (`testPayloadDelete` for 
example) where test cases fail with following exception:
   
   ```
   1284819 [Executor task launch worker for task 2.0 in stage 107.0 (TID 136)] 
ERROR org.apache.spark.executor.Executor [] - Exception in task 2.0 in stage 
107.0 (TID 136)
   java.lang.ClassCastException: org.apache.hudi.common.model.HoodieEmptyRecord 
cannot be cast to org.apache.hudi.common.model.HoodieSparkRecord
at 
org.apache.hudi.HoodieSparkRecordMerger.merge(HoodieSparkRecordMerger.java:45) 
~[classes/:?]
at org.apache.hudi.RecordMergingFileIterator.merge(Iterators.scala:241) 
~[classes/:?]
at 
org.apache.hudi.RecordMergingFileIterator.hasNextInternal(Iterators.scala:218) 
~[classes/:?]
at 
org.apache.hudi.RecordMergingFileIterator.doHasNext(Iterators.scala:203) 
~[classes/:?]
at 
org.apache.hudi.util.CachingIterator.hasNext(CachingIterator.scala:36) 
~[classes/:?]
at 
org.apache.hudi.util.CachingIterator.hasNext$(CachingIterator.scala:36) 
~[classes/:?]
at org.apache.hudi.LogFileIterator.hasNext(Iterators.scala:61) 
~[classes/:?]
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown
 Source) ~[?:?]
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source) ~[?:?]
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
 ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) 
~[scala-library-2.12.15.jar:?]
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
 ~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
 ~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.scheduler.Task.run(Task.scala:136) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
 ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) 
~[spark-core_2.12-3.3.1.jar:3.3.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_372]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]```

   The HoodieEmptyRecord that is leading to `ClassCastException` is being 
created in `HoodieMergedLogRecordScanner.java` Line 295
   ```// Put the DELETE record
   if (recordType == HoodieRecordType.AVRO) {
 records.put(key, SpillableMapUtils.generateEmptyPayload(key,
 deleteRecord.getPartitionPath(), deleteRecord.getOrderingValue(), 
getPayloadClassFQN()));
   } else {
 HoodieEmptyRecord record = new HoodieEmptyRecord<>(new HoodieKey(key, 
deleteRecord.getPartitionPath()), null, deleteRecord.getOrderingValue(), 
recordType);
 records.put(key, record);
   }
   ```
   
   Based on offline discussion, we decided to continue with `instanceof` check 
before casting to `HoodieSparkRecord`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this 

[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-28 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1245840481


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java:
##
@@ -41,13 +42,30 @@ public Option> 
merge(HoodieRecord older, Schema oldSc
 ValidationUtils.checkArgument(older.getRecordType() == 
HoodieRecordType.SPARK);
 ValidationUtils.checkArgument(newer.getRecordType() == 
HoodieRecordType.SPARK);
 
-if (newer.getData() == null) {
-  // Delete record
-  return Option.empty();
+if (newer instanceof HoodieSparkRecord) {
+  HoodieSparkRecord newSparkRecord = (HoodieSparkRecord) newer;
+  if (newSparkRecord.isDeleted()) {
+// Delete record
+return Option.empty();
+  }
+} else {
+  if (newer.getData() == null) {

Review Comment:
   Not able to see this failure in the usual test cases (TestDelete, 
TestUpdate, TestDeleteFrom) that I am running locally for validating 
functionality. The failure might have been in one of the test cases that are 
run through github. I will revert the change to what I had earlier and see if 
build passes through github.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-28 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1245827119


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkDeletePreppedCommitActionExecutor.java:
##
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.commit;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.io.HoodieWriteHandle;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.HoodieWriteMetadata;
+
+import java.util.List;
+
+/**
+ * Flink upsert prepped commit action executor.
+ */
+public class FlinkDeletePreppedCommitActionExecutor extends 
BaseFlinkCommitActionExecutor {
+
+  private final List> preppedRecords;
+
+  public FlinkDeletePreppedCommitActionExecutor(HoodieEngineContext context,

Review Comment:
   Created ticket (https://issues.apache.org/jira/browse/HUDI-6455).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-23 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1240446113


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -254,11 +254,26 @@ object HoodieSparkSqlWriter {
 
   val (writeResult, writeClient: SparkRDDWriteClient[_]) =
 operation match {
-  case WriteOperationType.DELETE =>
+  case WriteOperationType.DELETE | WriteOperationType.DELETE_PREPPED =>
 val genericRecords = HoodieSparkUtils.createRdd(df, 
avroRecordName, avroRecordNamespace)
 // Convert to RDD[HoodieKey]
-val keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(new 
TypedProperties(hoodieConfig.getProps))
-val hoodieKeysToDelete = genericRecords.map(gr => 
keyGenerator.getKey(gr)).toJavaRDD()
+val isPrepped = 
hoodieConfig.getBooleanOrDefault(DATASOURCE_WRITE_PREPPED_KEY, false)
+val keyGenerator: Option[BaseKeyGenerator] = if (isPrepped) {
+  None
+} else {
+  Some(HoodieSparkKeyGeneratorFactory.createKeyGenerator(new 
TypedProperties(hoodieConfig.getProps))
+.asInstanceOf[BaseKeyGenerator])
+}
+
+var validatePreppedRecord = true

Review Comment:
   Replaced map with mapPartitions to avoid multiple calls to validate function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1239066715


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java:
##
@@ -153,6 +154,27 @@ public HoodieWriteMetadata> delete(
 return new FlinkDeleteCommitActionExecutor<>(context, writeHandle, config, 
this, instantTime, keys).execute();
   }
 
+  /**
+   * Delete the given prepared records from the Hoodie table, at the supplied 
instantTime.
+   *
+   * This implementation requires that the input records are already 
tagged, and de-duped if needed.
+   *
+   * Specifies the write handle explicitly in order to have fine-grained 
control with
+   * the underneath file.
+   *
+   * @param context {@link HoodieEngineContext}
+   * @param instantTime Instant Time for the action
+   * @param preppedRecords Hoodie records to delete
+   * @return {@link HoodieWriteMetadata}
+   */
+  public HoodieWriteMetadata> deletePrepped(

Review Comment:
   This also applies to other operations such as upsert. One set of methods are 
called from `HoodieFlinkWriteClient` by casting the object to 
`HoodieFlinkTable`. The other set of methods are derived from `HoodieTable`. I 
added this implementation based on existing `upsert`, `upsertPrepped`, and 
`delete` methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1239066715


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java:
##
@@ -153,6 +154,27 @@ public HoodieWriteMetadata> delete(
 return new FlinkDeleteCommitActionExecutor<>(context, writeHandle, config, 
this, instantTime, keys).execute();
   }
 
+  /**
+   * Delete the given prepared records from the Hoodie table, at the supplied 
instantTime.
+   *
+   * This implementation requires that the input records are already 
tagged, and de-duped if needed.
+   *
+   * Specifies the write handle explicitly in order to have fine-grained 
control with
+   * the underneath file.
+   *
+   * @param context {@link HoodieEngineContext}
+   * @param instantTime Instant Time for the action
+   * @param preppedRecords Hoodie records to delete
+   * @return {@link HoodieWriteMetadata}
+   */
+  public HoodieWriteMetadata> deletePrepped(

Review Comment:
   This also applies to other operations such as upsert. One set of methods are 
called from `HoodieFlinkWriteClient` by casting the object to 
`HoodieFlinkTable`. The other set of methods are derived from `HoodieTable`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1239018220


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkMergeOnReadTable.java:
##
@@ -105,6 +106,11 @@ public HoodieWriteMetadata> 
delete(HoodieEngineContext c
 return new 
SparkDeleteDeltaCommitActionExecutor<>((HoodieSparkEngineContext) context, 
config, this, instantTime, keys).execute();
   }
 
+  @Override
+  public HoodieWriteMetadata> 
deletePrepped(HoodieEngineContext context, String instantTime, 
HoodieData> preppedRecords) {

Review Comment:
   Added back `deletePrepped` and `SparkDeletePreppedCommitActionExecutor` as 
`TestSqlStatement` started failing for MOR table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1239017645


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/SparkDeletePreppedDeltaCommitActionExecutor.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.deltacommit;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.HoodieWriteMetadata;
+
+public class SparkDeletePreppedDeltaCommitActionExecutor
+extends BaseSparkDeltaCommitActionExecutor {
+
+  private final HoodieData> preppedRecords;
+
+  public SparkDeletePreppedDeltaCommitActionExecutor(HoodieSparkEngineContext 
context,

Review Comment:
   Added back `SparkDeletePreppedCommitActionExecutor` as `TestSqlStatement` 
started failing for MOR table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1238060345


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##
@@ -254,11 +254,26 @@ object HoodieSparkSqlWriter {
 
   val (writeResult, writeClient: SparkRDDWriteClient[_]) =
 operation match {
-  case WriteOperationType.DELETE =>
+  case WriteOperationType.DELETE | WriteOperationType.DELETE_PREPPED =>
 val genericRecords = HoodieSparkUtils.createRdd(df, 
avroRecordName, avroRecordNamespace)
 // Convert to RDD[HoodieKey]
-val keyGenerator = 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(new 
TypedProperties(hoodieConfig.getProps))
-val hoodieKeysToDelete = genericRecords.map(gr => 
keyGenerator.getKey(gr)).toJavaRDD()
+val isPrepped = 
hoodieConfig.getBooleanOrDefault(DATASOURCE_WRITE_PREPPED_KEY, false)
+val keyGenerator: Option[BaseKeyGenerator] = if (isPrepped) {
+  None
+} else {
+  Some(HoodieSparkKeyGeneratorFactory.createKeyGenerator(new 
TypedProperties(hoodieConfig.getProps))
+.asInstanceOf[BaseKeyGenerator])
+}
+
+var validatePreppedRecord = true

Review Comment:
   As far as I can tell, an `AtomicBoolean` would probably end up serializing 
parallel execution here. With regular boolean, validation may occur more than 
once but it probably won't occur for every record. If there is no boolean then 
every record will get validated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1238043064


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCreateRecordUtils.scala:
##
@@ -194,7 +192,7 @@ object HoodieCreateRecordUtils {
 true
   }
 
-  private def validateMetaFieldsInAvroRecords(avroRec: GenericRecord): Unit = {
+  def validateMetaFieldsInAvroRecords(avroRec: GenericRecord): Unit = {

Review Comment:
   For DELETE, this function is being called by `HoodieSparkSqlWriter.scala`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1238042006


##
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieSparkRecordMerger.java:
##
@@ -41,13 +42,30 @@ public Option> 
merge(HoodieRecord older, Schema oldSc
 ValidationUtils.checkArgument(older.getRecordType() == 
HoodieRecordType.SPARK);
 ValidationUtils.checkArgument(newer.getRecordType() == 
HoodieRecordType.SPARK);
 
-if (newer.getData() == null) {
-  // Delete record
-  return Option.empty();
+if (newer instanceof HoodieSparkRecord) {
+  HoodieSparkRecord newSparkRecord = (HoodieSparkRecord) newer;
+  if (newSparkRecord.isDeleted()) {
+// Delete record
+return Option.empty();
+  }
+} else {
+  if (newer.getData() == null) {

Review Comment:
   There was a test case failing because the record here wasn't 
`HoodieSparkRecord`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1238039779


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkMergeOnReadTable.java:
##
@@ -105,6 +106,11 @@ public HoodieWriteMetadata> 
delete(HoodieEngineContext c
 return new 
SparkDeleteDeltaCommitActionExecutor<>((HoodieSparkEngineContext) context, 
config, this, instantTime, keys).execute();
   }
 
+  @Override
+  public HoodieWriteMetadata> 
deletePrepped(HoodieEngineContext context, String instantTime, 
HoodieData> preppedRecords) {

Review Comment:
   Removed `deletePrepped` method here. Also removed 
`SparkDeletePreppedCommitActionExecutor`.



##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/SparkDeletePreppedDeltaCommitActionExecutor.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.deltacommit;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.client.common.HoodieSparkEngineContext;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.HoodieTable;
+import org.apache.hudi.table.action.HoodieWriteMetadata;
+
+public class SparkDeletePreppedDeltaCommitActionExecutor
+extends BaseSparkDeltaCommitActionExecutor {
+
+  private final HoodieData> preppedRecords;
+
+  public SparkDeletePreppedDeltaCommitActionExecutor(HoodieSparkEngineContext 
context,

Review Comment:
   Removed `SparkDeletePreppedCommitActionExecutor`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1238035397


##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/DeleteHoodieTableCommand.scala:
##
@@ -38,15 +39,14 @@ case class DeleteHoodieTableCommand(dft: DeleteFromTable) 
extends HoodieLeafRunn
 logInfo(s"Executing 'DELETE FROM' command for $tableId")
 
 val condition = sparkAdapter.extractDeleteCondition(dft)
-
-val targetLogicalPlan = stripMetaFieldAttributes(dft.table)
 val filteredPlan = if (condition != null) {
-  Filter(condition, targetLogicalPlan)
+  Filter(condition, dft.table)
 } else {
-  targetLogicalPlan
+  dft.table
 }
 
-val config = buildHoodieDeleteTableConfig(catalogTable, sparkSession)
+var config = buildHoodieDeleteTableConfig(catalogTable, sparkSession)

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] amrishlal commented on a diff in pull request #8978: [HUDI-6315] Optimize DELETE codepath to use meta fields instead of key generation and index lookup

2023-06-22 Thread via GitHub


amrishlal commented on code in PR #8978:
URL: https://github.com/apache/hudi/pull/8978#discussion_r1238036525


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/common/model/HoodieSparkRecord.java:
##
@@ -83,6 +83,7 @@ public class HoodieSparkRecord extends 
HoodieRecord {
*/
   private final transient StructType schema;
 
+  private boolean isDeleted;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org