[GitHub] [hudi] yihua commented on a diff in pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables

via GitHub Mon, 03 Jul 2023 14:21:10 -0700


yihua commented on code in PR #9083:
URL: https://github.com/apache/hudi/pull/9083#discussion_r1251283969



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieInternalProxyIndex.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.index;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIndexException;
+import org.apache.hudi.table.HoodieTable;
+
+public class HoodieInternalProxyIndex extends HoodieIndex<Object, Object> {
+
+  /**
+   * Index that does not do tagging. Its purpose is to be used for Spark sql 
Merge into command
+   * Merge into does not need to use index lookup because we get the location 
from the meta columns
+   * from the join
+   */
+  public HoodieInternalProxyIndex(HoodieWriteConfig config) {
+    super(config);
+  }
+
+  @Override
+  public <R> HoodieData<HoodieRecord<R>> 
tagLocation(HoodieData<HoodieRecord<R>> records, HoodieEngineContext context, 
HoodieTable hoodieTable) throws HoodieIndexException {
+    return records;
+  }
+
+  @Override
+  public HoodieData<WriteStatus> updateLocation(HoodieData<WriteStatus> 
writeStatuses, HoodieEngineContext context, HoodieTable hoodieTable) throws 
HoodieIndexException {
+    return writeStatuses;
+  }
+
+  @Override
+  public boolean rollbackCommit(String instantTime) {
+    return false;
+  }
+
+  @Override
+  public boolean isGlobal() {
+    return false;

Review Comment:
   If the global version is to be implemented, does the user need to simply set 
a config and we return the `true` here for the global index?  Since the record 
location is already known from the meta column, how does the global/non-global 
part come into play here?



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/SparkAdapter.scala:
##########
@@ -202,4 +203,12 @@ trait SparkAdapter extends Serializable {
    * Converts instance of [[StorageLevel]] to a corresponding string
    */
   def convertStorageLevelToString(level: StorageLevel): String
+
+  /**
+   * Calls fail analysis on
+   *
+   */
+  def failAnalysisForMIT(a: Attribute, cols: String): Unit = {}
+
+  def createMITJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, 
condition: Option[Expression], hint: String): LogicalPlan

Review Comment:
   nit: Put these into `HoodieCatalystPlansUtils`?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieInternalConfig.java:
##########
@@ -46,6 +46,13 @@ public class HoodieInternalConfig extends HoodieConfig {
       .withDocumentation("For SQL operations, if enables bulk_insert 
operation, "
           + "this configure will take effect to decide overwrite whole table 
or partitions specified");
 
+  public static final ConfigProperty<String> SQL_MERGE_INTO_WRITES = 
ConfigProperty
+      .key("hoodie.internal.sql.merge.into.writes")
+      .defaultValue("false")
+      .markAdvanced()
+      .withDocumentation("For merge into from spark-sql, we need some special 
handling. for eg, schema "

Review Comment:
   nit: add `.sinceVersion("0.14.0")`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieInternalConfig.java:
##########
@@ -46,6 +46,13 @@ public class HoodieInternalConfig extends HoodieConfig {
       .withDocumentation("For SQL operations, if enables bulk_insert 
operation, "
           + "this configure will take effect to decide overwrite whole table 
or partitions specified");
 
+  public static final ConfigProperty<String> SQL_MERGE_INTO_WRITES = 
ConfigProperty
+      .key("hoodie.internal.sql.merge.into.writes")
+      .defaultValue("false")
+      .markAdvanced()
+      .withDocumentation("For merge into from spark-sql, we need some special 
handling. for eg, schema "
+          + "validation should be disabled for writes from merge into. As well 
as reuse of meta cols for keygen and skip tagging");

Review Comment:
   Could you add sth around: `This internal config is used by Merge Into SQL 
logic only to mark such use case and let the core components know it should 
handle the write differently`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on a diff in pull request #9083: [HUDI-6464] Spark SQL Merge Into for pkless tables

Reply via email to