yihua commented on code in PR #9083: URL: https://github.com/apache/hudi/pull/9083#discussion_r1251283969
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieInternalProxyIndex.java: ########## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.index; + +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.common.data.HoodieData; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieIndexException; +import org.apache.hudi.table.HoodieTable; + +public class HoodieInternalProxyIndex extends HoodieIndex<Object, Object> { + + /** + * Index that does not do tagging. Its purpose is to be used for Spark sql Merge into command + * Merge into does not need to use index lookup because we get the location from the meta columns + * from the join + */ + public HoodieInternalProxyIndex(HoodieWriteConfig config) { + super(config); + } + + @Override + public <R> HoodieData<HoodieRecord<R>> tagLocation(HoodieData<HoodieRecord<R>> records, HoodieEngineContext context, HoodieTable hoodieTable) throws HoodieIndexException { + return records; + } + + @Override + public HoodieData<WriteStatus> updateLocation(HoodieData<WriteStatus> writeStatuses, HoodieEngineContext context, HoodieTable hoodieTable) throws HoodieIndexException { + return writeStatuses; + } + + @Override + public boolean rollbackCommit(String instantTime) { + return false; + } + + @Override + public boolean isGlobal() { + return false; Review Comment: If the global version is to be implemented, does the user need to simply set a config and we return the `true` here for the global index? Since the record location is already known from the meta column, how does the global/non-global part come into play here? ########## hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/hudi/SparkAdapter.scala: ########## @@ -202,4 +203,12 @@ trait SparkAdapter extends Serializable { * Converts instance of [[StorageLevel]] to a corresponding string */ def convertStorageLevelToString(level: StorageLevel): String + + /** + * Calls fail analysis on + * + */ + def failAnalysisForMIT(a: Attribute, cols: String): Unit = {} + + def createMITJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition: Option[Expression], hint: String): LogicalPlan Review Comment: nit: Put these into `HoodieCatalystPlansUtils`? ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieInternalConfig.java: ########## @@ -46,6 +46,13 @@ public class HoodieInternalConfig extends HoodieConfig { .withDocumentation("For SQL operations, if enables bulk_insert operation, " + "this configure will take effect to decide overwrite whole table or partitions specified"); + public static final ConfigProperty<String> SQL_MERGE_INTO_WRITES = ConfigProperty + .key("hoodie.internal.sql.merge.into.writes") + .defaultValue("false") + .markAdvanced() + .withDocumentation("For merge into from spark-sql, we need some special handling. for eg, schema " Review Comment: nit: add `.sinceVersion("0.14.0")` ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieInternalConfig.java: ########## @@ -46,6 +46,13 @@ public class HoodieInternalConfig extends HoodieConfig { .withDocumentation("For SQL operations, if enables bulk_insert operation, " + "this configure will take effect to decide overwrite whole table or partitions specified"); + public static final ConfigProperty<String> SQL_MERGE_INTO_WRITES = ConfigProperty + .key("hoodie.internal.sql.merge.into.writes") + .defaultValue("false") + .markAdvanced() + .withDocumentation("For merge into from spark-sql, we need some special handling. for eg, schema " + + "validation should be disabled for writes from merge into. As well as reuse of meta cols for keygen and skip tagging"); Review Comment: Could you add sth around: `This internal config is used by Merge Into SQL logic only to mark such use case and let the core components know it should handle the write differently`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org