[GitHub] [hive] zhangbutao commented on a diff in pull request #4392: HIVE-27244: Iceberg: Implement LOAD data for unpartitioned table via Append API.

via GitHub Wed, 07 Jun 2023 07:33:50 -0700


zhangbutao commented on code in PR #4392:
URL: https://github.com/apache/hive/pull/4392#discussion_r1221617178



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java:
##########
@@ -295,16 +296,25 @@ private void analyzeLoad(ASTNode ast) throws 
SemanticException {
       throw new SemanticException(ErrorMsg.DML_AGAINST_VIEW.getMsg());
     }
     if (ts.tableHandle.isNonNative()) {
-      // launch a tez job
-      StorageFormatDescriptor ss =
-          
ts.tableHandle.getStorageHandler().getStorageFormatDescriptor(ts.tableHandle.getTTable());
-      if (ss != null) {
-        inputFormatClassName = ss.getInputFormat();
-        serDeClassName = ss.getSerde();
-        reparseAndSuperAnalyze(ts.tableHandle, fromURI);
+      HiveStorageHandler storageHandler = ts.tableHandle.getStorageHandler();
+      if (storageHandler.supportsAppendData(ts.tableHandle.getTTable())) {

Review Comment:
   I am thinking if we should add a propertity to allow loading iceberg data by 
launching tez job? 
   
   Because if the data is huge, loading task maybe consume much HiveServer2's  
resource(memory & cpu) or maybe too time-consuming. In this case, user may want 
to launch a tez task to load huge amounts of data into iceberg table.
   



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveTableUtil.java:
##########
@@ -68,6 +74,7 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+

Review Comment:
   Is the new line added intentionally?



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java:
##########
@@ -157,6 +159,19 @@ public LoadTableDesc(final Path sourcePath,
     }
   }
 
+  public LoadTableDesc(Path path, Table tableHandle, boolean isOverWrite, 
boolean isIcebergLoad) {
+    super(path, AcidUtils.Operation.NOT_ACID);
+    this.mdTable = tableHandle;
+    this.isIcebergLoad = isIcebergLoad;
+    this.loadFileType = isOverWrite ? LoadFileType.REPLACE_ALL : 
LoadFileType.KEEP_EXISTING;
+    this.table = Utilities.getTableDesc(tableHandle);
+  }
+
+  @Explain(displayName = "Iceberg Table Load")
+  public boolean isIcebergLoad() {

Review Comment:
   And i also think the naming is too iceberg specific.  Although maybe only 
iceberg handler can load  data by this way right now, we can not make sure 
other storage handler can also do this.  Therefore, could you rename it to 
express more generic?



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java:
##########
@@ -157,6 +159,19 @@ public LoadTableDesc(final Path sourcePath,
     }
   }
 
+  public LoadTableDesc(Path path, Table tableHandle, boolean isOverWrite, 
boolean isIcebergLoad) {
+    super(path, AcidUtils.Operation.NOT_ACID);
+    this.mdTable = tableHandle;
+    this.isIcebergLoad = isIcebergLoad;
+    this.loadFileType = isOverWrite ? LoadFileType.REPLACE_ALL : 
LoadFileType.KEEP_EXISTING;
+    this.table = Utilities.getTableDesc(tableHandle);
+  }
+
+  @Explain(displayName = "Iceberg Table Load")

Review Comment:
   
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4392/1/tests/
 Qtests failed with the added displayName ` Iceberg Table Load: false`. IMHO, 
we no need to display the field status, and i think maybe no one care what it 
is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] zhangbutao commented on a diff in pull request #4392: HIVE-27244: Iceberg: Implement LOAD data for unpartitioned table via Append API.

Reply via email to