[GitHub] [hive] kasakrisz commented on a diff in pull request #4571: HIVE-27627: Iceberg: Insert into/overwrite partition support

via GitHub Wed, 16 Aug 2023 05:48:34 -0700


kasakrisz commented on code in PR #4571:
URL: https://github.com/apache/hive/pull/4571#discussion_r1295826220



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -7493,9 +7507,17 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
 
       // Add NOT NULL constraint check
       input = genConstraintsPlan(dest, qb, input);
+      if (destinationTable.getStorageHandler() != null && 
destinationTable.getStorageHandler().alwaysUnpartitioned()) {
+        partSpec = qbm.getPartSpecForAlias(dest);
+      }
 
       if (!qb.getIsQuery()) {
-        input = genConversionSelectOperator(dest, qb, input, 
destinationTable.getDeserializer(), dpCtx, null, destinationTable);
+        if (!updating(dest) && !deleting(dest)) {
+          input = genConversionSelectOperatorByAddPartition(dest, qb, input,
+                  destinationTable.getDeserializer(), destinationTable, 
partSpec);
+        }
+        input = genConversionSelectOperator(dest, qb, input, 
destinationTable.getDeserializer(), dpCtx, null,
+                destinationTable);

Review Comment:
   Two conversion operators are generated if this is not a delete nor an update 
statement. Is it necessary?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -7906,6 +7945,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
     if (!(destType == QBMetaData.DEST_DFS_FILE && qb.getIsQuery())
             && destinationTable != null && 
destinationTable.getStorageHandler() != null) {
       try {
+        if (!updating(dest) && !deleting(dest)) {
+          input = genConversionSelectOperatorByAddPartition(dest, qb, input, 
destinationTable.getDeserializer(),
+                  destinationTable, partSpec);
+        }
         input = genConversionSelectOperator(
                 dest, qb, input, tableDescriptor.getDeserializer(conf), dpCtx, 
null, destinationTable);

Review Comment:
   Two conversion SelectOperators. Are they necessary?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -7563,9 +7585,26 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
       ltd.setMoveTaskId(moveTaskId);
 
       loadTableWork.add(ltd);
-      if (!outputs.add(new WriteEntity(destinationPartition, 
determineWriteType(ltd, dest)))) {
-        throw new SemanticException(ErrorMsg.OUTPUT_SPECIFIED_MULTIPLE_TIMES
-            .getMsg(destinationTable.getTableName() + "@" + 
destinationPartition.getName()));
+
+      if (destinationTable.getStorageHandler() != null && 
destinationTable.getStorageHandler().alwaysUnpartitioned()) {
+        // HMS does not know about this partition
+        // but the underlying storage format knows about it.
+        DummyPartition dummyPartition;
+        try {
+          String partName = Warehouse.makePartName(partSpec, false);
+          dummyPartition = new DummyPartition(destinationTable, partName, 
partSpec);
+        } catch (MetaException e) {
+          throw new SemanticException("Unable to construct name for dummy 
partition due to: ", e);
+        }
+        if (!outputs.add(new WriteEntity(dummyPartition, 
determineWriteType(ltd, dest)))) {
+          throw new SemanticException(ErrorMsg.OUTPUT_SPECIFIED_MULTIPLE_TIMES
+                  .getMsg(destinationTable.getTableName() + "@" + 
destinationPartition.getName()));

Review Comment:
   Should it be `dummyPartition` here?
   ```
    throw new SemanticException(ErrorMsg.OUTPUT_SPECIFIED_MULTIPLE_TIMES
                     .getMsg(destinationTable.getTableName() + "@" + 
destinationPartition.getName()));
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -8595,6 +8634,94 @@ These props are now enabled elsewhere (see commit 
diffs).  It would be better in
     SessionState.get().getConf().set(AcidUtils.CONF_ACID_KEY, "true");
   }
 
+  private Operator genConversionSelectOperatorByAddPartition(String dest, QB 
qb, Operator input,
+      Deserializer deserializer, Table table, Map<String, String> 
partitionSpec) throws SemanticException {
+    StructObjectInspector oi = null;
+    try {
+      oi = (StructObjectInspector) deserializer.getObjectInspector();
+    } catch (Exception e) {
+      throw new SemanticException(e);
+    }
+
+    // Check column number
+    List<? extends StructField> tableFields = oi.getAllStructFieldRefs();
+    List<ColumnInfo> rowFields = 
opParseCtx.get(input).getRowResolver().getColumnInfos();
+    int inColumnCnt = rowFields.size();
+    int outColumnCnt = tableFields.size();
+    List<ExprNodeDesc> expressions = new ArrayList<>(outColumnCnt);
+
+    // if target table is always unpartitioned, then the output object 
inspector will already contain the partition cols
+    // too, therefore we shouldn't add the partition col num to the output col 
num
+    boolean alreadyContainsPartCols = Optional.ofNullable(table)
+        .map(Table::getStorageHandler)
+        .map(HiveStorageHandler::alwaysUnpartitioned)
+        .orElse(Boolean.FALSE);

Review Comment:
   Can `table` be `null`?
   
   This method is called like
   ```
   input = genConversionSelectOperatorByAddPartition(dest, qb, input, 
destinationTable.getDeserializer(),
                     destinationTable, partSpec);
   ```
   
   If `destinationTable` is `null` the call 
`destinationTable.getDeserializer()` throws NPE.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -8595,6 +8634,94 @@ These props are now enabled elsewhere (see commit 
diffs).  It would be better in
     SessionState.get().getConf().set(AcidUtils.CONF_ACID_KEY, "true");
   }
 
+  private Operator genConversionSelectOperatorByAddPartition(String dest, QB 
qb, Operator input,
+      Deserializer deserializer, Table table, Map<String, String> 
partitionSpec) throws SemanticException {
+    StructObjectInspector oi = null;
+    try {
+      oi = (StructObjectInspector) deserializer.getObjectInspector();
+    } catch (Exception e) {
+      throw new SemanticException(e);
+    }
+
+    // Check column number
+    List<? extends StructField> tableFields = oi.getAllStructFieldRefs();
+    List<ColumnInfo> rowFields = 
opParseCtx.get(input).getRowResolver().getColumnInfos();
+    int inColumnCnt = rowFields.size();
+    int outColumnCnt = tableFields.size();
+    List<ExprNodeDesc> expressions = new ArrayList<>(outColumnCnt);
+
+    // if target table is always unpartitioned, then the output object 
inspector will already contain the partition cols
+    // too, therefore we shouldn't add the partition col num to the output col 
num
+    boolean alreadyContainsPartCols = Optional.ofNullable(table)
+        .map(Table::getStorageHandler)
+        .map(HiveStorageHandler::alwaysUnpartitioned)
+        .orElse(Boolean.FALSE);
+
+    AtomicBoolean convert = new AtomicBoolean(false);
+
+    if (inColumnCnt < outColumnCnt && alreadyContainsPartCols && partitionSpec 
!= null) {

Review Comment:
   Could you please invert this if?
   ```
   if (!(inColumnCnt < outColumnCnt && alreadyContainsPartCols && partitionSpec 
!= null)) {
     return input;
   }
   int rowNum = 0;
   ...
   ```
   It reduces indentation and improve readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kasakrisz commented on a diff in pull request #4571: HIVE-27627: Iceberg: Insert into/overwrite partition support

Reply via email to