[GitHub] [hive] kgyrtkirk commented on a change in pull request #2137: HIVE-24962: Implement partition pruning for Iceberg tables

GitBox Mon, 12 Apr 2021 03:52:16 -0700


kgyrtkirk commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r611516875




##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java
##########
@@ -171,7 +171,7 @@ private void prepare(InputInitializerContext 
initializerContext) throws IOExcept
       // perform dynamic partition pruning
       if (pruner != null) {
         pruner.initialize(getContext(), work, jobConf);
-        pruner.prune();
+        pruner.prune(jobConf);

Review comment:
       note: `jobConf` was already shown to the `pruner` in the initialize

##########
File path: 
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -250,4 +305,74 @@ static void overlayTableProperties(Configuration 
configuration, TableDesc tableD
     // this is an exception to the interface documentation, but it's a safe 
operation to add this property
     props.put(InputFormatConfig.TABLE_SCHEMA, schemaJson);
   }
+
+  /**
+   * Recursively collects the column names from the predicate.
+   * @param node The node we are traversing
+   * @param columns The already collected column names
+   */
+  private void columns(ExprNodeDesc node, Collection<String> columns) {

Review comment:
       why is this a `Collection` - do we need to collect the same column 
multiple times?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicPartitionPruner.java
##########
@@ -514,4 +569,29 @@ private boolean checkForSourceCompletion(String name) {
     }
     return false;
   }
+
+  /**
+   * Recursively replaces the ExprNodeDynamicListDesc to the list of the 
actual values. As a result of this call the
+   * original expression is modified so it can be used for pushing down to the 
TableScan for filtering the data at the
+   * source.
+   * <p>
+   * Please make sure to clone the predicate if needed since the original node 
will be modified.
+   * @param node The node we are traversing
+   * @param dynArgs The constant values we are substituting
+   */
+  private void replaceDynamicLists(ExprNodeDesc node, 
Collection<ExprNodeConstantDesc> dynArgs) {
+    List<ExprNodeDesc> children = node.getChildren();
+    if (children != null && !children.isEmpty()) {
+      ListIterator<ExprNodeDesc> iterator = node.getChildren().listIterator();
+      while (iterator.hasNext()) {
+        ExprNodeDesc child = iterator.next();
+        if (child instanceof ExprNodeDynamicListDesc) {
+          iterator.remove();
+          dynArgs.forEach(iterator::add);

Review comment:
       I strongly suspect that this method is problematic; what will happen if 
you have filter for 2 different columns or 2 different values?
   ```
   a IN L1 and b IN L2
   ```
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
##########
@@ -890,6 +894,24 @@ public static void pushFilters(JobConf jobConf, 
TableScanOperator tableScan,
     Utilities.setColumnTypeList(jobConf, tableScan);
     // push down filters
     ExprNodeGenericFuncDesc filterExpr = scanDesc.getFilterExpr();
+    String pruningFilter = jobConf.get(TableScanDesc.PARTITION_PRUNING_FILTER);
+    // If we have a pruning filter then combine it with the original
+    if (pruningFilter != null) {
+      ExprNodeGenericFuncDesc pruningExpr = 
SerializationUtilities.deserializeExpression(pruningFilter);
+      if (filterExpr != null) {
+        // Combine the 2 filters with AND
+        filterExpr = new 
ExprNodeGenericFuncDesc(TypeInfoFactory.booleanTypeInfo, new GenericUDFOPAnd(), 
"and",

Review comment:
       note: you could probably use `ExprNodeDescUtils#conjunction` (or move 
this method there...)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kgyrtkirk commented on a change in pull request #2137: HIVE-24962: Implement partition pruning for Iceberg tables

Reply via email to