Re: [PR] HIVE-28342: Iceberg: Major QB Compaction support filter in compaction… [hive]

via GitHub Tue, 10 Sep 2024 09:24:54 -0700


deniskuzZ commented on code in PR #5393:
URL: https://github.com/apache/hive/pull/5393#discussion_r1752283360



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -2069,30 +2070,56 @@ public List<FieldSchema> 
getPartitionKeys(org.apache.hadoop.hive.ql.metadata.Tab
   @Override
   public List<Partition> 
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Table hmsTable, 
ExprNodeDesc desc)
       throws SemanticException {
-    Table icebergTable = IcebergTableUtil.getTable(conf, hmsTable.getTTable());
-    PartitionSpec pSpec = icebergTable.spec();
-    PartitionsTable partitionsTable = (PartitionsTable) MetadataTableUtils
-            .createMetadataTableInstance(icebergTable, 
MetadataTableType.PARTITIONS);
-    SearchArgument sarg = ConvertAstToSearchArg.create(conf, 
(ExprNodeGenericFuncDesc) desc);
-    Expression expression = 
HiveIcebergFilterFactory.generateFilterExpression(sarg);
-    Set<PartitionData> partitionList = Sets.newHashSet();
-    ResidualEvaluator resEval = ResidualEvaluator.of(pSpec, expression, false);
-    try (CloseableIterable<FileScanTask> fileScanTasks = 
partitionsTable.newScan().planFiles()) {
-      fileScanTasks.forEach(task ->
-          
partitionList.addAll(Sets.newHashSet(CloseableIterable.transform(task.asDataTask().rows(),
 row -> {
-            StructProjection data = row.get(IcebergTableUtil.PART_IDX, 
StructProjection.class);
-            return IcebergTableUtil.toPartitionData(data, 
pSpec.partitionType());
-          })).stream()
-             .filter(partitionData -> 
resEval.residualFor(partitionData).isEquivalentTo(Expressions.alwaysTrue()))
-             .collect(Collectors.toSet())));
-
-
-      return partitionList.stream()
-        .map(partitionData -> new DummyPartition(hmsTable, 
pSpec.partitionToPath(partitionData)))
-        .collect(Collectors.toList());
+    return getPartitionsByExpr(hmsTable, desc, true);
+  }
+
+  @Override
+  public List<Partition> 
getPartitionsByExpr(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
+      ExprNodeDesc filter, boolean latestSpecOnly) throws SemanticException {
+    SearchArgument sarg = ConvertAstToSearchArg.create(conf, 
(ExprNodeGenericFuncDesc) filter);
+    Expression exp = HiveIcebergFilterFactory.generateFilterExpression(sarg);
+    Table table = IcebergTableUtil.getTable(conf, hmsTable.getTTable());
+    int tableSpecId = table.spec().specId();
+    List<Partition> partitions = Lists.newArrayList();
+
+    TableScan scan = 
table.newScan().filter(exp).caseSensitive(false).includeColumnStats().ignoreResiduals();
+
+    try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
+      tasks.forEach(task -> {
+        DataFile file = task.file();
+        PartitionSpec spec = task.spec();
+        if ((latestSpecOnly && file.specId() == tableSpecId) || 
(!latestSpecOnly && file.specId() != tableSpecId)) {
+          PartitionData partitionData = 
IcebergTableUtil.toPartitionData(task.partition(), spec.partitionType());
+          String partName = spec.partitionToPath(partitionData);
+          Map<String, String> partSpecMap = Maps.newLinkedHashMap();
+          Warehouse.makeSpecFromName(partSpecMap, new Path(partName), null);
+          DummyPartition partition = new DummyPartition(hmsTable, partName, 
partSpecMap);
+          if (!partitions.contains(partition)) {

Review Comment:
   no, `List<Partition> partitions` as you are. checking for dups. Also, since 
you are doing sort later, maybe use TreeSet?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28342: Iceberg: Major QB Compaction support filter in compaction… [hive]

Reply via email to