Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20515 )

Change subject: IMPALA-12243: Add support for DROP PARTITION for Iceberg tables
......................................................................


Patch Set 6:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift@629
PS6, Line 629: struct TIcebergDropPartitionSummary {
nit: I feel that naming this 'Summary' is a bit misleading. My impression for 
such a name would be that it contains e.g. some stats and result codes after 
executing the command. I think instead of 'Summary' we could call this 
'Request'.


http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift@631
PS6, Line 631:   2: optional bool is_truncate
Could you please add a comment about what 'is_truncate' means in the context of 
a drop partition.


http://gerrit.cloudera.org:8080/#/c/20515/6/common/thrift/CatalogObjects.thrift@632
PS6, Line 632:   3: required i64 num_partitions
nit: a short comment here too would be useful.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java:

http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/AlterTableDropPartitionStmt.java@103
PS6, Line 103:       analyzeIceberg();
nit: this could fit into the line above


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
File fe/src/main/java/org/apache/impala/analysis/PartitionSet.java:

http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@68
PS6, Line 68:   private long numberOfDroppedIcebergPartitions_;
PartitionSet is a general purpose class that could be used in various 
scenarios. I feel that the name is too generic for this class 
'numberOfDroppedIcebergPartitions_' in a sense that it assumes that this 
PartitionSet class is used for dropping partitions. 
'numberOfIcebergPartitions()' maybe?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@72
PS6, Line 72:   private boolean isIcebergTruncate_ = false;
Similarly as above, this assumes that this PartitionSet is used for some drop 
or truncate scenario, however, PartitionSpec is much more general purpose.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@144
PS6, Line 144:   public void analyzeIceberg(Analyzer analyzer) throws 
AnalysisException {
I know it was me who gave a comment to move all this logic into PartitionSet, 
but now that I see the code I have feelings against it. Initially I figured 
that there is some overlap between the original analyse() of PartitionSet and 
the AlterTableDropPartitionStmt.analyzeIceberg() but in fact we ended up 
completely separate fields and separate logic between the traditional and the 
Iceberg cases.
Sorry, for the inconvenience but I feel now that this code doesn't really fir 
here naturally, and might have been better in the original place.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@147
PS6, Line 147: IcebergPartitionExpressionRewriter
When looking at this code, for me it is pretty hard to decide the purpose of 
IcebergPartitionExpressionRewriter and  IcebergPartitionPredicateConverter. 
From the callsite you can only guess that these receive some expression tree 
and does some 'rewrite' and 'convert' on them but the naming doesn't really 
help to understand the code.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@152
PS6, Line 152: icebergExpressions
nit: 'icebergPartitionExprs' ?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@160
PS6, Line 160:       catch (ImpalaException e) {
nit: this should go into one line above


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@167
PS6, Line 167:         icebergExpressions, null)) {
Shouldn't we assert that the expressions in 'icebergExpressions' are in fact 
referring to partition columns in Iceberg? Or would this be guaranteed when 
checking the residual()? If the latter, could you add a comment about this 
somewhere around the residual call?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@169
PS6, Line 169:       HashMap<String, Long> icebergPartitionSummary = new 
HashMap<>();
You seem to only use the keys of this mapping. Could this be a HashSet<String> 
instead?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@174
PS6, Line 174:           if (fileScanTask.deletes().isEmpty()) {
For me it seems to add extra complexity to make all these separations of 
ContentFile types so that they can fit into GroupedContentFiles. You only use 
size() and getAllContentFiles() anyway so wouldn't it be possible to collect 
the ContentFiles in a single set instead?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@197
PS6, Line 197:   private class IcebergPartitionExpressionRewriter {
Could you move this into a separate top level class into a separate file? Might 
come handy later on for other features.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@221
PS6, Line 221: rewriteReferences
I'm not sure about the name, or actually with the 'References' part. I don;t 
see what it refers to. Would rewrite() do here?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@222
PS6, Line 222:       if (expr instanceof BinaryPredicate) {
Seeing all these instanceof checks made me wondering if you have considered 
polimorphism instead in Expr and its derived classes to do this rewrite action. 
Or would that be too Iceberg specific logic to add into Expr?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@264
PS6, Line 264:       TIcebergPartitionTransformType transformType = 
partitionExpr.getTransform()
nit: this should be below the literal check


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@312
PS6, Line 312:         analyzer_.addWarning(String.format(
I don't get why we write a warning kind of unconditionally when we have a year 
transform. Is this the case when we receive a "years from epoch year" input 
from the user? Do we want to support this? I'd expect users to provide the 
actual year when using this year transform in DELETE PARTITION.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@318
PS6, Line 318:       if (!IcebergUtil.isDateTimeTransformType(transformType)) 
return;
Shouldn't this be the very first check in this function to exit immediately if 
the transformType is not date time related?
Additionally at the callsite I'd only call this function if the transformType 
implies we should. And this check inside the function could be a precondition 
check instead.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@325
PS6, Line 325:       catch (ImpalaRuntimeException ignore){}
Could you help me understand why this exception is ignored? What cases do we 
ignore here?


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/analysis/PartitionSet.java@345
PS6, Line 345:       org.apache.iceberg.PartitionSpec spec = ((IcebergTable) 
table_).getIcebergApiTable()
You query this partition spec many times. Could this be a member of the class 
instead?
And if this line wasn't needed then this function would be reduced to one line 
so could rather write that one line on the callsite instead of making a 
function call.


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@240
PS6, Line 240:     List<String> paths = 
params.iceberg_drop_partition_summary.paths;
This could be moved into the else branch


http://gerrit.cloudera.org:8080/#/c/20515/6/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@245
PS6, Line 245:       for (String path : paths) {
Instead of for loop "deleteFiles.addAll(param..paths)"?



--
To view, visit http://gerrit.cloudera.org:8080/20515
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a768ba2966f570454687e02e4e6d67df46741f9
Gerrit-Change-Number: 20515
Gerrit-PatchSet: 6
Gerrit-Owner: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2023 15:03:14 +0000
Gerrit-HasComments: Yes

Reply via email to