-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72526/#review220896
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 54 (patched)
<https://reviews.apache.org/r/72526/#comment309606>

    Cool comment! :)



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 99 (patched)
<https://reviews.apache.org/r/72526/#comment309617>

    Could you add some more comments within this method (even if they are 
short) just going over the overall workflow?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 104 (patched)
<https://reviews.apache.org/r/72526/#comment309616>

    What if root has multiple inputs? This may not be supported right now, but 
to ensure correctness, add a check on the number of inputs and bail out 
otherwise.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 129 (patched)
<https://reviews.apache.org/r/72526/#comment309614>

    _SourceTable_ class name is a bit misleading. Is this another mapping? 
_SourceTableMapping_?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 204 (patched)
<https://reviews.apache.org/r/72526/#comment309610>

    One way to extend this to more complex expressions is to use a RexVisitor 
to gather the RexTableInputRef(s) in the expression.
    Then if they all come from the same table, you execute logic similar to 
what you are doing now.
    However, the _mapping_ will be from position to RexNode.
    When you are checking whether the keys are present, you would need to check 
for the TABLE_INPUT_REF RexNodes in those targets.
    Finally, a RexShuttle can be used to replace RexTableInputRefs by the 
corresponding RexInputRef after additional join.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 226 (patched)
<https://reviews.apache.org/r/72526/#comment309611>

    It seems problematic do skip the _cast_ here. Can't this potentially 
produce incorrect results?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 250 (patched)
<https://reviews.apache.org/r/72526/#comment309615>

    Nit. Renaming RelNodes to _leftInput_ and _rightInput_ may make things more 
clear.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 283 (patched)
<https://reviews.apache.org/r/72526/#comment309612>

    Doesn't it make things easier if we just keep a MultiMap here? It seems you 
could potentially get rid of the bit sets too.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
Lines 284 (patched)
<https://reviews.apache.org/r/72526/#comment309608>

    These bitsets as well as _targetFields_ below are not actually immutable 
(you are calling _set_ on them). Iirc the _set_ operation actually instantiates 
a new bitset under the hood and returns it, hence I believe this may not be 
correct (the actual bitset is not changing). Even if it does, it will not very 
efficient.
    If you are keeping mutable bitsets, the better choice is probably to use 
the mutable BitSet.
    Otherwise, the best option is to try to make this whole class inmmutable, 
e.g., use a ImmutableBitSet.builder in _getExpressionLineageOf_ and then pass 
an immutablebitset in the constructor.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
Lines 35 (patched)
<https://reviews.apache.org/r/72526/#comment309607>

    We should be able to reuse some of the code below if this rule extends 
_HiveRelFieldTrimmerRule_. You can have a trimmer instance in the super class, 
pass it in the constructor. Then this subclass will only pass the trimmer 
implementation while reusing rest of logic. Does it make sense?



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Lines 2389 (patched)
<https://reviews.apache.org/r/72526/#comment309609>

    Unless I am mistaken, this will execute _HiveCardinalityPreservingJoinRule_ 
at least twice.
    Although maybe this is obvious, is _HiveProjectMerge_ actually needed?
    If it is, a better choice is to use a different partial program for it.


- Jesús Camacho Rodríguez


On May 26, 2020, 6:29 p.m., Krisztian Kasa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72526/
> -----------------------------------------------------------
> 
> (Updated May 26, 2020, 6:29 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-23493
>     https://issues.apache.org/jira/browse/HIVE-23493
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Rewrite plan to join back tables with many projected columns joined multiple 
> times
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8094d28f21 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinOptimization.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveCardinalityPreservingJoinRule.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java
>  19ce3ea223 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 377e8280e5 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query11.q.out 
> 0136ee4bb5 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query4.q.out 
> 987a0f348e 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query74.q.out 
> 289e5d2569 
> 
> 
> Diff: https://reviews.apache.org/r/72526/diff/4/
> 
> 
> Testing
> -------
> 
> mvn test -Dtest.output.overwrite -DskipSparkTests 
> -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile=cardinality_preserving_join_opt.q,cardinality_preserving_join_opt_q4.q,cardinality_preserving_join_opt_q11.q,cardinality_preserving_join_opt_q74.q
>  -pl itests/qtest -Pitests
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>

Reply via email to