Github user jinfengni commented on the pull request:

    https://github.com/apache/drill/pull/444#issuecomment-201430413
  
    I agree that your expectation for RelSubset makes sense to me. However, for 
now it does not happen that way. The following is the trace for the query which 
went through planning with this customized rule (I removed some rels).
    
    Set#1, type: RecordType(BIGINT custkey, ANY custAddress)
    ...
    Set#2, type: (DrillRecordRow[*, l_orderkey, l_partkey, l_linenumber])
      rel#273:Subset#2.LOGICAL.ANY([]).[], best=rel#442, 
importance=0.31381059609000006
        
rel#275:AbstractConverter.LOGICAL.ANY([]).[](input=rel#80:Subset#2.ENUMERABLE.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
 rowcount=100.0, cumulative cost={inf}
        rel#442:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, 
tpch/lineitem.parquet],groupscan=ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/lineitem.parquet]], 
selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`*`]]), rowcount=60175.0, cumulative 
cost={60175.0 rows, 6.0175E8 cpu, 0.0 io, 0.0 network, 0.0 memory}
    Set#3, type: (DrillRecordRow[*, l_orderkey, l_partkey, l_linenumber])
      rel#82:Subset#3.NONE.ANY([]).[], best=null, importance=0.3874204890000001
        
rel#345:LogicalFilter.NONE.ANY([]).[[]](input=rel#273:Subset#2.LOGICAL.ANY([]).[],condition=AND(>=($1,
 20160101), <=($2, 20160301), OR(=($2, 1), =($2, 2), =($2, 5), =($2, 6)))), 
rowcount=6.25, cumulative cost={inf}
    
    rel#345:LogicalFilter has a child rel#273 with LOGICAL convention.
    
    As another example, for the following query:
     
       Select n_name, n_nationkey from cp.`tpch/nation.parquet` where 
n_nationkey > 5
    
    The trace:
    Set#0, type: (DrillRecordRow[*, n_nationkey, n_name])
      ... 
    Set#1, type: (DrillRecordRow[*, n_nationkey, n_name])
      rel#21:Subset#1.NONE.ANY([]).[], best=null, importance=0.81
        
rel#20:LogicalFilter.NONE.ANY([]).[](input=rel#19:Subset#0.ENUMERABLE.ANY([]).[],condition=>($1,
 5)), rowcount=50.0, cumulative cost={inf}
        
rel#37:LogicalFilter.NONE.ANY([]).[[]](input=rel#19:Subset#0.ENUMERABLE.ANY([]).[],condition=>($1,
 5)), rowcount=50.0, cumulative cost={inf}
      rel#35:Subset#1.LOGICAL.ANY([]).[], best=rel#60, importance=0.81
        
rel#60:DrillFilterRel.LOGICAL.ANY([]).[](input=rel#57:Subset#0.LOGICAL.ANY([]).[],condition=>($1,
 5)), rowcount=50.0, cumulative cost={125.0 rows, 250600.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}
    Set#2, type: RecordType(ANY n_name, ANY n_nationkey)
      rel#23:Subset#2.NONE.ANY([]).[], best=null, importance=0.9
        
rel#53:LogicalProject.NONE.ANY([]).[](input=rel#35:Subset#1.LOGICAL.ANY([]).[],n_name=$2,n_nationkey=$1),
 rowcount=50.0, cumulative cost={inf}
    
    Again, rel#53:LogicalProject has a child rel#35 whose convention is LOGICAL.
    
    I think the reason that we have such mixed rels is we have different kinds 
of rules, used in a single Volcano planning phase.
     1) Rule matchs base class Filter/Project, etc only.
     2) Rule matches LogicalFilter/LogicalProject, etc
     3) Rule uses copy() method to generate a new Rel 
     4) Rule  uses RelFactory to generate a new Rel.
     5) convent rule, which convert from Calcite logical (NONE/Enumerable) to 
Drill logical (LOGICAL)
    
    For instance, ProjectMergeRule, which matches base Project, yet uses 
default RelFactory, will match both LogicalProject and DrillProject, but 
produce LogicalProject as outcome. That will cause the mixed rels. 
    
    2 things we may consider to fix this:
    1) Separate the convent rules from the other transformation rules. Apply 
convert rule first, then transformation rule match DrillLogical only. That's 
similar to what other system (hive) is doing.
    2) go through every rule we use, and we need make sure the convention of 
input and ouptput of a transformation rule should be same, except for the 
convert rule.
    
    The above 2 things would take some considerably effort, though.
    
      



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to