Hi,
As mentioned by Mads in a previous mail, we are working on a SQL-API in Apache 
Wayang.
We are trying to set up experiments with the JOB Benchmark and see that we have 
to rewrite queries to explicit INNER JOINS for them to be parsed correctly.
Since we are planning to do other benchmarks with thousands of queries, 
rewriting is not feasible.

Given this (not-rewritten) query from JOB:

SELECT MIN(chn.name) AS uncredited_voiced_character,
       MIN(t.title) AS russian_movie
FROM postgres.char_name AS chn,
     postgres.cast_info AS ci,
     postgres.company_name AS cn,
     postgres.company_type AS ct,
     postgres.movie_companies AS mc,
     postgres.role_type AS rt,
     postgres.title AS t
WHERE ci.note LIKE '%(voice)%'
  AND ci.note LIKE '%(uncredited)%'
  AND cn.country_code = '[ru]'
  AND rt.role = 'actor'
  AND t.production_year > 2005
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND ci.movie_id = mc.movie_id
  AND chn.id = ci.person_role_id
  AND rt.id = ci.role_id
  AND cn.id = mc.company_id
  AND ct.id = mc.company_type_id;

We use calcite to get the following tree:

LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)], 
russian_movie=[MIN($1)])
  LogicalProject(name=[$1], title=[$31])
    LogicalFilter(condition=[AND(LIKE($11, '%(voice)%'), LIKE($11, 
'%(uncredited)%'), =($16, '[ru]'), =($29, 'actor'), >($34, 2005), =($30, $24), 
=($30, $9), =($9, $24), =($0, $10), =($28, $13), =($14, $25), =($21, $26))])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalJoin(condition=[true], joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner])
            LogicalJoin(condition=[true], joinType=[inner])
              LogicalJoin(condition=[true], joinType=[inner])
                LogicalJoin(condition=[true], joinType=[inner])
                  LogicalTableScan(table=[[postgres, char_name]])
                  LogicalTableScan(table=[[postgres, cast_info]])
                LogicalTableScan(table=[[postgres, company_name]])
              LogicalTableScan(table=[[postgres, company_type]])
            LogicalTableScan(table=[[postgres, movie_companies]])
          LogicalTableScan(table=[[postgres, role_type]])
        LogicalTableScan(table=[[postgres, title]])


I then try to apply the CoreRules.FILTER_INTO_JOIN (tried smart and dumb 
version), in order to avoid the cartesian products, hoping to push the join 
conditions into the respective LogicalJoins.
Heres the resulting tree:

LogicalAggregate(group=[{}], uncredited_voiced_character=[MIN($0)], 
russian_movie=[MIN($1)])
  LogicalProject(name=[$1], title=[$31])
    LogicalJoin(condition=[=($24, $30)], joinType=[inner])
      LogicalJoin(condition=[=($28, $13)], joinType=[inner])
        LogicalJoin(condition=[AND(=($9, $24), =($14, $25), =($21, $26))], 
joinType=[inner])
          LogicalJoin(condition=[true], joinType=[inner])
            LogicalJoin(condition=[true], joinType=[inner])
              LogicalJoin(condition=[=($0, $10)], joinType=[inner])
                LogicalTableScan(table=[[postgres, char_name]])
                LogicalFilter(condition=[AND(LIKE($4, '%(voice)%'), LIKE($4, 
'%(uncredited)%'))])
                  LogicalTableScan(table=[[postgres, cast_info]])
              LogicalFilter(condition=[=($2, '[ru]')])
                LogicalTableScan(table=[[postgres, company_name]])
            LogicalTableScan(table=[[postgres, company_type]])
          LogicalTableScan(table=[[postgres, movie_companies]])
        LogicalFilter(condition=[=($1, 'actor')])
          LogicalTableScan(table=[[postgres, role_type]])
      LogicalFilter(condition=[>($4, 2005)])
        LogicalTableScan(table=[[postgres, title]])

Some of the conditions are pushed down, but we still have remaining cartesian 
products and a multi-condition join.
Looking at the input query, I would expect every Join to have a condition, as 
there are no unspecified joins, right?
What am I missing or what can we do to deconstruct the multi-conditional join 
and avoid cartesian products?

Thanks in advance for any help!

Best,
Juri

Reply via email to