Hi,

I am wondering if Calcite will support "lazy optimization" (execution time
optimization / runtime optimization).

For example, we want to do an inner join between an Elasticsearch table and
a MySQL table, like this:

WITH logic_table_2 AS
  (SELECT _MAP['status'] AS "status",
          _MAP['user'] AS "user"
   FROM "es"."insight-by-sql-v3"
   LIMIT 12345)
SELECT *
FROM "insight_user"."user_tab" AS t1
JOIN logic_table_2 AS t2 ON t1."email" = t2."user"
WHERE t2."status" = 'fail'
LIMIT 10

t2 is a ES table and t1 is a MySQL table, and it may generate a execution
plan like this:

EnumerableProject(id=[$2], name=[$3], email=[$4], desc=[$5], is_super=[$6],
create_time=[$7], has_all_access=[$8], status=[$0], user=[$1])
  EnumerableLimit(fetch=[10])
    EnumerableHashJoin(condition=[=($1, $4)], joinType=[inner])
      ElasticsearchToEnumerableConverter
        ElasticsearchProject(status=[ITEM($0, 'status')], user=[ITEM($0,
'user')])
          ElasticsearchFilter(condition=[=(ITEM($0, 'status'), 'fail')])
            ElasticsearchSort(fetch=[12345])
              ElasticsearchTableScan(table=[[es, insight-by-sql-v3]])
      JdbcToEnumerableConverter
        JdbcTableScan(table=[[insight_user, user_tab]])

since here ES query has a filter, in execution Calcite will do the ES query
first and get the build table, and then do JdbcTableScan and get the probe
table, and do the HashJoin finally.

But, since this is a INNER JOIN, there is an implicit filter on the later
JdbcTableScan:
``` t1.email in (select user from t2 where t2.status='fail') ```, if
applying this implicit filter, the dataset we will handle may become
extremely small (save memory) and running much faster since the full
JdbcTableScan is always time-wasting. But since Calcite do the optimization
in planner phase, this dynamic/lazy optimization seems missed ...

To summarize, serial execution with a "lazy optimization" may be faster and
use less memory than parallel execution with an optimized execution plan
since the former one can reduce dataset we handle.

Any ideas?

Reply via email to