Calcite fullscan vs indexscan

Vladimir Sitnikov Sun, 09 Nov 2014 07:05:47 -0800

Hi,

I am having troubles implementing indexed accesses via Calcite.
Can you please guide me?


Here's the problem statement:
1) I have "table full scans" working.
2) I want Calcite to transform joins into nested-loops with "lookup by
id" inner loop.

Here's sample query: https://github.com/vlsi/mat-calcite-plugin#join-sample
explain plan for
 select u."@ID", s."@RETAINED"
   from "java.lang.String" s
   join "java.net.URL" u
     on (s."@ID" = get_id(u.path))

The "@ID" column is a primary key, so I want Calcite to generate the
following plan: Filter(NestedLoops(Scan("java.net.URL" u),
FetchObjectBy(get_id(u.path))), get_class(s)=="java.lang.String")

Current plan is just a join of two "full scans" :(

My "storage engine" is a java library (Eclipse Memory Analyzer in
fact), thus the perfect generated code would be as follows:
for(IObject url: snapshot.getObjectsByClass("java.net.URL")){
  IObject path = (IObject) url.resolveValue("path");
  pipe row(url.getObjectId(), path.getRetainedHeapSize()); // return results
}

Here's what I did:
1) I found NestedLoopsJoinRule that seems to generate the required
kind of plan. I have no idea why the rule is disabled by default.
2) However, I find no "EnumerableCorrelatorRel", thus it looks like I
would get that "cannotplan" exception even if I create my
CorellatorRel("@ID"=get_id) rule.

3) Another my idea is to match JoinRel(MyRel, MyRel) and replace the
second argument with a TableFunction, so the final plan would be
Join(Scan("java.net.URL" u), TableFunction("getObject", get_id(u.path))
Using table function machinery for retrieving a single row looks like
an overkill.

This ends up in the following questions:
1) What is the suggested way to implement this kind of optimizations?
2) Why there is no such thing as EnumerableCorrelatorRel?

-- 
Regards,
Vladimir Sitnikov

Calcite fullscan vs indexscan

Reply via email to