Re: Embedding Calcite, adjusting convertlets

Gian Merlino Wed, 14 Dec 2016 10:47:12 -0800

I spent some more time looking into (3) and found that when I had things
going through the Planner rather than the JDBC driver, SortRemoveRule was
removing sorts when it shouldn't have been. This happens even for simple
queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
DESC". Removing SortRemoveRule from the planner fixed the broken tests on
my end.


I dug into that a bit and saw that the call to "convert(sort.getInput(),
traits)" in SortRemoveRule was returning a RelSubset that looked a bit
funny in the debugger:

- The RelSubset's "traitSet" _does_ have the proper collation trait.
- But its "set" field points to a RelSet with "rels" that _don't_ have
_any_ collation traits.

>From what I understand that causes Calcite to treat the unsorted and sorted
rels as equivalent when they in fact aren't. I'm still not sure if this is
a Calcite bug or user error on my part… I'll keep looking into it unless
someone has any bright ideas.

fwiw, my Planner construction looks like this:

    final FrameworkConfig frameworkConfig = Frameworks
        .newConfigBuilder()
        .parserConfig(
            SqlParser.configBuilder()
                     .setCaseSensitive(true)
                     .setUnquotedCasing(Casing.UNCHANGED)
                     .build()
        )
        .defaultSchema(rootSchema)
        .traitDefs(ConventionTraitDef.INSTANCE,
RelCollationTraitDef.INSTANCE)
        .programs(Programs.ofRules(myRules))
        .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
        .context(Contexts.EMPTY_CONTEXT)
        .build();

    return Frameworks.getPlanner(frameworkConfig);

Gian

On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <g...@imply.io> wrote:

> Sure, I added those first two to the ticket.
>
> I don't think those are happening with (3) but I'll double check next time
> I take a look at using the Planner.
>
> Gian
>
> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> Can you please add (1) and (2) to https://issues.apache.org/jira
>> /browse/CALCITE-1525 <https://issues.apache.org/jira/browse/CALCITE-1525>,
>> which deals with the whole issue of using “Planner” within the JDBC driver,
>> so we can be consistent.
>>
>> (3) doesn’t look likely to be related. Do your queries have UNION or
>> other set-ops? Are you sorting on columns that do not appear in the final
>> result?
>>
>> Julian
>>
>>
>> > On Nov 28, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
>> >
>> > I traveled a bit down the Frameworks/Planner road and got most of my
>> tests
>> > passing, but ran into some problems getting them all to work:
>> >
>> > (1) "EXPLAIN PLAN FOR" throws NullPointerException during
>> Planner.validate.
>> > It looks like CalcitePrepareImpl has some special code to handle
>> validation
>> > of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is something I
>> > should be doing on my end, or if it's a bug in PlannerImpl.
>> > (2) I don't see a way to do ?-style prepared statements with bound
>> > variables, which _is_ possible with the JDBC driver route.
>> > (3) Not sure why this is happening, but for some reason ORDER BY / LIMIT
>> > clauses are getting ignored sometimes, even when they work with the JDBC
>> > driver route. This may be something messed up with my rules though and
>> may
>> > not be Calcite's fault.
>> >
>> > Julian, do any of these look like bugs that should be raised in jira, or
>> > are they just stuff I should be dealing with on my side?
>> >
>> > Btw, I do like that the Frameworks/Planner route gives me back the
>> RelNode
>> > itself, since that means I can make the Druid queries directly without
>> > needing to go through the extra layers of the JDBC driver. That part is
>> > nice.
>> >
>> > Gian
>> >
>> > On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <jh...@apache.org> wrote:
>> >
>> >> I don’t know how it’s used outside Calcite. Maybe some others can
>> chime in.
>> >>
>> >> Thanks for the PR. I logged https://issues.apache.org/jira
>> >> /browse/CALCITE-1509 <https://issues.apache.org/jir
>> a/browse/CALCITE-1509>
>> >> for it, and will commit shortly.
>> >>
>> >> Julian
>> >>
>> >>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <g...@imply.io> wrote:
>> >>>
>> >>> Do you know examples of projects that use Planner or PlannerImpl
>> >> currently
>> >>> (from "outside")? As far as I can tell, within Calcite itself it's
>> only
>> >>> used in test code. Maybe that'd be a better entry point.
>> >>>
>> >>> In the meantime I raised a PR here for allowing a convertlet table
>> >> override
>> >>> in a CalcitePrepareImpl: https://github.com/apache/calcite/pull/330.
>> >> That
>> >>> was enough to get the JDBC driver on my end to behave how I want it
>> to.
>> >>>
>> >>> Gian
>> >>>
>> >>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>
>> >>>> I was wrong earlier… FrameworkConfig already has a getConvertletTable
>> >>>> method. But regarding using FrameworkConfig from within the JDBC
>> driver,
>> >>>> It’s complicated. FrameworkConfig only works if you are “outside”
>> >> Calcite,
>> >>>> whereas CalcitePrepare is when you are customizing from the inside,
>> and
>> >>>> sadly CalcitePrepare does not use a FrameworkConfig.
>> >>>>
>> >>>> Compare and contrast:
>> >>>> * CalcitePrepareImpl.getSqlToRelConverter [
>> https://github.com/apache/
>> >>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
>> >> src/main/java/
>> >>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
>> >>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
>> 8d22d7a75
>> >>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
>> >>>> CalcitePrepareImpl.java#L1114> ]
>> >>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
>> >>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
>> >>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
>> >>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
>> 11d262e48
>> >>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
>> >>>> PlannerImpl.java#L225> ]
>> >>>>
>> >>>> The latter uses a convertletTable sourced from a FrameworkConfig.
>> >>>>
>> >>>> The ideal thing would be to get CalcitePrepareImpl to use a
>> PlannerImpl
>> >> to
>> >>>> do its dirty work. Then “inside” and “outside” would work the same.
>> >> Would
>> >>>> definitely appreciate that as a patch.
>> >>>>
>> >>>> If you choose to go the JDBC driver route, you could override
>> >>>> Driver.createPrepareFactory to produce a sub-class of CalcitePrepare
>> >> that
>> >>>> works for your environment, one with an explicit convertletTable
>> rather
>> >>>> than just using the default.
>> >>>>
>> >>>> Julian
>> >>>>
>> >>>>
>> >>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <g...@imply.io> wrote:
>> >>>>>
>> >>>>> Hey Julian,
>> >>>>>
>> >>>>> If the convertlets were customizable with a FrameworkConfig, how
>> would
>> >> I
>> >>>>> use that configure the JDBC driver (given that I'm doing it with the
>> >> code
>> >>>>> upthread)? Or would that suggest using a different approach to
>> >> embedding
>> >>>>> Calcite?
>> >>>>>
>> >>>>> Gian
>> >>>>>
>> >>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>>>
>> >>>>>> Convertlets have a similar effect to planner rules (albeit they
>> act on
>> >>>>>> scalar expressions, not relational expressions) so people should be
>> >>>> able to
>> >>>>>> change the set of active convertlets.
>> >>>>>>
>> >>>>>> Would you like to propose a change that makes the convertlet table
>> >>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless, please
>> log a
>> >>>> JIRA
>> >>>>>> to track this.
>> >>>>>>
>> >>>>>> And by the way, RexImpTable, which defines how operators are
>> >> implemented
>> >>>>>> by generating java code, should also be pluggable. It’s been on my
>> >> mind
>> >>>> for
>> >>>>>> a long time to allow the “engine” — related to the data format, and
>> >> how
>> >>>>>> code is generated to access fields and evaluate expressions and
>> >>>> operators —
>> >>>>>> to be pluggable.
>> >>>>>>
>> >>>>>> Regarding whether the JDBC driver is the right way to embed
>> Calcite.
>> >>>>>> There’s no easy answer. You might want to embed Calcite as a
>> library
>> >> in
>> >>>>>> your own server (as Drill and Hive do). Or you might want to make
>> >>>> yourself
>> >>>>>> just an adapter that runs inside a Calcite JDBC server (as the CSV
>> >>>> adapter
>> >>>>>> does). Or something in the middle, like what Phoenix does: using
>> >> Calcite
>> >>>>>> for JDBC, SQL, planning, but with your own metadata and runtime
>> >> engine.
>> >>>>>>
>> >>>>>> As long as you build the valuable stuff into planner rules, new
>> >>>> relational
>> >>>>>> operators (if necessary) and use the schema SPI, you should be
>> able to
>> >>>>>> change packaging in the future.
>> >>>>>>
>> >>>>>> Julian
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <g...@imply.io> wrote:
>> >>>>>>>
>> >>>>>>> Hey Calcites,
>> >>>>>>>
>> >>>>>>> I'm working on embedding Calcite into Druid (http://druid.io/,
>> >>>>>>> https://github.com/druid-io/druid/pull/3682) and am running into
>> a
>> >>>>>> problem
>> >>>>>>> that is making me wonder if the approach I'm using makes sense.
>> >>>>>>>
>> >>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite has a
>> >>>> standard
>> >>>>>>> convertlet rule "convertExtract" that changes this into some
>> >> arithmetic
>> >>>>>> on
>> >>>>>>> __time casted to an int type. But Druid has some builtin
>> functions to
>> >>>> do
>> >>>>>>> this, and I'd rather use those than arithmetic (for a bunch of
>> >>>> reasons).
>> >>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
>> >> queries,
>> >>>>>> I'd
>> >>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
>> >>>> expression
>> >>>>>> to
>> >>>>>>> apply it to. That's a lot easier to translate than the arithmetic
>> >>>> stuff,
>> >>>>>>> which I'd have to pattern match and undo first before translating.
>> >>>>>>>
>> >>>>>>> So the problem I have is that I want to disable convertExtract,
>> but I
>> >>>>>> don't
>> >>>>>>> see a way to do that or to swap out the convertlet table.
>> >>>>>>>
>> >>>>>>> The code I'm using to set up a connection is:
>> >>>>>>>
>> >>>>>>> public CalciteConnection createCalciteConnection(
>> >>>>>>>   final DruidSchema druidSchema
>> >>>>>>> ) throws SQLException
>> >>>>>>> {
>> >>>>>>> final Properties props = new Properties();
>> >>>>>>> props.setProperty("caseSensitive", "true");
>> >>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
>> >>>>>>> final Connection connection =
>> >>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
>> >>>>>>> final CalciteConnection calciteConnection =
>> >>>>>>> connection.unwrap(CalciteConnection.class);
>> >>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
>> >>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
>> >>>>>> druidSchema);
>> >>>>>>> return calciteConnection;
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> This CalciteConnection is then used by the Druid HTTP server to
>> >> offer a
>> >>>>>> SQL
>> >>>>>>> API.
>> >>>>>>>
>> >>>>>>> Is there some way to swap out the convertlet table that I'm
>> missing?
>> >>>>>>>
>> >>>>>>> Also, just in general, am I going about this the right way? Is
>> using
>> >>>> the
>> >>>>>>> JDBC driver the right way to embed Calcite? Or should I be calling
>> >> into
>> >>>>>> it
>> >>>>>>> at some lower level?
>> >>>>>>>
>> >>>>>>> Thanks!
>> >>>>>>>
>> >>>>>>> Gian
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Embedding Calcite, adjusting convertlets

Reply via email to