Re: Embedding Calcite, adjusting convertlets

Gian Merlino Fri, 13 Jan 2017 09:03:42 -0800

This was totally user error, I was ignoring "fields" and "collation" of the
RelRoot. Passing "collation" to planner.transform stopped it from removing
the sort.


Gian

On Wed, Dec 14, 2016 at 3:21 PM, Gian Merlino <g...@imply.io> wrote:

> I don't think so, since my test query (SELECT dim1 FROM s.foo GROUP BY
> dim1 ORDER BY dim1 DESC) is sorting on a column that is also projected.
>
> Gian
>
> On Wed, Dec 14, 2016 at 11:58 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> Are you running into some variant of the problems that inspired
>> https://issues.apache.org/jira/browse/CALCITE-819: <
>> https://issues.apache.org/jira/browse/CALCITE-819:> at the root of the
>> tree, columns that are not projected are removed, and if the desired sort
>> order involves non-projected columns, the desired sort order is forgotten).
>>
>> > On Dec 14, 2016, at 11:19 AM, Gian Merlino <g...@imply.io> wrote:
>> >
>> > Ah, thanks. So if that sort of thing is not a smoking gun, do you have
>> an
>> > idea about where I should look next? If not I'll keep poking around.
>> >
>> > Gian
>> >
>> > On Wed, Dec 14, 2016 at 11:06 AM, Julian Hyde <jh...@apache.org> wrote:
>> >
>> >>> - But its "set" field points to a RelSet with "rels" that _don't_ have
>> >>> _any_ collation traits.
>> >>
>> >> That’s OK. A “subset” (RelSubset) is a collection of RelNodes that are
>> >> logically and physically equivalent (same results, same physical
>> >> properties) whereas a “set” (RelSet) is a collection of RelNodes that
>> are
>> >> logically equivalent.
>> >>
>> >> A set can therefore be considered to be a collection of subsets, each
>> of
>> >> which contains RelNodes. And it used to be implemented that way, but in
>> >> https://issues.apache.org/jira/browse/CALCITE-88 <
>> >> https://issues.apache.org/jira/browse/CALCITE-88> we introduced
>> collation
>> >> as a trait, and that made subsets non-disjoint (a RelNode can be
>> sorted on
>> >> (x, y), and also on (x), and also on (), and also on (z)) so we made
>> >> RelSubset just a view onto a RelSet, filtering the list of RelNodes
>> >> according to the ones that have (“subsume”) the desired traits.
>> >>
>> >> Julian
>> >>
>> >>
>> >>> On Dec 14, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
>> >>>
>> >>> I spent some more time looking into (3) and found that when I had
>> things
>> >>> going through the Planner rather than the JDBC driver, SortRemoveRule
>> was
>> >>> removing sorts when it shouldn't have been. This happens even for
>> simple
>> >>> queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
>> >>> DESC". Removing SortRemoveRule from the planner fixed the broken
>> tests on
>> >>> my end.
>> >>>
>> >>> I dug into that a bit and saw that the call to
>> "convert(sort.getInput(),
>> >>> traits)" in SortRemoveRule was returning a RelSubset that looked a bit
>> >>> funny in the debugger:
>> >>>
>> >>> - The RelSubset's "traitSet" _does_ have the proper collation trait.
>> >>> - But its "set" field points to a RelSet with "rels" that _don't_ have
>> >>> _any_ collation traits.
>> >>>
>> >>> From what I understand that causes Calcite to treat the unsorted and
>> >> sorted
>> >>> rels as equivalent when they in fact aren't. I'm still not sure if
>> this
>> >> is
>> >>> a Calcite bug or user error on my part… I'll keep looking into it
>> unless
>> >>> someone has any bright ideas.
>> >>>
>> >>> fwiw, my Planner construction looks like this:
>> >>>
>> >>>   final FrameworkConfig frameworkConfig = Frameworks
>> >>>       .newConfigBuilder()
>> >>>       .parserConfig(
>> >>>           SqlParser.configBuilder()
>> >>>                    .setCaseSensitive(true)
>> >>>                    .setUnquotedCasing(Casing.UNCHANGED)
>> >>>                    .build()
>> >>>       )
>> >>>       .defaultSchema(rootSchema)
>> >>>       .traitDefs(ConventionTraitDef.INSTANCE,
>> >>> RelCollationTraitDef.INSTANCE)
>> >>>       .programs(Programs.ofRules(myRules))
>> >>>       .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
>> >>>       .context(Contexts.EMPTY_CONTEXT)
>> >>>       .build();
>> >>>
>> >>>   return Frameworks.getPlanner(frameworkConfig);
>> >>>
>> >>> Gian
>> >>>
>> >>> On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <g...@imply.io> wrote:
>> >>>
>> >>>> Sure, I added those first two to the ticket.
>> >>>>
>> >>>> I don't think those are happening with (3) but I'll double check next
>> >> time
>> >>>> I take a look at using the Planner.
>> >>>>
>> >>>> Gian
>> >>>>
>> >>>> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >>>>
>> >>>>> Can you please add (1) and (2) to https://issues.apache.org/jira
>> >>>>> /browse/CALCITE-1525 <https://issues.apache.org/jir
>> >> a/browse/CALCITE-1525>,
>> >>>>> which deals with the whole issue of using “Planner” within the JDBC
>> >> driver,
>> >>>>> so we can be consistent.
>> >>>>>
>> >>>>> (3) doesn’t look likely to be related. Do your queries have UNION or
>> >>>>> other set-ops? Are you sorting on columns that do not appear in the
>> >> final
>> >>>>> result?
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>>
>> >>>>>> On Nov 28, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
>> >>>>>>
>> >>>>>> I traveled a bit down the Frameworks/Planner road and got most of
>> my
>> >>>>> tests
>> >>>>>> passing, but ran into some problems getting them all to work:
>> >>>>>>
>> >>>>>> (1) "EXPLAIN PLAN FOR" throws NullPointerException during
>> >>>>> Planner.validate.
>> >>>>>> It looks like CalcitePrepareImpl has some special code to handle
>> >>>>> validation
>> >>>>>> of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is
>> >> something I
>> >>>>>> should be doing on my end, or if it's a bug in PlannerImpl.
>> >>>>>> (2) I don't see a way to do ?-style prepared statements with bound
>> >>>>>> variables, which _is_ possible with the JDBC driver route.
>> >>>>>> (3) Not sure why this is happening, but for some reason ORDER BY /
>> >> LIMIT
>> >>>>>> clauses are getting ignored sometimes, even when they work with the
>> >> JDBC
>> >>>>>> driver route. This may be something messed up with my rules though
>> and
>> >>>>> may
>> >>>>>> not be Calcite's fault.
>> >>>>>>
>> >>>>>> Julian, do any of these look like bugs that should be raised in
>> jira,
>> >> or
>> >>>>>> are they just stuff I should be dealing with on my side?
>> >>>>>>
>> >>>>>> Btw, I do like that the Frameworks/Planner route gives me back the
>> >>>>> RelNode
>> >>>>>> itself, since that means I can make the Druid queries directly
>> without
>> >>>>>> needing to go through the extra layers of the JDBC driver. That
>> part
>> >> is
>> >>>>>> nice.
>> >>>>>>
>> >>>>>> Gian
>> >>>>>>
>> >>>>>> On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <jh...@apache.org>
>> >> wrote:
>> >>>>>>
>> >>>>>>> I don’t know how it’s used outside Calcite. Maybe some others can
>> >>>>> chime in.
>> >>>>>>>
>> >>>>>>> Thanks for the PR. I logged https://issues.apache.org/jira
>> >>>>>>> /browse/CALCITE-1509 <https://issues.apache.org/jir
>> >>>>> a/browse/CALCITE-1509>
>> >>>>>>> for it, and will commit shortly.
>> >>>>>>>
>> >>>>>>> Julian
>> >>>>>>>
>> >>>>>>>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <g...@imply.io>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> Do you know examples of projects that use Planner or PlannerImpl
>> >>>>>>> currently
>> >>>>>>>> (from "outside")? As far as I can tell, within Calcite itself
>> it's
>> >>>>> only
>> >>>>>>>> used in test code. Maybe that'd be a better entry point.
>> >>>>>>>>
>> >>>>>>>> In the meantime I raised a PR here for allowing a convertlet
>> table
>> >>>>>>> override
>> >>>>>>>> in a CalcitePrepareImpl: https://github.com/apache/calc
>> ite/pull/330
>> >> .
>> >>>>>>> That
>> >>>>>>>> was enough to get the JDBC driver on my end to behave how I want
>> it
>> >>>>> to.
>> >>>>>>>>
>> >>>>>>>> Gian
>> >>>>>>>>
>> >>>>>>>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <jh...@apache.org>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I was wrong earlier… FrameworkConfig already has a
>> >> getConvertletTable
>> >>>>>>>>> method. But regarding using FrameworkConfig from within the JDBC
>> >>>>> driver,
>> >>>>>>>>> It’s complicated. FrameworkConfig only works if you are
>> “outside”
>> >>>>>>> Calcite,
>> >>>>>>>>> whereas CalcitePrepare is when you are customizing from the
>> inside,
>> >>>>> and
>> >>>>>>>>> sadly CalcitePrepare does not use a FrameworkConfig.
>> >>>>>>>>>
>> >>>>>>>>> Compare and contrast:
>> >>>>>>>>> * CalcitePrepareImpl.getSqlToRelConverter [
>> >>>>> https://github.com/apache/
>> >>>>>>>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
>> >>>>>>> src/main/java/
>> >>>>>>>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
>> >>>>>>>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
>> >>>>> 8d22d7a75
>> >>>>>>>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
>> >>>>>>>>> CalcitePrepareImpl.java#L1114> ]
>> >>>>>>>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
>> >>>>>>>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
>> >>>>>>>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
>> >>>>>>>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
>> >>>>> 11d262e48
>> >>>>>>>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
>> >>>>>>>>> PlannerImpl.java#L225> ]
>> >>>>>>>>>
>> >>>>>>>>> The latter uses a convertletTable sourced from a
>> FrameworkConfig.
>> >>>>>>>>>
>> >>>>>>>>> The ideal thing would be to get CalcitePrepareImpl to use a
>> >>>>> PlannerImpl
>> >>>>>>> to
>> >>>>>>>>> do its dirty work. Then “inside” and “outside” would work the
>> same.
>> >>>>>>> Would
>> >>>>>>>>> definitely appreciate that as a patch.
>> >>>>>>>>>
>> >>>>>>>>> If you choose to go the JDBC driver route, you could override
>> >>>>>>>>> Driver.createPrepareFactory to produce a sub-class of
>> >> CalcitePrepare
>> >>>>>>> that
>> >>>>>>>>> works for your environment, one with an explicit convertletTable
>> >>>>> rather
>> >>>>>>>>> than just using the default.
>> >>>>>>>>>
>> >>>>>>>>> Julian
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <g...@imply.io>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Hey Julian,
>> >>>>>>>>>>
>> >>>>>>>>>> If the convertlets were customizable with a FrameworkConfig,
>> how
>> >>>>> would
>> >>>>>>> I
>> >>>>>>>>>> use that configure the JDBC driver (given that I'm doing it
>> with
>> >> the
>> >>>>>>> code
>> >>>>>>>>>> upthread)? Or would that suggest using a different approach to
>> >>>>>>> embedding
>> >>>>>>>>>> Calcite?
>> >>>>>>>>>>
>> >>>>>>>>>> Gian
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <jh...@apache.org
>> >
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Convertlets have a similar effect to planner rules (albeit
>> they
>> >>>>> act on
>> >>>>>>>>>>> scalar expressions, not relational expressions) so people
>> should
>> >> be
>> >>>>>>>>> able to
>> >>>>>>>>>>> change the set of active convertlets.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Would you like to propose a change that makes the convertlet
>> >> table
>> >>>>>>>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless,
>> please
>> >>>>> log a
>> >>>>>>>>> JIRA
>> >>>>>>>>>>> to track this.
>> >>>>>>>>>>>
>> >>>>>>>>>>> And by the way, RexImpTable, which defines how operators are
>> >>>>>>> implemented
>> >>>>>>>>>>> by generating java code, should also be pluggable. It’s been
>> on
>> >> my
>> >>>>>>> mind
>> >>>>>>>>> for
>> >>>>>>>>>>> a long time to allow the “engine” — related to the data
>> format,
>> >> and
>> >>>>>>> how
>> >>>>>>>>>>> code is generated to access fields and evaluate expressions
>> and
>> >>>>>>>>> operators —
>> >>>>>>>>>>> to be pluggable.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regarding whether the JDBC driver is the right way to embed
>> >>>>> Calcite.
>> >>>>>>>>>>> There’s no easy answer. You might want to embed Calcite as a
>> >>>>> library
>> >>>>>>> in
>> >>>>>>>>>>> your own server (as Drill and Hive do). Or you might want to
>> make
>> >>>>>>>>> yourself
>> >>>>>>>>>>> just an adapter that runs inside a Calcite JDBC server (as the
>> >> CSV
>> >>>>>>>>> adapter
>> >>>>>>>>>>> does). Or something in the middle, like what Phoenix does:
>> using
>> >>>>>>> Calcite
>> >>>>>>>>>>> for JDBC, SQL, planning, but with your own metadata and
>> runtime
>> >>>>>>> engine.
>> >>>>>>>>>>>
>> >>>>>>>>>>> As long as you build the valuable stuff into planner rules,
>> new
>> >>>>>>>>> relational
>> >>>>>>>>>>> operators (if necessary) and use the schema SPI, you should be
>> >>>>> able to
>> >>>>>>>>>>> change packaging in the future.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Julian
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <g...@imply.io>
>> >> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Hey Calcites,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I'm working on embedding Calcite into Druid (
>> http://druid.io/,
>> >>>>>>>>>>>> https://github.com/druid-io/druid/pull/3682) and am running
>> >> into
>> >>>>> a
>> >>>>>>>>>>> problem
>> >>>>>>>>>>>> that is making me wonder if the approach I'm using makes
>> sense.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite
>> has a
>> >>>>>>>>> standard
>> >>>>>>>>>>>> convertlet rule "convertExtract" that changes this into some
>> >>>>>>> arithmetic
>> >>>>>>>>>>> on
>> >>>>>>>>>>>> __time casted to an int type. But Druid has some builtin
>> >>>>> functions to
>> >>>>>>>>> do
>> >>>>>>>>>>>> this, and I'd rather use those than arithmetic (for a bunch
>> of
>> >>>>>>>>> reasons).
>> >>>>>>>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
>> >>>>>>> queries,
>> >>>>>>>>>>> I'd
>> >>>>>>>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
>> >>>>>>>>> expression
>> >>>>>>>>>>> to
>> >>>>>>>>>>>> apply it to. That's a lot easier to translate than the
>> >> arithmetic
>> >>>>>>>>> stuff,
>> >>>>>>>>>>>> which I'd have to pattern match and undo first before
>> >> translating.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> So the problem I have is that I want to disable
>> convertExtract,
>> >>>>> but I
>> >>>>>>>>>>> don't
>> >>>>>>>>>>>> see a way to do that or to swap out the convertlet table.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The code I'm using to set up a connection is:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> public CalciteConnection createCalciteConnection(
>> >>>>>>>>>>>> final DruidSchema druidSchema
>> >>>>>>>>>>>> ) throws SQLException
>> >>>>>>>>>>>> {
>> >>>>>>>>>>>> final Properties props = new Properties();
>> >>>>>>>>>>>> props.setProperty("caseSensitive", "true");
>> >>>>>>>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
>> >>>>>>>>>>>> final Connection connection =
>> >>>>>>>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
>> >>>>>>>>>>>> final CalciteConnection calciteConnection =
>> >>>>>>>>>>>> connection.unwrap(CalciteConnection.class);
>> >>>>>>>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
>> >>>>>>>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
>> >>>>>>>>>>> druidSchema);
>> >>>>>>>>>>>> return calciteConnection;
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This CalciteConnection is then used by the Druid HTTP server
>> to
>> >>>>>>> offer a
>> >>>>>>>>>>> SQL
>> >>>>>>>>>>>> API.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Is there some way to swap out the convertlet table that I'm
>> >>>>> missing?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Also, just in general, am I going about this the right way?
>> Is
>> >>>>> using
>> >>>>>>>>> the
>> >>>>>>>>>>>> JDBC driver the right way to embed Calcite? Or should I be
>> >> calling
>> >>>>>>> into
>> >>>>>>>>>>> it
>> >>>>>>>>>>>> at some lower level?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Thanks!
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Gian
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Embedding Calcite, adjusting convertlets

Reply via email to