Re: Embedding Calcite, adjusting convertlets

Gian Merlino Wed, 14 Dec 2016 11:19:59 -0800

Ah, thanks. So if that sort of thing is not a smoking gun, do you have an
idea about where I should look next? If not I'll keep poking around.


Gian

On Wed, Dec 14, 2016 at 11:06 AM, Julian Hyde <jh...@apache.org> wrote:

> > - But its "set" field points to a RelSet with "rels" that _don't_ have
> > _any_ collation traits.
>
> That’s OK. A “subset” (RelSubset) is a collection of RelNodes that are
> logically and physically equivalent (same results, same physical
> properties) whereas a “set” (RelSet) is a collection of RelNodes that are
> logically equivalent.
>
> A set can therefore be considered to be a collection of subsets, each of
> which contains RelNodes. And it used to be implemented that way, but in
> https://issues.apache.org/jira/browse/CALCITE-88 <
> https://issues.apache.org/jira/browse/CALCITE-88> we introduced collation
> as a trait, and that made subsets non-disjoint (a RelNode can be sorted on
> (x, y), and also on (x), and also on (), and also on (z)) so we made
> RelSubset just a view onto a RelSet, filtering the list of RelNodes
> according to the ones that have (“subsume”) the desired traits.
>
> Julian
>
>
> > On Dec 14, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
> >
> > I spent some more time looking into (3) and found that when I had things
> > going through the Planner rather than the JDBC driver, SortRemoveRule was
> > removing sorts when it shouldn't have been. This happens even for simple
> > queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
> > DESC". Removing SortRemoveRule from the planner fixed the broken tests on
> > my end.
> >
> > I dug into that a bit and saw that the call to "convert(sort.getInput(),
> > traits)" in SortRemoveRule was returning a RelSubset that looked a bit
> > funny in the debugger:
> >
> > - The RelSubset's "traitSet" _does_ have the proper collation trait.
> > - But its "set" field points to a RelSet with "rels" that _don't_ have
> > _any_ collation traits.
> >
> > From what I understand that causes Calcite to treat the unsorted and
> sorted
> > rels as equivalent when they in fact aren't. I'm still not sure if this
> is
> > a Calcite bug or user error on my part… I'll keep looking into it unless
> > someone has any bright ideas.
> >
> > fwiw, my Planner construction looks like this:
> >
> >    final FrameworkConfig frameworkConfig = Frameworks
> >        .newConfigBuilder()
> >        .parserConfig(
> >            SqlParser.configBuilder()
> >                     .setCaseSensitive(true)
> >                     .setUnquotedCasing(Casing.UNCHANGED)
> >                     .build()
> >        )
> >        .defaultSchema(rootSchema)
> >        .traitDefs(ConventionTraitDef.INSTANCE,
> > RelCollationTraitDef.INSTANCE)
> >        .programs(Programs.ofRules(myRules))
> >        .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
> >        .context(Contexts.EMPTY_CONTEXT)
> >        .build();
> >
> >    return Frameworks.getPlanner(frameworkConfig);
> >
> > Gian
> >
> > On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <g...@imply.io> wrote:
> >
> >> Sure, I added those first two to the ticket.
> >>
> >> I don't think those are happening with (3) but I'll double check next
> time
> >> I take a look at using the Planner.
> >>
> >> Gian
> >>
> >> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <jh...@apache.org> wrote:
> >>
> >>> Can you please add (1) and (2) to https://issues.apache.org/jira
> >>> /browse/CALCITE-1525 <https://issues.apache.org/jir
> a/browse/CALCITE-1525>,
> >>> which deals with the whole issue of using “Planner” within the JDBC
> driver,
> >>> so we can be consistent.
> >>>
> >>> (3) doesn’t look likely to be related. Do your queries have UNION or
> >>> other set-ops? Are you sorting on columns that do not appear in the
> final
> >>> result?
> >>>
> >>> Julian
> >>>
> >>>
> >>>> On Nov 28, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
> >>>>
> >>>> I traveled a bit down the Frameworks/Planner road and got most of my
> >>> tests
> >>>> passing, but ran into some problems getting them all to work:
> >>>>
> >>>> (1) "EXPLAIN PLAN FOR" throws NullPointerException during
> >>> Planner.validate.
> >>>> It looks like CalcitePrepareImpl has some special code to handle
> >>> validation
> >>>> of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is
> something I
> >>>> should be doing on my end, or if it's a bug in PlannerImpl.
> >>>> (2) I don't see a way to do ?-style prepared statements with bound
> >>>> variables, which _is_ possible with the JDBC driver route.
> >>>> (3) Not sure why this is happening, but for some reason ORDER BY /
> LIMIT
> >>>> clauses are getting ignored sometimes, even when they work with the
> JDBC
> >>>> driver route. This may be something messed up with my rules though and
> >>> may
> >>>> not be Calcite's fault.
> >>>>
> >>>> Julian, do any of these look like bugs that should be raised in jira,
> or
> >>>> are they just stuff I should be dealing with on my side?
> >>>>
> >>>> Btw, I do like that the Frameworks/Planner route gives me back the
> >>> RelNode
> >>>> itself, since that means I can make the Druid queries directly without
> >>>> needing to go through the extra layers of the JDBC driver. That part
> is
> >>>> nice.
> >>>>
> >>>> Gian
> >>>>
> >>>> On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >>>>
> >>>>> I don’t know how it’s used outside Calcite. Maybe some others can
> >>> chime in.
> >>>>>
> >>>>> Thanks for the PR. I logged https://issues.apache.org/jira
> >>>>> /browse/CALCITE-1509 <https://issues.apache.org/jir
> >>> a/browse/CALCITE-1509>
> >>>>> for it, and will commit shortly.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <g...@imply.io> wrote:
> >>>>>>
> >>>>>> Do you know examples of projects that use Planner or PlannerImpl
> >>>>> currently
> >>>>>> (from "outside")? As far as I can tell, within Calcite itself it's
> >>> only
> >>>>>> used in test code. Maybe that'd be a better entry point.
> >>>>>>
> >>>>>> In the meantime I raised a PR here for allowing a convertlet table
> >>>>> override
> >>>>>> in a CalcitePrepareImpl: https://github.com/apache/calcite/pull/330
> .
> >>>>> That
> >>>>>> was enough to get the JDBC driver on my end to behave how I want it
> >>> to.
> >>>>>>
> >>>>>> Gian
> >>>>>>
> >>>>>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <jh...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>>> I was wrong earlier… FrameworkConfig already has a
> getConvertletTable
> >>>>>>> method. But regarding using FrameworkConfig from within the JDBC
> >>> driver,
> >>>>>>> It’s complicated. FrameworkConfig only works if you are “outside”
> >>>>> Calcite,
> >>>>>>> whereas CalcitePrepare is when you are customizing from the inside,
> >>> and
> >>>>>>> sadly CalcitePrepare does not use a FrameworkConfig.
> >>>>>>>
> >>>>>>> Compare and contrast:
> >>>>>>> * CalcitePrepareImpl.getSqlToRelConverter [
> >>> https://github.com/apache/
> >>>>>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
> >>>>> src/main/java/
> >>>>>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
> >>>>>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
> >>> 8d22d7a75
> >>>>>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
> >>>>>>> CalcitePrepareImpl.java#L1114> ]
> >>>>>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
> >>>>>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
> >>>>>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
> >>>>>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
> >>> 11d262e48
> >>>>>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
> >>>>>>> PlannerImpl.java#L225> ]
> >>>>>>>
> >>>>>>> The latter uses a convertletTable sourced from a FrameworkConfig.
> >>>>>>>
> >>>>>>> The ideal thing would be to get CalcitePrepareImpl to use a
> >>> PlannerImpl
> >>>>> to
> >>>>>>> do its dirty work. Then “inside” and “outside” would work the same.
> >>>>> Would
> >>>>>>> definitely appreciate that as a patch.
> >>>>>>>
> >>>>>>> If you choose to go the JDBC driver route, you could override
> >>>>>>> Driver.createPrepareFactory to produce a sub-class of
> CalcitePrepare
> >>>>> that
> >>>>>>> works for your environment, one with an explicit convertletTable
> >>> rather
> >>>>>>> than just using the default.
> >>>>>>>
> >>>>>>> Julian
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <g...@imply.io> wrote:
> >>>>>>>>
> >>>>>>>> Hey Julian,
> >>>>>>>>
> >>>>>>>> If the convertlets were customizable with a FrameworkConfig, how
> >>> would
> >>>>> I
> >>>>>>>> use that configure the JDBC driver (given that I'm doing it with
> the
> >>>>> code
> >>>>>>>> upthread)? Or would that suggest using a different approach to
> >>>>> embedding
> >>>>>>>> Calcite?
> >>>>>>>>
> >>>>>>>> Gian
> >>>>>>>>
> >>>>>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <jh...@apache.org>
> >>> wrote:
> >>>>>>>>
> >>>>>>>>> Convertlets have a similar effect to planner rules (albeit they
> >>> act on
> >>>>>>>>> scalar expressions, not relational expressions) so people should
> be
> >>>>>>> able to
> >>>>>>>>> change the set of active convertlets.
> >>>>>>>>>
> >>>>>>>>> Would you like to propose a change that makes the convertlet
> table
> >>>>>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless, please
> >>> log a
> >>>>>>> JIRA
> >>>>>>>>> to track this.
> >>>>>>>>>
> >>>>>>>>> And by the way, RexImpTable, which defines how operators are
> >>>>> implemented
> >>>>>>>>> by generating java code, should also be pluggable. It’s been on
> my
> >>>>> mind
> >>>>>>> for
> >>>>>>>>> a long time to allow the “engine” — related to the data format,
> and
> >>>>> how
> >>>>>>>>> code is generated to access fields and evaluate expressions and
> >>>>>>> operators —
> >>>>>>>>> to be pluggable.
> >>>>>>>>>
> >>>>>>>>> Regarding whether the JDBC driver is the right way to embed
> >>> Calcite.
> >>>>>>>>> There’s no easy answer. You might want to embed Calcite as a
> >>> library
> >>>>> in
> >>>>>>>>> your own server (as Drill and Hive do). Or you might want to make
> >>>>>>> yourself
> >>>>>>>>> just an adapter that runs inside a Calcite JDBC server (as the
> CSV
> >>>>>>> adapter
> >>>>>>>>> does). Or something in the middle, like what Phoenix does: using
> >>>>> Calcite
> >>>>>>>>> for JDBC, SQL, planning, but with your own metadata and runtime
> >>>>> engine.
> >>>>>>>>>
> >>>>>>>>> As long as you build the valuable stuff into planner rules, new
> >>>>>>> relational
> >>>>>>>>> operators (if necessary) and use the schema SPI, you should be
> >>> able to
> >>>>>>>>> change packaging in the future.
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <g...@imply.io>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey Calcites,
> >>>>>>>>>>
> >>>>>>>>>> I'm working on embedding Calcite into Druid (http://druid.io/,
> >>>>>>>>>> https://github.com/druid-io/druid/pull/3682) and am running
> into
> >>> a
> >>>>>>>>> problem
> >>>>>>>>>> that is making me wonder if the approach I'm using makes sense.
> >>>>>>>>>>
> >>>>>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite has a
> >>>>>>> standard
> >>>>>>>>>> convertlet rule "convertExtract" that changes this into some
> >>>>> arithmetic
> >>>>>>>>> on
> >>>>>>>>>> __time casted to an int type. But Druid has some builtin
> >>> functions to
> >>>>>>> do
> >>>>>>>>>> this, and I'd rather use those than arithmetic (for a bunch of
> >>>>>>> reasons).
> >>>>>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
> >>>>> queries,
> >>>>>>>>> I'd
> >>>>>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
> >>>>>>> expression
> >>>>>>>>> to
> >>>>>>>>>> apply it to. That's a lot easier to translate than the
> arithmetic
> >>>>>>> stuff,
> >>>>>>>>>> which I'd have to pattern match and undo first before
> translating.
> >>>>>>>>>>
> >>>>>>>>>> So the problem I have is that I want to disable convertExtract,
> >>> but I
> >>>>>>>>> don't
> >>>>>>>>>> see a way to do that or to swap out the convertlet table.
> >>>>>>>>>>
> >>>>>>>>>> The code I'm using to set up a connection is:
> >>>>>>>>>>
> >>>>>>>>>> public CalciteConnection createCalciteConnection(
> >>>>>>>>>>  final DruidSchema druidSchema
> >>>>>>>>>> ) throws SQLException
> >>>>>>>>>> {
> >>>>>>>>>> final Properties props = new Properties();
> >>>>>>>>>> props.setProperty("caseSensitive", "true");
> >>>>>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
> >>>>>>>>>> final Connection connection =
> >>>>>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
> >>>>>>>>>> final CalciteConnection calciteConnection =
> >>>>>>>>>> connection.unwrap(CalciteConnection.class);
> >>>>>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
> >>>>>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
> >>>>>>>>> druidSchema);
> >>>>>>>>>> return calciteConnection;
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> This CalciteConnection is then used by the Druid HTTP server to
> >>>>> offer a
> >>>>>>>>> SQL
> >>>>>>>>>> API.
> >>>>>>>>>>
> >>>>>>>>>> Is there some way to swap out the convertlet table that I'm
> >>> missing?
> >>>>>>>>>>
> >>>>>>>>>> Also, just in general, am I going about this the right way? Is
> >>> using
> >>>>>>> the
> >>>>>>>>>> JDBC driver the right way to embed Calcite? Or should I be
> calling
> >>>>> into
> >>>>>>>>> it
> >>>>>>>>>> at some lower level?
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> Gian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Embedding Calcite, adjusting convertlets

Reply via email to