Re: Embedding Calcite, adjusting convertlets

Julian Hyde Wed, 14 Dec 2016 11:07:47 -0800

> - But its "set" field points to a RelSet with "rels" that _don't_ have
> _any_ collation traits.


That’s OK. A “subset” (RelSubset) is a collection of RelNodes that are 
logically and physically equivalent (same results, same physical properties) 
whereas a “set” (RelSet) is a collection of RelNodes that are logically 
equivalent.

A set can therefore be considered to be a collection of subsets, each of which 
contains RelNodes. And it used to be implemented that way, but in 
https://issues.apache.org/jira/browse/CALCITE-88 
<https://issues.apache.org/jira/browse/CALCITE-88> we introduced collation as a 
trait, and that made subsets non-disjoint (a RelNode can be sorted on (x, y), 
and also on (x), and also on (), and also on (z)) so we made RelSubset just a 
view onto a RelSet, filtering the list of RelNodes according to the ones that 
have (“subsume”) the desired traits.

Julian


> On Dec 14, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
> 
> I spent some more time looking into (3) and found that when I had things
> going through the Planner rather than the JDBC driver, SortRemoveRule was
> removing sorts when it shouldn't have been. This happens even for simple
> queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
> DESC". Removing SortRemoveRule from the planner fixed the broken tests on
> my end.
> 
> I dug into that a bit and saw that the call to "convert(sort.getInput(),
> traits)" in SortRemoveRule was returning a RelSubset that looked a bit
> funny in the debugger:
> 
> - The RelSubset's "traitSet" _does_ have the proper collation trait.
> - But its "set" field points to a RelSet with "rels" that _don't_ have
> _any_ collation traits.
> 
> From what I understand that causes Calcite to treat the unsorted and sorted
> rels as equivalent when they in fact aren't. I'm still not sure if this is
> a Calcite bug or user error on my part… I'll keep looking into it unless
> someone has any bright ideas.
> 
> fwiw, my Planner construction looks like this:
> 
>    final FrameworkConfig frameworkConfig = Frameworks
>        .newConfigBuilder()
>        .parserConfig(
>            SqlParser.configBuilder()
>                     .setCaseSensitive(true)
>                     .setUnquotedCasing(Casing.UNCHANGED)
>                     .build()
>        )
>        .defaultSchema(rootSchema)
>        .traitDefs(ConventionTraitDef.INSTANCE,
> RelCollationTraitDef.INSTANCE)
>        .programs(Programs.ofRules(myRules))
>        .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
>        .context(Contexts.EMPTY_CONTEXT)
>        .build();
> 
>    return Frameworks.getPlanner(frameworkConfig);
> 
> Gian
> 
> On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <g...@imply.io> wrote:
> 
>> Sure, I added those first two to the ticket.
>> 
>> I don't think those are happening with (3) but I'll double check next time
>> I take a look at using the Planner.
>> 
>> Gian
>> 
>> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>>> Can you please add (1) and (2) to https://issues.apache.org/jira
>>> /browse/CALCITE-1525 <https://issues.apache.org/jira/browse/CALCITE-1525>,
>>> which deals with the whole issue of using “Planner” within the JDBC driver,
>>> so we can be consistent.
>>> 
>>> (3) doesn’t look likely to be related. Do your queries have UNION or
>>> other set-ops? Are you sorting on columns that do not appear in the final
>>> result?
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 28, 2016, at 10:45 AM, Gian Merlino <g...@imply.io> wrote:
>>>> 
>>>> I traveled a bit down the Frameworks/Planner road and got most of my
>>> tests
>>>> passing, but ran into some problems getting them all to work:
>>>> 
>>>> (1) "EXPLAIN PLAN FOR" throws NullPointerException during
>>> Planner.validate.
>>>> It looks like CalcitePrepareImpl has some special code to handle
>>> validation
>>>> of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is something I
>>>> should be doing on my end, or if it's a bug in PlannerImpl.
>>>> (2) I don't see a way to do ?-style prepared statements with bound
>>>> variables, which _is_ possible with the JDBC driver route.
>>>> (3) Not sure why this is happening, but for some reason ORDER BY / LIMIT
>>>> clauses are getting ignored sometimes, even when they work with the JDBC
>>>> driver route. This may be something messed up with my rules though and
>>> may
>>>> not be Calcite's fault.
>>>> 
>>>> Julian, do any of these look like bugs that should be raised in jira, or
>>>> are they just stuff I should be dealing with on my side?
>>>> 
>>>> Btw, I do like that the Frameworks/Planner route gives me back the
>>> RelNode
>>>> itself, since that means I can make the Druid queries directly without
>>>> needing to go through the extra layers of the JDBC driver. That part is
>>>> nice.
>>>> 
>>>> Gian
>>>> 
>>>> On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I don’t know how it’s used outside Calcite. Maybe some others can
>>> chime in.
>>>>> 
>>>>> Thanks for the PR. I logged https://issues.apache.org/jira
>>>>> /browse/CALCITE-1509 <https://issues.apache.org/jir
>>> a/browse/CALCITE-1509>
>>>>> for it, and will commit shortly.
>>>>> 
>>>>> Julian
>>>>> 
>>>>>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <g...@imply.io> wrote:
>>>>>> 
>>>>>> Do you know examples of projects that use Planner or PlannerImpl
>>>>> currently
>>>>>> (from "outside")? As far as I can tell, within Calcite itself it's
>>> only
>>>>>> used in test code. Maybe that'd be a better entry point.
>>>>>> 
>>>>>> In the meantime I raised a PR here for allowing a convertlet table
>>>>> override
>>>>>> in a CalcitePrepareImpl: https://github.com/apache/calcite/pull/330.
>>>>> That
>>>>>> was enough to get the JDBC driver on my end to behave how I want it
>>> to.
>>>>>> 
>>>>>> Gian
>>>>>> 
>>>>>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> I was wrong earlier… FrameworkConfig already has a getConvertletTable
>>>>>>> method. But regarding using FrameworkConfig from within the JDBC
>>> driver,
>>>>>>> It’s complicated. FrameworkConfig only works if you are “outside”
>>>>> Calcite,
>>>>>>> whereas CalcitePrepare is when you are customizing from the inside,
>>> and
>>>>>>> sadly CalcitePrepare does not use a FrameworkConfig.
>>>>>>> 
>>>>>>> Compare and contrast:
>>>>>>> * CalcitePrepareImpl.getSqlToRelConverter [
>>> https://github.com/apache/
>>>>>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
>>>>> src/main/java/
>>>>>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
>>>>>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
>>> 8d22d7a75
>>>>>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
>>>>>>> CalcitePrepareImpl.java#L1114> ]
>>>>>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
>>>>>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
>>>>>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
>>>>>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
>>> 11d262e48
>>>>>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
>>>>>>> PlannerImpl.java#L225> ]
>>>>>>> 
>>>>>>> The latter uses a convertletTable sourced from a FrameworkConfig.
>>>>>>> 
>>>>>>> The ideal thing would be to get CalcitePrepareImpl to use a
>>> PlannerImpl
>>>>> to
>>>>>>> do its dirty work. Then “inside” and “outside” would work the same.
>>>>> Would
>>>>>>> definitely appreciate that as a patch.
>>>>>>> 
>>>>>>> If you choose to go the JDBC driver route, you could override
>>>>>>> Driver.createPrepareFactory to produce a sub-class of CalcitePrepare
>>>>> that
>>>>>>> works for your environment, one with an explicit convertletTable
>>> rather
>>>>>>> than just using the default.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <g...@imply.io> wrote:
>>>>>>>> 
>>>>>>>> Hey Julian,
>>>>>>>> 
>>>>>>>> If the convertlets were customizable with a FrameworkConfig, how
>>> would
>>>>> I
>>>>>>>> use that configure the JDBC driver (given that I'm doing it with the
>>>>> code
>>>>>>>> upthread)? Or would that suggest using a different approach to
>>>>> embedding
>>>>>>>> Calcite?
>>>>>>>> 
>>>>>>>> Gian
>>>>>>>> 
>>>>>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <jh...@apache.org>
>>> wrote:
>>>>>>>> 
>>>>>>>>> Convertlets have a similar effect to planner rules (albeit they
>>> act on
>>>>>>>>> scalar expressions, not relational expressions) so people should be
>>>>>>> able to
>>>>>>>>> change the set of active convertlets.
>>>>>>>>> 
>>>>>>>>> Would you like to propose a change that makes the convertlet table
>>>>>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless, please
>>> log a
>>>>>>> JIRA
>>>>>>>>> to track this.
>>>>>>>>> 
>>>>>>>>> And by the way, RexImpTable, which defines how operators are
>>>>> implemented
>>>>>>>>> by generating java code, should also be pluggable. It’s been on my
>>>>> mind
>>>>>>> for
>>>>>>>>> a long time to allow the “engine” — related to the data format, and
>>>>> how
>>>>>>>>> code is generated to access fields and evaluate expressions and
>>>>>>> operators —
>>>>>>>>> to be pluggable.
>>>>>>>>> 
>>>>>>>>> Regarding whether the JDBC driver is the right way to embed
>>> Calcite.
>>>>>>>>> There’s no easy answer. You might want to embed Calcite as a
>>> library
>>>>> in
>>>>>>>>> your own server (as Drill and Hive do). Or you might want to make
>>>>>>> yourself
>>>>>>>>> just an adapter that runs inside a Calcite JDBC server (as the CSV
>>>>>>> adapter
>>>>>>>>> does). Or something in the middle, like what Phoenix does: using
>>>>> Calcite
>>>>>>>>> for JDBC, SQL, planning, but with your own metadata and runtime
>>>>> engine.
>>>>>>>>> 
>>>>>>>>> As long as you build the valuable stuff into planner rules, new
>>>>>>> relational
>>>>>>>>> operators (if necessary) and use the schema SPI, you should be
>>> able to
>>>>>>>>> change packaging in the future.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <g...@imply.io> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hey Calcites,
>>>>>>>>>> 
>>>>>>>>>> I'm working on embedding Calcite into Druid (http://druid.io/,
>>>>>>>>>> https://github.com/druid-io/druid/pull/3682) and am running into
>>> a
>>>>>>>>> problem
>>>>>>>>>> that is making me wonder if the approach I'm using makes sense.
>>>>>>>>>> 
>>>>>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite has a
>>>>>>> standard
>>>>>>>>>> convertlet rule "convertExtract" that changes this into some
>>>>> arithmetic
>>>>>>>>> on
>>>>>>>>>> __time casted to an int type. But Druid has some builtin
>>> functions to
>>>>>>> do
>>>>>>>>>> this, and I'd rather use those than arithmetic (for a bunch of
>>>>>>> reasons).
>>>>>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
>>>>> queries,
>>>>>>>>> I'd
>>>>>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
>>>>>>> expression
>>>>>>>>> to
>>>>>>>>>> apply it to. That's a lot easier to translate than the arithmetic
>>>>>>> stuff,
>>>>>>>>>> which I'd have to pattern match and undo first before translating.
>>>>>>>>>> 
>>>>>>>>>> So the problem I have is that I want to disable convertExtract,
>>> but I
>>>>>>>>> don't
>>>>>>>>>> see a way to do that or to swap out the convertlet table.
>>>>>>>>>> 
>>>>>>>>>> The code I'm using to set up a connection is:
>>>>>>>>>> 
>>>>>>>>>> public CalciteConnection createCalciteConnection(
>>>>>>>>>>  final DruidSchema druidSchema
>>>>>>>>>> ) throws SQLException
>>>>>>>>>> {
>>>>>>>>>> final Properties props = new Properties();
>>>>>>>>>> props.setProperty("caseSensitive", "true");
>>>>>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
>>>>>>>>>> final Connection connection =
>>>>>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
>>>>>>>>>> final CalciteConnection calciteConnection =
>>>>>>>>>> connection.unwrap(CalciteConnection.class);
>>>>>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
>>>>>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
>>>>>>>>> druidSchema);
>>>>>>>>>> return calciteConnection;
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> This CalciteConnection is then used by the Druid HTTP server to
>>>>> offer a
>>>>>>>>> SQL
>>>>>>>>>> API.
>>>>>>>>>> 
>>>>>>>>>> Is there some way to swap out the convertlet table that I'm
>>> missing?
>>>>>>>>>> 
>>>>>>>>>> Also, just in general, am I going about this the right way? Is
>>> using
>>>>>>> the
>>>>>>>>>> JDBC driver the right way to embed Calcite? Or should I be calling
>>>>> into
>>>>>>>>> it
>>>>>>>>>> at some lower level?
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> 
>>>>>>>>>> Gian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: Embedding Calcite, adjusting convertlets

Reply via email to