Re: Embedding Calcite, adjusting convertlets

2016-11-23 Thread Julian Hyde
I don’t know how it’s used outside Calcite. Maybe some others can chime in.

Thanks for the PR. I logged https://issues.apache.org/jira/browse/CALCITE-1509 
 for it, and will commit 
shortly.

Julian

> On Nov 23, 2016, at 12:32 PM, Gian Merlino  wrote:
> 
> Do you know examples of projects that use Planner or PlannerImpl currently
> (from "outside")? As far as I can tell, within Calcite itself it's only
> used in test code. Maybe that'd be a better entry point.
> 
> In the meantime I raised a PR here for allowing a convertlet table override
> in a CalcitePrepareImpl: https://github.com/apache/calcite/pull/330. That
> was enough to get the JDBC driver on my end to behave how I want it to.
> 
> Gian
> 
> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde  wrote:
> 
>> I was wrong earlier… FrameworkConfig already has a getConvertletTable
>> method. But regarding using FrameworkConfig from within the JDBC driver,
>> It’s complicated. FrameworkConfig only works if you are “outside” Calcite,
>> whereas CalcitePrepare is when you are customizing from the inside, and
>> sadly CalcitePrepare does not use a FrameworkConfig.
>> 
>> Compare and contrast:
>> * CalcitePrepareImpl.getSqlToRelConverter [ https://github.com/apache/
>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/src/main/java/
>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b828d22d7a75
>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
>> CalcitePrepareImpl.java#L1114> ]
>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e1211d262e48
>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
>> PlannerImpl.java#L225> ]
>> 
>> The latter uses a convertletTable sourced from a FrameworkConfig.
>> 
>> The ideal thing would be to get CalcitePrepareImpl to use a PlannerImpl to
>> do its dirty work. Then “inside” and “outside” would work the same. Would
>> definitely appreciate that as a patch.
>> 
>> If you choose to go the JDBC driver route, you could override
>> Driver.createPrepareFactory to produce a sub-class of CalcitePrepare that
>> works for your environment, one with an explicit convertletTable rather
>> than just using the default.
>> 
>> Julian
>> 
>> 
>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino  wrote:
>>> 
>>> Hey Julian,
>>> 
>>> If the convertlets were customizable with a FrameworkConfig, how would I
>>> use that configure the JDBC driver (given that I'm doing it with the code
>>> upthread)? Or would that suggest using a different approach to embedding
>>> Calcite?
>>> 
>>> Gian
>>> 
>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde  wrote:
>>> 
 Convertlets have a similar effect to planner rules (albeit they act on
 scalar expressions, not relational expressions) so people should be
>> able to
 change the set of active convertlets.
 
 Would you like to propose a change that makes the convertlet table
 pluggable? Maybe as part of FrameworkConfig? Regardless, please log a
>> JIRA
 to track this.
 
 And by the way, RexImpTable, which defines how operators are implemented
 by generating java code, should also be pluggable. It’s been on my mind
>> for
 a long time to allow the “engine” — related to the data format, and how
 code is generated to access fields and evaluate expressions and
>> operators —
 to be pluggable.
 
 Regarding whether the JDBC driver is the right way to embed Calcite.
 There’s no easy answer. You might want to embed Calcite as a library in
 your own server (as Drill and Hive do). Or you might want to make
>> yourself
 just an adapter that runs inside a Calcite JDBC server (as the CSV
>> adapter
 does). Or something in the middle, like what Phoenix does: using Calcite
 for JDBC, SQL, planning, but with your own metadata and runtime engine.
 
 As long as you build the valuable stuff into planner rules, new
>> relational
 operators (if necessary) and use the schema SPI, you should be able to
 change packaging in the future.
 
 Julian
 
 
 
 
> On Nov 17, 2016, at 1:59 PM, Gian Merlino  wrote:
> 
> Hey Calcites,
> 
> I'm working on embedding Calcite into Druid (http://druid.io/,
> https://github.com/druid-io/druid/pull/3682) and am running into a
 problem
> that is making me wonder if the approach I'm using makes sense.
> 
> Consider the expression EXTRACT(YEAR FROM __time). Calcite has a
>> standard
> convertlet rule "convertExtract" that changes this into some arithmetic
 on
> __time casted to an int type. But Druid has some builtin functions to
>> 

[jira] [Created] (CALCITE-1509) Allow overriding the convertlet table in CalcitePrepareImpl

2016-11-23 Thread Julian Hyde (JIRA)
Julian Hyde created CALCITE-1509:


 Summary: Allow overriding the convertlet table in 
CalcitePrepareImpl
 Key: CALCITE-1509
 URL: https://issues.apache.org/jira/browse/CALCITE-1509
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Julian Hyde






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


FOSDEM 2017 HPC, Bigdata and Data Science DevRoom CFP is closing soon

2016-11-23 Thread Roman Shaposhnik
Hi!

apologies for the extra wide distribution (this exhausts my once
a year ASF mail-to-all-bigdata-projects quota ;-)) but I wanted
to suggest that all of you should consider submitting talks
to FOSDEM 2017 HPC, Bigdata and Data Science DevRoom:
https://hpc-bigdata-fosdem17.github.io/

It was a great success this year and we hope to make it an even
bigger success in 2017.

Besides -- FOSDEM is the biggest gathering of open source
developers on the face of the earth -- don't miss it!

Thanks,
Roman.

P.S. If you have any questions -- please email me directly and
see you all in Brussels!


[jira] [Created] (CALCITE-1508) SortJoinTransposeRule can remove the top Sort node if it is a trivial ORDER BY and the non-preserved side of the outer join is count-preserving

2016-11-23 Thread Maryann Xue (JIRA)
Maryann Xue created CALCITE-1508:


 Summary: SortJoinTransposeRule can remove the top Sort node if it 
is a trivial ORDER BY and the non-preserved side of the outer join is 
count-preserving
 Key: CALCITE-1508
 URL: https://issues.apache.org/jira/browse/CALCITE-1508
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.10.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Minor


If the non-preserved side of the outer join is count-preserving, for each row 
from the preserved side, there can only be zero or one matches from the 
non-preserved side, which means the join can produce only one row. So it is 
safe to push a LIMIT and/or an OFFSET through, and meanwhile remove the 
original Sort node if it is a trivial ORDER-BY. For example,
{code}
select d.deptno, empno
from sales.dept d
right join sales.emp e using (deptno)
limit 10 offset 2
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CALCITE-1507) OFFSET cannot be pushed through a JOIN if the non-preserved side of outer join is not count-preserving

2016-11-23 Thread Maryann Xue (JIRA)
Maryann Xue created CALCITE-1507:


 Summary: OFFSET cannot be pushed through a JOIN if the 
non-preserved side of outer join is not count-preserving
 Key: CALCITE-1507
 URL: https://issues.apache.org/jira/browse/CALCITE-1507
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.10.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Minor


If the non-preserved side of the outer join is not count-preserving, for each 
row from the preserved side, there can be zero, one or multiple matches from 
the non-preserved side, which means the join can produce one or multiple rows. 
So it is safe to push a LIMIT through, but it is invalid to push an OFFSET 
through.
Take this query as an example:
{code}
select d.deptno, empno
from sales.dept d
left join sales.emp e using (deptno)
order by d.deptno offset 1
{code}
And rows from "dept" and "emp" tables are like:
{code}
"dept"
  deptno
  10
  20
  30

"emp"
  empnodeptno
  101  10
  102  10
  105  30
{code}
The expected output is:
{code}
d.deptnoe.empno
10  102
20  null
30  105
{code}
While after applying SortJoinTransposeRule, the rel becomes:
{code}
LogicalProject(DEPTNO=[$0], EMPNO=[$2])
  LogicalSort(sort0=[$0], dir0=[ASC], offset=[1])
LogicalJoin(condition=[=($0, $9)], joinType=[left])
  LogicalSort(sort0=[$0], dir0=[ASC], offset=[1])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{code}
And the output will now be:
{code}
d.deptnoe.empno
20  null
30  105
{code}
because deptno "10" has been skipped from the left relation by the pushed 
through Sort node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Embedding Calcite, adjusting convertlets

2016-11-23 Thread Gian Merlino
Do you know examples of projects that use Planner or PlannerImpl currently
(from "outside")? As far as I can tell, within Calcite itself it's only
used in test code. Maybe that'd be a better entry point.

In the meantime I raised a PR here for allowing a convertlet table override
in a CalcitePrepareImpl: https://github.com/apache/calcite/pull/330. That
was enough to get the JDBC driver on my end to behave how I want it to.

Gian

On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde  wrote:

> I was wrong earlier… FrameworkConfig already has a getConvertletTable
> method. But regarding using FrameworkConfig from within the JDBC driver,
> It’s complicated. FrameworkConfig only works if you are “outside” Calcite,
> whereas CalcitePrepare is when you are customizing from the inside, and
> sadly CalcitePrepare does not use a FrameworkConfig.
>
> Compare and contrast:
>  * CalcitePrepareImpl.getSqlToRelConverter [ https://github.com/apache/
> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/src/main/java/
> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b828d22d7a75
> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
> CalcitePrepareImpl.java#L1114> ]
>  * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
> org/apache/calcite/prepare/PlannerImpl.java#L225 <
> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e1211d262e48
> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
> PlannerImpl.java#L225> ]
>
> The latter uses a convertletTable sourced from a FrameworkConfig.
>
> The ideal thing would be to get CalcitePrepareImpl to use a PlannerImpl to
> do its dirty work. Then “inside” and “outside” would work the same. Would
> definitely appreciate that as a patch.
>
> If you choose to go the JDBC driver route, you could override
> Driver.createPrepareFactory to produce a sub-class of CalcitePrepare that
> works for your environment, one with an explicit convertletTable rather
> than just using the default.
>
> Julian
>
>
> > On Nov 17, 2016, at 5:01 PM, Gian Merlino  wrote:
> >
> > Hey Julian,
> >
> > If the convertlets were customizable with a FrameworkConfig, how would I
> > use that configure the JDBC driver (given that I'm doing it with the code
> > upthread)? Or would that suggest using a different approach to embedding
> > Calcite?
> >
> > Gian
> >
> > On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde  wrote:
> >
> >> Convertlets have a similar effect to planner rules (albeit they act on
> >> scalar expressions, not relational expressions) so people should be
> able to
> >> change the set of active convertlets.
> >>
> >> Would you like to propose a change that makes the convertlet table
> >> pluggable? Maybe as part of FrameworkConfig? Regardless, please log a
> JIRA
> >> to track this.
> >>
> >> And by the way, RexImpTable, which defines how operators are implemented
> >> by generating java code, should also be pluggable. It’s been on my mind
> for
> >> a long time to allow the “engine” — related to the data format, and how
> >> code is generated to access fields and evaluate expressions and
> operators —
> >> to be pluggable.
> >>
> >> Regarding whether the JDBC driver is the right way to embed Calcite.
> >> There’s no easy answer. You might want to embed Calcite as a library in
> >> your own server (as Drill and Hive do). Or you might want to make
> yourself
> >> just an adapter that runs inside a Calcite JDBC server (as the CSV
> adapter
> >> does). Or something in the middle, like what Phoenix does: using Calcite
> >> for JDBC, SQL, planning, but with your own metadata and runtime engine.
> >>
> >> As long as you build the valuable stuff into planner rules, new
> relational
> >> operators (if necessary) and use the schema SPI, you should be able to
> >> change packaging in the future.
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>> On Nov 17, 2016, at 1:59 PM, Gian Merlino  wrote:
> >>>
> >>> Hey Calcites,
> >>>
> >>> I'm working on embedding Calcite into Druid (http://druid.io/,
> >>> https://github.com/druid-io/druid/pull/3682) and am running into a
> >> problem
> >>> that is making me wonder if the approach I'm using makes sense.
> >>>
> >>> Consider the expression EXTRACT(YEAR FROM __time). Calcite has a
> standard
> >>> convertlet rule "convertExtract" that changes this into some arithmetic
> >> on
> >>> __time casted to an int type. But Druid has some builtin functions to
> do
> >>> this, and I'd rather use those than arithmetic (for a bunch of
> reasons).
> >>> Ideally, in my RelOptRules that convert Calcite rels to Druid queries,
> >> I'd
> >>> see the EXTRACT as a normal RexCall with the time flag and an
> expression
> >> to
> >>> apply it to. That's a lot easier to translate than the arithmetic
> stuff,
> >>> which I'd have to pattern match and undo first before translating.
> 

Re: Find Monotonic Column in GROUP BY

2016-11-23 Thread Julian Hyde
Use collations, which are a kind of metadata:

  RelNode r;
  RelMetadataQuery mq = RelMetadataQuery.instance();
  List collations = mq.collations(r);

This example creates a RelMetadataQuery instance, but a RelMetadataQuery 
instance is expensive to create, and contains data structures that cache 
intermediate results and prevent cycles.  So, if you already have a 
RelMetadataQuery instance (e.g. if you are implementing a metadata method) then 
use it rather than creating a new one.

There are lots of other kinds of metadata, including lots of statistics. The 
methods on RelMetadataQuery[1] give you an idea of the built-in metadata, and 
you can also add your own metadata types.

Two things make “collations” of streams more complex:

1. It is the validator that determines whether a SQL query is valid. It works 
on the SqlNode tree, and information available from the catalog, before the 
first RelNode is created. The implication of this is that the monotonicity 
available to the validator is different (though hopefully not too different).

2. At present, we validate based on “is sorted”. In future, to deal with the 
variety of streaming systems, and even hybrid problems like continuous ETL, we 
will want to validate based on “could be sorted”. For example, if your orderId 
is allocated from parallel sequence generators that are never more than 5 
minutes apart, then someone could say “group by floor(orderId / 1000)” if they 
are prepared for their query to have a 5 minute latency.

These areas both need some work over the next months.

Julian

[1] 
https://calcite.apache.org/apidocs/org/apache/calcite/rel/metadata/RelMetadataQuery.html
 


> On Nov 23, 2016, at 4:04 AM, Chinmay Kolhatkar  wrote:
> 
> Dear Community,
> 
> I'm trying to add support for GROUP BY clause in Apache Apex-Calcite
> integration.
> 
> I am assuming that calcite knows which is the monotonic column because
> query fails to parse if there is no monotonic column present in group set.
> 
> Is there any way to find out which is the monotonic column in the GROUP BY
> clause from Aggregate/LogicalAggregate object?
> 
> Thanks,
> Chinmay.



[jira] [Created] (CALCITE-1506) Push OVER Clause to underlying SQL via JDBC adapter

2016-11-23 Thread Christian Tzolov (JIRA)
Christian Tzolov created CALCITE-1506:
-

 Summary: Push OVER Clause to underlying SQL via JDBC adapter
 Key: CALCITE-1506
 URL: https://issues.apache.org/jira/browse/CALCITE-1506
 Project: Calcite
  Issue Type: Bug
  Components: jdbc-adapter
Affects Versions: 1.10.0
Reporter: Christian Tzolov
Assignee: Julian Hyde


The jdbc adapter adapter should push down the OVER clause  for all dialects 
that support window functions. 

At the moment the Rel to SQL conversion ignores the 'OVER(...)'. The RexOver 
expression is treated as a plain RexCall and the RexOver#window attribute is 
not converted into SQL. 

For example if the following sql query (using Postgres dialect): 
{code:sql}
SELECT "id", "device_id", "transaction_value", "account_id", "ts_millis", 
MAX("ts_millis") OVER(partition by "device_id") as "last_version_number" 
FROM "HAWQ"."transaction"
WHERE "device_id" = 1445
{code}
is pushed down to the jdbc like this:
{code:sql}
SELECT "id", "device_id", "transaction_value", "account_id", "ts_millis", 
MAX("ts_millis") AS "last_version_number"
FROM "transaction"
WHERE "device_id" = 1445
{code}
The OVER clause is completely dropped!  Here is the plan:
{code}
JdbcToEnumerableConverter
  JdbcProject(id=[$0], device_id=[$1], transaction_value=[$2], account_id=[$3], 
ts_millis=[$4], last_version_number=[MAX($4) OVER (PARTITION BY $1 RANGE 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)])
JdbcFilter(condition=[=($1, 1445)])
  JdbcTableScan(table=[[HAWQ, transaction]])
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Find Monotonic Column in GROUP BY

2016-11-23 Thread Chinmay Kolhatkar
Dear Community,

I'm trying to add support for GROUP BY clause in Apache Apex-Calcite
integration.

I am assuming that calcite knows which is the monotonic column because
query fails to parse if there is no monotonic column present in group set.

Is there any way to find out which is the monotonic column in the GROUP BY
clause from Aggregate/LogicalAggregate object?

Thanks,
Chinmay.