[jira] [Created] (CALCITE-2630) Convert SqlInOperator to a IN-Function

2018-10-17 Thread pengzhiwei (JIRA)
pengzhiwei created CALCITE-2630:
---

 Summary: Convert SqlInOperator to a IN-Function
 Key: CALCITE-2630
 URL: https://issues.apache.org/jira/browse/CALCITE-2630
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.17.0
Reporter: pengzhiwei
Assignee: Julian Hyde


Currently Calcite translate "IN" to "OR" expression when the count of  IN's 
operands less than "inSubQueryThreshold" and to "Join" when the count greater  
than "inSubQueryThreshold" to get better performance.

  However this translation to "JOIN" is so complex,especially when the "IN" 
expression located in the "select" or "join on condition".For example:

 
{code:java}
select case when deptno in (1,2) then 0 else 1 end from emp
{code}
the logical plan generated as follow:

 

 
{code:java}
LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true, 
IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
LogicalJoin(condition=[=($11, $12)], joinType=[left])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], 
DEPTNO0=[$7])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])
LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])

{code}
The generated logical plan is so complex for such a simple sql!

 

I think we can treat "IN" as a function like "plus" and "minus".And there is no 
translation spending on "IN" just keep it as it is.This would be much clear in 
the logical plan!

In the compute stage,We can provide a "InExpression":

 
{code:java}
InExpression(left,condition0,condition1,...){code}
 

 We can put all the constant conditions to a "Set".In that way,the 
computational complexity can reduce from O(n) to O(1).

It would be much clear and have a good performance.And we have implement it in 
our streaming-sql system.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Exception-handling in built-in functions

2018-10-17 Thread Jesus Camacho Rodriguez
I do not believe there is enough reason to block CALCITE-525. IMO, CALCITE-525 
describes a problem that some Calcite users are facing and a reasonable 
plugable solution. We should not be vetoing such a feature without providing 
viable alternatives. (Without having checked the specific implementation 
details, I prefer approach B described below as it is less intrusive. And A 
should be fixed in a different issue.)
I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or 
Postgres, and it has always tried to provide flexibility to underlying engines, 
one of the reasons for its wide adoption. In addition, systems are not forced 
to use this feature, it is tagged as experimental and by default we are still 
running in same mode. I believe that is sufficient.
Personally, I will not be happy if a developer feels compelled to fork Calcite 
or stop contributing code because we do not accept features such as the one 
described there.

Thanks,
Jesús


On 10/17/18, 5:17 PM, "Michael Mior"  wrote:

My apologies for missing this thread a couple days ago. (Thanks for pinging
it.) Here's my two cents: taking care of contributors to the project is
just as important (if not more important) than taking care of the code. I'm
not saying we should merge terrible code just to keep each other happy, but
I don't think that's the case here. If anyone writes some code which you
disagree with, you should be free to voice your disagreement. However,
especially when the code is from a core contributor and the argument
focuses on potential future problems, I think it's important to consider
that people who have shown dedication to the project over the years are
very likely to be around and willing to fix these problems as they arise.

Code which turns out to cause problems can always be deleted, reverted,
refactored, etc. It's much harder to back out when a contributor is burned
out or interpersonal conflicts get heated.

--
Michael Mior
mm...@apache.org


Le mer. 17 oct. 2018 à 14:58, Julian Hyde  a écrit :

> Vladimir,
>
> You’ve made your points. And I hear them.
>
> However I get the impression that you are not open to persuasion. Which
> means that I am wasting my time trying to reach consensus with you. Which
> means that people win arguments not on merit, but based upon who is most
> persistent.
>
> Here is my point. Calcite's goal is not to re-create what Oracle or
> PostgreSQL did ten years later. It is a platform that allows people to
> write their own data engine. If they want to redefine the “+” operator 
such
> that 2 + 2 returns 5, the platform should allow it.
>
> Certainly if they want to engineer their own error-handling strategy, we
> should let them do it. I didn’t have the energy to find an example of a 
SQL
> engine that discards rows with divide-by-zero errors, but I believe there
> is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
> engines that I have worked on that performed ETL-like tasks, all had that
> capability. And all ETL tools have very flexible error-handling 
strategies.
> They are not SQL-based, but Calcite is not exclusively for SQL systems.
>
> I have been designing and building world-class data engines for 30 years.
> Please take me on good faith that a flexible error-handing strategy is a
> good idea. Don’t force me to bicker over email for hours and hours. When a
> long discussion leads to the rejection of a contribution, I get
> considerably closer to burning out.
>
> Julian
>
>
> > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> >
> > Juilian>Hey, folks. We need your input here.
> >
> > Here are my thoughts:
> > 1) I think the features we add should have at least some level of
> > consistency
> > 2) It is much safer to adopt well-known features rather than be pioneers
> in
> > the field. I do not mean we must wait for someone else implement and try
> > out a feature, however I would not rush for implementing a feature that
> > no-one else explored.
> >
> > CALCITE-525 has two key points:
> > A) Current implementation of enumerable factors code like 0/0 to a 
static
> > field of a generated code. It causes the generated code to fail at load
> > time even before the query is executed.
> > Of course that is a bug, and I'm even inclined to remove that "static
> > fields"
> >
> > B) Someone (Hongze? Juilan?) suggest to implement a mode to silently
> ignore
> > the error (e.g. by ignoring the row or by returning default value).
> > First of all, I don't think "ignore the row" kind of processing would do
> > any good to the user since it would not be possible to predict the
> output.
> > "ignore the 

Re: Exception-handling in built-in functions

2018-10-17 Thread Michael Mior
My apologies for missing this thread a couple days ago. (Thanks for pinging
it.) Here's my two cents: taking care of contributors to the project is
just as important (if not more important) than taking care of the code. I'm
not saying we should merge terrible code just to keep each other happy, but
I don't think that's the case here. If anyone writes some code which you
disagree with, you should be free to voice your disagreement. However,
especially when the code is from a core contributor and the argument
focuses on potential future problems, I think it's important to consider
that people who have shown dedication to the project over the years are
very likely to be around and willing to fix these problems as they arise.

Code which turns out to cause problems can always be deleted, reverted,
refactored, etc. It's much harder to back out when a contributor is burned
out or interpersonal conflicts get heated.

--
Michael Mior
mm...@apache.org


Le mer. 17 oct. 2018 à 14:58, Julian Hyde  a écrit :

> Vladimir,
>
> You’ve made your points. And I hear them.
>
> However I get the impression that you are not open to persuasion. Which
> means that I am wasting my time trying to reach consensus with you. Which
> means that people win arguments not on merit, but based upon who is most
> persistent.
>
> Here is my point. Calcite's goal is not to re-create what Oracle or
> PostgreSQL did ten years later. It is a platform that allows people to
> write their own data engine. If they want to redefine the “+” operator such
> that 2 + 2 returns 5, the platform should allow it.
>
> Certainly if they want to engineer their own error-handling strategy, we
> should let them do it. I didn’t have the energy to find an example of a SQL
> engine that discards rows with divide-by-zero errors, but I believe there
> is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
> engines that I have worked on that performed ETL-like tasks, all had that
> capability. And all ETL tools have very flexible error-handling strategies.
> They are not SQL-based, but Calcite is not exclusively for SQL systems.
>
> I have been designing and building world-class data engines for 30 years.
> Please take me on good faith that a flexible error-handing strategy is a
> good idea. Don’t force me to bicker over email for hours and hours. When a
> long discussion leads to the rejection of a contribution, I get
> considerably closer to burning out.
>
> Julian
>
>
> > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
> >
> > Juilian>Hey, folks. We need your input here.
> >
> > Here are my thoughts:
> > 1) I think the features we add should have at least some level of
> > consistency
> > 2) It is much safer to adopt well-known features rather than be pioneers
> in
> > the field. I do not mean we must wait for someone else implement and try
> > out a feature, however I would not rush for implementing a feature that
> > no-one else explored.
> >
> > CALCITE-525 has two key points:
> > A) Current implementation of enumerable factors code like 0/0 to a static
> > field of a generated code. It causes the generated code to fail at load
> > time even before the query is executed.
> > Of course that is a bug, and I'm even inclined to remove that "static
> > fields"
> >
> > B) Someone (Hongze? Juilan?) suggest to implement a mode to silently
> ignore
> > the error (e.g. by ignoring the row or by returning default value).
> > First of all, I don't think "ignore the row" kind of processing would do
> > any good to the user since it would not be possible to predict the
> output.
> > "ignore the row" is very tricky when join/semijoin/antijoin is there.
> >
> > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think
> we
> > should not rush for it.
> >
> > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
> > 0)  EMPTY ON ERROR  kind of functions.
> > That enables to confine the scope of the error, however I don't think it
> > would be used often (does that mean one would have to wrap each
> > expression?), and this "catch error" would be non-trivial to propagate to
> > the downstream executors.
> > On top of that, we might end up inventing full-blown
> try-catch-catch-catch
> > syntax.
> >
> > I truly see no business value in implementing B/C, however I do see the
> > pain it would introduce. It would complicate Calcite maintenance. "B"
> could
> > silently produce wrong results, and I'm sure we don't want get results
> out
> > of thin air.
> >
> > Vladimir
>
>


[jira] [Created] (CALCITE-2629) Unnecessary call to CatalogReader#getAllSchemaObjects in CatalogScope

2018-10-17 Thread Laurent Goujon (JIRA)
Laurent Goujon created CALCITE-2629:
---

 Summary: Unnecessary call to CatalogReader#getAllSchemaObjects in 
CatalogScope
 Key: CALCITE-2629
 URL: https://issues.apache.org/jira/browse/CALCITE-2629
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Laurent Goujon
Assignee: Laurent Goujon


CatalogScope constructor does a call to CatalogReader#getAllSchemaObjects and 
store its result into a private field which has no accessor. This call is 
actually expensive in our system, and is causing some performance issue during 
planning



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Calcite-Master - Build # 943 - Still Failing

2018-10-17 Thread Apache Jenkins Server
The Apache Jenkins build system has built Calcite-Master (build #943)

Status: Still Failing

Check console output at https://builds.apache.org/job/Calcite-Master/943/ to 
view the results.

Calcite-Master - Build # 942 - Still Failing

2018-10-17 Thread Apache Jenkins Server
The Apache Jenkins build system has built Calcite-Master (build #942)

Status: Still Failing

Check console output at https://builds.apache.org/job/Calcite-Master/942/ to 
view the results.

Re: Exception-handling in built-in functions

2018-10-17 Thread Julian Hyde
Vladimir,

You’ve made your points. And I hear them.

However I get the impression that you are not open to persuasion. Which means 
that I am wasting my time trying to reach consensus with you. Which means that 
people win arguments not on merit, but based upon who is most persistent.

Here is my point. Calcite's goal is not to re-create what Oracle or PostgreSQL 
did ten years later. It is a platform that allows people to write their own 
data engine. If they want to redefine the “+” operator such that 2 + 2 returns 
5, the platform should allow it.

Certainly if they want to engineer their own error-handling strategy, we should 
let them do it. I didn’t have the energy to find an example of a SQL engine 
that discards rows with divide-by-zero errors, but I believe there is one. I 
suspect that both Broadbase, SQLstream and Hive, three SQL engines that I have 
worked on that performed ETL-like tasks, all had that capability. And all ETL 
tools have very flexible error-handling strategies. They are not SQL-based, but 
Calcite is not exclusively for SQL systems.

I have been designing and building world-class data engines for 30 years. 
Please take me on good faith that a flexible error-handing strategy is a good 
idea. Don’t force me to bicker over email for hours and hours. When a long 
discussion leads to the rejection of a contribution, I get considerably closer 
to burning out.

Julian


> On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov  
> wrote:
> 
> Juilian>Hey, folks. We need your input here.
> 
> Here are my thoughts:
> 1) I think the features we add should have at least some level of
> consistency
> 2) It is much safer to adopt well-known features rather than be pioneers in
> the field. I do not mean we must wait for someone else implement and try
> out a feature, however I would not rush for implementing a feature that
> no-one else explored.
> 
> CALCITE-525 has two key points:
> A) Current implementation of enumerable factors code like 0/0 to a static
> field of a generated code. It causes the generated code to fail at load
> time even before the query is executed.
> Of course that is a bug, and I'm even inclined to remove that "static
> fields"
> 
> B) Someone (Hongze? Juilan?) suggest to implement a mode to silently ignore
> the error (e.g. by ignoring the row or by returning default value).
> First of all, I don't think "ignore the row" kind of processing would do
> any good to the user since it would not be possible to predict the output.
> "ignore the row" is very tricky when join/semijoin/antijoin is there.
> 
> I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think we
> should not rush for it.
> 
> C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
> 0)  EMPTY ON ERROR  kind of functions.
> That enables to confine the scope of the error, however I don't think it
> would be used often (does that mean one would have to wrap each
> expression?), and this "catch error" would be non-trivial to propagate to
> the downstream executors.
> On top of that, we might end up inventing full-blown try-catch-catch-catch
> syntax.
> 
> I truly see no business value in implementing B/C, however I do see the
> pain it would introduce. It would complicate Calcite maintenance. "B" could
> silently produce wrong results, and I'm sure we don't want get results out
> of thin air.
> 
> Vladimir



Re: Exception-handling in built-in functions

2018-10-17 Thread Vladimir Sitnikov
Juilian>Hey, folks. We need your input here.

Here are my thoughts:
1) I think the features we add should have at least some level of
consistency
2) It is much safer to adopt well-known features rather than be pioneers in
the field. I do not mean we must wait for someone else implement and try
out a feature, however I would not rush for implementing a feature that
no-one else explored.

CALCITE-525 has two key points:
A) Current implementation of enumerable factors code like 0/0 to a static
field of a generated code. It causes the generated code to fail at load
time even before the query is executed.
Of course that is a bug, and I'm even inclined to remove that "static
fields"

B) Someone (Hongze? Juilan?) suggest to implement a mode to silently ignore
the error (e.g. by ignoring the row or by returning default value).
First of all, I don't think "ignore the row" kind of processing would do
any good to the user since it would not be possible to predict the output.
"ignore the row" is very tricky when join/semijoin/antijoin is there.

I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think we
should not rush for it.

C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
0)  EMPTY ON ERROR  kind of functions.
That enables to confine the scope of the error, however I don't think it
would be used often (does that mean one would have to wrap each
expression?), and this "catch error" would be non-trivial to propagate to
the downstream executors.
On top of that, we might end up inventing full-blown try-catch-catch-catch
syntax.

I truly see no business value in implementing B/C, however I do see the
pain it would introduce. It would complicate Calcite maintenance. "B" could
silently produce wrong results, and I'm sure we don't want get results out
of thin air.

Vladimir


Re: Exception-handling in built-in functions

2018-10-17 Thread Julian Hyde
Hey, folks. We need your input here. This is a crisis.

I do not want us to become a project where people rule by veto, override 
consensus, and (heaven forbid) back out each other’s commits. But this case is 
the third instance where this has happened (2438 and 2458 are the other 
instances, all within the past few weeks) and it is becoming a pattern. I am 
finding it less and less pleasant to interact with this project.

Julian



> On Oct 15, 2018, at 11:10 AM, Julian Hyde  wrote:
> 
> There’s a discussion on https://issues.apache.org/jira/browse/CALCITE-525 
> , "Exception-handling in 
> built-in functions” that seems to be heading for stalemate. There is a PR 
> that I am inclined to accept (with some modifications, probably), and based 
> on his comments so far, Vladimir seems to be set against.
> 
> We had our first commit veto from Vladimir a couple of months ago, and I 
> don’t want to have another.
> 
> How can we work through this and get to consensus?
> 
> Julian
>