[jira] [Created] (CALCITE-4594) Interpreter returns wrong result when Values has zero fields

2021-05-03 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-4594:


 Summary: Interpreter returns wrong result when Values has zero 
fields
 Key: CALCITE-4594
 URL: https://issues.apache.org/jira/browse/CALCITE-4594
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Interpreter returns wrong result when Values has zero fields. It returns zero 
rows rather than N rows of zero columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: RelFieldTrimmer throws an exception in certain cases

2021-05-03 Thread Julian Hyde
Wow, I just ran into another. ValuesNode.createRows (in the interpreter) [1]. 
What a coincidence!

Julian

[1] 
https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/interpreter/ValuesNode.java#L56
 

 

> On May 3, 2021, at 2:55 PM, Julian Hyde  wrote:
> 
> I am inclined to be brave and idealistic: officially allow records with no 
> fields, and add enough tests that we know that the existing rules can handle 
> them.
> 
> Of course there is a non-zero risk that we will break some things. But these 
> things are already broken, because we zero-field RelNodes can still crop up.
> 
> One place I know that will need to be fixed is 
> 
>ImmutableList RelBuilder.tupleList(int columnCount, Object[] values)
> 
> because is divides by columnCount.
> 
> Julian
> 
> 
>> On May 3, 2021, at 8:55 AM, Konstantin Orlov  wrote:
>> 
>> Hello,
>> 
>> Another case where it will look a bit cleaner with an empty record is an 
>> insert with values. 
>> Currently a query like INSERT INTO PROJECT(projectId, name) VALUES (?, ?) 
>> have a plan like follow:
>> 
>> TableModify(table=[[PUBLIC, PROJECT]], operation=[INSERT], flattened=[false])
>> Project(PROJECTID=[?1], NAME=[?2])
>>   Values(tuples=[[{ 0 }]])
>> 
>> 
>>> @Konstantin how did you come with the empty aggregate? Was it also a result
>>> of trimming?
>> 
>> The empty aggregate is created for query like this: SELECT 42 FROM my_table 
>> HAVING 82 > 42.
>> Although this query is quite synthetic, it’s still grammatically correct, 
>> and we need to be able to handle it.
>> 
>> Personally I tend to think that allow the empty records is a better option. 
>> For now there is a code that supposed
>> to overcome introduced limitation, but causes another issues.
>> 
>> But since the benefits are not clear and the effort required seems to be 
>> bigger, the more rational option is to ban them.
>> 
>> -- 
>> Regards,
>> Konstantin Orlov
>> 
>> 
>> 
>> 
>>> On 2 May 2021, at 01:30, Julian Hyde  wrote:
>>> 
>>> Here's a valid SQL query where an empty projection naturally arises:
>>> 
>>> SELECT d.dname
>>> FROM dept AS d
>>>  JOIN emp AS e ON e.deptno = d.deptno
>>> WHERE d.deptno = 10
>>> 
>>> You want the name of department 10 to be printed N times, where N is
>>> the number of employees in the department, but you don't need any
>>> attributes from the employee table. So it can be optimized to the
>>> following pseudo-SQL:
>>> 
>>> SELECT d.dname
>>> FROM (
>>>  SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
>>> CROSS JOIN (
>>>  SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e
>>> 
>>> If there are 4 employees in department 10, then the emp side of this
>>> query would return 4 empty records:
>>> 
>>> ()
>>> ()
>>> ()
>>> ()
>>> 
>>> An Aggregate with no group keys and no aggregate calls is less
>>> interesting, because it always returns one row, and can therefore be
>>> eliminated.
>>> 
>>> Julian
>>> 
>>> On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis  
>>> wrote:
 
 Hello,
 
 The first operator that comes to mind when we are talking about empty
 records is a projection that projects nothing. In theory and in most
 database books the projection is allowed to have arity 0 but in practice I
 never had to use it; neither aggregate nor any other operator. In fact an
 aggregate with empty group keys and aggregate calls resembles a lot an
 empty projection.
 
 @Konstantin how did you come with the empty aggregate? Was it also a result
 of trimming?
 
 If we have to make a decision now I would prefer to disallow them globally
 unless somebody comes up with a compelling use-case.
 In order to express other query languages (e.g,, datalog, conjunctive
 queries, etc) with the algebraic operators we may need 0-arity operators
 but I don't have an example readily in mind.
 
 Best,
 Stamatis
 
 On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde  wrote:
 
>> So the question is whether an empty record is a legal row type for an
> aggregation node?
> 
> As that comment indicates, we have tried to avoid empty records — that is,
> a relational expression that produces a row type with zero fields — but as
> you have just discovered, we have failed to go all the way.
> 
> Mathematically, it is purer to allow empty records. SQL does not allow
> them, they crop up naturally in quite a lot of corner cases, especially
> after trimming fields.
> 
> Pragmatically, I assumed that quite a lot of code was making the
> assumption that records were not empty. And that empty records are
> sufficiently rare that we would never be able to find all of those places
> via testing.
> 
> Is it time to decide? If we allow 

Re: How to match root node in a HEP Rule?

2021-05-03 Thread Haisheng Yuan
Hi Ian,

Is there any specific reason or use case that you have to match the root node 
and find the parent node in your customized rule?

Thanks,
Haisheng Yuan

On 2021/05/03 20:20:22, Julian Hyde  wrote: 
> > Is there a way to identify a node as being a root node during RelRule.match?
> 
> Not currently. One workaround would be to create 'class MyRoot extends
> SingleRel', add it as the root of the tree, and write your rules to
> match on it.
> 
> > Or more generally, is there a way to collect the parents of an arbitrary 
> > RelNode?
> 
> By design, the only way to find the parents of a RelNode are to write
> a rule that matches those parents. This ensures that rules will work
> in the Volcano planner, where a RelNode can have many parents.
> 
> Julian
> 
> On Mon, May 3, 2021 at 12:52 PM Ian Bertolacci
>  wrote:
> >
> > Hello,
> > I am trying to write a rule that matches (among other things) the root of a 
> > RelNode tree.
> >
> > Unfortunately, it seems that HepRuleCall.parents is only initialized and 
> > populated if the rule extends CommonRelSubExpr, which doesn’t really apply 
> > in my case.
> > (see: 
> > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L526)
> >
> > Is there a way to identify a node as being a root node during 
> > RelRule.match? Or more generally, is there a way to collect the parents of 
> > an arbitrary RelNode?
> >
> > Thanks.
> 


Re: RelFieldTrimmer throws an exception in certain cases

2021-05-03 Thread Julian Hyde
I am inclined to be brave and idealistic: officially allow records with no 
fields, and add enough tests that we know that the existing rules can handle 
them.

Of course there is a non-zero risk that we will break some things. But these 
things are already broken, because we zero-field RelNodes can still crop up.

One place I know that will need to be fixed is 

ImmutableList RelBuilder.tupleList(int columnCount, Object[] values)

because is divides by columnCount.

Julian


> On May 3, 2021, at 8:55 AM, Konstantin Orlov  wrote:
> 
> Hello,
> 
> Another case where it will look a bit cleaner with an empty record is an 
> insert with values. 
> Currently a query like INSERT INTO PROJECT(projectId, name) VALUES (?, ?) 
> have a plan like follow:
> 
> TableModify(table=[[PUBLIC, PROJECT]], operation=[INSERT], flattened=[false])
>  Project(PROJECTID=[?1], NAME=[?2])
>Values(tuples=[[{ 0 }]])
> 
> 
>> @Konstantin how did you come with the empty aggregate? Was it also a result
>> of trimming?
> 
> The empty aggregate is created for query like this: SELECT 42 FROM my_table 
> HAVING 82 > 42.
> Although this query is quite synthetic, it’s still grammatically correct, and 
> we need to be able to handle it.
> 
> Personally I tend to think that allow the empty records is a better option. 
> For now there is a code that supposed
> to overcome introduced limitation, but causes another issues.
> 
> But since the benefits are not clear and the effort required seems to be 
> bigger, the more rational option is to ban them.
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
> 
>> On 2 May 2021, at 01:30, Julian Hyde  wrote:
>> 
>> Here's a valid SQL query where an empty projection naturally arises:
>> 
>> SELECT d.dname
>> FROM dept AS d
>>   JOIN emp AS e ON e.deptno = d.deptno
>> WHERE d.deptno = 10
>> 
>> You want the name of department 10 to be printed N times, where N is
>> the number of employees in the department, but you don't need any
>> attributes from the employee table. So it can be optimized to the
>> following pseudo-SQL:
>> 
>> SELECT d.dname
>> FROM (
>>   SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
>> CROSS JOIN (
>>   SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e
>> 
>> If there are 4 employees in department 10, then the emp side of this
>> query would return 4 empty records:
>> 
>> ()
>> ()
>> ()
>> ()
>> 
>> An Aggregate with no group keys and no aggregate calls is less
>> interesting, because it always returns one row, and can therefore be
>> eliminated.
>> 
>> Julian
>> 
>> On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis  
>> wrote:
>>> 
>>> Hello,
>>> 
>>> The first operator that comes to mind when we are talking about empty
>>> records is a projection that projects nothing. In theory and in most
>>> database books the projection is allowed to have arity 0 but in practice I
>>> never had to use it; neither aggregate nor any other operator. In fact an
>>> aggregate with empty group keys and aggregate calls resembles a lot an
>>> empty projection.
>>> 
>>> @Konstantin how did you come with the empty aggregate? Was it also a result
>>> of trimming?
>>> 
>>> If we have to make a decision now I would prefer to disallow them globally
>>> unless somebody comes up with a compelling use-case.
>>> In order to express other query languages (e.g,, datalog, conjunctive
>>> queries, etc) with the algebraic operators we may need 0-arity operators
>>> but I don't have an example readily in mind.
>>> 
>>> Best,
>>> Stamatis
>>> 
>>> On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde  wrote:
>>> 
> So the question is whether an empty record is a legal row type for an
 aggregation node?
 
 As that comment indicates, we have tried to avoid empty records — that is,
 a relational expression that produces a row type with zero fields — but as
 you have just discovered, we have failed to go all the way.
 
 Mathematically, it is purer to allow empty records. SQL does not allow
 them, they crop up naturally in quite a lot of corner cases, especially
 after trimming fields.
 
 Pragmatically, I assumed that quite a lot of code was making the
 assumption that records were not empty. And that empty records are
 sufficiently rare that we would never be able to find all of those places
 via testing.
 
 Is it time to decide? If we allow empty records, we should test that all
 relational operators can handle them. If we ban them, then we should (say)
 throw whenever someone registers a RelNode that has an empty row type.
 
 Julian
 
 
 
 
> On Apr 29, 2021, at 7:25 AM, Konstantin Orlov 
 wrote:
> 
> Hi all.
> 
> I faced a problem preventing certain queries being planned because
 RelFieldTrimmer throws
> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
 for length 0”.
> 
> The problem is here [1]:
> 
>  // If they are asking for no 

Re: How to match root node in a HEP Rule?

2021-05-03 Thread Julian Hyde
> Is there a way to identify a node as being a root node during RelRule.match?

Not currently. One workaround would be to create 'class MyRoot extends
SingleRel', add it as the root of the tree, and write your rules to
match on it.

> Or more generally, is there a way to collect the parents of an arbitrary 
> RelNode?

By design, the only way to find the parents of a RelNode are to write
a rule that matches those parents. This ensures that rules will work
in the Volcano planner, where a RelNode can have many parents.

Julian

On Mon, May 3, 2021 at 12:52 PM Ian Bertolacci
 wrote:
>
> Hello,
> I am trying to write a rule that matches (among other things) the root of a 
> RelNode tree.
>
> Unfortunately, it seems that HepRuleCall.parents is only initialized and 
> populated if the rule extends CommonRelSubExpr, which doesn’t really apply in 
> my case.
> (see: 
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L526)
>
> Is there a way to identify a node as being a root node during RelRule.match? 
> Or more generally, is there a way to collect the parents of an arbitrary 
> RelNode?
>
> Thanks.


Re: Support for JSON path expressions in select queries.

2021-05-03 Thread Julian Hyde
If we add native JSON syntax to SQL, it is very likely that there will
be ambiguities. I think our best path for JSON support is to follow
the SQL standard (and perhaps some extensions made by other major
DBs), which consists of a set of functions to handle JSON. Quite a few
of those functions accept arguments that are JSON paths. For example,
here is Oracle's documentation for the JSON_EXISTS function: [1];
Calcite supports JSON_EXISTS too.

You could try creating a variant of Calcite's SQL parser that accepts
a wider syntax for expressions and desugars to the standard JSON_xxx
functions. If that parser is successful we could perhaps merge it into
the main parser, with the extended syntax enabled by setting a flag.

Julian

[1] 
https://docs.oracle.com/en/database/oracle/oracle-database/18/adjsn/condition-JSON_EXISTS.html

On Mon, May 3, 2021 at 11:25 AM Amrish Lal  wrote:
>
> Hello,
>
> Calcite is used as the parsing layer of a database I am working on. I
> noticed that calcite support dot notation and array subscripts in
> identifiers as in:
>
> SELECT json_column.person.name.email[5] FROM table
>
> This allows for writing rudimentary Json Path expressions. However, some
> support is still missing. For example, the following queries will give
> parsing errors.
>
> SELECT json_column.person.name.email[*] FROM table
> SELECT json_column.person.name.email[:2] FROM table
> SELECT json_column.person.name.email[1,3] FROM table
> SELECT json_column..email FROM table
> SELECT json_column.person..name FROM table
> etc... (more json path examples here
> 
> )
>
> From what I can see, this shouldn't be very complicated to add and will
> mainly require accepting a wider range of characters in SELECT list
> expression values.
>
> We only need basic json path expression support for now (dot operator and
> array subscript operator) which calcite seems to already support, but would
> like to add further json path expression support in future. I am wondering
> if Calcite is open to further supporting json path expressions in SELECT
> and WHERE clause expression list?


How to match root node in a HEP Rule?

2021-05-03 Thread Ian Bertolacci
Hello,
I am trying to write a rule that matches (among other things) the root of a 
RelNode tree.

Unfortunately, it seems that HepRuleCall.parents is only initialized and 
populated if the rule extends CommonRelSubExpr, which doesn’t really apply in 
my case.
(see: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L526)

Is there a way to identify a node as being a root node during RelRule.match? Or 
more generally, is there a way to collect the parents of an arbitrary RelNode?

Thanks.


Support for JSON path expressions in select queries.

2021-05-03 Thread Amrish Lal
Hello,

Calcite is used as the parsing layer of a database I am working on. I
noticed that calcite support dot notation and array subscripts in
identifiers as in:

SELECT json_column.person.name.email[5] FROM table

This allows for writing rudimentary Json Path expressions. However, some
support is still missing. For example, the following queries will give
parsing errors.

SELECT json_column.person.name.email[*] FROM table
SELECT json_column.person.name.email[:2] FROM table
SELECT json_column.person.name.email[1,3] FROM table
SELECT json_column..email FROM table
SELECT json_column.person..name FROM table
etc... (more json path examples here

)

>From what I can see, this shouldn't be very complicated to add and will
mainly require accepting a wider range of characters in SELECT list
expression values.

We only need basic json path expression support for now (dot operator and
array subscript operator) which calcite seems to already support, but would
like to add further json path expression support in future. I am wondering
if Calcite is open to further supporting json path expressions in SELECT
and WHERE clause expression list?


Re: RelFieldTrimmer throws an exception in certain cases

2021-05-03 Thread Konstantin Orlov
Hello,

Another case where it will look a bit cleaner with an empty record is an insert 
with values. 
Currently a query like INSERT INTO PROJECT(projectId, name) VALUES (?, ?) have 
a plan like follow:

TableModify(table=[[PUBLIC, PROJECT]], operation=[INSERT], flattened=[false])
  Project(PROJECTID=[?1], NAME=[?2])
Values(tuples=[[{ 0 }]])


> @Konstantin how did you come with the empty aggregate? Was it also a result
> of trimming?

The empty aggregate is created for query like this: SELECT 42 FROM my_table 
HAVING 82 > 42.
Although this query is quite synthetic, it’s still grammatically correct, and 
we need to be able to handle it.

Personally I tend to think that allow the empty records is a better option. For 
now there is a code that supposed
to overcome introduced limitation, but causes another issues.

But since the benefits are not clear and the effort required seems to be 
bigger, the more rational option is to ban them.

-- 
Regards,
Konstantin Orlov




> On 2 May 2021, at 01:30, Julian Hyde  wrote:
> 
> Here's a valid SQL query where an empty projection naturally arises:
> 
>  SELECT d.dname
>  FROM dept AS d
>JOIN emp AS e ON e.deptno = d.deptno
>  WHERE d.deptno = 10
> 
> You want the name of department 10 to be printed N times, where N is
> the number of employees in the department, but you don't need any
> attributes from the employee table. So it can be optimized to the
> following pseudo-SQL:
> 
>  SELECT d.dname
>  FROM (
>SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
>  CROSS JOIN (
>SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e
> 
> If there are 4 employees in department 10, then the emp side of this
> query would return 4 empty records:
> 
>  ()
>  ()
>  ()
>  ()
> 
> An Aggregate with no group keys and no aggregate calls is less
> interesting, because it always returns one row, and can therefore be
> eliminated.
> 
> Julian
> 
> On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis  wrote:
>> 
>> Hello,
>> 
>> The first operator that comes to mind when we are talking about empty
>> records is a projection that projects nothing. In theory and in most
>> database books the projection is allowed to have arity 0 but in practice I
>> never had to use it; neither aggregate nor any other operator. In fact an
>> aggregate with empty group keys and aggregate calls resembles a lot an
>> empty projection.
>> 
>> @Konstantin how did you come with the empty aggregate? Was it also a result
>> of trimming?
>> 
>> If we have to make a decision now I would prefer to disallow them globally
>> unless somebody comes up with a compelling use-case.
>> In order to express other query languages (e.g,, datalog, conjunctive
>> queries, etc) with the algebraic operators we may need 0-arity operators
>> but I don't have an example readily in mind.
>> 
>> Best,
>> Stamatis
>> 
>> On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde  wrote:
>> 
 So the question is whether an empty record is a legal row type for an
>>> aggregation node?
>>> 
>>> As that comment indicates, we have tried to avoid empty records — that is,
>>> a relational expression that produces a row type with zero fields — but as
>>> you have just discovered, we have failed to go all the way.
>>> 
>>> Mathematically, it is purer to allow empty records. SQL does not allow
>>> them, they crop up naturally in quite a lot of corner cases, especially
>>> after trimming fields.
>>> 
>>> Pragmatically, I assumed that quite a lot of code was making the
>>> assumption that records were not empty. And that empty records are
>>> sufficiently rare that we would never be able to find all of those places
>>> via testing.
>>> 
>>> Is it time to decide? If we allow empty records, we should test that all
>>> relational operators can handle them. If we ban them, then we should (say)
>>> throw whenever someone registers a RelNode that has an empty row type.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>> 
 On Apr 29, 2021, at 7:25 AM, Konstantin Orlov 
>>> wrote:
 
 Hi all.
 
 I faced a problem preventing certain queries being planned because
>>> RelFieldTrimmer throws
 an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
>>> for length 0”.
 
 The problem is here [1]:
 
   // If they are asking for no fields, we can't give them what they
>>> want,
   // because zero-column records are illegal. Give them the last field,
   // which is unlikely to be a system field.
   if (fieldsUsed.isEmpty()) {
 fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
   }
 
 In case fieldsUsed.isEmpty we returns last field, but it is currently
>>> possible that fieldCount=0 as well.
 
 After some investigation I find out that the reason is empty record
>>> derived as row type for Aggregate.
 It is possible when an aggregate has an empty group key and no aggregate
>>> calls.
 
 So the question is whether an empty record is a legal row type for