[jira] [Comment Edited] (CALCITE-4085) Improve nullability support for fields of structured type

Dawid Wysakowicz (Jira) Wed, 24 Jun 2020 01:10:31 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143625#comment-17143625
 ]


Dawid Wysakowicz edited comment on CALCITE-4085 at 6/24/20, 8:09 AM:
---------------------------------------------------------------------

My end goal is to be able to have a nullable record with not null fields. E.g. 
{{USER(id NOT NULL) <NULLABLE>}}. That is because we might have a UDF that 
expects {{USER(id NOT NULL) <NULLABLE>}}. From that point of view the logic in 
RelDataTypeFactoryImpl[1] is incorrect as it changes the nullability of nested 
fields.

I'd like to be able to have a UDF:

{code}

function(<USER(id INT NOT NULL) NULLABLE>) 

{code}

I fully agree with 

??For expression "[A].[B]", if [A] is nullable then [A].[B] should also be 
nullable.??

I don't think though this should be fixed at the time when the RecordType is 
created, as it makes, in my opinion, a valid use case described in the first 
paragraph impossible. This logic in my understanding/opinion is very similar if 
not exactly the same as {{SqlTypeTransforms#TO_NULLABLE}}, and it does belong 
to every operator that accesses nested fields. Of course you cannot prevent 
people to add new operators, but again in my opinion this belongs to the 
operator logic the same as the mentioned {{SqlTypeTransforms#TO_NULLABLE}}.

This is incorrect:

??You want the int type to be nullable. But in the POJO code, you want to 
translate the id to int instead of integer. In my point, the type inference of 
current Calcite is very close to correct. What we should do is the QE (when we 
translate the POJO, to do a promotion in the code instead) which seems more 
feasible.??

I want the int to be {{NOT NULL}} and be translated to {{int}}. Calcite will 
convert it to {{NULLABLE}} because of the logic in [1].

>From my perspective changes in RexBuilder#makeFieldAccessInternal, 
>SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are necessary because 
>I don't want the logic in the type factory as it breaks my use case from the 
>first paragraph.

The changes in RexBuilder and SqlValidatorImpl are required for a query like 
{{SELECT u.nullable_row.not_null_nested_field FROM 
tableWithNullableRowColumn}}, because those do not use SqlDotOperator, but 
create a {{RexFieldAccess}}.  I split the PR into three commits because I do 
understand the changes in RexBuilder#makeFieldAccessInternal, 
SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are quite invasive and 
you might not agree to incorporate them. So if you are strongly against those I 
will just drop those from the PR.

I'd really like to see the other two commits merged. Still the changes in 
{{AliasNamespace}} make tests in {{TableFunctionTest}} fail imo because of the 
logic in [1].  The tests there join with lateral table generated from a UDF 
which return type is {{IntString(n INT NOT NULL, s String) <NULLABLE>}}. If I 
change the {{AliasNamespace}} to keep the original nullability(which imo is 
correct, I don't see a reason why aliasing should change nullability), then the 
logic in [1] changes the nullability of n which in turn causes mismatch.

[1] 
https://github.com/apache/calcite/blob/0769a8b31cbbeb5bca66ade30cf3710523da4aaa/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactoryImpl.java#L338


was (Author: dawidwys):
My end goal is to be able to have a nullable record with not null fields. E.g. 
{{USER(id NOT NULL) <NULLABLE>}}. That is because we might have a UDF that 
expects {{USER(id NOT NULL) <NULLABLE>}}. From that point of view the logic in 
RelDataTypeFactoryImpl[1] is incorrect as it changes the nullability of nested 
fields.

I fully agree with 

??For expression "[A].[B]", if [A] is nullable then [A].[B] should also be 
nullable.??

I don't think though this should be fixed at the time when the RecordType is 
created, as it makes, in my opinion, a valid use case described in the first 
paragraph impossible. This logic in my understanding/opinion is very similar if 
not exactly the same as {{SqlTypeTransforms#TO_NULLABLE}}, and it does belong 
to every operator that accesses nested fields. Of course you cannot prevent 
people to add new operators, but again in my opinion this belongs to the 
operator logic the same as the mentioned {{SqlTypeTransforms#TO_NULLABLE}}.

This is incorrect:

??You want the int type to be nullable. But in the POJO code, you want to 
translate the id to int instead of integer. In my point, the type inference of 
current Calcite is very close to correct. What we should do is the QE (when we 
translate the POJO, to do a promotion in the code instead) which seems more 
feasible.??

I want the int to be {{NOT NULL}} and be translated to {{int}}. Calcite will 
convert it to {{NULLABLE}} because of the logic in [1].

>From my perspective changes in RexBuilder#makeFieldAccessInternal, 
>SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are necessary because 
>I don't want the logic in the type factory as it breaks my use case from the 
>first paragraph.

The changes in RexBuilder and SqlValidatorImpl are required for a query like 
{{SELECT u.nullable_row.not_null_nested_field FROM 
tableWithNullableRowColumn}}, because those do not use SqlDotOperator, but 
create a {{RexFieldAccess}}.  I split the PR into three commits because I do 
understand the changes in RexBuilder#makeFieldAccessInternal, 
SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are quite invasive and 
you might not agree to incorporate them. So if you are strongly against those I 
will just drop those from the PR.

I'd really like to see the other two commits merged. Still the changes in 
{{AliasNamespace}} make tests in {{TableFunctionTest}} fail imo because of the 
logic in [1].  The tests there join with lateral table generated from a UDF 
which return type is {{IntString(n INT NOT NULL, s String) <NULLABLE>}}. If I 
change the {{AliasNamespace}} to keep the original nullability(which imo is 
correct, I don't see a reason why aliasing should change nullability), then the 
logic in [1] changes the nullability of n which in turn causes mismatch.

[1] 
https://github.com/apache/calcite/blob/0769a8b31cbbeb5bca66ade30cf3710523da4aaa/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactoryImpl.java#L338

> Improve nullability support for fields of structured type
> ---------------------------------------------------------
>
>                 Key: CALCITE-4085
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4085
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in 
> https://lists.apache.org/thread.html/r602ac95fff23dd1ef974fb396df7be061ab861384ec42f5c57ce0bc2%40%3Cdev.calcite.apache.org%3E
>  I would like to change the way a type of a field of a record is derived at 
> couple of locations. This helps frameworks such as Apache Flink to build 
> support for nullable records with not null fields.
> I suggest to change:
> * SqlDotOperator#deriveType
> * SqlItemOperator#inferReturnType
> * AliasNamespace#validateImpl
> * RexBuilder#makeFieldAccessInternal
> * SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (CALCITE-4085) Improve nullability support for fields of structured type

Reply via email to