Hi devs,

I wanted to reach out to the Apache Calcite community about an issue
with nullability of structured types we observed in Apache Flink.

First of all let me give you a quick introduction why it is important
for us. What we want to support eventually is that we want to be able to
map POJOs to structured types.

For example, having a POJO like

class Address {
  public String street,
  public int zip, // important it is a primitive int
  public String city);
}

we want to map it a type

CREATE TYPE Address AS (
  street VARCHAR(20),
  zip INT NOT NULL,
  city VARCHAR(20) NOT NULL);

and with that type we want to create a table

CREATE TABLE Contacts (
  name VARCHAR(20) NOT NULL,
  address Address); -- Address structured type is nullable

Unfortunately as described in CALCITE-2464[1], the way it works is that
first a not null address type is created which is further converted to
nullable via TypeFactor#createTypeWithNullability. Unfortunately the way
this method works right now is that it changes all the nested fields to
nullable if the the strucuted type is nullable. Therefore we can no
longer map this type to our original pojo, because the zip field has
nullable INT type now. Which in turn makes it impossible to pass that
pojo as a parameter to a UDF:

|class UDF extends ScalarFunction { public Object eval(Address address)
{ if (address != null) { // based on the type I know address.id is NOT
NULL here int id = address.id; } } }|

We were able to circumvent that somewhat in our version of the
TypeFactory, but this causes problems when accessing fields of such
type. In most of the places in Calcite as far as I can tell the type of
an accessed nested field will simply be RelDataType#getField#getType,
but this does not work if the outer type is nullable. Take the above
example:

SELECT address.zip FROM Contacts;

This query should have a column of type INT instead of INT NOT NULL,
because if the column itself (the address) is null we can not produce a
not null zip.

I was able to make it work with few changes to Calcite classes[2]:

* SqlDotOperator - I had to change the deriveType method, this class has
the needed logic in the inferReturnType

* SqlItemOperator - the same change in deriveType

* AliasNamespace - this class is stripping down the original nullability

* FlinkRexBuilder, SqlNameMatcher - those classes are used when
converting SqlIdentifier in a query like SELECT address.zip FROM Contacts;


Here come my questions:

1. Do you have a suggestion/better way to handle the problem

2. Would you be willing to accept a contribution of the changes
mentioned above? (I was thinking of the first three classes). We would
like to use vanilla Calcite as much as possible.


I am happy to answer any questions, if I described something not clear
enough. I am also looking forward to any comments/suggestions.

Best,

Dawid

[1] https://issues.apache.org/jira/browse/CALCITE-2464

[2]
https://github.com/apache/flink/pull/12649/commits/1c33fe9cb5491f47fb19c16f32de5e6aef5089d3

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to