Hi devs, I wanted to reach out to the Apache Calcite community about an issue with nullability of structured types we observed in Apache Flink.
First of all let me give you a quick introduction why it is important for us. What we want to support eventually is that we want to be able to map POJOs to structured types. For example, having a POJO like class Address { public String street, public int zip, // important it is a primitive int public String city); } we want to map it a type CREATE TYPE Address AS ( street VARCHAR(20), zip INT NOT NULL, city VARCHAR(20) NOT NULL); and with that type we want to create a table CREATE TABLE Contacts ( name VARCHAR(20) NOT NULL, address Address); -- Address structured type is nullable Unfortunately as described in CALCITE-2464[1], the way it works is that first a not null address type is created which is further converted to nullable via TypeFactor#createTypeWithNullability. Unfortunately the way this method works right now is that it changes all the nested fields to nullable if the the strucuted type is nullable. Therefore we can no longer map this type to our original pojo, because the zip field has nullable INT type now. Which in turn makes it impossible to pass that pojo as a parameter to a UDF: |class UDF extends ScalarFunction { public Object eval(Address address) { if (address != null) { // based on the type I know address.id is NOT NULL here int id = address.id; } } }| We were able to circumvent that somewhat in our version of the TypeFactory, but this causes problems when accessing fields of such type. In most of the places in Calcite as far as I can tell the type of an accessed nested field will simply be RelDataType#getField#getType, but this does not work if the outer type is nullable. Take the above example: SELECT address.zip FROM Contacts; This query should have a column of type INT instead of INT NOT NULL, because if the column itself (the address) is null we can not produce a not null zip. I was able to make it work with few changes to Calcite classes[2]: * SqlDotOperator - I had to change the deriveType method, this class has the needed logic in the inferReturnType * SqlItemOperator - the same change in deriveType * AliasNamespace - this class is stripping down the original nullability * FlinkRexBuilder, SqlNameMatcher - those classes are used when converting SqlIdentifier in a query like SELECT address.zip FROM Contacts; Here come my questions: 1. Do you have a suggestion/better way to handle the problem 2. Would you be willing to accept a contribution of the changes mentioned above? (I was thinking of the first three classes). We would like to use vanilla Calcite as much as possible. I am happy to answer any questions, if I described something not clear enough. I am also looking forward to any comments/suggestions. Best, Dawid [1] https://issues.apache.org/jira/browse/CALCITE-2464 [2] https://github.com/apache/flink/pull/12649/commits/1c33fe9cb5491f47fb19c16f32de5e6aef5089d3
signature.asc
Description: OpenPGP digital signature