Thanks, Timo. 
We may put the NULL related function support to SQL API, but for Scalar 
expression and Boolean expression, it already been supported in Table API, 
without NULL value handling support, query with Scalar expression and Boolean 
expression would fail while encounter NULL value.

Thanks
Chengxiang 

-----Original Message-----
From: Timo Walther [mailto:twal...@apache.org] 
Sent: Wednesday, November 25, 2015 7:33 PM
To: dev@flink.apache.org
Subject: Re: The null in Flink

Hi Chengxiang,

I totally agree that the Table API should fully support NULL values. The Table 
API is a logical API and therefore we should be as close to ANSI SQL as 
possible. Rows need to be nullable in the near future.

2. i, ii, iii and iv sound reasonable. But v, vi and vii sound to much like SQL 
magic. I think all other SQL magic (DBMS specific corner cases) should be 
handled by the SQL API on top of the Table API.

Regards,
Timo


On 25.11.2015 11:31, Li, Chengxiang wrote:
> Hi
> In this mail list, there are some discussions about null value handling in 
> Flink, and I saw several related JIRAs as well(like FLINK-2203, FLINK-2210), 
> but unfortunately, got reverted due to immature design, and no further action 
> since then. I would like to pick this topic up here, as it's quite an 
> important part of data analysis and many features depend on it. Hopefully, 
> through a plenary discussion, we can generate an acceptable solution and move 
> forward. Stephan has explained very clearly about how and why Flink handle 
> "Null values in the Programming Language APIs", so I mainly talk about the 
> second part of "Null values in the high-level (logical) APIs ".
>
> 1. Why should Flink support Null values handling in Table API?
>       i.  Data source may miss column value in many cases, if no Null values 
> handling in Table API, user need to write an extra ETL to handle missing 
> values manually.
>       ii. Some Table API operators generate Null values on their own, like 
> Outer Join/Cube/Rollup/Grouping Set, and so on. Null values handling in Table 
> API is the prerequisite of these features.
>
> 2. The semantic of Null value handling in Table API.
> Fortunately, there are already mature DBMS  standards we can follow for Null 
> value handling, I list several semantic of Null value handling here. To be 
> noted that, this may not cover all the cases, and the semantics may vary in 
> different DBMSs, so it should totally open to discuss.
>       I,  NULL compare. In ascending order, NULL is smaller than any other 
> value, and NULL == NULL return false.
>       ii. NULL exists in GroupBy Key, all NULL values are grouped as a single 
> group.
>       iii. NULL exists in Aggregate columns, ignore NULL in aggregation 
> function.
>                  iv. NULL exists in both side Join key, refer to #i, NULL == 
> NULL return false, no output for NULL Join key.
>                  v.  NULL in Scalar expression, expression within NULL(eg. 1 
> + NULL) return NULL.
>                  vi. NULL in Boolean expression, add an extra result: 
> UNKNOWN, more semantic for Boolean expression in reference #1.
>                  vii. More related function support, like COALESCE, NVL, 
> NANVL, and so on.
>
> 3. NULL value storage in Table API.
>    Just set null to Row field value. To mark NULL value in serialized binary 
> record data, normally it use extra flag for each field to mark whether its 
> value is NULL, which would change the data layout of Row object. So any logic 
> that access serialized Row data directly should updated to sync with new data 
> layout, for example, many methods in RowComparator.
>
> Reference:
> 1. Nulls: Nothing to worry about: 
> http://www.oracle.com/technetwork/issue-archive/2005/05-jul/o45sql-097727.html.
> 2. Null related functions: 
> https://oracle-base.com/articles/misc/null-related-functions
>
> -----Original Message-----
> From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf 
> Of Stephan Ewen
> Sent: Thursday, June 18, 2015 8:43 AM
> To: dev@flink.apache.org
> Subject: Re: The null in Flink
>
> Hi!
>
> I think we actually have two discussions here, both of them important:
>
> --------------------------------------------------------------
> 1) Null values in the Programming Language APIs
> --------------------------------------------------------------
>
> Fields in composite types may simply be null pointers.
>
> In object types:
>    - primitives members are naturally non-nullable
>    - all other members are nullable
>
> => If you want to avoid the overhead of nullability, go with primitive types.
>
> In Tuples, and derives types (Scala case classes):
>    - Fields are non-nullable.
>
> => The reason here is that we initially decided to keep tuples as a very fast 
> data type. Because tuples cannot hold primitives in Java/Scala, we would not 
> have a way to make fast non-nullable fields. The performance of nullable 
> fields affects the key-operations, especially on normalized keys.
> We can work around that with some effort, but have not one it so far.
>
> => In Scala, the Option types is a natural way of elegantly working around 
> that.
>
>
> --------------------------------------------------------------
> 2) Null values in the high-level (logial) APIs
> --------------------------------------------------------------
>
> This is mainly what Ted was referring to, if I understood him correctly.
>
> Here, we need to figure out what form of semantical null values in the Table 
> API and later, in SQL.
>
> Besides deciding what semantics to follow here in the logical APIs, we need 
> to decide what these values confert to/from when switching between 
> logical/physical APIs.
>
>
>
>
>
>
> On Mon, Jun 15, 2015 at 10:07 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
>> On Mon, Jun 15, 2015 at 8:45 AM, Maximilian Michels <m...@apache.org>
>> wrote:
>>
>>> Just to give an idea what null values could cause in Flink:
>> DataSet.count()
>>> returns the number of elements of all values in a Dataset (null or
>>> not) while #834 would ignore null values and aggregate the DataSet 
>>> without
>> them.
>> Compare R's na.action.
>>
>> http://www.ats.ucla.edu/stat/r/faq/missing.htm
>>

Reply via email to