[ 
https://issues.apache.org/jira/browse/FLINK-22737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354265#comment-17354265
 ] 

Timo Walther commented on FLINK-22737:
--------------------------------------

I'm against returning 0 as the initial watermark for the following reasons
- This introduces a third value with special meaning next to Long.MIN_VALUE and 
LONG.MAX_VALUE.
- It goes against the design of all other components in the code base (from 
DataStream API to Web UI). The ProcessFunction returns Long.MIN_VALUE when 
querying the timer service and the Web UI also checks against Long.MIN_VALUE 
and displays "No watermark received yet.".
- It prevents the processing of historical data. This might not be a strong 
argument but it could happen that data from 1970 is processed (banks? weather 
data? stock data?). The decision to use Long.MIN_VALUE in streaming operators 
is definitely safer to prevent unintended side effects. Also, when further 
combing bounded and unbounded data processing watermarks could also describe 
non-real-time data in the near future.

Personally, I would go for Long.MIN_VALUE or NULL. The problem we have in SQL 
is that new type system is actually limiting the precision of years to 4 digits 
in the definition of `TimestampType`. So NULL is a good alternative.

It is true that it introduces a three-valued logic but this should not be a 
problem for handling late data. {{FROM T WHERE rowtime < CURRENT_WATERMARK}} 
evaluates to false until a watermark arrives which is correct.

> Add support for CURRENT_WATERMARK to SQL
> ----------------------------------------
>
>                 Key: FLINK-22737
>                 URL: https://issues.apache.org/jira/browse/FLINK-22737
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API
>            Reporter: David Anderson
>            Assignee: Ingo Bürk
>            Priority: Major
>
> With a built-in function returning the current watermark, one could operate 
> on late events without resorting to using the DataStream API.
> Called with zero parameters, this function returns the current watermark for 
> the current row – if there is an event time attribute. Otherwise, it returns 
> NULL. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to