[jira] [Updated] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2018-02-06 Thread Batkhuyag Batsaikhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Batkhuyag Batsaikhan updated BEAM-2535:
---
Description: 
Today, we have insufficient control over the event time timestamp of elements 
output from a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
 2. For a processing time timer, it is the current input watermark at the time 
of processing.

But for both of these, we may want to reserve the right to output a particular 
time, aka set a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
sure output is not droppable, but does not fully explain window expiration and 
late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, 
timers should be viewed as another channel of input, with a watermark, and 
items on that channel _all need event time timestamps even if they are 
delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be 
separated (with nice defaults) from the specification of the event time of 
resulting outputs. These timestamps will determine a side channel with a new 
"timer watermark" that constrains the output watermark.
 - We still need to fire event time timers according to the input watermark, so 
that event time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of 
the input watermark and the timer watermark. In this way, whenever a timer is 
set, the window is not going to be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is 
expired; this may be as simple as exhausting the timer channel as soon as the 
input watermark indicates expiration of a window

This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems 
reasonable to use timers as an implementation detail (e.g. in runners-core 
utilities) without wanting any of this additional machinery. For example, if 
there is no possibility of output from the timer callback.

  was:
Today, we have insufficient control over the event time timestamp of elements 
output from a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time 
of processing.

But for both of these, we may want to reserve the right to output a particular 
time, aka set a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
sure output is not droppable, but does not fully explain window expiration and 
late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, 
timers should be viewed as another channel of input, with a watermark, and 
items on that channel _all need event time timestamps even if they are 
delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be 
separated (with nice defaults) from the specification of the event time of 
resulting outputs. These timestamps will determine a side channel with a new 
"timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so 
that event time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of 
the input watermark and the timer watermark. In this way, whenever a timer is 
set, the window is not going to be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is 
expired; this may be as simple as exhausting the timer channel as soon as the 
input watermark indicates expiration of a window

This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems 
reasonable to use timers as an implementation detail (e.g. in runners-core 
utilities) without wanting any of this additional machinery. For example, if 
there is no possibility of output from the timer callback.


> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Batkhuyag Batsaikhan
>Priority: Major
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> part

[jira] [Updated] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2017-06-29 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-2535:
--
Description: 
Today, we have insufficient control over the event time timestamp of elements 
output from a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time 
of processing.

But for both of these, we may want to reserve the right to output a particular 
time, aka set a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
sure output is not droppable, but does not fully explain window expiration and 
late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, 
timers should be viewed as another channel of input, with a watermark, and 
items on that channel _all need event time timestamps even if they are 
delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be 
separated (with nice defaults) from the specification of the event time of 
resulting outputs. These timestamps will determine a side channel with a new 
"timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so 
that event time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of 
the input watermark and the timer watermark. In this way, whenever a timer is 
set, the window is not going to be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is 
expired; this may be as simple as exhausting the timer channel as soon as the 
input watermark indicates expiration of a window

This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It seems 
reasonable to use timers as an implementation detail (e.g. in runners-core 
utilities) without wanting any of this additional machinery. For example, if 
there is no possibility of output from the timer callback.

  was:
Today, we have insufficient control over the event time timestamp of elements 
output from a timer callback.

1. For an event time timer, it is the timestamp of the timer itself.
2. For a processing time timer, it is the current input watermark at the time 
of processing.

But for both of these, we may want to reserve the right to output a particular 
time, aka set a "watermark hold".

A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
sure output is not droppable, but does not fully explain window expiration and 
late data/timer dropping.

In the natural interpretation of a timer as a feedback loop on a transform, 
timers should be viewed as another channel of input, with a watermark, and 
items on that channel _all need event time timestamps even if they are 
delivered according to a different time domain_.

I propose that the specification for when a timer should fire should be 
separated (with nice defaults) from the specification of the event time of 
resulting outputs. These timestamps will determine a side channel with a new 
"timer watermark" that constrains the output watermark.

 - We still need to fire event time timers according to the input watermark, so 
that event time timers fire.
 - Late data dropping and window expiration will be in terms of the minimum of 
the input watermark and the timer watermark. In this way, whenever a timer is 
set, the window is not going to be garbage collected.
 - We will need to make sure we have a way to "wake up" a window once it is 
expired; this may be as simple as exhausting the timer channel as soon as the 
input watermark indicates expiration of a window


> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
> 2. For a processing time timer, it is the current input watermark at the time 
> of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as a