Lateness droppings debugging

2018-02-07 Thread Carlos Alonso
Hi everyone!!

I have a streaming job running with fixed windows of one hour and allowed
lateness of two days and the number of dropped due to lateness elements is
slowly, but continuously growing and I'd like to understand which elements
are those.

I'd like to get the watermark from inside the job to compare it against
each element and write log messages with the ones that will be potentially
discarded Does that approach make any sense? I which case... How can I
get the watermark from inside the job? Any other ideas?

Thanks in advance!!


Re: Lateness droppings debugging

2018-02-07 Thread Raghu Angadi
The watermark is not directly available, you essentially infer from fired
triggers (e.g. fixed windows). I would consider some of these options :
  - Adhoc debugging : if the pipeline is close to realtime, you can just
guess if a element will be dropped based on its timestamp and current time
in the first stage (before first aggregation)
  - Increase allowed lateness (say to 3 days) and drop the elements
yourself you notice are later than 1 day.
  - Place the elements into another window with larger allowed lateness and
log very late elements in another parallel aggregation (see
TriggerExample.java in Beam repo).

On Wed, Feb 7, 2018 at 2:55 PM, Carlos Alonso  wrote:

> Hi everyone!!
>
> I have a streaming job running with fixed windows of one hour and allowed
> lateness of two days and the number of dropped due to lateness elements is
> slowly, but continuously growing and I'd like to understand which elements
> are those.
>
> I'd like to get the watermark from inside the job to compare it against
> each element and write log messages with the ones that will be potentially
> discarded Does that approach make any sense? I which case... How can I
> get the watermark from inside the job? Any other ideas?
>
> Thanks in advance!!
>


Re: Lateness droppings debugging

2018-02-07 Thread Pawel Bartoszek
Hi Raghu,
Can you provide more details about increasing allowed lateness? Even if I
do that I still need to compare event time of record with processing
time(system current time) in my ParDo?

Pawel

On 8 February 2018 at 05:40, Raghu Angadi  wrote:

> The watermark is not directly available, you essentially infer from fired
> triggers (e.g. fixed windows). I would consider some of these options :
>   - Adhoc debugging : if the pipeline is close to realtime, you can just
> guess if a element will be dropped based on its timestamp and current time
> in the first stage (before first aggregation)
>   - Increase allowed lateness (say to 3 days) and drop the elements
> yourself you notice are later than 1 day.
>   - Place the elements into another window with larger allowed lateness
> and log very late elements in another parallel aggregation (see
> TriggerExample.java in Beam repo).
>
> On Wed, Feb 7, 2018 at 2:55 PM, Carlos Alonso 
> wrote:
>
>> Hi everyone!!
>>
>> I have a streaming job running with fixed windows of one hour and allowed
>> lateness of two days and the number of dropped due to lateness elements is
>> slowly, but continuously growing and I'd like to understand which elements
>> are those.
>>
>> I'd like to get the watermark from inside the job to compare it against
>> each element and write log messages with the ones that will be potentially
>> discarded Does that approach make any sense? I which case... How can I
>> get the watermark from inside the job? Any other ideas?
>>
>> Thanks in advance!!
>>
>
>


Re: Lateness droppings debugging

2018-02-07 Thread Carlos Alonso
Cool, I'll try some of these. Thanks Raghu!

On Thu, Feb 8, 2018 at 6:40 AM Raghu Angadi  wrote:

> The watermark is not directly available, you essentially infer from fired
> triggers (e.g. fixed windows). I would consider some of these options :
>   - Adhoc debugging : if the pipeline is close to realtime, you can just
> guess if a element will be dropped based on its timestamp and current time
> in the first stage (before first aggregation)
>   - Increase allowed lateness (say to 3 days) and drop the elements
> yourself you notice are later than 1 day.
>   - Place the elements into another window with larger allowed lateness
> and log very late elements in another parallel aggregation (see
> TriggerExample.java in Beam repo).
>
> On Wed, Feb 7, 2018 at 2:55 PM, Carlos Alonso 
> wrote:
>
>> Hi everyone!!
>>
>> I have a streaming job running with fixed windows of one hour and allowed
>> lateness of two days and the number of dropped due to lateness elements is
>> slowly, but continuously growing and I'd like to understand which elements
>> are those.
>>
>> I'd like to get the watermark from inside the job to compare it against
>> each element and write log messages with the ones that will be potentially
>> discarded Does that approach make any sense? I which case... How can I
>> get the watermark from inside the job? Any other ideas?
>>
>> Thanks in advance!!
>>
>
>