Re: [EXT] Sliding windows

2019-06-04 Thread Craig Knell
Joe

Thanks for the options.  

Kafka Flink option is interesting but to big at the moment. 

I’ll build a custom processor for this occasion.  

Much appreciated!

Craig Knell


> On 4 Jun 2019, at 21:56, Joe Witt  wrote:
> 
> ...from the description it isn't clear what you're trying to achieve so
> lets first try to expand the detail on the use case.
> 
> We should distinguish whether you're wanting to 'combine various objects in
> a data stream together on some time bound' from 'processing various objects
> in a data stream to make some observation over some time bound'.
> 
> If you're wanting to merge data together to make a larger object comprised
> of those smaller objects then MergeContent is your friend.
> 
> If you're wanting to look at a given stream or set of streams at once and
> make a time window based observation over that data then I recommend
> looking at something like Apache Flink which is purpose built for that and
> should be better than NiFi at that part.  If it is a pretty straight
> forward single stream window evaluation and you want to avoid having
> another system in play then I'd just write a little custom processor in
> NiFi for your case.  Once you have a more complex data distribution and
> processing requirement and you want a powerful low latency combination I'd
> say put NiFi, Kafka, and Flink together for a pretty hard to beat combo.
> 
> Thanks
> 
> On Tue, Jun 4, 2019 at 9:51 AM Peter Wicks (pwicks) 
> wrote:
> 
>> Craig,
>> 
>> If you have a timestamp set as an attribute on the processor, then this is
>> kind of possible.
>> 
>> Have a regular MergeContent processor, with "Maximum Group Size" set to 1
>> mb, set "Max Bin Age" to 3 min; you may need to tweak settings to get the
>> right cadence, but these are generally the settings you need to touch. Use
>> the "Merged" relationship for whatever you need. To create the Window, pass
>> the "Original" relationship to a RouteOnAttribute processor.
>> 
>> In the RouteOnAttribute use NiFi Expression Language to calculate how old
>> the FlowFile is (using the timestamp attribute I mentioned). If the
>> FlowFile is older than x, drop it, else send it back to the MergeContent
>> processor.
>> 
>> Using this process, it should be easy to get a 5 min rolling window (drop
>> any FlowFile older than 5 min in RouteOnAttribute).
>> 
>> I don't know that this perfectly answers what you asked, but does it give
>> you a good direction to investigate?
>> 
>> Thanks,
>>  Peter
>> 
>> -Original Message-
>> From: Craig Knell 
>> Sent: Tuesday, June 4, 2019 1:32 AM
>> To: dev@nifi.apache.org
>> Subject: [EXT] Sliding windows
>> 
>> Hi Folks
>> 
>> We have a stream of data that I need to window to 5 minutes and the window
>> is to slide every 3 minutes. Each minute is 1 mb, I therefore have to
>> deliver 5mb per 3 minutes.
>> 
>> What is the best way of achieving this in nifi?
>> 
>> Best regards
>> 
>> Craig
>> 


Re: [EXT] Sliding windows

2019-06-04 Thread Joe Witt
...from the description it isn't clear what you're trying to achieve so
lets first try to expand the detail on the use case.

We should distinguish whether you're wanting to 'combine various objects in
a data stream together on some time bound' from 'processing various objects
in a data stream to make some observation over some time bound'.

If you're wanting to merge data together to make a larger object comprised
of those smaller objects then MergeContent is your friend.

If you're wanting to look at a given stream or set of streams at once and
make a time window based observation over that data then I recommend
looking at something like Apache Flink which is purpose built for that and
should be better than NiFi at that part.  If it is a pretty straight
forward single stream window evaluation and you want to avoid having
another system in play then I'd just write a little custom processor in
NiFi for your case.  Once you have a more complex data distribution and
processing requirement and you want a powerful low latency combination I'd
say put NiFi, Kafka, and Flink together for a pretty hard to beat combo.

Thanks

On Tue, Jun 4, 2019 at 9:51 AM Peter Wicks (pwicks) 
wrote:

> Craig,
>
> If you have a timestamp set as an attribute on the processor, then this is
> kind of possible.
>
> Have a regular MergeContent processor, with "Maximum Group Size" set to 1
> mb, set "Max Bin Age" to 3 min; you may need to tweak settings to get the
> right cadence, but these are generally the settings you need to touch. Use
> the "Merged" relationship for whatever you need. To create the Window, pass
> the "Original" relationship to a RouteOnAttribute processor.
>
> In the RouteOnAttribute use NiFi Expression Language to calculate how old
> the FlowFile is (using the timestamp attribute I mentioned). If the
> FlowFile is older than x, drop it, else send it back to the MergeContent
> processor.
>
> Using this process, it should be easy to get a 5 min rolling window (drop
> any FlowFile older than 5 min in RouteOnAttribute).
>
> I don't know that this perfectly answers what you asked, but does it give
> you a good direction to investigate?
>
> Thanks,
>   Peter
>
> -Original Message-
> From: Craig Knell 
> Sent: Tuesday, June 4, 2019 1:32 AM
> To: dev@nifi.apache.org
> Subject: [EXT] Sliding windows
>
> Hi Folks
>
> We have a stream of data that I need to window to 5 minutes and the window
> is to slide every 3 minutes. Each minute is 1 mb, I therefore have to
> deliver 5mb per 3 minutes.
>
> What is the best way of achieving this in nifi?
>
> Best regards
>
> Craig
>


RE: [EXT] Sliding windows

2019-06-04 Thread Peter Wicks (pwicks)
Craig,

If you have a timestamp set as an attribute on the processor, then this is kind 
of possible.

Have a regular MergeContent processor, with "Maximum Group Size" set to 1 mb, 
set "Max Bin Age" to 3 min; you may need to tweak settings to get the right 
cadence, but these are generally the settings you need to touch. Use the 
"Merged" relationship for whatever you need. To create the Window, pass the 
"Original" relationship to a RouteOnAttribute processor.

In the RouteOnAttribute use NiFi Expression Language to calculate how old the 
FlowFile is (using the timestamp attribute I mentioned). If the FlowFile is 
older than x, drop it, else send it back to the MergeContent processor.

Using this process, it should be easy to get a 5 min rolling window (drop any 
FlowFile older than 5 min in RouteOnAttribute).

I don't know that this perfectly answers what you asked, but does it give you a 
good direction to investigate?

Thanks,
  Peter

-Original Message-
From: Craig Knell  
Sent: Tuesday, June 4, 2019 1:32 AM
To: dev@nifi.apache.org
Subject: [EXT] Sliding windows

Hi Folks

We have a stream of data that I need to window to 5 minutes and the window is 
to slide every 3 minutes. Each minute is 1 mb, I therefore have to deliver 5mb 
per 3 minutes.  

What is the best way of achieving this in nifi?

Best regards

Craig