Posting to the list with permission...

---------- Forwarded message ----------
From: Matthieu Morel <matthieu.mo...@gmail.com>
Date: Wed, Jun 6, 2012 at 7:46 AM
Subject: Re: Few questions.
To: shailendra.mis...@thomsonreuters.com
Cc: leoneume...@gmail.com


Hi Shailendra,

please don't hesitate to post on the public list, that will be useful
for everyone!

About partitioning:
- you partition data using a KeyFinder. See for example in the twitter example:
https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=blob;f=test-apps/twitter-counter/src/main/java/org/apache/s4/example/twitter/TwitterCounterApp.java;h=90c31994e20cc311e333ea8eb6bd1485e8b2e857;hb=S4-22#l46
- right now, if you use an adapter application in front of a consumer
application, events are broadcasted to all consumer nodes. Maybe
that's what is giving you issues. We'll add a customizable policy,
round-robin being probably the default.

About windowing:
- the idea is that you fill a circular and rotating buffer with slots
(in piper, you provide your own implementation), upon reception of
events
- you always have access to the latest slot, and you place data in that slot
- you define when new slots are generated
- you specify the size of a window, i.e. how many slots per window

In parallel, you can use a trigger to output data that you compute
from data in the current window. (that trigger could actually be a
multiple of slot duration)

We'll add examples and documentation for that.

Hope this helps, and thanks again for the feedback!

Matthieu




On Wed, Jun 6, 2012 at 3:15 PM, <shailendra.mis...@thomsonreuters.com> wrote:
>
> Hi Leo, Matthieu:
>
> Sorry couldn't attend yesterdays hangout session, I was on a plane. I have 
> been trying to code a few quant use cases using s4 and have a few questions:
> - Consider the following topology Input-Adaptor -> PE1, PE2, PE3 <all three 
> are running the same application> -> PrintPE <which outputs the data>
> Ideally, I would like to think PE1..3 as processing specific partition of the 
> data, but looks like there is no obvious way to do it. So, I thought I would 
> filter out stuff at the destination based on a partition-id. Now I can 
> interrogate ZK and get my process partition (haven't tried that but think it 
> is possible). Short of that is there a cheaper way of doing this. Maybe this 
> is not a suitable way in S4, assuming that to be true - let me ask the 
> question how would you partition data ?
> - Now for the second question, so far for my applications I have been using 
> onTime, onTrigger methods to implement windowing. The former to do wall clock 
> time based and the latter to do application time based. However, I came 
> across the notion of WindowingPE which could be used instead. Would you have 
> an example showing the use of WindowingPE to model what I have been doing 
> using onTime, onTrigger.
> Would greatly appreciate any help.
>
> - Thanks
> - Shailendra
>
> This email was sent to you by Thomson Reuters, the global news and 
> information company. Any views expressed in this message are those of the 
> individual sender, except where the sender specifically states them to be the 
> views of Thomson Reuters.




-- 

Leo Neumeyer (@leoneu)

Reply via email to