Posting to the list with permission...
---------- Forwarded message ---------- From: Matthieu Morel <matthieu.mo...@gmail.com> Date: Wed, Jun 6, 2012 at 7:46 AM Subject: Re: Few questions. To: shailendra.mis...@thomsonreuters.com Cc: leoneume...@gmail.com Hi Shailendra, please don't hesitate to post on the public list, that will be useful for everyone! About partitioning: - you partition data using a KeyFinder. See for example in the twitter example: https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=blob;f=test-apps/twitter-counter/src/main/java/org/apache/s4/example/twitter/TwitterCounterApp.java;h=90c31994e20cc311e333ea8eb6bd1485e8b2e857;hb=S4-22#l46 - right now, if you use an adapter application in front of a consumer application, events are broadcasted to all consumer nodes. Maybe that's what is giving you issues. We'll add a customizable policy, round-robin being probably the default. About windowing: - the idea is that you fill a circular and rotating buffer with slots (in piper, you provide your own implementation), upon reception of events - you always have access to the latest slot, and you place data in that slot - you define when new slots are generated - you specify the size of a window, i.e. how many slots per window In parallel, you can use a trigger to output data that you compute from data in the current window. (that trigger could actually be a multiple of slot duration) We'll add examples and documentation for that. Hope this helps, and thanks again for the feedback! Matthieu On Wed, Jun 6, 2012 at 3:15 PM, <shailendra.mis...@thomsonreuters.com> wrote: > > Hi Leo, Matthieu: > > Sorry couldn't attend yesterdays hangout session, I was on a plane. I have > been trying to code a few quant use cases using s4 and have a few questions: > - Consider the following topology Input-Adaptor -> PE1, PE2, PE3 <all three > are running the same application> -> PrintPE <which outputs the data> > Ideally, I would like to think PE1..3 as processing specific partition of the > data, but looks like there is no obvious way to do it. So, I thought I would > filter out stuff at the destination based on a partition-id. Now I can > interrogate ZK and get my process partition (haven't tried that but think it > is possible). Short of that is there a cheaper way of doing this. Maybe this > is not a suitable way in S4, assuming that to be true - let me ask the > question how would you partition data ? > - Now for the second question, so far for my applications I have been using > onTime, onTrigger methods to implement windowing. The former to do wall clock > time based and the latter to do application time based. However, I came > across the notion of WindowingPE which could be used instead. Would you have > an example showing the use of WindowingPE to model what I have been doing > using onTime, onTrigger. > Would greatly appreciate any help. > > - Thanks > - Shailendra > > This email was sent to you by Thomson Reuters, the global news and > information company. Any views expressed in this message are those of the > individual sender, except where the sender specifically states them to be the > views of Thomson Reuters. -- Leo Neumeyer (@leoneu)