Update, the myApp and twitter example works now. We still have some pending work. However I wanted to get feedback on Aimee's comment on S4-110.
https://issues.apache.org/jira/browse/S4-110?focusedCommentId=13559905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13559905 The issue is in S4 should we partition a stream or pe. Here is how the system would work for each use case. Option 1: PARTITION STREAMS -- This means all pe's interested in a stream will be partitioned the same way. Aimee's bring out a good point that this may not be appropriate for scenarios where different pe's might have different performance characteristics. For example a simple countPE might be faster than a PE that is doing some machine learning algorithms. By partitioning on stream, we will enforce the parallelism on processing elements interested in the stream. But the good thing about this is Option 2: PARTITION PE's -- Here we will partition based on the PE, for example a countPE might be partitioned 10 ways while a complexPE might be partitioned 100 ways but both listening to same stream. While this provides different parallelism for each PE, the downside is an event will now be sent to multiple destinations thus increasing the network i/o. Additionally we need to add more code to co-locate PE's listening to same stream and has same number of partitions. After thinking a bit more, i am more inclined towards Option 2 but with additional support to co-locate PE's with same number of partitions.Integration with Helix does not really enforce either of the partitioning schemes and also allows us to co-locate the PE's processing same partitions. However in the current S4-110 branch, we have implemented Option 1. What do people think? Thanks, Kishore G On Mon, Feb 11, 2013 at 3:54 AM, Matthieu Morel <mmo...@apache.org> wrote: > Thanks for the summary Kishore! > > > One note about the new design of emitters: before, emitters used be > notified of updates to the cluster, but this is not the case anymore: the > senders are the ones receiving these updates, so they can decide where to > send messages. > > One implication is that the mapping destination/node in the emitter is not > updated automatically. > > Illustration: Suppose a mapping destination1->node1. Node 1 fails. Node 2 > starts and replaces node1. Suppose we have a new mapping destination 1 -> > node2. In the emitter, in the absence of notifications, this mapping has > not been updated yet. This happens with the first message to destination 1 > - which does not reach destination, but clears the old mapping. Next > message for destination 1 will setup the new channels to node 2. > > Once we get the integration complete, we can improve that with something > like keeping the mappings of the emitters in a cache with a TTL or > something like that. > > > The other pending tasks will be porting existing examples (the echo and > the twitter examples) to illustrate the new perspectives offered by this > integration. > > > Regards, > > Matthieu > > > > On Feb 11, 2013, at 11:07 , kishore g wrote: > > > Hi, > > > > This is an update on S4 and Helix integration happening on S4-110-new > > branch. Last week Matthieu and I made good progress. We have the initial > > set of changes that allow us to configure the cluster management mode ( > > existing zk based or helix based). We had to do some changes to the > > existing code to facilitate this. > > > > Significant changes was that Emitters no longer hold the mapping of > > partition to node. The logic of deciding the destination for an event is > > moved to Sender. This allowed us to have clear separation > > of responsibility between Sender and Emitter. > > > > In order to add new helix implementations, we also made changes in some > of > > the bindings like S4Bootstrap, DeploymentManager, cluster, Emittter etc. > > > > We still have the following pending > > > > > > - Tools are broken. they currently support only Helix based mode. We > > need to support additional configuration that indicates the mode(zk or > > helix). > > - Need to add more test cases/documentation for helix related changes. > > - Fix broken test cases for s4 mode. > > - Other minor code clean up/refactoring. > > > > Matthieu, please add items to this list( I am sure I have missed quite a > > few) :-). The code is not yet ready for review, but feel free to comment > on > > design and code structure. > > > > Thanks, > > Kishore G > >