Samza looks very flexible. Let's say I had message streaming in to a task at 100 k / sec duty cycle. The task has a SLA to output the results every .1 seconds. So it would have swallowed about 10 k of messages; of course this will vary based on many factors. So when the task streams it aggregated result to task B, will it stop for a moment whilst it does this ? A git like a garbage collection metaphor.
The reason I ask this is because samza looks nice for lots of dataflow scenarios but I am wondering about vest practice to get it doing some sort of deterministic QoS for time sensitive situations. Sensor fusion bring one and financial trading being another. Ways to approach problem. 1. If a task if having problems keeping up can I real time add another CPU core or task instance ? 2. Can I skip some of the 10k per second in real time and give a less correct result to task b ? 3. To alleviate the "stopping while it sends the results " to the next task, could I store the aggregated results in memory, and at the time of handing it on have another task do the aggregated hand on, so the task never pauses ? The metaphor is a typical function. It has many parameters on the function signature and a single out return. Inside the function its doing a loop on data (the stream) and when it reaches a certain timer Point, it returns out the aggregated result. Then it just keeps going. Hope this is understandable and you can see what I am getting at. Ged
