Samza looks very flexible.
Let's say I had message streaming in to a task at 100 k / sec duty cycle.
The task has a SLA to output the results every .1 seconds. So it would have
swallowed about 10 k of messages; of course this will vary based on many
factors.
So when the task streams it aggregated result to task B, will it stop for a
moment whilst it does this ? A git like a garbage collection metaphor.

The reason I ask this is because samza looks nice for lots of dataflow
scenarios but I am wondering about vest practice to get it doing some sort
of deterministic QoS for time sensitive situations.
Sensor fusion bring one and financial trading being another.

Ways to approach problem.
1. If a task if having problems keeping up can I real time add another CPU
core or task instance ?

2. Can I skip some of the 10k per second in real time and give a less
correct result to task b ?

3. To alleviate the "stopping while it sends the results " to the next
task, could I store the aggregated results in memory, and at the time of
handing it on have another task do the aggregated hand on, so the task
never pauses ?

The metaphor is a typical function. It has many parameters on the function
signature and a single out return. Inside the function its doing a loop on
data (the stream) and when it reaches a certain timer Point, it returns out
the aggregated result. Then it just keeps going.

Hope this is understandable and you can see what I am getting at.

Ged

Reply via email to