Hey Alan,
Your summary of the issue is pretty much dead on. In 0.6.0 (the version you're
dealing with), all you can do is configure per-stream throughput with a hard
max, which is not ideal for the reason you've described below. We saw the exact
same issue you've described with some other jobs we're working on.
I've opened a JIRA with my response:
https://issues.apache.org/jira/browse/SAMZA-2
Could you please add yourself as a watcher, and follow up on my comment?
Thanks!
Chris
________________________________________
From: Alan Li [[email protected]]
Sent: Monday, August 05, 2013 4:54 PM
To: [email protected]
Subject: Feature Request: Fine-grain control over stream consumption
Currently, samza exposes configuration in the form of
"streams.%s.consumer.max.bytes.per.sec" for throttling the # of bytes the Task
will read from a stream. This is a feature request for programmatic fine-grain
control over stream consumption. The use-case is a samza task that will be
consuming multiple streams where some streams may be from live systems that
have stricter SLA requirements and must always be prioritized over other
streams that may be from batch systems. The above configuration is not the
ideal way to express this type of stream prioritization because configuring the
"batch" streams with a low consumption rate will decrease the overall
throughput of the system when there is no data in the "live" streams.
Furthermore, we'll want to throttle each "batch" stream based on external
signals that can change over time. Because of the dynamic nature of these
external signals, we would like to have a programmatic interface that can
dynamically change the prioritization as the signal changes.
Thanks, Alan