It seems for me that, the batch processing fetches many messages at one time and then takes too long time to process. My first thought is that, since we have the systems.system-name.samza.fetch.threshold <http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/configuration-table.html> , setting this number to a smaller (default is 50000) will force the system to fetch the messages more frequently.
Any other ideas? Fang, Yan [email protected] +1 (206) 849-4108 On Mon, Jul 14, 2014 at 1:12 PM, TJ Giuli <[email protected]> wrote: > Hi, > > I have a stream processor that takes inputs from multiple streams, some > are more batch, non-latency sensitive and others are real-time, > infrequently have traffic and should be low-latency. The real-time stream > helps me interpret the batch stream, so I would ideally like any real-time > stream envelopes delivered within some maximum latency from the time the > message enters into a Kafka topic. > > I have my stream processor configured to prioritize my real-time streams > over the batch streams, but I consistently find that the real-time stream > is delayed by traffic from the batch stream. From tracing the Kafka > consumer, it looks like my stream processor periodically fetches from > Kafka, finds that the batch streams have a large chunk of messages waiting, > doesn’t find anything on the real-time topics, and processes away the batch > messages for a few minutes. During the batch processing, the Kafka > consumer does not poll the real-time streams, so if a message is sent to a > real-time topic, the message effectively doesn’t arrive until the next time > the Kafka consumer does another fetch. When a real-time message is > consumed by the Kafka consumer, the TieredPriorityChooser correctly > prioritizes traffic from the real-time streams over the batch streams. > > Does anyone have recommendations on how to get infrequent but important > messages within some maximum time bound? Thanks! > —T
