Hey TJ,

This sounds like a bug.

1. Is the task.consumer.batch.size config set for your job?
2. What does this log line say: Building default chooser with:
useBatching=%s, useBootstrapping=%s, usePriority=%s
3. How long is the delay that you're seeing?

Cheers,
Chris

On 7/14/14 4:57 PM, "Yan Fang" <[email protected]> wrote:

>It seems for me that, the batch processing fetches many messages at one
>time and then takes too long time to process. My first thought is that,
>since we have the systems.system-name.samza.fetch.threshold
><http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/configur
>ation-table.html>
>,
>setting this number to a smaller (default is 50000) will force the system
>to fetch the messages more frequently.
>
>Any other ideas?
>
>Fang, Yan
>[email protected]
>+1 (206) 849-4108
>
>
>On Mon, Jul 14, 2014 at 1:12 PM, TJ Giuli <[email protected]>
>wrote:
>
>> Hi,
>>
>> I have a stream processor that takes inputs from multiple streams, some
>> are more batch, non-latency sensitive and others are real-time,
>> infrequently have traffic and should be low-latency.  The real-time
>>stream
>> helps me interpret the batch stream, so I would ideally like any
>>real-time
>> stream envelopes delivered within some maximum latency from the time the
>> message enters into a Kafka topic.
>>
>> I have my stream processor configured to prioritize my real-time streams
>> over the batch streams, but I consistently find that the real-time
>>stream
>> is delayed by traffic from the batch stream.  From tracing the Kafka
>> consumer, it looks like my stream processor periodically fetches from
>> Kafka, finds that the batch streams have a large chunk of messages
>>waiting,
>> doesn¹t find anything on the real-time topics, and processes away the
>>batch
>> messages for a few minutes.  During the batch processing, the Kafka
>> consumer does not poll the real-time streams, so if a message is sent
>>to a
>> real-time topic, the message effectively doesn¹t arrive until the next
>>time
>> the Kafka consumer does another fetch.  When a real-time message is
>> consumed by the Kafka consumer, the TieredPriorityChooser correctly
>> prioritizes traffic from the real-time streams over the batch streams.
>>
>> Does anyone have recommendations on how to get infrequent but important
>> messages within some maximum time bound?  Thanks!
>> ‹T

Reply via email to