[
https://issues.apache.org/jira/browse/SAMZA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078699#comment-14078699
]
Chris Riccomini commented on SAMZA-342:
---------------------------------------
bq. So it does appear that the KafkaSystemConsumer receives the message and
takes 3 seconds to deliver it, correct?
Great, so it sounds like at least part of the latency is introduced between the
KafkaSystemConsumer and your StreamTask (as opposed to between your upstream
producer and the KafkaSystemConsumer).
1. I'm not sure how flexible you are, but can you try running against master,
which has SAMZA-245? This has refactored the way batching/buffering is handled
in the SystemConsumers class, and should force a maximum delay for this part of
the code.
2. Out of curiosity, how long is your process method taking to process
messages? You should see these log lines in your trace logs:
{code}
trace("Processing incoming message envelope for taskName and SSP: %s, %s"
format (taskName, envelope.getSystemStreamPartition))
task.process(envelope, collector, coordinator)
listeners.foreach(_.afterProcess(envelope, config, context))
trace("Updating offset map for taskName, SSP and offset: %s, %s, %s" format
(taskName, envelope.getSystemStreamPartition, envelope.getOffset))
{code}
Can you grep out some of these log lines, and post? I'm curious if the
process() method itself is just slowing things down.
> Priority streams experience large latencies before being consumed by the
> stream processor
> -----------------------------------------------------------------------------------------
>
> Key: SAMZA-342
> URL: https://issues.apache.org/jira/browse/SAMZA-342
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.7.0
> Environment: ubuntu 13.10
> Reporter: TJ Giuli
>
> I have a stream processor that takes inputs from multiple streams, some are
> more batch, non-latency sensitive and others are real-time, infrequently have
> traffic and should be low-latency. The real-time stream helps me interpret
> the batch stream, so I would ideally like any real-time stream envelopes
> delivered within some maximum latency from the time the message enters into a
> Kafka topic.
> I have my stream processor configured to prioritize my real-time streams over
> the batch streams, but I consistently find that the real-time stream is
> delayed by traffic from the batch stream. From tracing the Kafka consumer,
> it looks like my stream processor periodically fetches from Kafka, finds that
> the batch streams have a large chunk of messages waiting, doesn’t find
> anything on the real-time topics, and processes away the batch messages for a
> few minutes. During the batch processing, the Kafka consumer does not poll
> the real-time streams, so if a message is sent to a real-time topic, the
> message effectively doesn’t arrive until the next time the Kafka consumer
> does another fetch. When a real-time message is consumed by the Kafka
> consumer, the TieredPriorityChooser correctly prioritizes traffic from the
> real-time streams over the batch streams.
--
This message was sent by Atlassian JIRA
(v6.2#6252)