[
https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281135#comment-14281135
]
Chris Riccomini commented on SAMZA-503:
---------------------------------------
bq. Chris Riccomini, I think it's ok if it only updates when the main thread
returns control from process(). It should be very rare to block for minutes
processing a single message.
The thing that bugs me about doing this in the container-layer (as opposed to
the Kafka layer) are the weird assumptions that we have to make:
# The offsets are ordered
# The offsets are longs
# The subtraction of the logs is semantically meaningful
None of these are necessarily true. In addition, the mental-model expense of
having to remember that this metric doesn't get updated when your task has slow
process() calls is annoying.
But, I can't figure out a clean way to do this within the Kafka layer. So, if
we accept implementing this in the container-layer, then I think a potentially
good solution would be to add IncomingMessageEnvelope.getMaxOffset(). This is a
fairly big user-facing API change, but it:
# Mirrors Kafka's APIs, which seems like a reasonable thing.
# Allows OffsetManager to store the maxOffset for each SSP.
# Allows OffsetManager to expose a gauge, which is maxOffset -
lastProcessOffset.
# Exposes the max offset to users, so they can make processing decisions based
on their lag (i.e. drop messages to catch up).
> Lag gauge very slow to update for slow jobs
> -------------------------------------------
>
> Key: SAMZA-503
> URL: https://issues.apache.org/jira/browse/SAMZA-503
> Project: Samza
> Issue Type: Bug
> Components: metrics
> Affects Versions: 0.8.0
> Environment: Mac OS X, Oracle Java 7, ProcessJobFactory
> Reporter: Roger Hoover
> Assignee: Yan Fang
> Fix For: 0.9.0
>
> Attachments: SAMZA-503.patch
>
>
> For slow jobs, the
> KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does
> not get updated very often.
> To reproduce:
> * Create a job that processes one message and sleeps for 5 seconds
> * Create it's input topic but do not populate it yet
> * Start the job
> * Load 1000s of messages to it's input topic. You can keep adding messages
> with a "wait -n 1 <kafka console producer command>"
> What happens:
> * Run jconsole to view the JMX metrics
> * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG
> time (~10 minutes?) before finally updating.
> What should happen:
> * The gauge should get updated at a reasonable interval (a least every few
> seconds)
> I think what's happening is that the BrokerProxy only updates the high
> watermark when a consumer is ready for more messages. When the job is so
> slow, this rarely happens to the metric doesn't get updated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)