[jira] [Commented] (SAMZA-503) Lag gauge very slow to update for slow jobs

Chris Riccomini (JIRA) Fri, 16 Jan 2015 18:03:07 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281135#comment-14281135
 ]


Chris Riccomini commented on SAMZA-503:
---------------------------------------

bq. Chris Riccomini, I think it's ok if it only updates when the main thread 
returns control from process(). It should be very rare to block for minutes 
processing a single message.

The thing that bugs me about doing this in the container-layer (as opposed to 
the Kafka layer) are the weird assumptions that we have to make:

# The offsets are ordered
# The offsets are longs
# The subtraction of the logs is semantically meaningful

None of these are necessarily true. In addition, the mental-model expense of 
having to remember that this metric doesn't get updated when your task has slow 
process() calls is annoying.

But, I can't figure out a clean way to do this within the Kafka layer. So, if 
we accept implementing this in the container-layer, then I think a potentially 
good solution would be to add IncomingMessageEnvelope.getMaxOffset(). This is a 
fairly big user-facing API change, but it:

# Mirrors Kafka's APIs, which seems like a reasonable thing.
# Allows OffsetManager to store the maxOffset for each SSP.
# Allows OffsetManager to expose a gauge, which is maxOffset - 
lastProcessOffset.
# Exposes the max offset to users, so they can make processing decisions based 
on their lag (i.e. drop messages to catch up).

> Lag gauge very slow to update for slow jobs
> -------------------------------------------
>
>                 Key: SAMZA-503
>                 URL: https://issues.apache.org/jira/browse/SAMZA-503
>             Project: Samza
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.8.0
>         Environment: Mac OS X, Oracle Java 7, ProcessJobFactory
>            Reporter: Roger Hoover
>            Assignee: Yan Fang
>             Fix For: 0.9.0
>
>         Attachments: SAMZA-503.patch
>
>
> For slow jobs, the 
> KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does 
> not get updated very often.
> To reproduce:
> * Create a job that processes one message and sleeps for 5 seconds
> * Create it's input topic but do not populate it yet
> * Start the job
> * Load 1000s of messages to it's input topic.  You can keep adding messages 
> with a "wait -n 1 <kafka console producer command>"
> What happens:
> * Run jconsole to view the JMX metrics
> * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG 
> time (~10 minutes?) before finally updating.
> What should happen:
> * The gauge should get updated at a reasonable interval (a least every few 
> seconds)
> I think what's happening is that the BrokerProxy only updates the high 
> watermark when a consumer is ready for more messages.  When the job is so 
> slow, this rarely happens to the metric doesn't get updated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-503) Lag gauge very slow to update for slow jobs

Reply via email to