[
https://issues.apache.org/jira/browse/SAMZA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281030#comment-14281030
]
Chris Riccomini commented on SAMZA-503:
---------------------------------------
bq. Why does the SimpleConsumer have a separate consumerID ? (It does have a
clientID of its own - which is a string). Looks like the older version did not
have an 'int consumerID' param. Pretty weird
No kidding. This is bizarre. [~jjkoshy]/[~guozhang], any idea while
SimpleConsumer.earliestOrLatestOffset takes an int as the client id?
bq. Why is this the case ? Don't offsets imply some kind of ordering in some
sort of collection of objects ? What is the side effect of imposing ordering on
the offsets ? (Also which input system stream does not hold this currently ? )
I think we can define them however we want. If we have a stronger definition of
offsets (e.g. they're ordered, and longs, not strings), then Samza has much
better control of things. The trade-off is that we could be excluding other
systems that might have un-sorted GUIDs, or byte arrays as their offsets. I
hadn't had any real world example of this, but [~martinkl] mentioned recently
that PostgresQL's changelog replication mechanism uses unordered GUIDs as its
offsets.
> Lag gauge very slow to update for slow jobs
> -------------------------------------------
>
> Key: SAMZA-503
> URL: https://issues.apache.org/jira/browse/SAMZA-503
> Project: Samza
> Issue Type: Bug
> Components: metrics
> Affects Versions: 0.8.0
> Environment: Mac OS X, Oracle Java 7, ProcessJobFactory
> Reporter: Roger Hoover
> Assignee: Yan Fang
> Fix For: 0.9.0
>
> Attachments: SAMZA-503.patch
>
>
> For slow jobs, the
> KafkaSystemConsumerMetrics.%s-%s-messages-behind-high-watermark) gauge does
> not get updated very often.
> To reproduce:
> * Create a job that processes one message and sleeps for 5 seconds
> * Create it's input topic but do not populate it yet
> * Start the job
> * Load 1000s of messages to it's input topic. You can keep adding messages
> with a "wait -n 1 <kafka console producer command>"
> What happens:
> * Run jconsole to view the JMX metrics
> * The %s-%s-messages-behind-high-watermark gauge will stay at 0 for a LONG
> time (~10 minutes?) before finally updating.
> What should happen:
> * The gauge should get updated at a reasonable interval (a least every few
> seconds)
> I think what's happening is that the BrokerProxy only updates the high
> watermark when a consumer is ready for more messages. When the job is so
> slow, this rarely happens to the metric doesn't get updated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)