Hi Thunder,

> What we believe may be happening is that most of the topics have no
backlog, but one topic has all the backlog (this is because one of the
topics accounts for ~60% of the total message rate).  Could there be
something inducing extra latency on processing the one topic with a backlog
just having a bunch of other topics with NO backlog?
This seems very similar to this issue:
https://issues.apache.org/jira/browse/SAMZA-1599
This was fixed in https://github.com/apache/samza/pull/436, and the fix
should be available in the 0.14.1 version.
Would it be possible to try upgrading to 0.14.1? It should be backwards
compatible with 0.14.0.

For something you can try without upgrading: try setting
"job.container.single.thread.mode" to true. From the configuration reference
<https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html>:
"If set to true, samza will fallback to legacy single-threaded event loop.
Default is false, which enables the multithreading execution."

Let us know if this doesn't help.

Thanks,
Prateek

On Fri, Jun 8, 2018 at 1:35 PM, Thunder Stumpges <tstump...@ntent.com>
wrote:

> We have a new samza job which we just put into production. This job
> processes many topics (~30) but the total rate is not that high (~1200/sec
> in aggregate). I am unable to get above ~700/sec and have a growing backlog.
>
> We are running samza 0.12 (I have an update to 0.14 that is not tested or
> pushed yet).  When we load tested with a single topic, we could easily do
> several thousand per second. The latency of a single message is about 0.5ms
> as recorded by our timer metric on our 'process' call.
>
> What we believe may be happening is that most of the topics have no
> backlog, but one topic has all the backlog (this is because one of the
> topics accounts for ~60% of the total message rate).  Could there be
> something inducing extra latency on processing the one topic with a backlog
> just having a bunch of other topics with NO backlog?
>
> Some things I have tried:
>
>
>   1.  Increasing thread pool (10->20->30), no change
>   2.  Going from 1 container to 2, no help (the two containers run at half
> the speed and total is the same)
>   3.  Increasing task.max.concurrency from 1 -> 2 -> 3  (this had some
> minor help going from 1 to 2, but not enough)
>   4.  Increasing fetch.threshold.bytes (currently at 100,000 and we have
> pretty small messages)
>
> Some observed metrics:
>
>
>   *   "Pending Messages" are > 0  (15+ on some partitions)
>   *   "Messages in flight" is almost always 0
>   *   Polls rate is ~50/sec
>   *   Message chooser "Choos Obj" is ~680-700/sec like our processing rate
>   *   Message chooser "choose null" is ~50/sec
>
> I'm somewhat at a loss because based on the actual processing latency we
> should easily be able to do 2000+ with just a small handful of threads.
>
> Thanks in advance, this is in production I really need a solution.
> Thunder
>
>

Reply via email to