[ 
https://issues.apache.org/jira/browse/KAFKA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598938#comment-16598938
 ] 

John Roesler commented on KAFKA-7363:
-------------------------------------

Hi Seweryn,
 
It's a little hard to say. For one thing, extra threads have some overhead of 
their own, but I agree with you that the bulk of the extra memory would come 
from the extra throughput you're able to drive through the application.
 
I haven't done any analysis of this before, so just reasoning about this (as 
opposed to speaking from experience):
 
In the maximum case, doubling your thread count would double your memory usage. 
This is for an "ideal" CPU-bound process. In reality, there are shared 
resources, such as network and disk, that should prevent you from reaching this 
bound.
 
In the minimum case, if the app is already saturating some other resource, like 
network, disk, or even memory, then increasing the thread count would not add 
an appreciable amount of memory. The reason is that if the app is saturating, 
say, the network already, then more threads doesn't change that fact, and you 
still can't increase the throughput.
 
As far as a concrete answer to your question, I think you're unfortunately the 
only one with enough visibility to predict the memory load. It would be very 
dependent on your machines, network, the number of topics and partitions, the 
size of your records in each partition, what exactly your Streams app does, and 
even your broker configuration.
 
However, I'd propose the following experimental strategy to try and get a 
handle on it:
1. start with one thread. Observe all the main resources (CPU, network i/o, 
disk i/o), but especially memory. For memory, pay particular attention to the 
memory used immediately after GC. You might want to turn on GC logging to help 
with this.
1b. observe these metrics for long enough for a stable trend to emerge. This 
might be hours or even a day.
2. add one more thread. Continue observing all the resources. As I said, in the 
ideal case, this should double your throughput and hence double your memory 
usage. Looking at how much all the extra metrics increase when you add the 
second thread should help you start building a model of the increase you should 
expect for each extra thread.
3. continue the experiment, adding one thread each time. At some point, you'll 
notice that the throughput/memory increase drops off when you add an extra 
thread. This means that you've saturated one or more other resource. The 
metrics for those resources should corroborate this.
 
Note that, if nothing else, the CPU should become saturated once the number of 
threads is equal to the number of cores. Increasing the thread count much 
beyond this shouldn't help much.
 
I hope this helps!

> How num.stream.threads in streaming application influence memory consumption?
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-7363
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7363
>             Project: Kafka
>          Issue Type: Task
>          Components: streams
>            Reporter: Seweryn Habdank-Wojewodzki
>            Priority: Major
>
> Dears,
> How option _num.stream.threads_ in streaming application influence memory 
> consumption?
> I see that by increasing num.stream.threads my application needs more memory.
> This is obvious, but it is not obvious how much I need to give it. Try and 
> error method does not work, as it seems to be highly dependen on forced 
> throughput.
> I mean: higher load more memory is needed.
> Thanks for help and regards,
> Seweryn.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to