[ https://issues.apache.org/jira/browse/KAFKA-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598938#comment-16598938 ]
John Roesler commented on KAFKA-7363: ------------------------------------- Hi Seweryn, It's a little hard to say. For one thing, extra threads have some overhead of their own, but I agree with you that the bulk of the extra memory would come from the extra throughput you're able to drive through the application. I haven't done any analysis of this before, so just reasoning about this (as opposed to speaking from experience): In the maximum case, doubling your thread count would double your memory usage. This is for an "ideal" CPU-bound process. In reality, there are shared resources, such as network and disk, that should prevent you from reaching this bound. In the minimum case, if the app is already saturating some other resource, like network, disk, or even memory, then increasing the thread count would not add an appreciable amount of memory. The reason is that if the app is saturating, say, the network already, then more threads doesn't change that fact, and you still can't increase the throughput. As far as a concrete answer to your question, I think you're unfortunately the only one with enough visibility to predict the memory load. It would be very dependent on your machines, network, the number of topics and partitions, the size of your records in each partition, what exactly your Streams app does, and even your broker configuration. However, I'd propose the following experimental strategy to try and get a handle on it: 1. start with one thread. Observe all the main resources (CPU, network i/o, disk i/o), but especially memory. For memory, pay particular attention to the memory used immediately after GC. You might want to turn on GC logging to help with this. 1b. observe these metrics for long enough for a stable trend to emerge. This might be hours or even a day. 2. add one more thread. Continue observing all the resources. As I said, in the ideal case, this should double your throughput and hence double your memory usage. Looking at how much all the extra metrics increase when you add the second thread should help you start building a model of the increase you should expect for each extra thread. 3. continue the experiment, adding one thread each time. At some point, you'll notice that the throughput/memory increase drops off when you add an extra thread. This means that you've saturated one or more other resource. The metrics for those resources should corroborate this. Note that, if nothing else, the CPU should become saturated once the number of threads is equal to the number of cores. Increasing the thread count much beyond this shouldn't help much. I hope this helps! > How num.stream.threads in streaming application influence memory consumption? > ----------------------------------------------------------------------------- > > Key: KAFKA-7363 > URL: https://issues.apache.org/jira/browse/KAFKA-7363 > Project: Kafka > Issue Type: Task > Components: streams > Reporter: Seweryn Habdank-Wojewodzki > Priority: Major > > Dears, > How option _num.stream.threads_ in streaming application influence memory > consumption? > I see that by increasing num.stream.threads my application needs more memory. > This is obvious, but it is not obvious how much I need to give it. Try and > error method does not work, as it seems to be highly dependen on forced > throughput. > I mean: higher load more memory is needed. > Thanks for help and regards, > Seweryn. -- This message was sent by Atlassian JIRA (v7.6.3#76005)