This is an automated email from the ASF dual-hosted git repository.
bbejeck pushed a commit to branch 2.0
in repository https://gitbox.apache.org/repos/asf/kafka.git
The following commit(s) were added to refs/heads/2.0 by this push:
new a38c654 port paragrpah from CP docs (#7808)
a38c654 is described below
commit a38c65475f17b55ae60afddfdfaa9ca19d1653b0
Author: A. Sophie Blee-Goldman <[email protected]>
AuthorDate: Mon Dec 9 13:35:17 2019 -0800
port paragrpah from CP docs (#7808)
The AK Streams architecture docs should explain how the maximum parallelism
is determined
Reviewers: Bill Bejeck <[email protected]>
---
docs/streams/architecture.html | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/docs/streams/architecture.html b/docs/streams/architecture.html
index 8bc3156..7efd7ea 100644
--- a/docs/streams/architecture.html
+++ b/docs/streams/architecture.html
@@ -66,6 +66,14 @@
</p>
<p>
+ Slightly simplified, the maximum parallelism at which your application
may run is bounded by the maximum number of stream tasks, which itself is
determined by
+ maximum number of partitions of the input topic(s) the application is
reading from. For example, if your input topic has 5 partitions, then you can
run up to 5
+ applications instances. These instances will collaboratively process
the topic’s data. If you run a larger number of app instances than partitions
of the input
+ topic, the “excess” app instances will launch but remain idle;
however, if one of the busy instances goes down, one of the idle instances will
resume the former’s
+ work.
+ </p>
+
+ <p>
It is important to understand that Kafka Streams is not a resource
manager, but a library that "runs" anywhere its stream processing application
runs.
Multiple instances of the application are executed either on the same
machine, or spread across multiple machines and tasks can be distributed
automatically
by the library to those running application instances. The assignment
of partitions to tasks never changes; if an application instance fails, all its
assigned