DaveDuggins commented on code in PR #15809: URL: https://github.com/apache/pulsar/pull/15809#discussion_r922398210
########## site2/docs/architecture-overview.md: ########## @@ -0,0 +1,143 @@ +--- + +id: concepts-architecture-overview + +title: Architecture overview + +sidebar_label: Concepts + +--- + +The following overview describes the components that make up a Pulsar cluster, from general to specific. + +### Instance + +*** + +A Pulsar instance is composed of one or more Pulsar clusters. Clusters within an instance can [replicate](concepts-replication.md) data amongst themselves. + +### Cluster + +*** + + + +In a Pulsar cluster: + +* One or more **brokers** handles and load balances incoming messages from **producers**, dispatches **messages** to **consumers**, communicates with the Pulsar **configuration store** to handle various coordination tasks, stores messages in BookKeeper instances (aka **bookies**), relies on a cluster-specific ZooKeeper cluster for certain tasks, and more. + +* A BookKeeper cluster consisting of one or more bookies handles [persistent storage](#persistent-storage) of messages. + +* A ZooKeeper cluster specific to that cluster handles coordination tasks between Pulsar clusters. + +An instance-wide ZooKeeper cluster called the Configuration Store handles coordination tasks involving multiple clusters, for example [geo-replication](concepts-replication.md). + +For a guide to managing Pulsar clusters, see the [clusters](admin-api-clusters.md) guide. + +### Producer + +*** + +A producer is a process that attaches to a topic and publishes messages to a Pulsar [broker](reference-terminology.md#broker). The Pulsar broker processes the messages. + +Refer to the [producer](concepts-producer.md) topic for more information. + +### Topic + +*** + + + +As in other pub-sub systems, topics in Pulsar are named channels for transmitting messages from producers to consumers. Topic names are URLs that have a well-defined structure: + +```http + +{persistent|non-persistent}://tenant/namespace/topic + +``` + +| Topic name component | Description | +|:--------------------|:-----------| +| persistent / non-persistent | This identifies the type of topic. Pulsar supports two kind of topics: [persistent](concepts-architecture-overview.md#persistent-storage) and [non-persistent](#non-persistent-topics). The default is persistent, so if you do not specify a type, the topic is persistent. With persistent topics, all messages are durably persisted on disks (if the broker is not standalone, messages are durably persisted on multiple disks), whereas data for non-persistent topics is not persisted to storage disks. +tenant | The topic tenant within the instance. Tenants are essential to multi-tenancy in Pulsar, and spread across clusters. +|`namespace` | The administrative unit of the topic, which acts as a grouping mechanism for related topics. Most topic configuration is performed at the [namespace](#namespaces) level. Each tenant has one or multiple namespaces. +|topic | The final part of the name. Topic names have no special meaning in a Pulsar instance. + + + +Refer to [topic](concepts-topic.md) for more information. + +### Consumer + +*** + +A consumer is a process that attaches to a topic via a subscription and then receives messages. + +A consumer sends a [flow permit request](developing-binary-protocol.md#flow-control) to a broker to get messages. There is a queue at the consumer side to receive messages pushed from the broker. You can configure the queue size with the [`receiverQueueSize`](client-libraries-java.md#configure-consumer) parameter. The default size is `1000`). Each time `consumer.receive()` is called, a message is dequeued from the buffer. + +Refer to the [consumer](concepts-consumer.md) topic for more information. + +### Broker + +*** + +The **Pulsar message broker** is a stateless component that's primarily responsible for running two other components: + +* An HTTP server that exposes an {@inject: rest:REST:/} API for both administrative tasks and [topic lookup](concepts-clients.md#client-setup-phase) for producers and consumers. The producers connect to the brokers to publish messages and the consumers connect to the brokers to consume the messages. + +* A dispatcher, which is an asynchronous TCP server over a custom [binary protocol](developing-binary-protocol.md) used for all data transfers. + + + +Messages are typically dispatched out of a [managed ledger](#managed-ledgers) cache for the sake of performance, *unless* the backlog exceeds the cache size. If the backlog grows too large for the cache, the broker will start reading entries from BookKeeper. + +Finally, to support geo-replication on global topics, the broker manages replicators that tail the entries published in the local region and republish them to the remote region using the Pulsar [Java client library](client-libraries-java.md). + +> For a guide to managing Pulsar brokers, see the [brokers](admin-api-brokers.md) guide. + +### Namespace + +*** + + + +A namespace is a logical nomenclature within a tenant. A tenant creates multiple namespaces via the [admin API](admin-api-namespaces.md#create). For instance, a tenant with different applications can create a separate namespace for each application. A namespace allows the application to create and manage a hierarchy of topics. The topic `my-tenant/app1` is a namespace for the application `app1` for `my-tenant`. You can create any number of [topics](#topics) under the namespace. + +### Metadata Store + +*** + +The Pulsar metadata store maintains all the metadata of a Pulsar cluster, such as topic metadata, schema, broker load data, and so on. Pulsar uses [Apache ZooKeeper](https://zookeeper.apache.org/) for metadata storage, cluster configuration, and coordination. The Pulsar metadata store can be deployed on a separate ZooKeeper cluster or an existing ZooKeeper cluster. You can use one ZooKeeper cluster for both Pulsar metadata store and [BookKeeper metadata store](https://bookkeeper.apache.org/docs/latest/getting-started/concepts/#metadata-storage). If you want to deploy Pulsar brokers connected to an existing BookKeeper cluster, you need to deploy separate ZooKeeper clusters for Pulsar metadata store and BookKeeper metadata store respectively. + +> Pulsar also supports more metadata backend services, including [ETCD](https://etcd.io/) and [RocksDB](http://rocksdb.org/) (for standalone Pulsar only). Review Comment: It is prominent enough without highlighting. It's the first item on the page and there are several code snippets in the bullet statements, which draw attention. Leaving as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org