momo-jun commented on code in PR #18242: URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013655142
########## site2/docs/schema-overview.md: ########## @@ -0,0 +1,154 @@ +--- +id: schema-overview +title: Overview +sidebar_label: "Overview" +--- + +This section introduces the following content: +* [What is Pulsar Schema](#what-is-pulsar-schema) +* [Why use it](#why-use-it) +* [How it works](#how-it-works) +* [Use case](#use-case) +* [What's next?](#whats-next) + +## What is Pulsar Schema + +Pulsar messages are stored as unstructured byte arrays and the data structure (as known as schema) is applied to this data only when it's read. The schema serializes the bytes before they are published to a topic and deserializes them before they are delivered to the consumers, dictating which data types are recognized as valid for a given topic. + +Pulsar schema registry is a central repository to store the schema information, which enables producers/consumers to coordinate on the schema of a topic’s data through brokers. + +:::note + +Currently, Pulsar schema is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md). + +::: + +## Why use it + +Type safety is extremely important in any application built around a messaging and streaming system. Raw bytes are flexible for data transfer, but the flexibility and neutrality come with a cost: you have to overlay data type checking and serialization/deserialization to ensure that the bytes fed into the system can be read and successfully consumed. In other words, you need to make sure the data intelligible and usable to applications. + +Pulsar schema resolves the pain points with the following capabilities: +* enforces the data type safety when a topic has a schema defined. As a result, producers/consumers are only allowed to connect if they are using a “compatible” schema. +* provides a central location for storing information about the schemas used within your organization, in turn greatly simplifies the sharing of this information across application teams. +* serves as a single source of truth for all the message schemas used across all your services and development teams, which makes it easier for them to collaborate. +* keeps data compatibility on-track between schema versions. When new schemas are uploaded, the new versions can be read by old consumers. +* stored in the existing storage layer BookKeeper, no additional system required. + +## How it works Review Comment: Makes sense. I think it also helps users understand what schema is. This move can be considered and improved in the content review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
