[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

GitBox Thu, 03 Nov 2022 23:12:34 -0700


momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013655142



##########
site2/docs/schema-overview.md:
##########
@@ -0,0 +1,154 @@
+---
+id: schema-overview
+title: Overview
+sidebar_label: "Overview"
+---
+
+This section introduces the following content:
+* [What is Pulsar Schema](#what-is-pulsar-schema)
+* [Why use it](#why-use-it)
+* [How it works](#how-it-works)
+* [Use case](#use-case)
+* [What's next?](#whats-next)
+
+## What is Pulsar Schema
+
+Pulsar messages are stored as unstructured byte arrays and the data structure 
(as known as schema) is applied to this data only when it's read. The schema 
serializes the bytes before they are published to a topic and deserializes them 
before they are delivered to the consumers, dictating which data types are 
recognized as valid for a given topic.
+
+Pulsar schema registry is a central repository to store the schema 
information, which enables producers/consumers to coordinate on the schema of a 
topic’s data through brokers.
+
+:::note
+
+Currently, Pulsar schema is only available for the [Java 
client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python 
client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+
+:::
+
+## Why use it
+
+Type safety is extremely important in any application built around a messaging 
and streaming system. Raw bytes are flexible for data transfer, but the 
flexibility and neutrality come with a cost: you have to overlay data type 
checking and serialization/deserialization to ensure that the bytes fed into 
the system can be read and successfully consumed. In other words, you need to 
make sure the data intelligible and usable to applications.
+
+Pulsar schema resolves the pain points with the following capabilities:
+* enforces the data type safety when a topic has a schema defined. As a 
result, producers/consumers are only allowed to connect if they are using a 
“compatible” schema.
+* provides a central location for storing information about the schemas used 
within your organization, in turn greatly simplifies the sharing of this 
information across application teams.
+* serves as a single source of truth for all the message schemas used across 
all your services and development teams, which makes it easier for them to 
collaborate.
+* keeps data compatibility on-track between schema versions. When new schemas 
are uploaded, the new versions can be read by old consumers. 
+* stored in the existing storage layer BookKeeper, no additional system 
required.
+
+## How it works

Review Comment:
   Makes sense. I think it also helps users understand what schema is. This 
move can be considered and improved in the content review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Reply via email to