RobertIndie commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012464500
##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
sidebar_label: "Get started"
---
-This chapter introduces Pulsar schemas and explains why they are important.
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
-Type safety is extremely important in any application built around a message
bus like Pulsar.
-Producers and consumers need some kind of mechanism for coordinating types at
the topic level to avoid various potential problems arising. For example,
serialization and deserialization issues.
+This hands-on tutorial provides instructions and examples on how to construct
and customize schemas.
-Applications typically adopt one of the following approaches to guarantee type
safety in messaging. Both approaches are available in Pulsar, and you're free
to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java
client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python
client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string
schema](schema-understand.md#primitive-type) and use it to produce and consume
messages in Java.
-### Client-side approach
+1. Create a producer with a string schema and send messages.
-Producers and consumers are responsible for not only serializing and
deserializing messages (which consist of raw bytes) but also "knowing" which
types are being transmitted via which topics.
+ ```java
+ Producer<String> producer = client.newProducer(Schema.STRING).create();
+ producer.newMessage().value("Hello Pulsar!").send();
+ ```
-If a producer is sending temperature sensor data on the topic `topic-1`,
consumers of that topic will run into trouble if they attempt to parse that
data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.
-Producers and consumers can send and receive messages consisting of raw byte
arrays and leave all type safety enforcement to the application on an
"out-of-band" basis.
+ ```java
+ Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+ consumer.receive();
+ ```
-### Server-side approach
+## Construct a key/value schema
-Producers and consumers inform the system which data types can be transmitted
via the topic.
+This example shows how to construct a [key/value
schema](schema-understand.md#keyvalue-schema) and use it to produce and consume
messages in Java.
-With this approach, the messaging system enforces type safety and ensures that
producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
-Pulsar has a built-in **schema registry** that enables clients to upload data
schemas on a per-topic basis. Those schemas dictate which data types are
recognized as valid for that topic.
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.INLINE
+ );
+ ```
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and
sends bytes as outputs. While data has meaning beyond bytes, you need to parse
data and might encounter parse exceptions which mainly occur in the following
situations:
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+ ```
-* The field does not exist
+3. Produce messages using a key/value schema.
-* The field type has changed (for example, `string` is changed to `int`)
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
Review Comment:
```suggestion
```
We have already created the schema in the previous section.
##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
sidebar_label: "Get started"
---
-This chapter introduces Pulsar schemas and explains why they are important.
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
-Type safety is extremely important in any application built around a message
bus like Pulsar.
-Producers and consumers need some kind of mechanism for coordinating types at
the topic level to avoid various potential problems arising. For example,
serialization and deserialization issues.
+This hands-on tutorial provides instructions and examples on how to construct
and customize schemas.
-Applications typically adopt one of the following approaches to guarantee type
safety in messaging. Both approaches are available in Pulsar, and you're free
to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java
client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python
client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string
schema](schema-understand.md#primitive-type) and use it to produce and consume
messages in Java.
-### Client-side approach
+1. Create a producer with a string schema and send messages.
-Producers and consumers are responsible for not only serializing and
deserializing messages (which consist of raw bytes) but also "knowing" which
types are being transmitted via which topics.
+ ```java
+ Producer<String> producer = client.newProducer(Schema.STRING).create();
+ producer.newMessage().value("Hello Pulsar!").send();
+ ```
-If a producer is sending temperature sensor data on the topic `topic-1`,
consumers of that topic will run into trouble if they attempt to parse that
data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.
-Producers and consumers can send and receive messages consisting of raw byte
arrays and leave all type safety enforcement to the application on an
"out-of-band" basis.
+ ```java
+ Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+ consumer.receive();
+ ```
-### Server-side approach
+## Construct a key/value schema
-Producers and consumers inform the system which data types can be transmitted
via the topic.
+This example shows how to construct a [key/value
schema](schema-understand.md#keyvalue-schema) and use it to produce and consume
messages in Java.
-With this approach, the messaging system enforces type safety and ensures that
producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
-Pulsar has a built-in **schema registry** that enables clients to upload data
schemas on a per-topic basis. Those schemas dictate which data types are
recognized as valid for that topic.
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.INLINE
+ );
+ ```
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and
sends bytes as outputs. While data has meaning beyond bytes, you need to parse
data and might encounter parse exceptions which mainly occur in the following
situations:
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+ ```
-* The field does not exist
+3. Produce messages using a key/value schema.
-* The field type has changed (for example, `string` is changed to `int`)
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
-There are a few methods to prevent and overcome these exceptions, for example,
you can catch exceptions when parsing errors, which makes code hard to
maintain; or you can adopt a schema management system to perform schema
evolution, not to break downstream applications, and enforces type safety to
max extend in the language you are using, the solution is Pulsar Schema.
+ Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+ .topic(TOPIC)
+ .create();
-Pulsar schema enables you to use language-specific types of data when
constructing and handling messages from simple types like `string` to more
complex application-specific types.
+ final int key = 100;
+ final String value = "value-100";
+
+ // send the key/value message
+ producer.newMessage()
+ .value(new KeyValue(key, value))
+ .send();
+ ```
+
+4. Consume messages using a key/value schema.
+
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+
+ Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+ ...
+ .topic(TOPIC)
+ .subscriptionName(SubscriptionName).subscribe();
+
+ // receive key/value pair
+ Message<KeyValue<Integer, String>> msg = consumer.receive();
+ KeyValue<Integer, String> kv = msg.getValue();
+ ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct
schema](schema-understand.md#struct-schema) and use it to produce and consume
messages using different methods.
+
+````mdx-code-block
+<Tabs
+ defaultValue="static"
+
values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct`
in Go, or classes generated by Avro or Protobuf tools.
+
+**Example**
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro
library. The schema definition is the schema data stored as a part of the
`SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+ ```java
+ @Builder
+ @AllArgsConstructor
+ @NoArgsConstructor
+ public static class User {
+ String name;
+ int age;
+ }
+ ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+ ```java
+ Producer<User> producer =
client.newProducer(Schema.AVRO(User.class)).create();
+
producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+ ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+ ```java
+ Consumer<User> consumer =
client.newConsumer(Schema.AVRO(User.class)).subscribe();
+ User user = consumer.receive().getValue();
+ ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this
method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate
a generic struct using `GenericRecordBuilder` and consume messages into
`GenericRecord`.
**Example**
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+ ```java
+ RecordSchemaBuilder recordSchemaBuilder =
SchemaBuilder.record("schemaName");
+ recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+ SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+ Producer<GenericRecord> producer =
client.newProducer(Schema.generic(schemaInfo)).create();
+ ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+ ```java
+ producer.newMessage().value(schema.newRecordBuilder()
+ .set("intField", 32)
+ .build()).send();
+ ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example**
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+ ```java
+ @Builder
+ @AllArgsConstructor
+ @NoArgsConstructor
+ public static class User {
+ String name;
+ int age;
+ }
+ ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+ ```java
+ SchemaDefinition<User> schemaDefinition =
SchemaDefinition.<User>builder().withPojo(User.class).build();
+ Producer<User> producer =
client.newProducer(Schema.AVRO(schemaDefinition)).create();
+
producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+ ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+ ```java
+ SchemaDefinition<User> schemaDefinition =
SchemaDefinition.<User>builder().withPojo(User.class).build();
+ Consumer<User> consumer =
client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+ User user = consumer.receive().getValue();
+ ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to
transmit it over a Pulsar topic.
```java
-public class User {
- String name;
- int age;
+public class SensorReading {
+ public float temperature;
+
+ public SensorReading(float temperature) {
+ this.temperature = temperature;
+ }
+
+ // A no-arg constructor is required
+ public SensorReading() {
+ }
+
+ public float getTemperature() {
+ return temperature;
+ }
+
+ public void setTemperature(float temperature) {
+ this.temperature = temperature;
+ }
}
```
-When constructing a producer with the _User_ class, you can specify a schema
or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer =
client.newProducer(JSONSchema.of(SensorReading.class))
Review Comment:
```suggestion
Producer<SensorReading> producer =
client.newProducer(AvroSchema.of(SensorReading.class))
```
##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
sidebar_label: "Get started"
---
-This chapter introduces Pulsar schemas and explains why they are important.
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
-Type safety is extremely important in any application built around a message
bus like Pulsar.
-Producers and consumers need some kind of mechanism for coordinating types at
the topic level to avoid various potential problems arising. For example,
serialization and deserialization issues.
+This hands-on tutorial provides instructions and examples on how to construct
and customize schemas.
-Applications typically adopt one of the following approaches to guarantee type
safety in messaging. Both approaches are available in Pulsar, and you're free
to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java
client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python
client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string
schema](schema-understand.md#primitive-type) and use it to produce and consume
messages in Java.
-### Client-side approach
+1. Create a producer with a string schema and send messages.
-Producers and consumers are responsible for not only serializing and
deserializing messages (which consist of raw bytes) but also "knowing" which
types are being transmitted via which topics.
+ ```java
+ Producer<String> producer = client.newProducer(Schema.STRING).create();
+ producer.newMessage().value("Hello Pulsar!").send();
+ ```
-If a producer is sending temperature sensor data on the topic `topic-1`,
consumers of that topic will run into trouble if they attempt to parse that
data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.
-Producers and consumers can send and receive messages consisting of raw byte
arrays and leave all type safety enforcement to the application on an
"out-of-band" basis.
+ ```java
+ Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+ consumer.receive();
+ ```
-### Server-side approach
+## Construct a key/value schema
-Producers and consumers inform the system which data types can be transmitted
via the topic.
+This example shows how to construct a [key/value
schema](schema-understand.md#keyvalue-schema) and use it to produce and consume
messages in Java.
-With this approach, the messaging system enforces type safety and ensures that
producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
-Pulsar has a built-in **schema registry** that enables clients to upload data
schemas on a per-topic basis. Those schemas dictate which data types are
recognized as valid for that topic.
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.INLINE
+ );
+ ```
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and
sends bytes as outputs. While data has meaning beyond bytes, you need to parse
data and might encounter parse exceptions which mainly occur in the following
situations:
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+ ```
-* The field does not exist
+3. Produce messages using a key/value schema.
-* The field type has changed (for example, `string` is changed to `int`)
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
-There are a few methods to prevent and overcome these exceptions, for example,
you can catch exceptions when parsing errors, which makes code hard to
maintain; or you can adopt a schema management system to perform schema
evolution, not to break downstream applications, and enforces type safety to
max extend in the language you are using, the solution is Pulsar Schema.
+ Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+ .topic(TOPIC)
+ .create();
-Pulsar schema enables you to use language-specific types of data when
constructing and handling messages from simple types like `string` to more
complex application-specific types.
+ final int key = 100;
+ final String value = "value-100";
+
+ // send the key/value message
+ producer.newMessage()
+ .value(new KeyValue(key, value))
+ .send();
+ ```
+
+4. Consume messages using a key/value schema.
+
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+
+ Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+ ...
+ .topic(TOPIC)
+ .subscriptionName(SubscriptionName).subscribe();
+
+ // receive key/value pair
+ Message<KeyValue<Integer, String>> msg = consumer.receive();
+ KeyValue<Integer, String> kv = msg.getValue();
+ ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct
schema](schema-understand.md#struct-schema) and use it to produce and consume
messages using different methods.
+
+````mdx-code-block
+<Tabs
+ defaultValue="static"
+
values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct`
in Go, or classes generated by Avro or Protobuf tools.
+
+**Example**
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro
library. The schema definition is the schema data stored as a part of the
`SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+ ```java
+ @Builder
+ @AllArgsConstructor
+ @NoArgsConstructor
+ public static class User {
+ String name;
+ int age;
+ }
+ ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+ ```java
+ Producer<User> producer =
client.newProducer(Schema.AVRO(User.class)).create();
+
producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+ ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+ ```java
+ Consumer<User> consumer =
client.newConsumer(Schema.AVRO(User.class)).subscribe();
+ User user = consumer.receive().getValue();
+ ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this
method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate
a generic struct using `GenericRecordBuilder` and consume messages into
`GenericRecord`.
**Example**
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+ ```java
+ RecordSchemaBuilder recordSchemaBuilder =
SchemaBuilder.record("schemaName");
+ recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+ SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+ Producer<GenericRecord> producer =
client.newProducer(Schema.generic(schemaInfo)).create();
+ ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+ ```java
+ producer.newMessage().value(schema.newRecordBuilder()
+ .set("intField", 32)
+ .build()).send();
+ ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example**
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+ ```java
+ @Builder
+ @AllArgsConstructor
+ @NoArgsConstructor
+ public static class User {
+ String name;
+ int age;
+ }
+ ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+ ```java
+ SchemaDefinition<User> schemaDefinition =
SchemaDefinition.<User>builder().withPojo(User.class).build();
+ Producer<User> producer =
client.newProducer(Schema.AVRO(schemaDefinition)).create();
+
producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+ ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+ ```java
+ SchemaDefinition<User> schemaDefinition =
SchemaDefinition.<User>builder().withPojo(User.class).build();
+ Consumer<User> consumer =
client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+ User user = consumer.receive().getValue();
+ ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to
transmit it over a Pulsar topic.
```java
-public class User {
- String name;
- int age;
+public class SensorReading {
+ public float temperature;
+
+ public SensorReading(float temperature) {
+ this.temperature = temperature;
+ }
+
+ // A no-arg constructor is required
+ public SensorReading() {
+ }
+
+ public float getTemperature() {
+ return temperature;
+ }
+
+ public void setTemperature(float temperature) {
+ this.temperature = temperature;
+ }
}
```
-When constructing a producer with the _User_ class, you can specify a schema
or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer =
client.newProducer(JSONSchema.of(SensorReading.class))
+ .topic("sensor-readings")
+ .create();
+```
+
+The following schema formats are currently available for Java:
+
+* No schema or the byte array schema (which can be applied using
`Schema.BYTES`):
+
+ ```java
+ Producer<byte[]> bytesProducer = client.newProducer(Schema.BYTES)
+ .topic("some-raw-bytes-topic")
+ .create();
+ ```
+
+ Or, equivalently:
+
+ ```java
+ Producer<byte[]> bytesProducer = client.newProducer()
+ .topic("some-raw-bytes-topic")
+ .create();
+ ```
+
+* `String` for normal UTF-8-encoded string data. Apply the schema using
`Schema.STRING`:
+
+ ```java
+ Producer<String> stringProducer = client.newProducer(Schema.STRING)
+ .topic("some-string-topic")
+ .create();
+ ```
+
+* Create JSON schemas for POJOs using `Schema.JSON`. The following is an
example.
+
+ ```java
+ Producer<MyPojo> pojoProducer = client.newProducer(Schema.JSON(MyPojo.class))
+ .topic("some-pojo-topic")
+ .create();
+ ```
+
+* Generate Protobuf schemas using `Schema.PROTOBUF`. The following example
shows how to create the Protobuf schema and use it to instantiate a new
producer:
+
+ ```java
+ Producer<MyProtobuf> protobufProducer =
client.newProducer(Schema.PROTOBUF(MyProtobuf.class))
+ .topic("some-protobuf-topic")
+ .create();
+ ```
+
+* Define Avro schemas with `Schema.AVRO`. The following code snippet
demonstrates how to create and use Avro schema.
-### Without schema
+ ```java
+ Producer<MyAvro> avroProducer = client.newProducer(Schema.AVRO(MyAvro.class))
+ .topic("some-avro-topic")
+ .create();
+ ```
-If you construct a producer without specifying a schema, then the producer can
only produce messages of type `byte[]`. If you have a POJO class, you need to
serialize the POJO into bytes before sending messages.
-**Example**
+### Avro schema using C++
+
+- The following example shows how to create a producer with an Avro schema.
+
+ ```cpp
+ static const std::string exampleSchema =
+ "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
+
"\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
+ Producer producer;
+ ProducerConfiguration producerConf;
+ producerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
+ client.createProducer("topic-avro", producerConf, producer);
+ ```
+
+- The following example shows how to create a consumer with an Avro schema.
+
+ ```cpp
+ static const std::string exampleSchema =
+ "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
+
"\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
+ ConsumerConfiguration consumerConf;
+ Consumer consumer;
+ consumerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
+ client.subscribe("topic-avro", "sub-2", consumerConf, consumer)
+ ```
+
+### ProtobufNative schema using C++
Review Comment:
Do we also need to talk about `ProtobufNative schema using Java`?
##########
site2/docs/client-libraries-cpp.md:
##########
@@ -412,80 +412,4 @@ For complete examples, refer to [C++ client
examples](https://github.com/apache/
## Schema
-This section describes some examples about schema. For more information about
schema, see [Pulsar schema](schema-get-started.md).
-
-### Avro schema
-
-- The following example shows how to create a producer with an Avro schema.
-
- ```cpp
- static const std::string exampleSchema =
- "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
-
"\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
- Producer producer;
- ProducerConfiguration producerConf;
- producerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
- client.createProducer("topic-avro", producerConf, producer);
- ```
-
-- The following example shows how to create a consumer with an Avro schema.
-
- ```cpp
- static const std::string exampleSchema =
- "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
-
"\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
- ConsumerConfiguration consumerConf;
- Consumer consumer;
- consumerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
- client.subscribe("topic-avro", "sub-2", consumerConf, consumer)
- ```
-
-### ProtobufNative schema
-
-The following example shows how to create a producer and a consumer with a
ProtobufNative schema.
-
-1. Generate the `User` class using Protobuf3 or later versions.
-
- ```protobuf
- syntax = "proto3";
-
- message User {
- string name = 1;
- int32 age = 2;
- }
- ```
-
-2. Include the `ProtobufNativeSchema.h` in your source code. Ensure the
Protobuf dependency has been added to your project.
-
- ```cpp
- #include <pulsar/ProtobufNativeSchema.h>
- ```
-
-3. Create a producer to send a `User` instance.
-
- ```cpp
- ProducerConfiguration producerConf;
- producerConf.setSchema(createProtobufNativeSchema(User::GetDescriptor()));
- Producer producer;
- client.createProducer("topic-protobuf", producerConf, producer);
- User user;
- user.set_name("my-name");
- user.set_age(10);
- std::string content;
- user.SerializeToString(&content);
- producer.send(MessageBuilder().setContent(content).build());
- ```
-
-4. Create a consumer to receive a `User` instance.
-
- ```cpp
- ConsumerConfiguration consumerConf;
- consumerConf.setSchema(createProtobufNativeSchema(User::GetDescriptor()));
- consumerConf.setSubscriptionInitialPosition(InitialPositionEarliest);
- Consumer consumer;
- client.subscribe("topic-protobuf", "my-sub", consumerConf, consumer);
- Message msg;
- consumer.receive(msg);
- User user2;
- user2.ParseFromArray(msg.getData(), msg.getLength());
- ```
+To work with [Pulsar schema](schema-overview.md) using C++ clients, see
[Schema - Get started](schema-get-started.md). For specific schema types that
C++ clients support, see
[code](https://github.com/apache/pulsar-client-cpp/blob/main/include/pulsar/Schema.h#L51-L132).
Review Comment:
Can we link it to the CPP doc? This link is not stable, it will point to
other lines of codes with codes changing.
##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
| `keyvalue` | Represents a complex type of a key/value pair. |
| `struct` | Handles structured data. It supports `AvroBaseStructSchema` and
`ProtobufNativeSchema`. |
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value.
+#### `keyvalue` schema
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key
schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value.
Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value
schema together.
-Pulsar provides the following methods to encode a key/value pair in messages:
+You can choose the encoding type when constructing the key/value schema.:
+* `INLINE` - Key/value pairs are encoded together in the message payload.
+* `SEPARATED` - see [Construct a key/value
schema](schema-get-started.md#construct-a-keyvalue-schema).
Review Comment:
Better to briefly explain `SEPARATED` here.
##########
site2/docs/schema-understand.md:
##########
@@ -43,74 +41,36 @@ This is the `SchemaInfo` of a string.
## Schema type
Pulsar supports various schema types, which are mainly divided into two
categories:
-
-* Primitive type
-
-* Complex type
+* [Primitive type](#primitive-type)
+* [Complex type](#complex-type)
### Primitive type
-Currently, Pulsar supports the following primitive types:
-
-| Primitive Type | Description |
-|---|---|
-| `BOOLEAN` | A binary value |
-| `INT8` | A 8-bit signed integer |
-| `INT16` | A 16-bit signed integer |
-| `INT32` | A 32-bit signed integer |
-| `INT64` | A 64-bit signed integer |
-| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number |
-| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number |
-| `BYTES` | A sequence of 8-bit unsigned bytes |
-| `STRING` | A Unicode character sequence |
-| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant
in time with millisecond precision. <br />It stores the number of milliseconds
since `January 1, 1970, 00:00:00 GMT` as an `INT64` value |
-| INSTANT | A single instantaneous point on the time-line with nanoseconds
precision|
-| LOCAL_DATE | An immutable date-time object that represents a date, often
viewed as year-month-day|
-| LOCAL_TIME | An immutable date-time object that represents a time, often
viewed as hour-minute-second. Time is represented to nanosecond precision.|
-| LOCAL_DATE_TIME | An immutable date-time object that represents a date-time,
often viewed as year-month-day-hour-minute-second |
-
-For primitive types, Pulsar does not store any schema data in `SchemaInfo`.
The `type` in `SchemaInfo` is used to determine how to serialize and
deserialize the data.
+The following table outlines the primitive types that Pulsar schema supports,
and the conversions between **schema types** and **language-specific primitive
types**.
+
+| Primitive Type | Description | Java Type| Python Type | Go Type |
+|---|---|---|---|---|
+| `BOOLEAN` | A binary value | boolean | bool | bool |
+| `INT8` | A 8-bit signed integer | byte | | int8 |
+| `INT16` | A 16-bit signed integer | short | | int16 |
+| `INT32` | A 32-bit signed integer | int | | int32 |
+| `INT64` | A 64-bit signed integer | long | | int64 |
Review Comment:
```suggestion
| `INT8` | A 8-bit signed integer | byte | int | int8 |
| `INT16` | A 16-bit signed integer | short | int | int16 |
| `INT32` | A 32-bit signed integer | int | int | int32 |
| `INT64` | A 64-bit signed integer | long | int | int64 |
```
##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
sidebar_label: "Get started"
---
-This chapter introduces Pulsar schemas and explains why they are important.
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
-Type safety is extremely important in any application built around a message
bus like Pulsar.
-Producers and consumers need some kind of mechanism for coordinating types at
the topic level to avoid various potential problems arising. For example,
serialization and deserialization issues.
+This hands-on tutorial provides instructions and examples on how to construct
and customize schemas.
-Applications typically adopt one of the following approaches to guarantee type
safety in messaging. Both approaches are available in Pulsar, and you're free
to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java
client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python
client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string
schema](schema-understand.md#primitive-type) and use it to produce and consume
messages in Java.
-### Client-side approach
+1. Create a producer with a string schema and send messages.
-Producers and consumers are responsible for not only serializing and
deserializing messages (which consist of raw bytes) but also "knowing" which
types are being transmitted via which topics.
+ ```java
+ Producer<String> producer = client.newProducer(Schema.STRING).create();
+ producer.newMessage().value("Hello Pulsar!").send();
+ ```
-If a producer is sending temperature sensor data on the topic `topic-1`,
consumers of that topic will run into trouble if they attempt to parse that
data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.
-Producers and consumers can send and receive messages consisting of raw byte
arrays and leave all type safety enforcement to the application on an
"out-of-band" basis.
+ ```java
+ Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+ consumer.receive();
+ ```
-### Server-side approach
+## Construct a key/value schema
-Producers and consumers inform the system which data types can be transmitted
via the topic.
+This example shows how to construct a [key/value
schema](schema-understand.md#keyvalue-schema) and use it to produce and consume
messages in Java.
-With this approach, the messaging system enforces type safety and ensures that
producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
-Pulsar has a built-in **schema registry** that enables clients to upload data
schemas on a per-topic basis. Those schemas dictate which data types are
recognized as valid for that topic.
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.INLINE
+ );
+ ```
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and
sends bytes as outputs. While data has meaning beyond bytes, you need to parse
data and might encounter parse exceptions which mainly occur in the following
situations:
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+ ```
-* The field does not exist
+3. Produce messages using a key/value schema.
-* The field type has changed (for example, `string` is changed to `int`)
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
-There are a few methods to prevent and overcome these exceptions, for example,
you can catch exceptions when parsing errors, which makes code hard to
maintain; or you can adopt a schema management system to perform schema
evolution, not to break downstream applications, and enforces type safety to
max extend in the language you are using, the solution is Pulsar Schema.
+ Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+ .topic(TOPIC)
+ .create();
-Pulsar schema enables you to use language-specific types of data when
constructing and handling messages from simple types like `string` to more
complex application-specific types.
+ final int key = 100;
+ final String value = "value-100";
+
+ // send the key/value message
+ producer.newMessage()
+ .value(new KeyValue(key, value))
+ .send();
+ ```
+
+4. Consume messages using a key/value schema.
+
+ ```java
+ Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+ Schema.INT32,
+ Schema.STRING,
+ KeyValueEncodingType.SEPARATED
+ );
+
+ Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+ ...
+ .topic(TOPIC)
+ .subscriptionName(SubscriptionName).subscribe();
+
+ // receive key/value pair
+ Message<KeyValue<Integer, String>> msg = consumer.receive();
+ KeyValue<Integer, String> kv = msg.getValue();
+ ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct
schema](schema-understand.md#struct-schema) and use it to produce and consume
messages using different methods.
+
+````mdx-code-block
+<Tabs
+ defaultValue="static"
+
values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct`
in Go, or classes generated by Avro or Protobuf tools.
+
+**Example**
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro
library. The schema definition is the schema data stored as a part of the
`SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+ ```java
+ @Builder
+ @AllArgsConstructor
+ @NoArgsConstructor
+ public static class User {
+ String name;
+ int age;
+ }
+ ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+ ```java
+ Producer<User> producer =
client.newProducer(Schema.AVRO(User.class)).create();
+
producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+ ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+ ```java
+ Consumer<User> consumer =
client.newConsumer(Schema.AVRO(User.class)).subscribe();
+ User user = consumer.receive().getValue();
+ ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this
method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate
a generic struct using `GenericRecordBuilder` and consume messages into
`GenericRecord`.
**Example**
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+ ```java
+ RecordSchemaBuilder recordSchemaBuilder =
SchemaBuilder.record("schemaName");
+ recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+ SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+ Producer<GenericRecord> producer =
client.newProducer(Schema.generic(schemaInfo)).create();
+ ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+ ```java
+ producer.newMessage().value(schema.newRecordBuilder()
+ .set("intField", 32)
+ .build()).send();
+ ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example**
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+ ```java
+ @Builder
+ @AllArgsConstructor
+ @NoArgsConstructor
+ public static class User {
+ String name;
+ int age;
+ }
+ ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+ ```java
+ SchemaDefinition<User> schemaDefinition =
SchemaDefinition.<User>builder().withPojo(User.class).build();
+ Producer<User> producer =
client.newProducer(Schema.AVRO(schemaDefinition)).create();
+
producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+ ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+ ```java
+ SchemaDefinition<User> schemaDefinition =
SchemaDefinition.<User>builder().withPojo(User.class).build();
+ Consumer<User> consumer =
client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+ User user = consumer.receive().getValue();
+ ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to
transmit it over a Pulsar topic.
```java
-public class User {
- String name;
- int age;
+public class SensorReading {
+ public float temperature;
+
+ public SensorReading(float temperature) {
+ this.temperature = temperature;
+ }
+
+ // A no-arg constructor is required
+ public SensorReading() {
+ }
+
+ public float getTemperature() {
+ return temperature;
+ }
+
+ public void setTemperature(float temperature) {
+ this.temperature = temperature;
+ }
}
```
-When constructing a producer with the _User_ class, you can specify a schema
or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer =
client.newProducer(JSONSchema.of(SensorReading.class))
+ .topic("sensor-readings")
+ .create();
+```
+
+The following schema formats are currently available for Java:
Review Comment:
This section seems to be introducing the Avro schema, but why are other
schemas also introduced here?
If we want to indicate that all these schemas are based on Avro protocol,
then I think it's better to use `Avro based schema ...` as the title.
Otherwise, it will make uses confused because there is an `AvroSchema` based on
Avro protocol.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]