[GitHub] [pulsar] momo-jun commented on a diff in pull request #18434: [refactor][doc] Improve schema docs

GitBox Wed, 07 Dec 2022 20:55:37 -0800


momo-jun commented on code in PR #18434:
URL: https://github.com/apache/pulsar/pull/18434#discussion_r1042922337



##########
site2/docs/schema-understand.md:
##########
@@ -48,92 +46,227 @@ Pulsar supports various schema types, which are mainly 
divided into two categori
 
 The following table outlines the primitive types that Pulsar schema supports, 
and the conversions between **schema types** and **language-specific primitive 
types**.
 
-| Primitive Type | Description | Java Type| Python Type | Go Type |
-|---|---|---|---|---|
-| `BOOLEAN` | A binary value | boolean | bool | bool |
-| `INT8` | A 8-bit signed integer | int | | int8 |
-| `INT16` | A 16-bit signed integer | int | | int16 |
-| `INT32` | A 32-bit signed integer | int | | int32 |
-| `INT64` | A 64-bit signed integer | int | | int64 |
-| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | float 
| float | float32 |
-| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | 
double | float | float64|
-| `BYTES` | A sequence of 8-bit unsigned bytes | byte[], ByteBuffer, ByteBuf | 
bytes | []byte |
-| `STRING` | A Unicode character sequence | string | str | string| 
-| `TIMESTAMP` (`DATE`, `TIME`) |  A logic type represents a specific instant 
in time with millisecond precision. <br />It stores the number of milliseconds 
since `January 1, 1970, 00:00:00 GMT` as an `INT64` value |  java.sql.Timestamp 
(java.sql.Time, java.util.Date) | | |
-| INSTANT | A single instantaneous point on the time-line with nanoseconds 
precision| java.time.Instant | | |
-| LOCAL_DATE | An immutable date-time object that represents a date, often 
viewed as year-month-day| java.time.LocalDate | | |
-| LOCAL_TIME | An immutable date-time object that represents a time, often 
viewed as hour-minute-second. Time is represented to nanosecond precision.| 
java.time.LocalDateTime | |
-| LOCAL_DATE_TIME | An immutable date-time object that represents a date-time, 
often viewed as year-month-day-hour-minute-second | java.time.LocalTime | |
+| Primitive Type | Description | Java Type| Python Type | Go Type | C++ Type | 
C# Type|
+|---|---|---|---|---|---|---|
+| `BOOLEAN` | A binary value. | boolean | bool | bool | bool | bool |
+| `INT8` | A 8-bit signed integer. | int | int | int8 | int8_t | byte |
+| `INT16` | A 16-bit signed integer. | int | int | int16 | int16_t | short |
+| `INT32` | A 32-bit signed integer. | int | int | int32 | int32_t | int |
+| `INT64` | A 64-bit signed integer. | int | int | int64 | int64_t | long |
+| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number. | 
float | float | float32 | float | float |
+| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number. | 
double | double | float64| double | double |
+| `BYTES` | A sequence of 8-bit unsigned bytes. | byte[], ByteBuffer, ByteBuf 
| bytes | []byte | void * | byte[], `ReadOnlySequence<byte>` |
+| `STRING` | An Unicode character sequence. | string | str | string| 
std::string | string |
+| `TIMESTAMP` (`DATE`, `TIME`) |  A logic type represents a specific instant 
in time with millisecond precision. <br />It stores the number of milliseconds 
since `January 1, 1970, 00:00:00 GMT` as an `INT64` value. |  
java.sql.Timestamp (java.sql.Time, java.util.Date) | N/A | N/A | N/A | 
DateTime,TimeSpan |
+| `INSTANT`| A single instantaneous point on the timeline with nanoseconds 
precision. | java.time.Instant | N/A | N/A | N/A | N/A |
+| `LOCAL_DATE` | An immutable date-time object that represents a date, often 
viewed as year-month-day. | java.time.LocalDate | N/A | N/A | N/A | N/A |
+| `LOCAL_TIME` | An immutable date-time object that represents a time, often 
viewed as hour-minute-second. Time is represented to nanosecond precision. | 
java.time.LocalDateTime | N/A | N/A  | N/A | N/A |
+| LOCAL_DATE_TIME | An immutable date-time object that represents a date-time, 
often viewed as year-month-day-hour-minute-second. | java.time.LocalTime | N/A 
| N/A | N/A | N/A |
+
+:::note
 
-For primitive types, Pulsar does not store any schema data in `SchemaInfo`. 
The `type` in `SchemaInfo` determines how to serialize and deserialize the 
data. 
+Pulsar does not store any schema data in `SchemaInfo` for primitive types. 
Some of the primitive schema implementations can use the `properties` parameter 
to store implementation-specific tunable settings. For example, a string schema 
can use `properties` to store the encoding charset to serialize and deserialize 
strings.
 
-Some of the primitive schema implementations can use `properties` to store 
implementation-specific tunable settings. For example, a `string` schema can 
use `properties` to store the encoding charset to serialize and deserialize 
strings.
+:::
 
-For more instructions, see [Construct a string 
schema](schema-get-started.md#construct-a-string-schema).
+For more instructions and examples, see [Construct a string 
schema](schema-get-started.md#string).
 
 
 ### Complex type
 
-Currently, Pulsar supports the following complex types:
+The following table outlines the complex types that Pulsar schema supports:
 
 | Complex Type | Description |
 |---|---|
-| `KeyValue` | Represents a complex type of a key/value pair. |
-| `Struct` | Handles structured data. It supports `AvroBaseStructSchema` and 
`ProtobufNativeSchema`. |
+| `Keyvalue` | Represents a complex key/value pair. |
+| `Struct` | Represents structured data, including `AvroBaseStructSchema`, 
`ProtobufNativeSchema` and `Schema.NATIVE_AVRO`. |
 
 #### `KeyValue` schema
 
-`KeyValue` schema helps applications define schemas for both key and value. 
Pulsar stores the `SchemaInfo` of key schema and the value schema together.
+`KeyValue` schema helps applications define schemas for both key and value. 
Pulsar stores the `SchemaInfo` of the key schema and the value schema together.
 
-You can choose the encoding type when constructing the key/value schema.：
+Pulsar provides the following methods to encode a **single** key/value pair in 
a message：
 * `INLINE` - Key/value pairs are encoded together in the message payload.
-* `SEPARATED` - see [Construct a key/value 
schema](schema-get-started.md#construct-a-keyvalue-schema).
+* `SEPARATED` - The Key is stored as a message key, while the value is stored 
as the message payload. See [Construct a key/value 
schema](schema-get-started.md#keyvalue) for more details.
 
 #### `Struct` schema
 
-`Struct` schema supports `AvroBaseStructSchema` and `ProtobufNativeSchema`.
+The following table outlines the `struct` types that Pulsar schema supports:
 
 |Type|Description|
 ---|---|
-`AvroBaseStructSchema`|Pulsar uses [Avro 
Specification](http://avro.apache.org/docs/current/spec.html) to declare the 
schema definition for `AvroBaseStructSchema`, which supports  `AvroSchema`, 
`JsonSchema`, and `ProtobufSchema`. <br /><br />This allows Pulsar:<br />- to 
use the same tools to manage schema definitions<br />- to use different 
serialization or deserialization methods to handle data|
-`ProtobufNativeSchema`|`ProtobufNativeSchema` is based on protobuf native 
Descriptor. <br /><br />This allows Pulsar:<br />- to use native protobuf-v3 to 
serialize or deserialize data<br />- to use `AutoConsume` to deserialize data.
+`AvroBaseStructSchema`|Pulsar uses [Avro 
Specification](http://avro.apache.org/docs/current/spec.html) to declare the 
schema definition for `AvroBaseStructSchema`, which supports `AvroSchema`, 
`JsonSchema`, and `ProtobufSchema`. <br /><br />This allows Pulsar to:<br />- 
use the same tools to manage schema definitions.<br />- use different 
serialization or deserialization methods to handle data. |
+`ProtobufNativeSchema`|`ProtobufNativeSchema` is based on protobuf native 
Descriptor. <br /><br />This allows Pulsar to:<br />- use native protobuf-v3 to 
serialize or deserialize data<br />- use `AutoConsume` to deserialize data.|
+`Schema.NATIVE_AVRO` | `Schema.NATIVE_AVRO` is used to wrap a native Avro 
schema type `org.apache.avro.Schema`. The result is a schema instance that 
accepts a serialized Avro payload without validating it against the wrapped 
Avro schema. <br /><br />When you migrate or ingest event or messaging data 
from external systems (such as Kafka and Cassandra), the data is often already 
serialized in Avro format. The applications producing the data typically have 
validated the data against their schemas (including compatibility checks) and 
stored them in a database or a dedicated service (such as schema registry). The 
schema of each serialized data record is usually retrievable by some metadata 
attached to that record. In such cases, a Pulsar producer doesn't need to 
repeat the schema validation when sending the ingested events to a topic. All 
it needs to do is pass each message or event with its schema to Pulsar. |

Review Comment:
   Makes sense. I've added these links.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] momo-jun commented on a diff in pull request #18434: [refactor][doc] Improve schema docs

Reply via email to