[GitHub] [pulsar] sijie commented on issue #4789: [Doc] Add Schema Chapter
sijie commented on issue #4789: [Doc] Add Schema Chapter URL: https://github.com/apache/pulsar/issues/4789#issuecomment-514505792 ^ @codelipenghui @congbobo184 does the proposed schema documentation structure looks good to you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[pulsar] branch master updated: Add a few recent presentations to the resources page (#4783)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/master by this push: new ecd7357 Add a few recent presentations to the resources page (#4783) ecd7357 is described below commit ecd73575d2631a7dc3deab4b85bac3bae4655cf6 Author: Sijie Guo AuthorDate: Wed Jul 24 14:54:11 2019 +0800 Add a few recent presentations to the resources page (#4783) *Motivation* Add a few recent presentations to the resources page. They cover different topics: - 2.4.0 release - use case - serverless - spark + pulsar - flink + pulsar --- site2/website/data/resources.js | 40 1 file changed, 40 insertions(+) diff --git a/site2/website/data/resources.js b/site2/website/data/resources.js index 004d834..9ca95c8 100644 --- a/site2/website/data/resources.js +++ b/site2/website/data/resources.js @@ -196,6 +196,46 @@ module.exports = { ], presentations: [ { + forum: '', + forum_link: '', + presenter: 'Sijie Guo', + date: 'June 2019', + title: "What's new in apache pulsar 2.4.0", + link: 'https://www.slideshare.net/streamnative/whats-new-in-apache-pulsar-240' +}, +{ + forum: 'Apache Pulsar Meetup | Shenzhen', + forum_link: 'https://www.huodongxing.com/event/9495713659500', + presenter: 'Yijie Shen', + date: 'June 2019', + title: 'A Unified Platform for Real-time Storage and Processing - Apache Pulsar as Stream Storage, Apache Spark for Processing as an example', + link: 'https://www.slideshare.net/streamnative/a-unified-platform-for-realtime-storage-and-processing' +}, +{ + forum: 'Ray Forward Beijing Meetup', + forum_link: 'https://tech.antfin.com/community/activities/698', + presenter: 'Sijie Guo', + date: 'June 2019', + title: 'Serverless Event Streaming with Pulsar Functions', + link: 'https://www.slideshare.net/streamnative/serverless-event-streaming-with-pulsar-functions' +}, +{ + forum: 'Flink Forward San Francisco 2019', + forum_link: 'https://sf-2019.flink-forward.org/', + presenter: 'Sijie Guo', + date: 'April 2019', + title: 'Elastic Data Processing with Apache Flink and Apache Pulsar', + link: 'https://www.slideshare.net/streamnative/elastic-data-processing-with-apache-flink-and-apache-pulsar' +}, +{ + forum: 'Strata Data San Francisco 2019', + forum_link: '', + presenter: 'Penghui Li, Sijie Guo', + date: 'March 2019', + title: 'How Zhaopin built its Event Center using Apache Pulsar', + link: 'https://www.slideshare.net/streamnative/how-zhaopin-built-its-event-center-using-apache-pulsar-152691364' +}, +{ forum: 'Strata San Jose', forum_link: 'https://conferences.oreilly.com/strata/strata-ca', presenter: 'Matteo Merli',
[GitHub] [pulsar] sijie merged pull request #4783: Add a few recent presentations to the resources page
sijie merged pull request #4783: Add a few recent presentations to the resources page URL: https://github.com/apache/pulsar/pull/4783 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] tuteng commented on a change in pull request #4786: Add *Understand Schema* Section
tuteng commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306643869 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32,
[GitHub] [pulsar] tuteng commented on a change in pull request #4786: Add *Understand Schema* Section
tuteng commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306643532 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** Review comment: I have already tested INLINE and SEPARATED, and it has no problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306618563 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** Review comment: @tuteng could you please help verify line 130 - line 188? many thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306618563 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** Review comment: @tuteng could you please help verify line 130 - line 188? many thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] Anonymitaet opened a new issue #4789: [Doc] Add Schema Chapter
Anonymitaet opened a new issue #4789: [Doc] Add Schema Chapter URL: https://github.com/apache/pulsar/issues/4789 I plan to add Schema Chapter and several sections under it, the structure looks like this: ![schema structure](https://user-images.githubusercontent.com/50226895/61762825-d437b180-ae05-11e9-9213-395d90afacd6.png) Any thoughts? @sijie @tuteng Welcome any comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] wolfstudy commented on issue #4720: Compilation errors in `pulsar-client-cpp`
wolfstudy commented on issue #4720: Compilation errors in `pulsar-client-cpp` URL: https://github.com/apache/pulsar/issues/4720#issuecomment-514459798 Yes, as you said, did not mention the version of `snappy` in [https://pulsar.apache.org/docs/en/develop-cpp/#system-requirements](https://pulsar.apache.org/docs/en/develop-cpp/#system-requirements). You are using `arch` OS, the default is 64-bit. If the package name in the arch package does not contain lib32, it is 64-bit. In your error message, all types are wrong and not declared. So, I suggest you check the version of snappy lib. Or you can try to change the version of snappy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on issue #4779: Add basic authentication capabilities to Pulsar SQL
sijie commented on issue #4779: Add basic authentication capabilities to Pulsar SQL URL: https://github.com/apache/pulsar/pull/4779#issuecomment-514453307 run cpp tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section
sijie commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306602777 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,319 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. Review comment: That's not correct. I would suggest removing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section
sijie commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306603212 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,319 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | Review comment: Pulsar always support Go since 2.4.0. @wolfstudy can you help adding the corresponding types for Go? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section
sijie commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306604369 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32,
[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section
sijie commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306603479 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** Review comment: why not ` KeyValue`? If you are using headline, the framework can anchors so that people can use anchors to quickly locate the content. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section
sijie commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306603660 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** Review comment: Have you verified the final rendered result? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section
sijie commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306604326 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32,
[GitHub] [pulsar] sijie commented on issue #4783: Add a few recent presentations to the resources page
sijie commented on issue #4783: Add a few recent presentations to the resources page URL: https://github.com/apache/pulsar/pull/4783#issuecomment-514450782 run cpp tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie commented on issue #4781: Added Pulsar in Action book to resources
sijie commented on issue #4781: Added Pulsar in Action book to resources URL: https://github.com/apache/pulsar/pull/4781#issuecomment-514450730 run java8 tests run cpp tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[pulsar] branch master updated: [Doc] Add Schema Chapter and Get Started Section (#4759)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/master by this push: new 84a7257 [Doc] Add Schema Chapter and Get Started Section (#4759) 84a7257 is described below commit 84a72578196ff9bfbfd3f7ea876dcd45756dc6b2 Author: Anonymitaet <50226895+anonymit...@users.noreply.github.com> AuthorDate: Wed Jul 24 10:09:25 2019 +0800 [Doc] Add Schema Chapter and Get Started Section (#4759) Add an independent Chapter for Pulsar Schema. This is the first section—Get started. --- site2/docs/schema-get-started.md | 62 site2/website/sidebars.json | 3 ++ 2 files changed, 65 insertions(+) diff --git a/site2/docs/schema-get-started.md b/site2/docs/schema-get-started.md new file mode 100644 index 000..9c001b4 --- /dev/null +++ b/site2/docs/schema-get-started.md @@ -0,0 +1,62 @@ +--- +id: schema-get-started +title: Get started +sidebar_label: Get started +--- + +When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations: + +* The field does not exist + +* The field type has changed (for example, `string` is changed to `int`) + +There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema. + +Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. + +**Example** + +You can use the _User_ class to define the messages sent to Pulsar topics. + +``` +public class User { +String name; +int age; +} +``` + +When constructing a producer with the _User_ class, you can specify a schema or not as below. + +## Without schema + +If you construct a producer without specifying a schema, then the producer can only produce messages of type `byte[]`. If you have a POJO class, you need to serialize the POJO into bytes before sending messages. + +**Example** + +``` +Producer producer = client.newProducer() +.topic(topic) +.create(); +User user = new User(“Tom”, 28); +byte[] message = … // serialize the `user` by yourself; +producer.send(message); +``` +## With schema + +If you construct a producer with specifying a schema, then you can send a class to a topic directly without worrying about how to serialize POJOs into bytes. + +**Example** + +This example constructs a producer with the _JSONSchema_, and you can send the _User_ class to topics directly without worrying about how to serialize it into bytes. + +``` +Producer producer = client.newProducer(JSONSchema.of(User.class)) +.topic(topic) +.create(); +User user = new User(“Tom”, 28); +producer.send(User); +``` + +## Summary + +When constructing a producer with a schema, you do not need to serialize messages into bytes, instead Pulsar schema does this job in the background. diff --git a/site2/website/sidebars.json b/site2/website/sidebars.json index 37ad5ec..cc52ceb 100644 --- a/site2/website/sidebars.json +++ b/site2/website/sidebars.json @@ -18,6 +18,9 @@ "concepts-tiered-storage", "concepts-schema-registry" ], +"Pulsar Schema": [ + "schema-get-started" +], "Pulsar Functions": [ "functions-overview", "functions-quickstart",
[GitHub] [pulsar] sijie merged pull request #4759: [Doc] Add Schema Chapter and Get Started Section
sijie merged pull request #4759: [Doc] Add Schema Chapter and Get Started Section URL: https://github.com/apache/pulsar/pull/4759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie opened a new issue #4788: Improve schema implementation to deal with ByteBuf
sijie opened a new issue #4788: Improve schema implementation to deal with ByteBuf URL: https://github.com/apache/pulsar/issues/4788 **Is your feature request related to a problem? Please describe.** Currently all the schema implementations deal with `byte[]` only. It introduces a lot of object allocations and produces garbages, which is not friendly to Java language. We should improve the schema interface to handle `ByteBuf` or `ByteBuffer`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie merged pull request #4728: Pulsar SQL supports pulsar's primitive schema
sijie merged pull request #4728: Pulsar SQL supports pulsar's primitive schema URL: https://github.com/apache/pulsar/pull/4728 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[pulsar] branch master updated: Pulsar SQL supports pulsar's primitive schema (#4728)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/master by this push: new 1ab35b0 Pulsar SQL supports pulsar's primitive schema (#4728) 1ab35b0 is described below commit 1ab35b01bcf931eb3b65a5660eca62a9ac4ceb1d Author: congbo <39078850+congbobo...@users.noreply.github.com> AuthorDate: Wed Jul 24 09:53:07 2019 +0800 Pulsar SQL supports pulsar's primitive schema (#4728) ### Motivation Continue the PR of #4151 --- .../apache/pulsar/common/schema/SchemaType.java| 42 ++ .../apache/pulsar/sql/presto/PulsarMetadata.java | 97 +++- .../sql/presto/PulsarPrimitiveSchemaHandler.java | 65 + .../pulsar/sql/presto/PulsarRecordCursor.java | 23 +-- .../pulsar/sql/presto/PulsarSchemaHandlers.java| 53 +++ .../org/apache/pulsar/sql/presto/PulsarSplit.java | 17 ++- .../pulsar/sql/presto/PulsarSplitManager.java | 3 +- .../pulsar/sql/presto/TestPulsarConnector.java | 2 +- .../pulsar/sql/presto/TestPulsarMetadata.java | 4 + .../presto/TestPulsarPrimitiveSchemaHandler.java | 162 + 10 files changed, 442 insertions(+), 26 deletions(-) diff --git a/pulsar-client-api/src/main/java/org/apache/pulsar/common/schema/SchemaType.java b/pulsar-client-api/src/main/java/org/apache/pulsar/common/schema/SchemaType.java index 138b3ec..67cb090 100644 --- a/pulsar-client-api/src/main/java/org/apache/pulsar/common/schema/SchemaType.java +++ b/pulsar-client-api/src/main/java/org/apache/pulsar/common/schema/SchemaType.java @@ -175,4 +175,46 @@ public enum SchemaType { default: return NONE; } } + + +public boolean isPrimitive() { +return isPrimitiveType(this); +} + +public boolean isStruct() { +return isStructType(this); +} + +public static boolean isPrimitiveType(SchemaType type) { +switch (type) { +case STRING: +case BOOLEAN: +case INT8: +case INT16: +case INT32: +case INT64: +case FLOAT: +case DOUBLE: +case DATE: +case TIME: +case TIMESTAMP: +case BYTES: +case NONE: +return true; +default: +return false; +} + +} + +public static boolean isStructType(SchemaType type) { +switch (type) { +case AVRO: +case JSON: +case PROTOBUF: +return true; +default: +return false; +} +} } diff --git a/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java b/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java index 3647c1b..1ee0177 100644 --- a/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java +++ b/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java @@ -34,9 +34,14 @@ import com.facebook.presto.spi.TableNotFoundException; import com.facebook.presto.spi.connector.ConnectorMetadata; import com.facebook.presto.spi.type.BigintType; import com.facebook.presto.spi.type.BooleanType; +import com.facebook.presto.spi.type.DateType; import com.facebook.presto.spi.type.DoubleType; import com.facebook.presto.spi.type.IntegerType; import com.facebook.presto.spi.type.RealType; +import com.facebook.presto.spi.type.SmallintType; +import com.facebook.presto.spi.type.TimeType; +import com.facebook.presto.spi.type.TimestampType; +import com.facebook.presto.spi.type.TinyintType; import com.facebook.presto.spi.type.Type; import com.facebook.presto.spi.type.VarbinaryType; import com.facebook.presto.spi.type.VarcharType; @@ -55,6 +60,7 @@ import org.apache.pulsar.client.admin.PulsarAdminException; import org.apache.pulsar.client.api.PulsarClientException; import org.apache.pulsar.common.naming.TopicName; import org.apache.pulsar.common.schema.SchemaInfo; +import org.apache.pulsar.common.schema.SchemaType; import javax.inject.Inject; import java.util.HashMap; @@ -296,6 +302,56 @@ public class PulsarMetadata implements ConnectorMetadata { + String.format("%s/%s", namespace, schemaTableName.getTableName()) + ": " + ExceptionUtils.getRootCause(e).getLocalizedMessage(), e); } +List handles = getPulsarColumns( +topicName, schemaInfo, withInternalColumns +); + + +return new ConnectorTableMetadata(schemaTableName, handles); +} + +/** + * Convert pulsar schema into presto table metadata. + */ +static List getPulsarColumns(TopicName topicName, + SchemaInfo schemaInfo, +
[GitHub] [pulsar] sijie merged pull request #4745: Fix:PulsarKafkaProducer is not thread safe
sijie merged pull request #4745: Fix:PulsarKafkaProducer is not thread safe URL: https://github.com/apache/pulsar/pull/4745 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie closed issue #4707: PulsarKafkaProducer is not thread safe
sijie closed issue #4707: PulsarKafkaProducer is not thread safe URL: https://github.com/apache/pulsar/issues/4707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[pulsar] branch master updated: Fix:PulsarKafkaProducer is not thread safe (#4745)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/master by this push: new 0362944 Fix:PulsarKafkaProducer is not thread safe (#4745) 0362944 is described below commit 0362944f6f7d610dd5b54d3724958e0bb9cb7abc Author: Xiaobing Fang AuthorDate: Wed Jul 24 09:51:38 2019 +0800 Fix:PulsarKafkaProducer is not thread safe (#4745) fix #4707 --- .../clients/producer/PulsarKafkaProducer.java | 4 +- .../kafka/PulsarKafkaProducerThreadSafeTest.java | 61 ++ 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/pulsar-client-kafka-compat/pulsar-client-kafka/src/main/java/org/apache/kafka/clients/producer/PulsarKafkaProducer.java b/pulsar-client-kafka-compat/pulsar-client-kafka/src/main/java/org/apache/kafka/clients/producer/PulsarKafkaProducer.java index 1c5758f..14dd78b 100644 --- a/pulsar-client-kafka-compat/pulsar-client-kafka/src/main/java/org/apache/kafka/clients/producer/PulsarKafkaProducer.java +++ b/pulsar-client-kafka-compat/pulsar-client-kafka/src/main/java/org/apache/kafka/clients/producer/PulsarKafkaProducer.java @@ -283,7 +283,9 @@ public class PulsarKafkaProducer implements Producer { private org.apache.pulsar.client.api.Producer createNewProducer(String topic) { try { // Add the partitions info for the new topic -cluster = cluster.withPartitions(readPartitionsInfo(topic)); +synchronized (this){ +cluster = cluster.withPartitions(readPartitionsInfo(topic)); +} List wrappedInterceptors = interceptors.stream() .map(interceptor -> new KafkaProducerInterceptorWrapper(interceptor, keySchema, valueSchema, topic)) .collect(Collectors.toList()); diff --git a/tests/pulsar-kafka-compat-client-test/src/test/java/org/apache/pulsar/tests/integration/compat/kafka/PulsarKafkaProducerThreadSafeTest.java b/tests/pulsar-kafka-compat-client-test/src/test/java/org/apache/pulsar/tests/integration/compat/kafka/PulsarKafkaProducerThreadSafeTest.java new file mode 100644 index 000..1246b1d --- /dev/null +++ b/tests/pulsar-kafka-compat-client-test/src/test/java/org/apache/pulsar/tests/integration/compat/kafka/PulsarKafkaProducerThreadSafeTest.java @@ -0,0 +1,61 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pulsar.tests.integration.compat.kafka; + +import org.apache.kafka.clients.producer.KafkaProducer; +import org.apache.kafka.clients.producer.Producer; +import org.apache.kafka.clients.producer.ProducerRecord; +import org.apache.kafka.clients.producer.PulsarKafkaProducer; +import org.apache.kafka.common.serialization.IntegerSerializer; +import org.apache.kafka.common.serialization.StringSerializer; +import org.apache.pulsar.tests.integration.suites.PulsarStandaloneTestSuite; +import org.testng.annotations.BeforeTest; +import org.testng.annotations.Test; +import java.util.Properties; +/** + * A test that tests if {@link PulsarKafkaProducer} is thread safe. + */ +public class PulsarKafkaProducerThreadSafeTest extends PulsarStandaloneTestSuite { +private Producer producer; + +private static String getPlainTextServiceUrl() { +return container.getPlainTextServiceUrl(); +} + +@BeforeTest +private void setup() { +Properties producerProperties = new Properties(); +producerProperties.put("bootstrap.servers", getPlainTextServiceUrl()); +producerProperties.put("key.serializer", IntegerSerializer.class.getName()); +producerProperties.put("value.serializer", StringSerializer.class.getName()); +producer = new KafkaProducer<>(producerProperties); +} + +/** + * This test run 10 times in threadPool witch size is 5. + * Different threads have same producer and different topics witch is based on thread time. + * This test will be failed when producer failed to send if PulsarKafkaProducer is not thread safe. + */ +@Test(threadPoolSize = 5, invocationCount = 10) +public void testPulsar
[pulsar] branch master updated: [Transaction][Buffer]Add new marker to show which message belongs to transaction (#4776)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/master by this push: new e8a95e5 [Transaction][Buffer]Add new marker to show which message belongs to transaction (#4776) e8a95e5 is described below commit e8a95e5ba57107c1e08a2610b7c55361a09f812b Author: Yong Zhang AuthorDate: Wed Jul 24 09:50:27 2019 +0800 [Transaction][Buffer]Add new marker to show which message belongs to transaction (#4776) * [Transaction][Buffer]Add new marker to show which message belongs to transaction --- *Motivation* Add new message type in the transaction including data and commit and abort maker in the transaction log. *Modifications* Add two new types of transaction messages. TXN_COMMIT is the commit marker of the transaction. TXN_ABORT is the abort marker of the transaction. --- .../apache/pulsar/common/api/proto/PulsarApi.java | 114 + .../pulsar/common/api/proto/PulsarMarkers.java | 6 ++ .../org/apache/pulsar/common/protocol/Markers.java | 45 pulsar-common/src/main/proto/PulsarApi.proto | 4 + pulsar-common/src/main/proto/PulsarMarkers.proto | 2 + .../apache/pulsar/common/protocol/MarkersTest.java | 33 ++ 6 files changed, 204 insertions(+) diff --git a/pulsar-common/src/main/java/org/apache/pulsar/common/api/proto/PulsarApi.java b/pulsar-common/src/main/java/org/apache/pulsar/common/api/proto/PulsarApi.java index 4c156e5..035f754 100644 --- a/pulsar-common/src/main/java/org/apache/pulsar/common/api/proto/PulsarApi.java +++ b/pulsar-common/src/main/java/org/apache/pulsar/common/api/proto/PulsarApi.java @@ -3103,6 +3103,14 @@ public final class PulsarApi { // optional int32 marker_type = 20; boolean hasMarkerType(); int getMarkerType(); + +// optional uint64 txnid_least_bits = 22 [default = 0]; +boolean hasTxnidLeastBits(); +long getTxnidLeastBits(); + +// optional uint64 txnid_most_bits = 23 [default = 0]; +boolean hasTxnidMostBits(); +long getTxnidMostBits(); } public static final class MessageMetadata extends org.apache.pulsar.shaded.com.google.protobuf.v241.GeneratedMessageLite @@ -3443,6 +3451,26 @@ public final class PulsarApi { return markerType_; } +// optional uint64 txnid_least_bits = 22 [default = 0]; +public static final int TXNID_LEAST_BITS_FIELD_NUMBER = 22; +private long txnidLeastBits_; +public boolean hasTxnidLeastBits() { + return ((bitField0_ & 0x0001) == 0x0001); +} +public long getTxnidLeastBits() { + return txnidLeastBits_; +} + +// optional uint64 txnid_most_bits = 23 [default = 0]; +public static final int TXNID_MOST_BITS_FIELD_NUMBER = 23; +private long txnidMostBits_; +public boolean hasTxnidMostBits() { + return ((bitField0_ & 0x0002) == 0x0002); +} +public long getTxnidMostBits() { + return txnidMostBits_; +} + private void initFields() { producerName_ = ""; sequenceId_ = 0L; @@ -3463,6 +3491,8 @@ public final class PulsarApi { orderingKey_ = org.apache.pulsar.shaded.com.google.protobuf.v241.ByteString.EMPTY; deliverAtTime_ = 0L; markerType_ = 0; + txnidLeastBits_ = 0L; + txnidMostBits_ = 0L; } private byte memoizedIsInitialized = -1; public final boolean isInitialized() { @@ -3562,6 +3592,12 @@ public final class PulsarApi { if (((bitField0_ & 0x8000) == 0x8000)) { output.writeInt32(20, markerType_); } + if (((bitField0_ & 0x0001) == 0x0001)) { +output.writeUInt64(22, txnidLeastBits_); + } + if (((bitField0_ & 0x0002) == 0x0002)) { +output.writeUInt64(23, txnidMostBits_); + } } private int memoizedSerializedSize = -1; @@ -3651,6 +3687,14 @@ public final class PulsarApi { size += org.apache.pulsar.shaded.com.google.protobuf.v241.CodedOutputStream .computeInt32Size(20, markerType_); } + if (((bitField0_ & 0x0001) == 0x0001)) { +size += org.apache.pulsar.shaded.com.google.protobuf.v241.CodedOutputStream + .computeUInt64Size(22, txnidLeastBits_); + } + if (((bitField0_ & 0x0002) == 0x0002)) { +size += org.apache.pulsar.shaded.com.google.protobuf.v241.CodedOutputStream + .computeUInt64Size(23, txnidMostBits_); + } memoizedSerializedSize = size; return size; } @@ -3802,6 +3846,10 @@ public final class PulsarApi { bitField0_ = (bitField0_ & ~0x0002); markerType_ = 0; bitField0_ = (bitField0_ & ~0x0004); +txnidLeastBits_ = 0L; +bitField0_ = (bitField0_ & ~0x0008); +txnidMostBits_ = 0L; +
[GitHub] [pulsar] sijie merged pull request #4776: [Transaction][Buffer]Add new marker to show which message belongs to transaction
sijie merged pull request #4776: [Transaction][Buffer]Add new marker to show which message belongs to transaction URL: https://github.com/apache/pulsar/pull/4776 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] evanhlavaty opened a new issue #4787: Sink ignores inputSpecs in configuration yaml on create
evanhlavaty opened a new issue #4787: Sink ignores inputSpecs in configuration yaml on create URL: https://github.com/apache/pulsar/issues/4787 **Describe the bug** When creating a sink with an accompanying configuration yaml, I want to increase the receiverQueueSize so I have the following set in my configuration yaml ``` inputSpecs: sample_topic_name: receiverQueueSize: 100 ``` however, my inputSpecs configuration settings are ignored after creation of the sink and receiverQueueSize is set to the default of 1000 **To Reproduce** - **Create sink with pulsar admin cli** [solr_kerberos_io-2.3.2.yml](https://github.com/apache/pulsar/files/3423473/sink_config.txt) ``` pulsar-admin --admin-url https://sample-admin:6751 sink create \ --sink-config-file /data/connectors/solr_kerberos_io-2.3.2.yml ``` `"Created Successfully"` - **Confirm sink settings** ``` pulsar-admin --admin-url https://sample-admin:6751 sink get \ --tenant public \ --namespace default \ --name solr_kerberos { "tenant": "public", "namespace": "default", "name": "solr_kerberos", "className": "org.apache.pulsar.io.solrcloud.SolrCloudSink", "inputSpecs": { "sample_topic_name": { "isRegexPattern": false } }, "configs": { "solrUrl": "http://sample-solr:8983/solr";, "loginConfigurationLocation": "/data/connectors/solr_kerberos_io_jaas.conf", "collectionName": "sample_collection", "documentQueueSize": "5000.0", "concurrentThreadCount": "25.0" }, "parallelism": 3, "processingGuarantees": "ATLEAST_ONCE", "retainOrdering": false, "autoAck": true, "archive": "builtin://solr_kerberos" } ``` **Expected behavior** I expect the receiverQueueSize to be set to what I define it to be in the configuration yaml and not be overridden and/or ignored. **Desktop (please complete the following information):** NAME="Oracle Linux Server" VERSION="7.6" ID="ol" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.6" PRETTY_NAME="Oracle Linux Server 7.6" **Additional context** Pulsar Version 2.3.2. Sink is running on a cluster of function workers/brokers/proxies. When creating a function using the same inputSpecs yaml entry the receiverQueueSize is set correctly. This issue only seems to be related to sinks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] jerrypeng commented on issue #4779: Add basic authentication capabilities to Pulsar SQL
jerrypeng commented on issue #4779: Add basic authentication capabilities to Pulsar SQL URL: https://github.com/apache/pulsar/pull/4779#issuecomment-514296899 rerun java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] mediocregopher commented on issue #4720: Compilation errors in `pulsar-client-cpp`
mediocregopher commented on issue #4720: Compilation errors in `pulsar-client-cpp` URL: https://github.com/apache/pulsar/issues/4720#issuecomment-514260035 Hi @wolfstudy , my snappy version is `1.1.7-1`. It doesn't mention anything about snappy being a requirement in the README so I'm not sure what version it ought to be on. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306351569 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306350907 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306350907 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306350084 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306350084 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306349822 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306349543 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306349084 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306349084 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306349084 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306348854 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,321 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | +|---|---|---| +| BOOLEAN | boolean | bool | +| INT8 | byte | | +| INT16 | short | | +| INT32 | int | | +| INT64 | long | | +| FLOAT | float | float | +| DOUBLE | double | float | +| BYTES | byte[], ByteBuffer, ByteBuf | bytes | +| STRING | string | str | +| TIMESTAMP | java.sql.Timestamp | | +| TIME | java.sql.Time | | +| DATE | java.util.Date | | + +**Example** + +This example demonstrates how to use a string schema. + +1. Create a producer with a string schema and send messages. + +```text +Producer producer = client.newProducer(Schema.STRING).create(); +producer.newMessage().value("Hello Pulsar!").send(); +``` + +2. Create a consumer with a string schema and receive messages. + +```text +Consumer consumer = client.newConsumer(Schema.STRING).create(); +consumer.receive(); +``` + +### Complex type + +Currently, Pulsar supports the following complex types: + +| Complex Type | Description | +|---|---| +| `keyvalue` | Represents a complex type of a key/value pair. | +| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. | + +* **Complex type 1: `keyvalue`** + +`keyvalue` schema helps applications define schemas for both key and value. + +For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together. + +Pulsar provides two methods to encode a key/value pair in messages: + +* **`INLINE`** mode: a key/value pair will be encoded together in the message payload. + +* **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. + +Users can choose the encoding type when constructing the key/value schema. + +**Example** + +This example shows how to construct a key/value schema and then use it to produce and consume messages. + +1. Construct a key/value schema with `INLINE` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.INT32, +Schema.STRING, +KeyValueEncodingType.INLINE +); +``` + +2. Optionally, construct a key/value schema with `SEPARATED` encoding type. + +```text +Schema> kvSchema = Schema.KeyValue( +Schema.I
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306346703 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,319 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. + +### Primitive type + +Currently, Pulsar supports the following primitive types: + +| Primitive Type | Description | +|---|---| +| `BOOLEAN` | A binary value | +| `INT8` | A 8-bit signed integer | +| `INT16` | A 16-bit signed integer | +| `INT32` | A 32-bit signed integer | +| `INT64` | A 64-bit signed integer | +| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number | +| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number | +| `BYTES` | A sequence of 8-bit unsigned bytes | +| `STRING` | A Unicode character sequence | +| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | + +For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. + +Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings. + +The conversions between **Pulsar schema types** and **language-specific primitive types** are as below. + +| Schema Type | Java Type| Python Type | Review comment: @sijie Does Pulsar support Go here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section
Anonymitaet commented on a change in pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786#discussion_r306345623 ## File path: site2/docs/schema-understand.md ## @@ -0,0 +1,319 @@ +--- +id: schema-understand +title: Understand schema +sidebar_label: Understand schema +--- + +## `SchemaInfo` + +Pulsar schema is defined in a data structure called `SchemaInfo`. + +The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level. + +A `SchemaInfo` consists of the following fields: + +| Field | Description | +|---|---| +| `name` | Schema name (a string). | +| `type` | Schema type, which determines how to interpret the schema data. | +| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. | +| `properties` | A map of string key/value pairs, which is application-specific. | + +**Example** + +This is the `SchemaInfo` of a string. + +```text +{ +“name”: “test-string-schema”, +“type”: “STRING”, +“schema”: “”, +“properties”: {} +} +``` + +## Schema type + +Pulsar supports various schema types, which are mainly divided into two categories: + +* Primitive type + +* Complex type + +> Note +> +> If you create a schema without specifying a type, producers and consumers can only handle raw bytes. Review comment: @sijie I add this Note based on my understanding, is it correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] Anonymitaet opened a new pull request #4786: Add *Understand Schema* Section
Anonymitaet opened a new pull request #4786: Add *Understand Schema* Section URL: https://github.com/apache/pulsar/pull/4786 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] fxbing commented on issue #4745: Fix:PulsarKafkaProducer is not thread safe
fxbing commented on issue #4745: Fix:PulsarKafkaProducer is not thread safe URL: https://github.com/apache/pulsar/pull/4745#issuecomment-514227515 run java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] epsteina16 commented on a change in pull request #4768: Expose acknowledgment flushing capabilities at the consumer level
epsteina16 commented on a change in pull request #4768: Expose acknowledgment flushing capabilities at the consumer level URL: https://github.com/apache/pulsar/pull/4768#discussion_r306321704 ## File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Consumer.java ## @@ -255,6 +255,13 @@ * @return a future that can be used to track the completion of the operation */ CompletableFuture acknowledgeCumulativeAsync(MessageId messageId); + +/** + * Flush all batched acknowledgements and wait until all acknowledgments have been persisted. + * + * Flush acks returns immediately if batching of acks is disabled. + */ +void flushAcknowledgements(); Review comment: It should be a blocking call, in my opinion, since it requires a channel flush on the client ctx which is a blocking call. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] zymap commented on issue #4776: [Transaction][Buffer]Add new marker to show which message belongs to transaction
zymap commented on issue #4776: [Transaction][Buffer]Add new marker to show which message belongs to transaction URL: https://github.com/apache/pulsar/pull/4776#issuecomment-514195683 run java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] codelipenghui opened a new issue #4785: [pulsar-sql] export partitioned topic name by show tables
codelipenghui opened a new issue #4785: [pulsar-sql] export partitioned topic name by show tables URL: https://github.com/apache/pulsar/issues/4785 **Is your feature request related to a problem? Please describe.** Currently, use pulsar sql to show tables list will return all topic partition names, integration with some UI tools can't select the partitioned topic, so suggest to add the partitioned topic to pulsar sql table list. **Describe the solution you'd like** Add the partitioned topic to pulsar sql table list while get table list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[pulsar] branch asf-site updated: Updated site at revision 2cc34af
This is an automated email from the ASF dual-hosted git repository. mmerli pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/pulsar.git The following commit(s) were added to refs/heads/asf-site by this push: new deba09a Updated site at revision 2cc34af deba09a is described below commit deba09a90df3ee1a3b9f5e781f620ce8a5259e82 Author: jenkins AuthorDate: Tue Jul 23 11:05:06 2019 + Updated site at revision 2cc34af --- content/swagger/2.5.0-SNAPSHOT/swagger.json| 36 +++--- .../swagger/2.5.0-SNAPSHOT/swaggerfunctions.json | 32 +-- 2 files changed, 34 insertions(+), 34 deletions(-) diff --git a/content/swagger/2.5.0-SNAPSHOT/swagger.json b/content/swagger/2.5.0-SNAPSHOT/swagger.json index f0259c9..2f533ac 100644 --- a/content/swagger/2.5.0-SNAPSHOT/swagger.json +++ b/content/swagger/2.5.0-SNAPSHOT/swagger.json @@ -8652,9 +8652,14 @@ "type" : "number", "format" : "double" }, -"lastUpdate" : { - "type" : "integer", - "format" : "int64" +"bandwidthIn" : { + "$ref" : "#/definitions/ResourceUsage" +}, +"bandwidthOut" : { + "$ref" : "#/definitions/ResourceUsage" +}, +"memory" : { + "$ref" : "#/definitions/ResourceUsage" }, "underLoaded" : { "type" : "boolean" @@ -8662,31 +8667,26 @@ "overLoaded" : { "type" : "boolean" }, -"cpu" : { - "$ref" : "#/definitions/ResourceUsage" +"loadReportType" : { + "type" : "string" }, "msgThroughputIn" : { "type" : "number", "format" : "double" }, -"memory" : { - "$ref" : "#/definitions/ResourceUsage" -}, "msgThroughputOut" : { "type" : "number", "format" : "double" }, -"directMemory" : { - "$ref" : "#/definitions/ResourceUsage" -}, -"bandwidthOut" : { +"cpu" : { "$ref" : "#/definitions/ResourceUsage" }, -"bandwidthIn" : { +"directMemory" : { "$ref" : "#/definitions/ResourceUsage" }, -"loadReportType" : { - "type" : "string" +"lastUpdate" : { + "type" : "integer", + "format" : "int64" } } }, @@ -9624,11 +9624,11 @@ "ResourceUnit" : { "type" : "object", "properties" : { -"availableResource" : { - "$ref" : "#/definitions/ResourceDescription" -}, "resourceId" : { "type" : "string" +}, +"availableResource" : { + "$ref" : "#/definitions/ResourceDescription" } } }, diff --git a/content/swagger/2.5.0-SNAPSHOT/swaggerfunctions.json b/content/swagger/2.5.0-SNAPSHOT/swaggerfunctions.json index 22d057d..99891e9 100644 --- a/content/swagger/2.5.0-SNAPSHOT/swaggerfunctions.json +++ b/content/swagger/2.5.0-SNAPSHOT/swaggerfunctions.json @@ -1348,9 +1348,6 @@ "Message" : { "type" : "object", "properties" : { -"replicatedFrom" : { - "type" : "string" -}, "replicated" : { "type" : "boolean" }, @@ -1365,17 +1362,6 @@ "topicName" : { "type" : "string" }, -"keyBytes" : { - "type" : "array", - "items" : { -"type" : "string", -"format" : "byte" - } -}, -"sequenceId" : { - "type" : "integer", - "format" : "int64" -}, "orderingKey" : { "type" : "array", "items" : { @@ -1397,12 +1383,26 @@ "format" : "byte" } }, -"producerName" : { - "type" : "string" +"sequenceId" : { + "type" : "integer", + "format" : "int64" }, "messageId" : { "$ref" : "#/definitions/MessageId" }, +"replicatedFrom" : { + "type" : "string" +}, +"producerName" : { + "type" : "string" +}, +"keyBytes" : { + "type" : "array", + "items" : { +"type" : "string", +"format" : "byte" + } +}, "data" : { "type" : "array", "items" : {
[GitHub] [pulsar] Anonymitaet commented on issue #4759: [Doc] Add Schema Chapter and Get Started Section
Anonymitaet commented on issue #4759: [Doc] Add Schema Chapter and Get Started Section URL: https://github.com/apache/pulsar/pull/4759#issuecomment-514140627 @sijie could u pls help merge? thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] wolfstudy commented on issue #4720: Compilation errors in `pulsar-client-cpp`
wolfstudy commented on issue #4720: Compilation errors in `pulsar-client-cpp` URL: https://github.com/apache/pulsar/issues/4720#issuecomment-514134929 @mediocregopher Can you check the version of `snappy` lib? According to the error message you provided, the version of snappy is incorrect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] zymap commented on a change in pull request #4738: [Transaction][buffer] Add basic operation of transaction
zymap commented on a change in pull request #4738: [Transaction][buffer] Add basic operation of transaction URL: https://github.com/apache/pulsar/pull/4738#discussion_r306206535 ## File path: pulsar-transaction/buffer/src/main/java/org/apache/pulsar/transaction/buffer/impl/InMemTransactionBuffer.java ## @@ -79,6 +81,41 @@ public int numEntries() { } } +@Override +public long committedAtLedgerId() { +return committedAtLedgerId; +} + +@Override +public long committedAtEntryId() { +return committedAtEntryId; +} + +@Override +public long lastSequenceId() { +return entries.lastKey(); +} + +@Override +public CompletableFuture> readEntries(int num, long startSequenceId) { +return FutureUtil.failedFuture(new UnsupportedOperationException()); Review comment: Because it related to the entry position and there is no position in memory. So I think it is no usage in this implementation, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar-client-go] wolfstudy opened a new pull request #24: [issue:23]Use golangci-lint to format the code for the master branch
wolfstudy opened a new pull request #24: [issue:23]Use golangci-lint to format the code for the master branch URL: https://github.com/apache/pulsar-client-go/pull/24 Signed-off-by: xiaolong.ran Fixes #23 ### Motivation run `golangci-lint` and format the code for the master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] sijie opened a new issue #4784: [presto] pulsar presto connector doesn't have to configure zookeeper manually
sijie opened a new issue #4784: [presto] pulsar presto connector doesn't have to configure zookeeper manually URL: https://github.com/apache/pulsar/issues/4784 **Is your feature request related to a problem? Please describe.** Currently pulsar configures presto-worker by setting `pulsar.zookeeper-uri` in `conf/presto/catalog/pulsar.properties`. It would be great to deprecate this setting. Because the internal cluster configuration can be fetched from broker restful endpoint. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] congbobo184 commented on issue #4728: Pulsar SQL supports pulsar's primitive schema
congbobo184 commented on issue #4728: Pulsar SQL supports pulsar's primitive schema URL: https://github.com/apache/pulsar/pull/4728#issuecomment-514085405 run java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] codelipenghui commented on issue #4621: [PIP-38] Support batch receive in java client.
codelipenghui commented on issue #4621: [PIP-38] Support batch receive in java client. URL: https://github.com/apache/pulsar/pull/4621#issuecomment-514083846 @merlimat Please help take a look [PIP-38](https://github.com/apache/pulsar/wiki/PIP-38%3A-Batch-Receiving-Messages) when you have time, here is the discuss thread: https://lists.apache.org/thread.html/3e2a87d31bf8a98142bd68545714cdbf5d87011b4ae3909c5c9f43b9@%3Cdev.pulsar.apache.org%3E Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] zymap commented on issue #4776: [Transaction][Buffer]Add new marker to show which message belongs to transaction
zymap commented on issue #4776: [Transaction][Buffer]Add new marker to show which message belongs to transaction URL: https://github.com/apache/pulsar/pull/4776#issuecomment-514083785 run java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar] codelipenghui commented on issue #4760: Allow to configure ack-timeout tick time
codelipenghui commented on issue #4760: Allow to configure ack-timeout tick time URL: https://github.com/apache/pulsar/pull/4760#issuecomment-514082681 @merlimat > Another option I was thinking of, was to just keep a fixed number of time buckets, say like 16 (non configurable). That will automatically tie the precision the order of magnitude of the ack timeout. I think it can be used as the default configuration, but it's better for users to configure it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [pulsar-client-go] wolfstudy opened a new issue #23: [Code format]
wolfstudy opened a new issue #23: [Code format] URL: https://github.com/apache/pulsar-client-go/issues/23 Expected behavior `golangci-lint run` success. Actual behavior ``` pkg/compression/compression_test.go:44:8: Using the variable on range scope `p` in function literal (scopelint) pkg/compression/compression_test.go:49:18: Using the variable on range scope `p` in function literal (scopelint) pkg/compression/compression_test.go:50:25: Using the variable on range scope `p` in function literal (scopelint) pulsar/internal/commands.go:61:41: `smm` can be `github.com/golang/protobuf/proto.Message` (interfacer) pulsar/internal/commands.go:71:32: `cmdSend` can be `github.com/golang/protobuf/proto.Message` (interfacer) pulsar/internal/connection.go:464:52: unnecessary conversion (unconvert) pulsar/internal/connection.go:454: G402: TLS InsecureSkipVerify may be true. (gosec) pulsar/impl_client.go:33:2: `options` is unused (structcheck) pulsar/impl_client.go:42:2: `consumerIdGenerator` is unused (structcheck) pulsar/impl_client.go:38:2: `auth` is unused (structcheck) pulsar/impl_client.go:41:2: `producerIdGenerator` is unused (structcheck) perf/perf-consumer.go:94:16: Error return value of `consumer.Ack` is not checked (errcheck) perf/pulsar-perf-go.go:48:17: Error return value of `rootCmd.Execute` is not checked (errcheck) pkg/compression/zlib.go:40:9: Error return value of `w.Write` is not checked (errcheck) pkg/compression/zlib.go:53:8: Error return value of `r.Read` is not checked (errcheck) pulsar/internal/hash.go:35:9: Error return value of `h.Write` is not checked (errcheck) pkg/auth/token.go:34:2: ifElseChain: rewrite if-else to switch statement (gocritic) pulsar/impl_client.go:57:2: ifElseChain: rewrite if-else to switch statement (gocritic) pulsar/internal/batch_builder.go:101:2: ifElseChain: rewrite if-else to switch statement (gocritic) pulsar/impl_producer.go:81:46: loopclosure: loop variable partition captured by func literal (govet) pulsar/impl_producer.go:82:23: loopclosure: loop variable partitionIdx captured by func literal (govet) pulsar/impl_client.go:65:10: nilness: impossible condition: nil != nil (govet) pulsar/producer_test.go:46:3: shadow: declaration of "err" shadows declaration at line 33 (govet) pulsar/producer_test.go:136:5: shadow: declaration of "err" shadows declaration at line 122 (govet) pulsar/producer_test.go:168:3: shadow: declaration of "err" shadows declaration at line 153 (govet) pulsar/internal/lookup_service.go:98:9: ineffectual assignment to `err` (ineffassign) pkg/auth/disabled.go:24: File is not `gofmt`-ed with `-s` (gofmt) pkg/auth/provider.go:23: File is not `gofmt`-ed with `-s` (gofmt) perf/perf-consumer.go:26: File is not `gofmt`-ed with `-s` (gofmt) pkg/auth/token.go:24: File is not `goimports`-ed (goimports) perf/perf-consumer.go:25: File is not `goimports`-ed (goimports) pkg/compression/compression_test.go:23: File is not `goimports`-ed (goimports) pkg/auth/disabled.go:42:17: method GetTlsCertificate should be GetTLSCertificate (golint) pkg/auth/tls.go:53:27: method GetTlsCertificate should be GetTLSCertificate (golint) pkg/auth/token.go:48:11: `if` block ends with a `return` statement, so drop this `else` and outdent its block (golint) pkg/auth/token.go:66:11: `if` block ends with a `return` statement, so drop this `else` and outdent its block (golint) pkg/auth/token.go:83:29: method GetTlsCertificate should be GetTLSCertificate (golint) pkg/auth/token.go:91:9: `if` block ends with a `return` statement, so drop this `else` and outdent its block (golint) perf/perf-consumer.go:84:26: should drop = 0 from declaration of var msgReceived; it is the zero value (golint) perf/perf-consumer.go:85:28: should drop = 0 from declaration of var bytesReceived; it is the zero value (golint) perf/perf-producer.go:96:39: should drop = nil from declaration of var rateLimiter; it is the zero value (golint) perf/perf-producer.go:143:4: should replace `messagesPublished += 1` with `messagesPublished++` (golint) pulsar/impl_message.go:27:6: type `messageId` should be `messageID` (golint) pulsar/impl_message.go:34:6: func newMessageId should be newMessageID (golint) pulsar/impl_message.go:44:2: var `msgId` should be `msgID` (golint) pulsar/impl_message.go:54:6: func deserializeMessageId should be deserializeMessageID (golint) pulsar/impl_message.go:55:2: var `msgId` should be `msgID` (golint) pulsar/impl_partition_producer.go:52:2: struct field `producerId` should be `producerID` (golint) pulsar/impl_partition_producer.go:54:2: struct field `sequenceIdGenerator` should be `sequenceIDGenerator` (golint) pulsar/impl_partition_producer.go:150:3: var `nextSequenceId` should be `nextSequenceID` (golint) pulsar/impl_partition_producer.go:248:2: var `sequenceId` shoul