martin-g commented on code in PR #451: URL: https://github.com/apache/avro-rs/pull/451#discussion_r2754545027
########## avro/src/documentation/primer.rs: ########## @@ -0,0 +1,105 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! # A primer on Apache Avro +//! +//! Avro is a schema based encoding system, like Protobuf. This means that if you have raw Avro data +//! without a schema, you are unable to decode it. It also means that the format is very space +//! efficient. +//! +//! ## Schemas +//! +//! Schemas are defined in JSON and look like this: +//! ```json +//! { +//! "type": "record", +//! "name": "example", +//! "fields": [ +//! {"name": "a", "type": "long", "default": 42}, +//! {"name": "b", "type": "string"} +//! ] +//! } +//! ``` +//! For all possible types and extra attributes, see [the schema section of the specification]. +//! +//! Schemas can depend on each other. For example, the schema defined above can be used again or a +//! schema can include itself: +//! ```json +//! { +//! "type": "record", +//! "name": "references", +//! "fields": [ +//! {"name": "a", "type": "example"}, +//! {"name": "b", "type": "bytes"}, +//! {"name": "recursive", "type": ["null", "references"]} +//! ] +//! } +//! ``` +//! +//! Schemas are represented using the [`Schema`](crate::Schema) type. +//! +//! [the schema section of the specification]: https://avro.apache.org/docs/++version++/specification/#schema-declaration +//! +//! ## Data serialization and deserialization +//! There are various formats to encode and decode Avro data. Most formats use the Avro binary encoding. +//! +//! #### [Object Container File](https://avro.apache.org/docs/++version++/specification/#object-container-files) +//! This is the most common file format used for Avro, it uses the binary encoding. It includes the +//! schema in the file, and can therefore be decoded by a reader who doesn't have the schema. It includes +//! many records in one file. +//! +//! This file format can be used via the [`Reader`](crate::Reader) and [`Writer`](crate::Writer) types. +//! +//! #### [Single Object Encoding](https://avro.apache.org/docs/++version++/specification/#single-object-encoding) +//! This file format also uses the binary encoding, but the schema is not included directly. It instead +//! includes a fingerprint of the schema, which a reader can look up in a schema database or compare +//! with the fingerprint that the reader is expecting. This file format always contains one record. +//! +//! This file format can be used via the [`GenericSingleObjectReader`](crate::GenericSingleObjectReader), +//! [`GenericSingleObjectWriter`](crate::GenericSingleObjectReader), [`SpecificSingleObjectReader`](crate::SpecificSingleObjectReader), Review Comment: ```suggestion //! [`GenericSingleObjectWriter`](crate::GenericSingleObjectWriter), [`SpecificSingleObjectReader`](crate::SpecificSingleObjectReader), ``` ########## avro/src/schema/parser.rs: ########## @@ -539,8 +535,7 @@ impl Parser { } } - /// Parse a `serde_json::Value` representing a Avro record type into a - /// `Schema`. + /// Parse a `serde_json::Value` representing a Avro record type into a `Schema`. Review Comment: ```suggestion /// Parse a `serde_json::Value` representing an Avro record type into a `Schema`. ``` ########## avro/src/schema/mod.rs: ########## @@ -99,17 +100,20 @@ pub enum Schema { /// A `double` Avro schema. Double, /// A `bytes` Avro schema. + /// /// `Bytes` represents a sequence of 8-bit unsigned bytes. Bytes, /// A `string` Avro schema. + /// /// `String` represents a unicode character sequence. String, - /// A `array` Avro schema. Avro arrays are required to have the same type for each element. - /// This variant holds the `Schema` for the array element type. + /// A `array` Avro schema. Review Comment: ```suggestion /// An `array` Avro schema. ``` ########## avro/src/documentation/primer.rs: ########## @@ -0,0 +1,105 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! # A primer on Apache Avro +//! +//! Avro is a schema based encoding system, like Protobuf. This means that if you have raw Avro data +//! without a schema, you are unable to decode it. It also means that the format is very space +//! efficient. +//! +//! ## Schemas +//! +//! Schemas are defined in JSON and look like this: +//! ```json +//! { +//! "type": "record", +//! "name": "example", +//! "fields": [ +//! {"name": "a", "type": "long", "default": 42}, +//! {"name": "b", "type": "string"} +//! ] +//! } +//! ``` +//! For all possible types and extra attributes, see [the schema section of the specification]. +//! +//! Schemas can depend on each other. For example, the schema defined above can be used again or a +//! schema can include itself: +//! ```json +//! { +//! "type": "record", +//! "name": "references", +//! "fields": [ +//! {"name": "a", "type": "example"}, +//! {"name": "b", "type": "bytes"}, +//! {"name": "recursive", "type": ["null", "references"]} +//! ] +//! } +//! ``` +//! +//! Schemas are represented using the [`Schema`](crate::Schema) type. +//! +//! [the schema section of the specification]: https://avro.apache.org/docs/++version++/specification/#schema-declaration +//! +//! ## Data serialization and deserialization +//! There are various formats to encode and decode Avro data. Most formats use the Avro binary encoding. +//! +//! #### [Object Container File](https://avro.apache.org/docs/++version++/specification/#object-container-files) +//! This is the most common file format used for Avro, it uses the binary encoding. It includes the +//! schema in the file, and can therefore be decoded by a reader who doesn't have the schema. It includes +//! many records in one file. +//! +//! This file format can be used via the [`Reader`](crate::Reader) and [`Writer`](crate::Writer) types. +//! +//! #### [Single Object Encoding](https://avro.apache.org/docs/++version++/specification/#single-object-encoding) +//! This file format also uses the binary encoding, but the schema is not included directly. It instead +//! includes a fingerprint of the schema, which a reader can look up in a schema database or compare +//! with the fingerprint that the reader is expecting. This file format always contains one record. +//! +//! This file format can be used via the [`GenericSingleObjectReader`](crate::GenericSingleObjectReader), +//! [`GenericSingleObjectWriter`](crate::GenericSingleObjectReader), [`SpecificSingleObjectReader`](crate::SpecificSingleObjectReader), +//! and [`SpecificSingleObjectWriter`](crate::SpecificSingleObjectWriter) types. +//! +//! #### Avro datums +//! This is not really a file format, as it's just the raw Avro binary data. It does not include a +//! schema and can therefore not be decoded without the reader knowing **exactly** which schema was +//! used to write it. +//! +//! This file format can be used via the [`to_avro_datum`](crate::to_avro_datum), [`from_avro_datum`](crate::from_avro_datum), +//! [`to_avro_datum_schemata`](crate::to_avro_datum_schemata), [`from_avro_datum_schemata`](crate::from_avro_datum_schemata), +//! [`from_avro_datum_reader_schemata`](crate::from_avro_datum_reader_schemata), and +//! [`write_avro_datum_ref`](crate::write_avro_datum_ref) functions. +//! +//! #### [Avro JSON](https://avro.apache.org/docs/++version++/specification/#json-encoding) +//! Not be confused with the schema definition which is also in JSON. This is the Avro data encoded +//! in JSON. +//! +//! It can be used via the [`From<serde_json::Value> for Value`](crate::types::Value) and +//! [`TryFrom<Value> for serde_json::Value`](crate::types::Value) implementations. +//! +//! ## Compression +//! For records with low entropy it can be useful to compress the encoded data. Using the [#Object Container File] Review Comment: ```suggestion //! For records with low entropy it can be useful to compress the encoded data. Using the [#object-container-file] ``` ########## avro/src/serde/mod.rs: ########## @@ -15,6 +15,98 @@ // specific language governing permissions and limitations // under the License. +//! # Using Avro in Rust, the Serde way. +//! +//! Avro is a schema-based format, this means it requires a few extra steps to use compared to +//! a data format like JSON. +//! +//! ## Schemas +//! It's strongly recommended to derive the schemas for your types using the [`AvroSchema`] derive macro. +//! The macro uses the Serde attributes to generate a matching schema and checks that no attributes are +//! used that are incompatible with the Serde implementation in this crate. See [the trait documentation] for +//! details on how to change the generated schema. +//! +//! Alternatively, you can write your own schema. If you go down this path, it is recommended you start with +//! the schema derived by [`AvroSchema`] and then modify it to fit your needs. +//! +//! #### Performance pitfall +//! One performance pitfall with Serde is (de)serializing bytes. The implementation of [`Serialize`][`serde::Serialize`] +//! and [`Deserialize`][`serde::Deserialize`] for types as `Vec<u8>`, `&[u8]` and `Cow<[u8]>` will +//! all use the array of integers representation. This can normally be fixed using the [`serde_bytes`] +//! crate, however this crate also needs some extra information. Therefore, you need to use the +//! [`bytes`], [`bytes_opt`], [`fixed`], [`fixed_opt`], [`mod@slice`], and [`slice_opt`] modules of +//! this crate instead. +//! +//! #### Using existing schemas +//! If you have schemas that are already being used in other parts of your software stack, generating types +//! from the schema can be very useful. There is a **third-party** crate [`rsgen-avro`] that implements this. +//! +//! ## Serializing data +//! Writing data is very simple. Use [`T::get_schema()`](AvroSchema::get_schema()) to get the schema +//! for the type you want to serialize. It is recommended to keep this schema around as long as possible +//! as generating the schema is quite expensive. Then create a [`Writer`](crate::Writer) with your schema +//! and use the [`append_ser()`](crate::Writer::append_ser()) function to serialize your data. +//! +//! ## Deserializing data +//! Reading data is both simpler and more complex than writing. On the one hand, you don't need to +//! generate a schema, as the Avro file has it embedded. But you can't directly deserialize from a +//! [`Reader`](crate::Reader). Instead, you have to iterate over the [`Value`](crate::types::Value)s +//! in the reader and deserialize from those via [`from_value`]. +//! +//! ## Putting it all together +//! +//! The following is an example of how to combine everything showed so far and it is meant to be a +//! quick reference of the Serde interface: +//! +//! ``` +//! # use std::io::Cursor; +//! # use serde::{Serialize, Deserialize}; +//! # use apache_avro::{AvroSchema, Error, Reader, Writer, serde::{from_value, to_value}}; +//! #[derive(AvroSchema, Serialize, Deserialize, PartialEq, Debug)] +//! struct Foo { +//! a: i64, +//! b: String, +//! // Otherwise it will be serialized as an array of integers +//! #[avro(with)] +//! #[serde(with = "apache_avro::serde::bytes")] +//! c: Vec<u8>, +//! } +//! +//! // Creating this schema is expensive, reuse it as much as possible +//! let schema = Foo::get_schema(); +//! // A writer needs the schema of the type that is going to be written +//! let mut writer = Writer::new(&schema, Vec::new())?; +//! +//! let foo = Foo { +//! a: 42, +//! b: "Hello".to_string(), +//! c: b"Data".to_vec() +//! }; +//! +//! // Serialize as many items as you want. +//! writer.append_ser(&foo)?; +//! writer.append_ser(&foo)?; +//! writer.append_ser(&foo)?; +//! +//! // Always flush +//! writer.flush(); Review Comment: ```suggestion //! writer.flush().unwrap(); ``` ########## avro/src/schema_compatibility.rs: ########## @@ -15,7 +15,48 @@ // specific language governing permissions and limitations // under the License. -//! Logic for checking schema compatibility +//! Check if the reader's schema is compatible with the writer's schema. +//! +//! To allow for schema evolution, Avro supports resolving the writer's schema to the reader's schema. +//! To check if this is possible, [`SchemaCompatibility`] can be used. For the complete rules see +//! [the specification](https://avro.apache.org/docs/++version++/specification/#schema-resolution). +//! +//! There are three levels of compatibility. +//! +//! 1. Fully compatible schemas (`Ok(Compatibility::Full)`) +//! +//! For example, an integer can always be resolved to a long: +//! +//! ``` +//! # use apache_avro::{Schema, schema_compatibility::{Compatibility, SchemaCompatibility}}; +//! let writers_schema = Schema::array(Schema::Int); +//! let readers_schema = Schema::array(Schema::Long); +//! assert_eq!(SchemaCompatibility::can_read(&writers_schema, &readers_schema), Ok(Compatibility::Full)); +//! ``` +//! +//! 2. Incompatible schemas (`Err`) +//! +//! For example, a long can never be resolved to a long: Review Comment: ```suggestion //! For example, a long can never be resolved to an int: ``` ########## avro_derive/src/lib.rs: ########## @@ -17,6 +17,14 @@ #![cfg_attr(nightly, feature(proc_macro_diagnostic))] +//! This crate provides the `AvroSchema` derive macro. +//! ```no_run +//! #[derive(AvroSchema)] +//! ``` +//! Please see the documentation of the [`AvroSchema`] trait for instructions on how to use it. +//! +//! [`AvroSchema`]: https://docs.rs/apache-avro/latest/apache_avro/schema/trait.AvroSchema.html Review Comment: ```suggestion //! [`AvroSchema`]: https://docs.rs/apache-avro/latest/apache_avro/serde/trait.AvroSchema.html ``` ########## avro/src/documentation/dynamic.rs: ########## @@ -0,0 +1,279 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! # Using Avro in Rust, the dynamic way. +//! +//! ## Creating a schema +//! +//! An Avro data cannot exist without an Avro schema. Schemas **must** be used while writing and +//! **can** be used while reading and they carry the information regarding the type of data we are +//! handling. Avro schemas are used for both schema validation and resolution of Avro data. +//! +//! Avro schemas are defined in **JSON** format and can just be parsed out of a raw string: +//! +//! ``` +//! use apache_avro::Schema; +//! +//! let raw_schema = r#" +//! { +//! "type": "record", +//! "name": "test", +//! "fields": [ +//! {"name": "a", "type": "long", "default": 42}, +//! {"name": "b", "type": "string"} +//! ] +//! } +//! "#; +//! +//! // if the schema is not valid, this function will return an error +//! let schema = Schema::parse_str(raw_schema).unwrap(); +//! +//! // schemas can be printed for debugging +//! println!("{:?}", schema); +//! ``` +//! +//! Additionally, a list of definitions (which may depend on each other) can be given and all of +//! them will be parsed into the corresponding schemas. +//! +//! ``` +//! use apache_avro::Schema; +//! +//! let raw_schema_1 = r#"{ +//! "name": "A", +//! "type": "record", +//! "fields": [ +//! {"name": "field_one", "type": "float"} +//! ] +//! }"#; +//! +//! // This definition depends on the definition of A above +//! let raw_schema_2 = r#"{ +//! "name": "B", +//! "type": "record", +//! "fields": [ +//! {"name": "field_one", "type": "A"} +//! ] +//! }"#; +//! +//! // if the schemas are not valid, this function will return an error +//! let schemas = Schema::parse_list(&[raw_schema_1, raw_schema_2]).unwrap(); +//! +//! // schemas can be printed for debugging +//! println!("{:?}", schemas); +//! ``` +//! +//! ## Writing data +//! +//! Once we have defined a schema, we are ready to serialize data in Avro, validating them against +//! the provided schema in the process. As mentioned before, there are two ways of handling Avro +//! data in Rust. +//! +//! Given that the schema we defined above is that of an Avro *Record*, we are going to use the +//! associated type provided by the library to specify the data we want to serialize: +//! +//! ``` +//! # use apache_avro::Schema; +//! use apache_avro::types::Record; +//! use apache_avro::Writer; +//! # +//! # let raw_schema = r#" +//! # { +//! # "type": "record", +//! # "name": "test", +//! # "fields": [ +//! # {"name": "a", "type": "long", "default": 42}, +//! # {"name": "b", "type": "string"} +//! # ] +//! # } +//! # "#; +//! # let schema = Schema::parse_str(raw_schema).unwrap(); +//! // a writer needs a schema and something to write to +//! let mut writer = Writer::new(&schema, Vec::new()).unwrap(); +//! +//! // the Record type models our Record schema +//! let mut record = Record::new(writer.schema()).unwrap(); +//! record.put("a", 27i64); +//! record.put("b", "foo"); +//! +//! // schema validation happens here +//! writer.append_value(record).unwrap(); +//! +//! // this is how to get back the resulting Avro bytecode +//! // this performs a flush operation to make sure data has been written, so it can fail +//! // you can also call `writer.flush()` yourself without consuming the writer +//! let encoded = writer.into_inner().unwrap(); +//! ``` +//! +//! The vast majority of the times, schemas tend to define a record as a top-level container +//! encapsulating all the values to convert as fields and providing documentation for them, but in +//! case we want to directly define an Avro value, the library offers that capability via the +//! `Value` interface. +//! +//! ``` +//! use apache_avro::types::Value; +//! +//! let mut value = Value::String("foo".to_string()); +//! ``` +//! +//! ## Reading data +//! +//! As far as reading Avro encoded data goes, we can just use the schema encoded with the data to +//! read them. The library will do it automatically for us, as it already does for the compression +//! codec: +//! +//! ``` +//! use apache_avro::Reader; +//! # use apache_avro::Schema; +//! # use apache_avro::types::Record; +//! # use apache_avro::Writer; +//! # +//! # let raw_schema = r#" +//! # { +//! # "type": "record", +//! # "name": "test", +//! # "fields": [ +//! # {"name": "a", "type": "long", "default": 42}, +//! # {"name": "b", "type": "string"} +//! # ] +//! # } +//! # "#; +//! # let schema = Schema::parse_str(raw_schema).unwrap(); +//! # let mut writer = Writer::new(&schema, Vec::new()).unwrap(); +//! # let mut record = Record::new(writer.schema()).unwrap(); +//! # record.put("a", 27i64); +//! # record.put("b", "foo"); +//! # writer.append_value(record).unwrap(); +//! # let input = writer.into_inner().unwrap(); +//! // reader creation can fail in case the input to read from is not Avro-compatible or malformed +//! let reader = Reader::new(&input[..]).unwrap(); +//! +//! // value is a Result of an Avro Value in case the read operation fails +//! for value in reader { +//! println!("{:?}", value.unwrap()); +//! } +//! ``` +//! +//! In case, instead, we want to specify a different (but compatible) reader schema from the schema +//! the data has been written with, we can just do as the following: +//! ``` +//! use apache_avro::Schema; +//! use apache_avro::Reader; +//! # use apache_avro::types::Record; +//! # use apache_avro::Writer; +//! # +//! # let writer_raw_schema = r#" +//! # { +//! # "type": "record", +//! # "name": "test", +//! # "fields": [ +//! # {"name": "a", "type": "long", "default": 42}, +//! # {"name": "b", "type": "string"} +//! # ] +//! # } +//! # "#; +//! # let writer_schema = Schema::parse_str(writer_raw_schema).unwrap(); +//! # let mut writer = Writer::new(&writer_schema, Vec::new()).unwrap(); +//! # let mut record = Record::new(writer.schema()).unwrap(); +//! # record.put("a", 27i64); +//! # record.put("b", "foo"); +//! # writer.append_value(record).unwrap(); +//! # let input = writer.into_inner().unwrap(); +//! +//! let reader_raw_schema = r#" +//! { +//! "type": "record", +//! "name": "test", +//! "fields": [ +//! {"name": "a", "type": "long", "default": 42}, +//! {"name": "b", "type": "string"}, +//! {"name": "c", "type": "long", "default": 43} +//! ] +//! } +//! "#; +//! +//! let reader_schema = Schema::parse_str(reader_raw_schema).unwrap(); +//! +//! // reader creation can fail in case the input to read from is not Avro-compatible or malformed +//! let reader = Reader::with_schema(&reader_schema, &input[..]).unwrap(); +//! +//! // value is a Result of an Avro Value in case the read operation fails Review Comment: ```suggestion //! // value is a Result of an Avro Value in case the read operation fails ``` ########## avro/src/documentation/dynamic.rs: ########## @@ -0,0 +1,279 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! # Using Avro in Rust, the dynamic way. +//! +//! ## Creating a schema +//! +//! An Avro data cannot exist without an Avro schema. Schemas **must** be used while writing and +//! **can** be used while reading and they carry the information regarding the type of data we are +//! handling. Avro schemas are used for both schema validation and resolution of Avro data. +//! +//! Avro schemas are defined in **JSON** format and can just be parsed out of a raw string: +//! +//! ``` +//! use apache_avro::Schema; +//! +//! let raw_schema = r#" +//! { +//! "type": "record", +//! "name": "test", +//! "fields": [ +//! {"name": "a", "type": "long", "default": 42}, +//! {"name": "b", "type": "string"} +//! ] +//! } +//! "#; +//! +//! // if the schema is not valid, this function will return an error +//! let schema = Schema::parse_str(raw_schema).unwrap(); +//! +//! // schemas can be printed for debugging +//! println!("{:?}", schema); +//! ``` +//! +//! Additionally, a list of definitions (which may depend on each other) can be given and all of +//! them will be parsed into the corresponding schemas. +//! +//! ``` +//! use apache_avro::Schema; +//! +//! let raw_schema_1 = r#"{ +//! "name": "A", +//! "type": "record", +//! "fields": [ +//! {"name": "field_one", "type": "float"} +//! ] +//! }"#; +//! +//! // This definition depends on the definition of A above +//! let raw_schema_2 = r#"{ +//! "name": "B", +//! "type": "record", +//! "fields": [ +//! {"name": "field_one", "type": "A"} +//! ] +//! }"#; +//! +//! // if the schemas are not valid, this function will return an error +//! let schemas = Schema::parse_list(&[raw_schema_1, raw_schema_2]).unwrap(); +//! +//! // schemas can be printed for debugging +//! println!("{:?}", schemas); +//! ``` +//! +//! ## Writing data +//! +//! Once we have defined a schema, we are ready to serialize data in Avro, validating them against +//! the provided schema in the process. As mentioned before, there are two ways of handling Avro +//! data in Rust. +//! +//! Given that the schema we defined above is that of an Avro *Record*, we are going to use the +//! associated type provided by the library to specify the data we want to serialize: +//! +//! ``` +//! # use apache_avro::Schema; +//! use apache_avro::types::Record; +//! use apache_avro::Writer; +//! # +//! # let raw_schema = r#" +//! # { +//! # "type": "record", +//! # "name": "test", +//! # "fields": [ +//! # {"name": "a", "type": "long", "default": 42}, +//! # {"name": "b", "type": "string"} +//! # ] +//! # } +//! # "#; +//! # let schema = Schema::parse_str(raw_schema).unwrap(); +//! // a writer needs a schema and something to write to +//! let mut writer = Writer::new(&schema, Vec::new()).unwrap(); +//! +//! // the Record type models our Record schema +//! let mut record = Record::new(writer.schema()).unwrap(); +//! record.put("a", 27i64); +//! record.put("b", "foo"); +//! +//! // schema validation happens here +//! writer.append_value(record).unwrap(); +//! +//! // this is how to get back the resulting Avro bytecode +//! // this performs a flush operation to make sure data has been written, so it can fail +//! // you can also call `writer.flush()` yourself without consuming the writer +//! let encoded = writer.into_inner().unwrap(); +//! ``` +//! +//! The vast majority of the times, schemas tend to define a record as a top-level container +//! encapsulating all the values to convert as fields and providing documentation for them, but in +//! case we want to directly define an Avro value, the library offers that capability via the +//! `Value` interface. +//! +//! ``` +//! use apache_avro::types::Value; +//! +//! let mut value = Value::String("foo".to_string()); +//! ``` +//! +//! ## Reading data +//! +//! As far as reading Avro encoded data goes, we can just use the schema encoded with the data to +//! read them. The library will do it automatically for us, as it already does for the compression +//! codec: +//! +//! ``` +//! use apache_avro::Reader; +//! # use apache_avro::Schema; +//! # use apache_avro::types::Record; +//! # use apache_avro::Writer; +//! # +//! # let raw_schema = r#" +//! # { +//! # "type": "record", +//! # "name": "test", +//! # "fields": [ +//! # {"name": "a", "type": "long", "default": 42}, +//! # {"name": "b", "type": "string"} +//! # ] +//! # } +//! # "#; +//! # let schema = Schema::parse_str(raw_schema).unwrap(); +//! # let mut writer = Writer::new(&schema, Vec::new()).unwrap(); +//! # let mut record = Record::new(writer.schema()).unwrap(); +//! # record.put("a", 27i64); +//! # record.put("b", "foo"); +//! # writer.append_value(record).unwrap(); +//! # let input = writer.into_inner().unwrap(); +//! // reader creation can fail in case the input to read from is not Avro-compatible or malformed +//! let reader = Reader::new(&input[..]).unwrap(); +//! +//! // value is a Result of an Avro Value in case the read operation fails Review Comment: ```suggestion //! // value is a Result of an Avro Value in case the read operation fails ``` ########## avro/src/schema/mod.rs: ########## @@ -903,8 +938,9 @@ impl Serialize for Schema { } } -/// Parses a **valid** avro schema into the Parsing Canonical Form. -/// https://avro.apache.org/docs/current/specification/#parsing-canonical-form-for-schemas +/// Parses a valid Avro schema into [the Parsing Canonical Form]. +/// +/// [the Parsing Canonical From](https://avro.apache.org/docs/current/specification/#parsing-canonical-form-for-schemas) Review Comment: ```suggestion /// [the Parsing Canonical Form](https://avro.apache.org/docs/++version++/specification/#parsing-canonical-form-for-schemas) ``` `++version++` looks worse in the address bar but here helps to make the fragment part working We could also use `1.12.0` instead and update it once in a while. ########## avro/src/schema/parser.rs: ########## @@ -67,8 +65,9 @@ impl Parser { self.parse(&value, &None) } - /// Create an array of `Schema`'s from an iterator of JSON Avro schemas. It is allowed that - /// the schemas have cross-dependencies; these will be resolved during parsing. + /// Create an array of `Schema`'s from an iterator of JSON Avro schemas. Review Comment: ```suggestion /// Create an array of `Schema`s from an iterator of JSON Avro schemas. ``` ########## avro/src/validator.rs: ########## @@ -15,6 +15,39 @@ // specific language governing permissions and limitations // under the License. +//! # Custom name validation +//! +//! By default, the library follows the rules specified in the [Avro specification](https://avro.apache.org/docs/1.11.1/specification/#names). +//! +//! Some of the other Apache Avro language SDKs are more flexible in their name validation. For +//! interoperability with those SDKs, the library provides a way to customize the name validation. +//! +//! ``` +//! use apache_avro::AvroResult; +//! use apache_avro::schema::Namespace; +//! use apache_avro::validator::{SchemaNameValidator, set_schema_name_validator}; +//! +//! struct MyCustomValidator; +//! +//! impl SchemaNameValidator for MyCustomValidator { +//! fn validate(&self, name: &str) -> AvroResult<(String, Namespace)> { +//! todo!() +//! } +//! } +//! +//! // don't parse any schema before registering the custom validator(s)! +//! +//! set_schema_name_validator(Box::new(MyCustomValidator)); Review Comment: ```suggestion //! set_schema_name_validator(Box::new(MyCustomValidator)).ok(); ``` ########## avro/src/serde/derive.rs: ########## @@ -22,64 +22,288 @@ use crate::schema::{ use std::borrow::Cow; use std::collections::HashMap; -/// Trait for types that serve as an Avro data model. Derive implementation available -/// through `derive` feature. Do not implement directly! -/// Implement [`AvroSchemaComponent`] to get this trait +/// Trait for types that serve as an Avro data model. +/// +/// Do not implement directly! Either derive it or implement [`AvroSchemaComponent`] to get this trait /// through a blanket implementation. +/// +/// ## Deriving `AvroSchema` +/// +/// Using the custom derive requires that you enable the `"derive"` cargo +/// feature in your `Cargo.toml`: +/// +/// ```toml +/// [dependencies] +/// apache-avro = { version = "..", features = ["derive"] } +/// ``` +/// +/// Then, you add the `#[derive(AvroSchema)]` annotation to your `struct` and +/// `enum` type definition: +/// +/// ``` +/// # use serde::{Serialize, Deserialize}; +/// # use apache_avro::AvroSchema; +/// #[derive(AvroSchema, Serialize, Deserialize)] +/// pub struct Foo { +/// bar: Vec<Bar>, +/// } +/// +/// #[derive(AvroSchema, Serialize, Deserialize)] +/// pub enum Bar { +/// Spam, +/// Maps +/// } +/// ``` +/// +/// This will implement [`AvroSchemaComponent`] for the type, and `AvroSchema` +/// through the blanket implementation for `T: AvroSchemaComponent`. +/// +/// When deriving `struct`s, every member must also implement `AvroSchemaComponent`. +/// +/// ## Changing the generated schema +/// +/// The derive macro will read both the `avro` and `serde` attributes to modify the generated schema. +/// It will also check for compatibility between the various attributes. +/// +/// #### Container attributes +/// +/// - `#[serde(rename = "name")]` +/// +// TODO: Should we check if `name` contains any dots? As that would imply a namespace +/// Set the `name` of the schema to the given string. Defaults to the name of the type. +/// +/// - `#[avro(namespace = "some.name.space")]` +/// +/// Set the `namespace` of the schema. This will be the relative namespace if the schema is included +/// in another schema. +/// +/// - `#[avro(doc = "Some documentation")]` +/// +/// Set the `doc` attribute of the schema. Defaults to the documentation of the type. +/// +/// - `#[avro(alias = "name")]` +/// +/// Set the `alias` attribute of the schema. Can be specified multiple times. +/// +/// - `#[serde(rename_all = "camelCase")]` +/// +/// Rename all the fields or variants in the schema to follow the given case convention. The possible values +/// are `"lowercase"`, `"UPPERCASE"`, `"PascalCase"`, `"camelCase"`, `"snake_case"`, `"kebab-case"`, +/// `"SCREAMING_SNAKE_CASE"`, `"SCREAMING-KEBAB-CASE"`. +/// +/// - `#[serde(transparent)]` +/// +/// Use the schema of the inner field directly. Is only allowed on structs with only unskipped field. +/// +/// +/// #### Variant attributes +/// +/// - `#[serde(rename = "name")]` +/// +/// Rename the variant to the given name. +/// +/// +/// #### Field attributes +/// +/// - `#[serde(rename = "name")]` +/// +/// Rename the field name to the given name. +/// +/// - `#[avro(doc = "Some documentation")]` +/// +/// Set the `doc` attribute of the field. Defaults to the documentation of the field. +/// +/// - `#[avro(default = "null")]` +/// +/// Set the `default` attribute of the field. +/// +/// _Note:_ This is a JSON value not a Rust value, as this is put in the schema itself. +/// +/// - `#[serde(alias = "name")]` +/// +/// Set the `alias` attribute of the field. Can be specified multiple times. +/// +/// - `#[serde(flatten)]` +/// +/// Flatten the content of this field into the container it is defined in. +/// +/// - `#[serde(skip)]` +/// +/// Do not include this field in the schema. +/// +/// - `#[serde(skip_serializing)]` +/// +/// When combined with `#[serde(skip_deserializing)]`, don't include this field in the schema. +/// Otherwise, it will be included in the schema and the `#[avro(default)]` attribute **must** be +/// set. That value will be used for serializing. +/// +/// - `#[serde(skip_serializing_if)]` +/// +/// Conditionally use the value of the field or the value provided by `#[avro(default)]`. The +/// `#[avro(default)]` attribute **must** be set. +/// +/// - `#[avro(with)]` and `#[serde(with = "module")]` +/// +/// Override the schema used for this field. See [Working with foreign types](#working-with-foreign-types). +/// +/// #### Incompatible Serde attributes +/// +/// The derive macro is compatible with most Serde attributes, but it is incompatible with +/// the following attributes: +/// +/// - Container attributes +/// - `tag` +/// - `content` +/// - `untagged` +/// - `variant_identifier` +/// - `field_identifier` +/// - `remote` +/// - `rename_all(serialize = "..", deserialize = "..")` where `serialize` != `deserialize` +/// - Variant attributes +/// - `other` +/// - `untagged` +/// - Field attributes +/// - `getter` +/// +/// ## Working with foreign types +/// +/// Most foreign types won't have a [`AvroSchema`] implementation. This crate implements it only +/// for built-in types and [`uuid::Uuid`]. +/// +/// To still be able to derive schemas for fields of foreign types, the `#[avro(with)`] +/// attribute can be used to get the schema for those fields. It can be used in two ways: +/// +/// 1. In combination with `#[serde(with = "path::to::module)]` +/// +/// To get the schema, it will call the functions `fn get_schema_in_ctxt(&mut Names, &Namespace) -> Schema` +/// and `fn get_record_fields_in_ctxt(&mut Names, &Namespace) -> Schema` in the module provided Review Comment: ```suggestion /// and `fn get_record_fields_in_ctxt(first_field_position: usize) -> Option<Vec<RecordField>>` in the module provided ``` ########## avro/src/error.rs: ########## @@ -109,7 +109,9 @@ pub enum Details { reason: String, }, - #[error("Unable to allocate {desired} bytes (maximum allowed: {maximum})")] + #[error( + "Unable to allocate {desired} bytes (maximum allowed: {maximum}). Change the limit using `utils::max_allocation_bytes`" Review Comment: ```suggestion "Unable to allocate {desired} bytes (maximum allowed: {maximum}). Change the limit using `apache_avro::util::max_allocation_bytes`" ``` ########## avro_derive/src/lib.rs: ########## @@ -17,6 +17,14 @@ #![cfg_attr(nightly, feature(proc_macro_diagnostic))] +//! This crate provides the `AvroSchema` derive macro. +//! ```no_run +//! #[derive(AvroSchema)] Review Comment: ```suggestion //! #[derive(apache_avro::AvroSchema)] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
