[
https://issues.apache.org/jira/browse/AVRO-3479?focusedWorklogId=754897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754897
]
ASF GitHub Bot logged work on AVRO-3479:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Apr/22 05:30
Start Date: 09/Apr/22 05:30
Worklog Time Spent: 10m
Work Description: jklamer commented on code in PR #1631:
URL: https://github.com/apache/avro/pull/1631#discussion_r846569639
##########
lang/rust/avro_derive/src/lib.rs:
##########
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use proc_macro2::{Span, TokenStream, TokenTree};
+use quote::quote;
+
+use syn::{parse_macro_input, Attribute, DeriveInput, Error, Lit, Path, Type,
TypePath};
+
+#[proc_macro_derive(AvroSchema, attributes(namespace))]
+// Templated from Serde
+pub fn proc_macro_derive_avro_schema(input: proc_macro::TokenStream) ->
proc_macro::TokenStream {
+ let mut input = parse_macro_input!(input as DeriveInput);
+ derive_avro_schema(&mut input)
+ .unwrap_or_else(to_compile_errors)
+ .into()
+}
+
+fn derive_avro_schema(input: &mut DeriveInput) -> Result<TokenStream,
Vec<syn::Error>> {
+ let namespace = get_namespace_from_attributes(&input.attrs)?;
+ let full_schema_name = vec![namespace, Some(input.ident.to_string())]
+ .into_iter()
+ .flatten()
+ .collect::<Vec<String>>()
+ .join(".");
+ let schema_def = match &input.data {
+ syn::Data::Struct(s) => {
+ get_data_struct_schema_def(&full_schema_name, s,
input.ident.span())?
+ }
+ syn::Data::Enum(e) => get_data_enum_schema_def(&full_schema_name, e,
input.ident.span())?,
+ _ => {
+ return Err(vec![Error::new(
+ input.ident.span(),
+ "AvroSchema derive only works for structs and simple enums ",
+ )])
+ }
+ };
+
+ let ty = &input.ident;
+ let (impl_generics, ty_generics, where_clause) =
input.generics.split_for_impl();
+ Ok(quote! {
+ impl #impl_generics apache_avro::schema::AvroSchemaWithResolved for
#ty #ty_generics #where_clause {
+ fn get_schema_with_resolved(resolved_schemas: &mut
HashMap<apache_avro::schema::Name, apache_avro::schema::Schema>) ->
apache_avro::schema::Schema {
+ let name =
apache_avro::schema::Name::new(#full_schema_name).expect(&format!("Unable to
parse schema name {}", #full_schema_name)[..]);
+ if resolved_schemas.contains_key(&name) {
+ resolved_schemas.get(&name).unwrap().clone()
+ }else {
+ resolved_schemas.insert(name.clone(), Schema::Ref{name:
name.clone()});
+ #schema_def
+ }
+ }
+ }
+ })
+}
+
+fn get_namespace_from_attributes(attrs: &[Attribute]) ->
Result<Option<String>, Vec<Error>> {
+ let namespace_attr_path_constant: Path = syn::parse2::<Path>(quote!
{namespace}).unwrap();
+ const NAMESPACE_PARSING_ERROR_CONSTANST: &str =
+ "Namespace attribute must be in form #[namespace =
\"com.testing.namespace\"]";
+ // parse out namespace if present. Requires strict syntax
+ for attr in attrs {
+ if namespace_attr_path_constant == attr.path {
+ let mut input_tokens = attr.tokens.clone().into_iter();
+ if let (
+ Some(TokenTree::Punct(punct)),
+ Some(TokenTree::Literal(namespace_literal)),
+ None,
+ ) = (
+ input_tokens.next(),
+ input_tokens.next(),
+ input_tokens.next(),
+ ) {
+ if punct.as_char() == '=' {
+ if let Lit::Str(lit_str) = Lit::new(namespace_literal) {
+ return Ok(Some(lit_str.value()));
+ }
+ }
+ }
+ return Err(vec![Error::new_spanned(
+ &attr.tokens,
+ NAMESPACE_PARSING_ERROR_CONSTANST,
+ )]);
+ }
+ }
+ Ok(None)
+}
+
+fn get_data_struct_schema_def(
+ full_schema_name: &str,
+ s: &syn::DataStruct,
+ error_span: Span,
+) -> Result<TokenStream, Vec<Error>> {
+ let mut record_field_exprs = vec![];
+ match s.fields {
+ syn::Fields::Named(ref a) => {
+ for (position, field) in a.named.iter().enumerate() {
+ let name = field.ident.as_ref().unwrap().to_string(); // we
know everything has a name
+ let schema_expr = type_to_schema_expr(&field.ty)?;
+ let position = position;
+ record_field_exprs.push(quote! {
+ apache_avro::schema::RecordField {
+ name: #name.to_string(),
+ doc: Option::None,
+ default: Option::None,
+ schema: #schema_expr,
+ order:
apache_avro::schema::RecordFieldOrder::Ignore,
+ position: #position,
+ }
+ });
+ }
+ }
+ syn::Fields::Unnamed(_) => {
+ return Err(vec![Error::new(
+ error_span,
+ "AvroSchema derive does not work for tuple structs",
+ )])
+ }
+ syn::Fields::Unit => {
+ return Err(vec![Error::new(
+ error_span,
+ "AvroSchema derive does not work for unit structs",
+ )])
+ }
+ }
+ Ok(quote! {
+ let schema_fields = vec![#(#record_field_exprs),*];
+ let name =
apache_avro::schema::Name::new(#full_schema_name).expect(&format!("Unable to
struct name for schema {}", #full_schema_name)[..]);
+ apache_avro::schema::record_schema_for_fields(name, None, None,
schema_fields)
+ })
+}
+
+fn get_data_enum_schema_def(
+ full_schema_name: &str,
+ e: &syn::DataEnum,
+ error_span: Span,
+) -> Result<TokenStream, Vec<Error>> {
+ if e.variants.iter().all(|v| syn::Fields::Unit == v.fields) {
+ let symbols: Vec<String> = e
+ .variants
+ .iter()
+ .map(|varient| varient.ident.to_string())
+ .collect();
+ Ok(quote! {
+ apache_avro::schema::Schema::Enum {
+ name:
apache_avro::schema::Name::new(#full_schema_name).expect(&format!("Unable to
parse enum name for schema {}", #full_schema_name)[..]),
+ aliases: None,
+ doc: None,
+ symbols: vec![#(#symbols.to_owned()),*]
+ }
+ })
+ } else {
+ Err(vec![Error::new(
+ error_span,
+ "AvroSchema derive does not work for enums with non unit structs",
+ )])
+ }
+}
+
+/// Takes in the Tokens of a type and returns the tokens of an expression with
return type `Schema`
+fn type_to_schema_expr(ty: &Type) -> Result<TokenStream, Vec<Error>> {
+ if let Type::Path(p) = ty {
+ let type_string = p.path.segments.last().unwrap().ident.to_string();
+
+ let schema = match &type_string[..] {
+ "bool" => quote! {Schema::Boolean},
+ "i8" | "i16" | "i32" | "u8" | "u16" => quote!
{apache_avro::schema::Schema::Int},
+ "i64" => quote! {apache_avro::schema::Schema::Long},
Review Comment:
The current serde implementation for u32 that we have is
```
fn serialize_u32(self, v: u32) -> Result<Self::Ok, Self::Error> {
if v <= i32::MAX as u32 {
self.serialize_i32(v as i32)
} else {
self.serialize_i64(i64::from(v))
}
}
```
The schema would have to be value dependent so I couldn't always create a
schema that I could guarantee to work. Unless I always did `[integer, long]`
but that felt like unexpected behavior. Just making the user revert to manual
definition felt better. Lots of ways we could handle, what do you think?
Issue Time Tracking
-------------------
Worklog Id: (was: 754897)
Time Spent: 1h 20m (was: 1h 10m)
> [rust] Derive Avro Schema macro
> -------------------------------
>
> Key: AVRO-3479
> URL: https://issues.apache.org/jira/browse/AVRO-3479
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: Jack Klamer
> Assignee: Jack Klamer
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The tracking Issue for the Avro Derive Feature of the rust SDK.
> Proposal (copied from email):
> Have another rust crate that is importable as a feature on the main crate (in
> the same manner as serde derive), that will provide a derive proc_macro that
> implements a simple trait that returns the schema for the implementing type.
> Right now, schemas must be parsed from strings ( or read from files first),
> and closely coordinated with the associated struct. This makes sense for
> workflows that need to associate the same type across languages. For programs
> that are all within Rust, there are usability advantages of the proc_macro.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)