[ https://issues.apache.org/jira/browse/AVRO-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507941#comment-17507941 ]
Jack Klamer commented on AVRO-3451: ----------------------------------- oh just saw your edit. Let me know how it goes. > fix poor Avro write performance > ------------------------------- > > Key: AVRO-3451 > URL: https://issues.apache.org/jira/browse/AVRO-3451 > Project: Apache Avro > Issue Type: Improvement > Components: rust > Affects Versions: 1.11.0 > Environment: Mac OS X Big Sur > {code:java} > installed toolchains > -------------------- > stable-x86_64-apple-darwin (default) > nightly-x86_64-apple-darwin > active toolchain > ---------------- > stable-x86_64-apple-darwin (default) > rustc 1.56.1 (59eed8a2a 2021-11-01) {code} > Reporter: Kevin > Priority: Major > Attachments: Screen Shot 2022-03-14 at 7.30.24 PM.png > > Original Estimate: 1h > Remaining Estimate: 1h > > Rust implementation of Apache Avro library – apache-avro (née avro-rs) – > demonstrates poor write performance when serializing Rust structures to Avro. > Profiling indicates that this implementation spends an inordinate amount of > time in the function {{encode::encode_ref}} performing {{clone()}} and > {{drop}} operations related to a HashMap<String, Schema> type. > We modified the function {{encode_ref0}} as follows: > {code:java} > -pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) { > - fn encode_ref0( > +pub fn encode_ref<'a>(value: &Value, schema: &'a Schema, buffer: &mut > Vec<u8>) { > + fn encode_ref0<'a>( > value: &Value, > - schema: &Schema, > + schema: &'a Schema, > buffer: &mut Vec<u8>, > - schemas_by_name: &mut HashMap<String, Schema>, > + schemas_by_name: &mut HashMap<&'a str, &'a Schema>, > ) { > match &schema { > Schema::Ref { ref name } => { > - let resolved = > schemas_by_name.get(name.name.as_str()).unwrap(); > + let resolved = schemas_by_name.get(&name.name as > &str).unwrap(); > return encode_ref0(value, resolved, buffer, &mut > schemas_by_name.clone()); > } > Schema::Record { ref name, .. } > | Schema::Enum { ref name, .. } > | Schema::Fixed { ref name, .. } => { > - schemas_by_name.insert(name.name.clone(), schema.clone()); > + schemas_by_name.insert(&name.name, &schema); > } > _ => (), > }{code} > to remove any need for Clone in the {{schemas_by_name}} cache and see a > notable improvement (factor of 4 to 5) in our application with this change. > After this change, all Cargo Tests still pass and Benchmarks display a very > significant improvement in Write performance across the board. Attached below > is one example benchmark for {{big schema, write 10k records}} with Before on > the Left and After on the Right. -- This message was sent by Atlassian Jira (v8.20.1#820001)