[ https://issues.apache.org/jira/browse/AVRO-1438?focusedWorklogId=744796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-744796 ]
ASF GitHub Bot logged work on AVRO-1438: ---------------------------------------- Author: ASF GitHub Bot Created on: 20/Mar/22 23:13 Start Date: 20/Mar/22 23:13 Worklog Time Spent: 10m Work Description: zcsizmadia commented on pull request #1604: URL: https://github.com/apache/avro/pull/1604#issuecomment-1073369284 @KyleSchoonover Could you post your before and afetr measurements? Here is my diff I applied to `master`: ``` diff --git a/lang/csharp/src/apache/main/Generic/GenericReader.cs b/lang/csharp/src/apache/main/Generic/GenericReader.cs index f42e572d..45df3b29 100644 --- a/lang/csharp/src/apache/main/Generic/GenericReader.cs +++ b/lang/csharp/src/apache/main/Generic/GenericReader.cs @@ -121,6 +121,9 @@ namespace Avro.Generic { this.ReaderSchema = readerSchema; this.WriterSchema = writerSchema; + + if (!ReaderSchema.CanRead(WriterSchema)) + throw new AvroException("Schema mismatch. Reader: " + ReaderSchema + ", writer: " + WriterSchema); } /// <summary> @@ -134,9 +137,6 @@ namespace Avro.Generic /// <returns>Object read from the decoder.</returns> public T Read<T>(T reuse, Decoder decoder) { - if (!ReaderSchema.CanRead(WriterSchema)) - throw new AvroException("Schema mismatch. Reader: " + ReaderSchema + ", writer: " + WriterSchema); - return (T)Read(reuse, WriterSchema, ReaderSchema, decoder); } ``` Before: ``` $ dotnet run -c Release -f net6.0 type impl action total_items batches batch_size time(ms) simple default_specific serialize 1000000 1000 1000 500 simple default_specific deserialize 1000000 1000 1000 1188 simple preresolved_specific serialize 1000000 1000 1000 312 simple preresolved_specific deserialize 1000000 1000 1000 328 simple default_generic serialize 1000000 1000 1000 391 simple default_generic deserialize 1000000 1000 1000 1109 simple preresolved_generic serialize 1000000 1000 1000 250 simple preresolved_generic deserialize 1000000 1000 1000 469 complex default_specific serialize 1000000 1000 1000 3015 complex default_specific deserialize 1000000 1000 1000 7391 complex preresolved_specific serialize 1000000 1000 1000 2438 complex preresolved_specific deserialize 1000000 1000 1000 3406 complex default_generic serialize 1000000 1000 1000 2937 complex default_generic deserialize 1000000 1000 1000 5735 complex preresolved_generic serialize 1000000 1000 1000 2031 complex preresolved_generic deserialize 1000000 1000 1000 2641 narrow default_specific serialize 1000000 1000 1000 203 narrow default_specific deserialize 1000000 1000 1000 547 narrow preresolved_specific serialize 1000000 1000 1000 156 narrow preresolved_specific deserialize 1000000 1000 1000 157 narrow default_generic serialize 1000000 1000 1000 203 narrow default_generic deserialize 1000000 1000 1000 562 narrow preresolved_generic serialize 1000000 1000 1000 141 narrow preresolved_generic deserialize 1000000 1000 1000 219 wide default_specific serialize 1000000 1000 1000 2610 wide default_specific deserialize 1000000 1000 1000 6593 wide preresolved_specific serialize 1000000 1000 1000 2297 wide preresolved_specific deserialize 1000000 1000 1000 2141 wide default_generic serialize 1000000 1000 1000 2109 wide default_generic deserialize 1000000 1000 1000 6235 wide preresolved_generic serialize 1000000 1000 1000 1343 wide preresolved_generic deserialize 1000000 1000 1000 2766 ``` After: ``` $ dotnet run -c Release -f net6.0 type impl action total_items batches batch_size time(ms) simple default_specific serialize 1000000 1000 1000 531 simple default_specific deserialize 1000000 1000 1000 891 simple preresolved_specific serialize 1000000 1000 1000 328 simple preresolved_specific deserialize 1000000 1000 1000 344 simple default_generic serialize 1000000 1000 1000 453 simple default_generic deserialize 1000000 1000 1000 844 simple preresolved_generic serialize 1000000 1000 1000 281 simple preresolved_generic deserialize 1000000 1000 1000 547 complex default_specific serialize 1000000 1000 1000 3579 complex default_specific deserialize 1000000 1000 1000 7218 complex preresolved_specific serialize 1000000 1000 1000 2875 complex preresolved_specific deserialize 1000000 1000 1000 3969 complex default_generic serialize 1000000 1000 1000 3266 complex default_generic deserialize 1000000 1000 1000 4734 complex preresolved_generic serialize 1000000 1000 1000 2109 complex preresolved_generic deserialize 1000000 1000 1000 2797 narrow default_specific serialize 1000000 1000 1000 219 narrow default_specific deserialize 1000000 1000 1000 406 narrow preresolved_specific serialize 1000000 1000 1000 141 narrow preresolved_specific deserialize 1000000 1000 1000 187 narrow default_generic serialize 1000000 1000 1000 204 narrow default_generic deserialize 1000000 1000 1000 375 narrow preresolved_generic serialize 1000000 1000 1000 140 narrow preresolved_generic deserialize 1000000 1000 1000 250 wide default_specific serialize 1000000 1000 1000 2688 wide default_specific deserialize 1000000 1000 1000 4781 wide preresolved_specific serialize 1000000 1000 1000 1969 wide preresolved_specific deserialize 1000000 1000 1000 2078 wide default_generic serialize 1000000 1000 1000 2094 wide default_generic deserialize 1000000 1000 1000 4640 wide preresolved_generic serialize 1000000 1000 1000 1422 wide preresolved_generic deserialize 1000000 1000 1000 2953 ``` You are definetely onto somethng with the CanRead function. E.g. `wide default_generic deserialize 1000000 1000 1000` improved from 6235ms ->4640ms = ~35%, which is massive, Btw, `PreresolvingDatumReader` does the `if (!ReaderSchema.CanRead(WriterSchema))` check in the constructor and not in the Read function. Which seems to be the correct path. However there are other places where CanRead is called, so caching might make sense there as well for the other types as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 744796) Time Spent: 1h 20m (was: 1h 10m) > C# reader performance improvement > --------------------------------- > > Key: AVRO-1438 > URL: https://issues.apache.org/jira/browse/AVRO-1438 > Project: Apache Avro > Issue Type: Improvement > Components: csharp > Affects Versions: 1.7.5 > Reporter: David Taylor > Priority: Minor > Labels: pull-request-available > Attachments: RecordSchema.cs.diff > > Time Spent: 1h 20m > Remaining Estimate: 0h > > GenericReader/SpecificReader spend a lot of time comparing the reader/writer > schema. Remembering the last good match speeds things up about 15% in my > tests using the avro.pref project for timings. This does not impact the > DatumReader implementation. -- This message was sent by Atlassian Jira (v8.20.1#820001)