[ 
https://issues.apache.org/jira/browse/AVRO-1438?focusedWorklogId=744796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-744796
 ]

ASF GitHub Bot logged work on AVRO-1438:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Mar/22 23:13
            Start Date: 20/Mar/22 23:13
    Worklog Time Spent: 10m 
      Work Description: zcsizmadia commented on pull request #1604:
URL: https://github.com/apache/avro/pull/1604#issuecomment-1073369284


   @KyleSchoonover Could you post your before and afetr measurements?
   
   Here is my diff I applied to `master`:
   
   ```
   diff --git a/lang/csharp/src/apache/main/Generic/GenericReader.cs 
b/lang/csharp/src/apache/main/Generic/GenericReader.cs
   index f42e572d..45df3b29 100644
   --- a/lang/csharp/src/apache/main/Generic/GenericReader.cs
   +++ b/lang/csharp/src/apache/main/Generic/GenericReader.cs
   @@ -121,6 +121,9 @@ namespace Avro.Generic
            {
                this.ReaderSchema = readerSchema;
                this.WriterSchema = writerSchema;
   +
   +            if (!ReaderSchema.CanRead(WriterSchema))
   +                throw new AvroException("Schema mismatch. Reader: " + 
ReaderSchema + ", writer: " + WriterSchema);
            }
   
            /// <summary>
   @@ -134,9 +137,6 @@ namespace Avro.Generic
            /// <returns>Object read from the decoder.</returns>
            public T Read<T>(T reuse, Decoder decoder)
            {
   -            if (!ReaderSchema.CanRead(WriterSchema))
   -                throw new AvroException("Schema mismatch. Reader: " + 
ReaderSchema + ", writer: " + WriterSchema);
   -
                return (T)Read(reuse, WriterSchema, ReaderSchema, decoder);
            }
   ```
   
   Before:
   ```
   $ dotnet run -c Release -f net6.0
   type    impl    action  total_items     batches batch_size      time(ms)
   simple  default_specific        serialize       1000000 1000    1000    500
   simple  default_specific        deserialize     1000000 1000    1000    1188
   simple  preresolved_specific    serialize       1000000 1000    1000    312
   simple  preresolved_specific    deserialize     1000000 1000    1000    328
   simple  default_generic serialize       1000000 1000    1000    391
   simple  default_generic deserialize     1000000 1000    1000    1109
   simple  preresolved_generic     serialize       1000000 1000    1000    250
   simple  preresolved_generic     deserialize     1000000 1000    1000    469
   complex default_specific        serialize       1000000 1000    1000    3015
   complex default_specific        deserialize     1000000 1000    1000    7391
   complex preresolved_specific    serialize       1000000 1000    1000    2438
   complex preresolved_specific    deserialize     1000000 1000    1000    3406
   complex default_generic serialize       1000000 1000    1000    2937
   complex default_generic deserialize     1000000 1000    1000    5735
   complex preresolved_generic     serialize       1000000 1000    1000    2031
   complex preresolved_generic     deserialize     1000000 1000    1000    2641
   narrow  default_specific        serialize       1000000 1000    1000    203
   narrow  default_specific        deserialize     1000000 1000    1000    547
   narrow  preresolved_specific    serialize       1000000 1000    1000    156
   narrow  preresolved_specific    deserialize     1000000 1000    1000    157
   narrow  default_generic serialize       1000000 1000    1000    203
   narrow  default_generic deserialize     1000000 1000    1000    562
   narrow  preresolved_generic     serialize       1000000 1000    1000    141
   narrow  preresolved_generic     deserialize     1000000 1000    1000    219
   wide    default_specific        serialize       1000000 1000    1000    2610
   wide    default_specific        deserialize     1000000 1000    1000    6593
   wide    preresolved_specific    serialize       1000000 1000    1000    2297
   wide    preresolved_specific    deserialize     1000000 1000    1000    2141
   wide    default_generic serialize       1000000 1000    1000    2109
   wide    default_generic deserialize     1000000 1000    1000    6235
   wide    preresolved_generic     serialize       1000000 1000    1000    1343
   wide    preresolved_generic     deserialize     1000000 1000    1000    2766
   ```
   
   After:
   ```
   $ dotnet run -c Release -f net6.0
   type    impl    action  total_items     batches batch_size      time(ms)
   simple  default_specific        serialize       1000000 1000    1000    531
   simple  default_specific        deserialize     1000000 1000    1000    891
   simple  preresolved_specific    serialize       1000000 1000    1000    328
   simple  preresolved_specific    deserialize     1000000 1000    1000    344
   simple  default_generic serialize       1000000 1000    1000    453
   simple  default_generic deserialize     1000000 1000    1000    844
   simple  preresolved_generic     serialize       1000000 1000    1000    281
   simple  preresolved_generic     deserialize     1000000 1000    1000    547
   complex default_specific        serialize       1000000 1000    1000    3579
   complex default_specific        deserialize     1000000 1000    1000    7218
   complex preresolved_specific    serialize       1000000 1000    1000    2875
   complex preresolved_specific    deserialize     1000000 1000    1000    3969
   complex default_generic serialize       1000000 1000    1000    3266
   complex default_generic deserialize     1000000 1000    1000    4734
   complex preresolved_generic     serialize       1000000 1000    1000    2109
   complex preresolved_generic     deserialize     1000000 1000    1000    2797
   narrow  default_specific        serialize       1000000 1000    1000    219
   narrow  default_specific        deserialize     1000000 1000    1000    406
   narrow  preresolved_specific    serialize       1000000 1000    1000    141
   narrow  preresolved_specific    deserialize     1000000 1000    1000    187
   narrow  default_generic serialize       1000000 1000    1000    204
   narrow  default_generic deserialize     1000000 1000    1000    375
   narrow  preresolved_generic     serialize       1000000 1000    1000    140
   narrow  preresolved_generic     deserialize     1000000 1000    1000    250
   wide    default_specific        serialize       1000000 1000    1000    2688
   wide    default_specific        deserialize     1000000 1000    1000    4781
   wide    preresolved_specific    serialize       1000000 1000    1000    1969
   wide    preresolved_specific    deserialize     1000000 1000    1000    2078
   wide    default_generic serialize       1000000 1000    1000    2094
   wide    default_generic deserialize     1000000 1000    1000    4640
   wide    preresolved_generic     serialize       1000000 1000    1000    1422
   wide    preresolved_generic     deserialize     1000000 1000    1000    2953
   ```
   
   You are definetely onto somethng with the CanRead function. E.g. `wide    
default_generic deserialize     1000000 1000    1000`
   improved  from 6235ms ->4640ms = ~35%, which is massive,
   
   Btw, `PreresolvingDatumReader` does the `if 
(!ReaderSchema.CanRead(WriterSchema))` check in the constructor and not in the 
Read function. Which seems to be the correct path.
   
   However there are other places where CanRead is called, so caching might 
make sense there as well for the other types as well.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@avro.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 744796)
    Time Spent: 1h 20m  (was: 1h 10m)

> C# reader performance improvement
> ---------------------------------
>
>                 Key: AVRO-1438
>                 URL: https://issues.apache.org/jira/browse/AVRO-1438
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: csharp
>    Affects Versions: 1.7.5
>            Reporter: David Taylor
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: RecordSchema.cs.diff
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> GenericReader/SpecificReader spend a lot of time comparing the reader/writer 
> schema.  Remembering the last good match speeds things up about 15% in my 
> tests using the avro.pref project for timings.  This does not impact the 
> DatumReader implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to