[ 
https://issues.apache.org/jira/browse/AVRO-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991201#comment-12991201
 ] 

Scott Carey commented on AVRO-743:
----------------------------------

With a larger heap, it completes but slowly.  There is still a large 
regression.  Using the new Perf.java, here are full read results with and 
without the AVRO-650 changes.  

args: -nowrite -server -Xmx256m -Xms256m -XX:+UseParallelGC 
-XX:+UseCompressedOops -XX:+DoEscapeAnalysis

Only the generic results are below -- only the "one time use" reader tests are 
affected and other generic tests are a good reference.
{code}
                  GenericRead:  1811 ms,      3.680 million entries/sec.    
142.805 million bytes/sec
           GenericNested_Read:  3015 ms,      2.211 million entries/sec.     
85.801 million bytes/sec
      GenericWithDefault_Read:  3253 ms,      2.049 million entries/sec.     
79.532 million bytes/sec
   GenericWithOutOfOrder_Read:  1855 ms,      3.594 million entries/sec.    
139.472 million bytes/sec
    GenericWithPromotion_Read:  1962 ms,      3.397 million entries/sec.    
131.853 million bytes/sec
GenericOneTimeDecoderUse_Read:  1791 ms,      3.721 million entries/sec.    
144.426 million bytes/sec
 GenericOneTimeReaderUse_Read:  6989 ms,      0.954 million entries/sec.     
37.014 million bytes/sec
       GenericOneTimeUse_Read:  7373 ms,      0.904 million entries/sec.     
35.088 million bytes/sec
{code}

If I revert AVRO-650, I get:
{code}
                  GenericRead:  1808 ms,      3.687 million entries/sec.    
143.076 million bytes/sec
           GenericNested_Read:  2872 ms,      2.321 million entries/sec.     
90.062 million bytes/sec
      GenericWithDefault_Read:  3389 ms,      1.967 million entries/sec.     
76.340 million bytes/sec
   GenericWithOutOfOrder_Read:  1805 ms,      3.693 million entries/sec.    
143.319 million bytes/sec
    GenericWithPromotion_Read:  1978 ms,      3.369 million entries/sec.    
130.759 million bytes/sec
GenericOneTimeDecoderUse_Read:  1803 ms,      3.696 million entries/sec.    
143.443 million bytes/sec
 GenericOneTimeReaderUse_Read:  2289 ms,      2.912 million entries/sec.    
113.024 million bytes/sec
       GenericOneTimeUse_Read:  2299 ms,      2.899 million entries/sec.    
112.501 million bytes/sec
{code}


To prevent the cases where GenericDatumReaders are created and disposed rapidly 
from causing this issue, I tried several things.  One was to remove the 
resolver cached in GenericDatumReader entirely and only use the global cache.  
This was surprisingly fast, but slowed all Generic tests by 10% to 15%.
Any variation that creates a new threadLocal per instance of GenericDatumReader 
was bad.  An alternate attempt tried to instead keep one global ThreadLocal 
WeakReferenceCache with GenericDatumReader's as keys to track the relationship 
was faster, but still a large memory hog and performance problem.



This is still not 100% thread-safe, but it is no worse than before.   Since we 
allow mutating state in setSchema() and setExpected() the only way to be 
completely thread-safe is to synchronize those as well as their access .  
Performance dropped quite a bit when I did that.  Longer term we need to make 
these objects immutable, and use a builder pattern when we don't know all the 
fields prior to construction.



> Java: Performance Regression and memory pressure with GenericDatumReader
> ------------------------------------------------------------------------
>
>                 Key: AVRO-743
>                 URL: https://issues.apache.org/jira/browse/AVRO-743
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>            Reporter: Scott Carey
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> AVRO-650 introduced a large performance regression and memory bloat issue 
> with GenericDatumReader.
> Performance plummets for some Perf.java tests (One test took 1 hour to finish 
> on my laptop).
> Some minor changes I tried result in it passing in shorter time, but with 
> still an 80% performance degredation.
> This is associated with memory bloat related to ThreadLocals.
> More details provided in comments.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to