stevedlawrence opened a new pull request, #1603:
URL: https://github.com/apache/daffodil/pull/1603

   When we need to parse a specified length string, we currently allocate a 
buffer that can be reused to store the decoded string. The size of this buffer 
is based on the maximumSimpleElementSizeInCharacters tunable, which defaults to 
a fairly large size (1MB) that can be slow and put added pressure on the 
garbage collector. Fortunately, this buffer is allocated using a LocalBuffer so 
it is reused during a parse so at worst there is only one allocation per parse. 
But when parsing many small files that contain specified length strings, this 
overhead can become noticable. And 1MB is likely orders of magnitude larger 
than the vast majority of data formats will need for any single string element.
   
   To address this, instead of using maximumSimpleElementSizeInCharacters, we 
calculate how many characters the string could possible decode to given the 
current bit position, bit limit, and encoding, and use that as the buffer size 
to request. This way we only ever request and allocate a large buffer is one is 
ever needed, which should be rare.
   
   Note that this new logic requires bitLimit as part of specified string 
parsing. That isn't available in the edge case of specified length complex 
nillables. The specified length nil parser is modified to handle this case.
   
   This also modifies the LocalBuffer to allocate buffers of a reasonably large 
minimum size of 1K. This way we will likely only ever need to allocate a single 
buffer rather than allocating small buffers that have to be reallocate as 
larger buffers are needed.
   
   Tested with small NITF files (<4000 bytes) that contain lots of fixed length 
strings, this saw about 30%+ performance improvements. Files tested as large as 
8000 bytes saw little or no change in performance.
   
   DAFFODIL-2851


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to