stevedlawrence opened a new pull request, #1603: URL: https://github.com/apache/daffodil/pull/1603
When we need to parse a specified length string, we currently allocate a buffer that can be reused to store the decoded string. The size of this buffer is based on the maximumSimpleElementSizeInCharacters tunable, which defaults to a fairly large size (1MB) that can be slow and put added pressure on the garbage collector. Fortunately, this buffer is allocated using a LocalBuffer so it is reused during a parse so at worst there is only one allocation per parse. But when parsing many small files that contain specified length strings, this overhead can become noticable. And 1MB is likely orders of magnitude larger than the vast majority of data formats will need for any single string element. To address this, instead of using maximumSimpleElementSizeInCharacters, we calculate how many characters the string could possible decode to given the current bit position, bit limit, and encoding, and use that as the buffer size to request. This way we only ever request and allocate a large buffer is one is ever needed, which should be rare. Note that this new logic requires bitLimit as part of specified string parsing. That isn't available in the edge case of specified length complex nillables. The specified length nil parser is modified to handle this case. This also modifies the LocalBuffer to allocate buffers of a reasonably large minimum size of 1K. This way we will likely only ever need to allocate a single buffer rather than allocating small buffers that have to be reallocate as larger buffers are needed. Tested with small NITF files (<4000 bytes) that contain lots of fixed length strings, this saw about 30%+ performance improvements. Files tested as large as 8000 bytes saw little or no change in performance. DAFFODIL-2851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
