stevedlawrence opened a new pull request #81: Reduce memory usage regressions in commit 07ee2434bb URL: https://github.com/apache/incubator-daffodil/pull/81 The modifications to the IO layer to support streaming made changes that substantially increased memory usage. This makes the following changes to minimize that: - No longer save the char iterator state. Saving this state required duplication a LongBuffer and CharBuffer, which are two non-trivial allocations/copies for every point of uncertainty. This really adds up for some file types. Instead, never save the char iterator state. When the bit position changes due to resetting a mark, we will just clear the char iterator state and decode data again. This does mean some data might be decoded twice if we backtrack, but that should be relatively quick, and means we only take a hit when we backtrack instead of every time there is a point of uncertainty. - The regexMatch buffers are intentionally large to match long patterns. Unfortunately, the PState was changed so that every PState allocated its own regex buffers, which resulted in a lot of large allocations. Instead, modify the DataProcessor to store ThreadLocal state for the regex buffers, and the PState access that state when necessary. So we will no only have large regex buffers for each Thread rather than each call to parse. - For every file parsed in the CLI performance command, we allocated a new InputSourceDataInputStream before doing any performance testing. So if you wanted to do a performance test of 500,000 files, we would allocate 500,000 InputSourceDataInputStreams immediately. This class isn't huge, but it can add up pretty quick and use a lot of memory. Instead, just allocate the InputSourceDataInputStream right before the call to parse so that it can be garbage collected when the parse ends. DAFFODIL-1966
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
