Oh, got it, thanks for re-explaining. Did you try a simple heuristic on visiting to estimate the size (3 chars for a number, no escaping for string, just length etc...), should enable to cut fast enough the visiting without hacking lower level any buffer strategy or buffer nor calling flush and moving the array data too often? Otherwise we can do a custom generator factory from the provider in mapper propagting properties but Im less a fan of that option cause it complexify the user customization of the generator (today we can output other stuff than json through that way).
Le lun. 25 avr. 2022 à 19:47, David Blevins <[email protected]> a écrit : > > On Apr 24, 2022, at 11:09 PM, Romain Manni-Bucau <[email protected]> > wrote: > > > > Le dim. 24 avr. 2022 à 22:30, David Blevins <[email protected]> a > > écrit : > > > >> All, > >> > >> I added more tests and found that most the optimizations were not > >> happening due to buffering. > >> > >> Essentially there are two buffers between Snippet.Buffer and > >> Snippet.SnippetOutputStream. The SnippetOutputStream had the > >> responsibility to tell the code up the stack when we've reached the max > >> snippet length. Since all the bytes were buffered, it would see nothing > >> until the very end and we'd end up serializing the full json text > anyway. > >> > >> One is the 64k buffer in JsonGeneratorImpl and the other is an 8k buffer > >> in the JVM implementation code of OutputStreamWriter. Since the > >> OutputStreamWriter buffer is hardcoded, we can't solve this by adjusting > >> buffer sizes and have no choice but to aggressively call flush() to > ensure > >> SnippetOutputStream has the bytes and can do its job. > >> > > > > Not sure I get that since if you close in a finally block the generator, > it > > will flush the actual output and all will be good. > > But can be to call tostring to early rather than a buffering issue > > It'd difficult to explain, but I'll do my best and thanks for the patience > if my attempt is poor. > > The code is designed with the assumption that as json is serialized there > will be write calls made on SnippetOutputStream, which then counts the > bytes and can eventually tell Snippet.Buffer to stop making more json via > the 'snippet.terminate()' calls. In practice this doesn't happen. > > In practice what does happen is the entire json document, up to a limit of > 64k, will be created before any calls reach SnippetOutputStream. This is > because JsonGeneratorImpl is holding a buffer (64k by default) and does not > call any writes on the Writer instance it's holding until that buffer has > filled or close is called. > > My first instinct was to reduce that 64k buffer to the snippet max length > and solve this problem that way. The trick with that is there is yet > another buffer being held internally by ObjectOutputStream and it recreates > the issue. That buffer is hardcoded to be 8k. So even if we adjust the > JsonGeneratorImpl buffer size, in practice what happens is the entire json > document, up to a limit of 8k, will be created before any calls reach > SnippetOutputStream. > > Certainly 8k is better than 64k which is better than potentially 1GB of > json, but I wanted to try and get close to the spirit of what we were both > after originally which is that we avoid serializing a lot of json only to > throw most of it away and show just a chunk of it. > > The only way to do that is the flush() calls. That's the only way to > ensure SnippetOutputStream is getting the json data as we serialize it. > > Hope some of this helps. > > > -David > >
