> On Apr 24, 2022, at 11:09 PM, Romain Manni-Bucau <[email protected]> 
> wrote:
> 
> Le dim. 24 avr. 2022 à 22:30, David Blevins <[email protected]> a
> écrit :
> 
>> All,
>> 
>> I added more tests and found that most the optimizations were not
>> happening due to buffering.
>> 
>> Essentially there are two buffers between Snippet.Buffer and
>> Snippet.SnippetOutputStream.  The SnippetOutputStream had the
>> responsibility to tell the code up the stack when we've reached the max
>> snippet length.  Since all the bytes were buffered, it would see nothing
>> until the very end and we'd end up serializing the full json text anyway.
>> 
>> One is the 64k buffer in JsonGeneratorImpl and the other is an 8k buffer
>> in the JVM implementation code of OutputStreamWriter.  Since the
>> OutputStreamWriter buffer is hardcoded, we can't solve this by adjusting
>> buffer sizes and have no choice but to aggressively call flush() to ensure
>> SnippetOutputStream has the bytes and can do its job.
>> 
> 
> Not sure I get that since if you close in a finally block the generator, it
> will flush the actual output and all will be good.
> But can be to call tostring to early rather than a buffering issue

It'd difficult to explain, but I'll do my best and thanks for the patience if 
my attempt is poor.

The code is designed with the assumption that as json is serialized there will 
be write calls made on SnippetOutputStream, which then counts the bytes and can 
eventually tell Snippet.Buffer to stop making more json via the 
'snippet.terminate()' calls.  In practice this doesn't happen.

In practice what does happen is the entire json document, up to a limit of 64k, 
will be created before any calls reach SnippetOutputStream.  This is because 
JsonGeneratorImpl is holding a buffer (64k by default) and does not call any 
writes on the Writer instance it's holding until that buffer has filled or 
close is called.

My first instinct was to reduce that 64k buffer to the snippet max length and 
solve this problem that way.  The trick with that is there is yet another 
buffer being held internally by ObjectOutputStream and it recreates the issue.  
That buffer is hardcoded to be 8k.  So even if we adjust the JsonGeneratorImpl 
buffer size, in practice what happens is the entire json document, up to a 
limit of 8k, will be created before any calls reach SnippetOutputStream.

Certainly 8k is better than 64k which is better than potentially 1GB of json, 
but I wanted to try and get close to the spirit of what we were both after 
originally which is that we avoid serializing a lot of json only to throw most 
of it away and show just a chunk of it.

The only way to do that is the flush() calls.  That's the only way to ensure 
SnippetOutputStream is getting the json data as we serialize it.

Hope some of this helps.


-David

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to