Re: [jackson-user] jsonFactory.createParser eagerly reads from the InputStream to detect the encoding

Artashes Aghajanyan Thu, 18 Apr 2019 07:44:22 -0700

Thank you for the additional insight.

In my specific case I used a different workaround as I know the size of the 
json object arriving, so I was able to just slice my original input buffer 
(create another view of the buffer without copying the bytes) and only then 
wrap it with an input stream. This way the problem goes away as it can't 
consume more from that sliced input stream than there is data in it.


In a more generic case, just a small note that returning the data buffered 
but not parsed to the user with releaseBuffered() is not free - it involves 
copying data. I understand it's a trade off. But can we say that a more 
performant approach would be to not give all the data to the parser in the 
first place if you don't want it to read and parse everything from the 
stream?

On Wednesday, 17 April 2019 21:38:48 UTC-4, Tatu Saloranta wrote:
>
> On Wed, Apr 17, 2019 at 6:34 PM Artashes Aghajanyan 
> <artashes....@gmail.com <javascript:>> wrote: 
> > 
> > I noticed the following behavior that feels like a bug to me, but I want 
> to confirm if it's not by design before opening an issue in github. 
> > 
> > jackson-databind-2.4.3 
> > 
> > Consider the following code fragment: 
> > 
> > ByteBuf in = ... // a bytebuf containing multiple jsons, e.g. 
> {"ack":"a1"}{"ack":"b1"}{"ack":"b3"} 
> > ByteBufInputStream inputStream = new ByteBufInputStream(in); 
> > Map<String, String> ackMap = mapper.readValue(inputStream, Map.class); 
> > 
> > After this readValue() call, inputStream becomes empty (nothing left to 
> read) but only the first {"ack":"a1"} object is parsed and returned. I 
> debugged it a bit, here's what's happening: 
> > 
> > ObjectMapper.readValue(InputStream src, Class<T> valueType) calls 
> _jsonFactory.createParser(src) which calls 
> ByteSourceJsonBootstrapper.detectEncoding() which calls 
> ByteSourceJsonBootstrapper.ensureLoaded(4). 
> > 
> > ensureLoaded(4) basically tries to read 4k bytes from the stream. 
> > 
> > If the input stream contains multiple small (less than 4k?) json 
> objects, it reads everything from the stream, just to detect the encoding! 
> > 
> > The problem with this approach is that once the data is read from the 
> stream it is essentially lost for the user of object mapper, so if we have 
> a stream that contains a series of small json strings, it'll read all of it 
> just to detect the encoding but will only return the first json from 
> readValue() call. 
> > 
> > Since ObjectMapper doesn't "own" the stream, one may expect that it 
> won't consume more data from the stream than is necessary to parse one json 
> object. 
> > 
> > I've also tried this with the latest 2.9.8 release and the behavior is 
> the same. 
> > 
> > Is this a bug? 
>
> No, it is by design. 
>
> Decoding is most efficient directly accessing byte[] for content, and 
> overhead for reads from InputStream is non-trivial (depending on type 
> of stream). Reads request buffer full of content, although if stream 
> returns less whatever is available is consumed first before requesting 
> more. 
>
> If content buffered needs to be recovered for some reason it is 
> available using one of 2 methods: 
>
> releaseBuffered(OutputStream) 
> releaseBuffered(Writer) 
>
> which will then pass buffered but unused content, if any; method to 
> call depends on kind of input source parser has been created with. 
>
> -+ Tatu +- 
>

-- 
You received this message because you are subscribed to the Google Groups 
"jackson-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jackson-user+unsubscr...@googlegroups.com.
To post to this group, send email to jackson-user@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [jackson-user] jsonFactory.createParser eagerly reads from the InputStream to detect the encoding

Reply via email to