Hi All,

Starting this thread to get opinions for adding a streaming JSON parser for
converting a JSON to POJO.This parser would be in addition to the databind
parser (com.fasterxml.jackson.databind) we already have.

The advantage of a streaming JSON parser is,

1.The parser need not parse entire input to set the fields of the POJO.
2.Can be used with multiline JSON records.eg if a user is using the
AbstractFileInputOperator to read a file line by line & a JSON is spanning
multiple lines, then the existing parser will not work even if the required
fields are covered in the single line input.
3.These parsers have the least read/write overhead as compared to databind
or tree based parsers.

Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more details.

The disadvantages are (from the documentation)

1.All content to read/write has to be processed in exact same order as
input comes in (or output is to go out) -- for random access, you need to
use Data Binding or Tree Model (which both actually use Streaming Api for
actual JSON reading/writing).
[Dev] This could be tricky if one row of input goes to one partition of the
parser and the other one goes to another.
[Dev] This also means that we cannot use it with the existing file
splitter,since different splits may not go to the same partition of the
parser.

2.No Java objects are created unless specifically requested; and even then
only very basic types are supported (Strings, byte[] for base64-encoded
binary content)
[Dev] Should be fine for the use-cases we are covering.

Please send across your inputs and comments.

Thanks,
Dev

Reply via email to