A multi-line JSON format is very common and is usually the case with REST
API results.
I think this could be a valuable addition.

Regarding the issues that you mentioned, I think it can be solved by having
a custom file splitter which takes care of splitting on a JSON record
boundary.

+1 for a streaming JSON parser.

~Bhupesh

On Wed, Mar 23, 2016 at 5:22 AM, Devendra Tagare <[email protected]>
wrote:

> Hi All,
>
> Starting this thread to get opinions for adding a streaming JSON parser for
> converting a JSON to POJO.This parser would be in addition to the databind
> parser (com.fasterxml.jackson.databind) we already have.
>
> The advantage of a streaming JSON parser is,
>
> 1.The parser need not parse entire input to set the fields of the POJO.
> 2.Can be used with multiline JSON records.eg if a user is using the
> AbstractFileInputOperator to read a file line by line & a JSON is spanning
> multiple lines, then the existing parser will not work even if the required
> fields are covered in the single line input.
> 3.These parsers have the least read/write overhead as compared to databind
> or tree based parsers.
>
> Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more
> details.
>
> The disadvantages are (from the documentation)
>
> 1.All content to read/write has to be processed in exact same order as
> input comes in (or output is to go out) -- for random access, you need to
> use Data Binding or Tree Model (which both actually use Streaming Api for
> actual JSON reading/writing).
> [Dev] This could be tricky if one row of input goes to one partition of the
> parser and the other one goes to another.
> [Dev] This also means that we cannot use it with the existing file
> splitter,since different splits may not go to the same partition of the
> parser.
>
> 2.No Java objects are created unless specifically requested; and even then
> only very basic types are supported (Strings, byte[] for base64-encoded
> binary content)
> [Dev] Should be fine for the use-cases we are covering.
>
> Please send across your inputs and comments.
>
> Thanks,
> Dev
>

Reply via email to