+1 This would be a good value add. It will result in achieving higher throughput for input operators which needs to consume selective json fields.
Thx, Ashish > On 23-Mar-2016, at 10:31 AM, Bhupesh Chawda <[email protected]> wrote: > > A multi-line JSON format is very common and is usually the case with REST > API results. > I think this could be a valuable addition. > > Regarding the issues that you mentioned, I think it can be solved by having > a custom file splitter which takes care of splitting on a JSON record > boundary. > > +1 for a streaming JSON parser. > > ~Bhupesh > > On Wed, Mar 23, 2016 at 5:22 AM, Devendra Tagare <[email protected]> > wrote: > >> Hi All, >> >> Starting this thread to get opinions for adding a streaming JSON parser for >> converting a JSON to POJO.This parser would be in addition to the databind >> parser (com.fasterxml.jackson.databind) we already have. >> >> The advantage of a streaming JSON parser is, >> >> 1.The parser need not parse entire input to set the fields of the POJO. >> 2.Can be used with multiline JSON records.eg if a user is using the >> AbstractFileInputOperator to read a file line by line & a JSON is spanning >> multiple lines, then the existing parser will not work even if the required >> fields are covered in the single line input. >> 3.These parsers have the least read/write overhead as compared to databind >> or tree based parsers. >> >> Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more >> details. >> >> The disadvantages are (from the documentation) >> >> 1.All content to read/write has to be processed in exact same order as >> input comes in (or output is to go out) -- for random access, you need to >> use Data Binding or Tree Model (which both actually use Streaming Api for >> actual JSON reading/writing). >> [Dev] This could be tricky if one row of input goes to one partition of the >> parser and the other one goes to another. >> [Dev] This also means that we cannot use it with the existing file >> splitter,since different splits may not go to the same partition of the >> parser. >> >> 2.No Java objects are created unless specifically requested; and even then >> only very basic types are supported (Strings, byte[] for base64-encoded >> binary content) >> [Dev] Should be fine for the use-cases we are covering. >> >> Please send across your inputs and comments. >> >> Thanks, >> Dev >>
