Hi All, Starting this thread to get opinions for adding a streaming JSON parser for converting a JSON to POJO.This parser would be in addition to the databind parser (com.fasterxml.jackson.databind) we already have.
The advantage of a streaming JSON parser is, 1.The parser need not parse entire input to set the fields of the POJO. 2.Can be used with multiline JSON records.eg if a user is using the AbstractFileInputOperator to read a file line by line & a JSON is spanning multiple lines, then the existing parser will not work even if the required fields are covered in the single line input. 3.These parsers have the least read/write overhead as compared to databind or tree based parsers. Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more details. The disadvantages are (from the documentation) 1.All content to read/write has to be processed in exact same order as input comes in (or output is to go out) -- for random access, you need to use Data Binding or Tree Model (which both actually use Streaming Api for actual JSON reading/writing). [Dev] This could be tricky if one row of input goes to one partition of the parser and the other one goes to another. [Dev] This also means that we cannot use it with the existing file splitter,since different splits may not go to the same partition of the parser. 2.No Java objects are created unless specifically requested; and even then only very basic types are supported (Strings, byte[] for base64-encoded binary content) [Dev] Should be fine for the use-cases we are covering. Please send across your inputs and comments. Thanks, Dev
