Github user ssriniva123 commented on the issue: https://github.com/apache/drill/pull/518 Apologize for getting back on this thread late, got tied up with some issues@work. Paul, The json parser is not just a tokenizer, it keeps track of the JSON structure and understands various aspects of it like root, array/objectcontext and all parsing is done under that context. - we cannot keep track of {} accurately - For eg: The counting json processor does a parser. skipChildren which tries to skip to the end of the JSON, but this can rollover to next line when there is a malformed JSON in the bottom most json sub object - see example below (missing " in last json structure). This is similar behavior with the JsonReader. {"balance": 1000.0,"num": 100,"is_vip": true,"name": "foo3","curr":{"denom":"pound","test":{"value :false}}} - One possible solution is to rewind the input source to reset the stream (which is not recommended and there is no guarentee that all streams support mark/reset semantics. Given where we are, I think the solution proposed works perfect for almost all malformed JSON's.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---