Github user ssriniva123 commented on the issue:
https://github.com/apache/drill/pull/518
Apologize for getting back on this thread late, got tied up with some
issues@work.
Paul,
The json parser is not just a tokenizer, it keeps track of the JSON
structure and understands various aspects of it like root, array/objectcontext
and all parsing is done under that context.
- we cannot keep track of {} accurately - For eg: The counting json
processor does a parser. skipChildren which tries to skip to the end of the
JSON, but this can rollover to next line when
there is a malformed JSON in the bottom most json sub object - see example
below (missing " in last json structure). This is similar behavior with the
JsonReader.
{"balance": 1000.0,"num": 100,"is_vip": true,"name":
"foo3","curr":{"denom":"pound","test":{"value :false}}}
- One possible solution is to rewind the input source to reset the stream
(which is not recommended and there is no guarentee that all streams support
mark/reset semantics.
Given where we are, I think the solution proposed works perfect for almost
all malformed JSON's.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---