Github user ssriniva123 commented on the issue:
https://github.com/apache/drill/pull/518
Paul,
Thanks for taking the time out in writing out a detailed email. Here are
some of my thoughts.
- Drill uses the com.fasterxml.jackson.core.json.UTF8StreamJsonParser for
parsing of JSON records. This parser does not rely on line delimiters for
record separators but instead uses
the JSON structure as a natural way to signal End of record (EOR). There
are methods internal
to the parser which check for line feeds but is not exposed to callers.
- The CountingJsonReader uses the parser.skipChildren() method to skip the
rest of the children for this record, hence it is not possible to accurately
count and match the no of braces to cleanly skip that bad record.
- One thought is to tap the inputsource of the parser on an exception
condition, but is not
encouraged.
My thought process was exactly along the lines you have been thinking. On
an exception scenario the code attempts to locate a closing bracket(}) followed
by a opening bracket ({).
This is what is being done in the BaseJsonProcessor.processJSONException
method. Please note that it works in all cases except when we do not have
proper brackets to signify end of a JSON record.
Hope this explanation helps clarify.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---