Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/518
As it turns out, the sample code shown was actually tested with a stock
Jackson JSON parser: it does work. No parser changes are needed.
The issue is not whether we can make the parser do what is needed: the code
posted in the comment above demonstrated a solution.
The issue is how we incorporate that code into the JSON parser to clean up
partial records and prevent schema changes. When I have time, I'll investigate
that question in greater depth.
IMHO, without a proper fix, we should simply state that Drill does not
support malformed JSON. If an input file might be incorrect, run it though a
clean-up step before allowing Drill to scan it. Otherwise, we are opening the
door to many hard-to-resolve bugs when people ask Drill to scan corrupt JSON:
the result, without a proper fix, would be undefined -- which is worse than the
current behavior that simply fails the scan with an error.
Let's follow up again after I (or someone) has had a chance to figure out
if we can undo a partially built record. If we can do that, then we've got a
path to a clean solution: recover the parser (as shown earlier) and discard the
in-flight record (as we need to research.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---