Github user tzolov commented on the pull request: https://github.com/apache/incubator-hawq/pull/302#issuecomment-178608613 @hornn, @adamjshook As an experimenti've rempimplemented/replaced the `JsonRecordReader`&`JsonStreamReader` with ideas and code boroed from the [json-mapreduce](https://github.com/alexholmes/json-mapreduce) project. I really like the result. It comes with a (sort of) JsonLexer/Parser that solves some of the shortcomings in the current implementation. For example having the `identifier` in the nested object(s) values will be handled. Furthermore they why how the identifier is used is slightly differently and more powerful. The identifier refers to a member name which it will use to determine the encapsulating object to return. This is superior functionality over what we have at the moment as you can point to multiline json objects that don't have a parent identifier. Shall i add this change to this PR or to a separate one after this has been merged? Personally i think it is better to add it now as it changes the `identifier` semantics and would be inconvenient for the potential users to learn/unlearn. Alternatively we can drop the current JsonRecordReader form this PR and introduce the new code as an extension i another one. What do you think?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---