[
https://issues.apache.org/jira/browse/SOLR-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428179#comment-16428179
]
Dawid Weiss commented on SOLR-12094:
------------------------------------
I understand the concept of "streaming" imports, but this just seems wrong to
me here. An analogy here would be XSLT or other technologies where the
implementation permits efficient "streaming" mode in certain cases, unless the
input makes it impossible.
I perceive a similar situation here: the parser should be able to handle the
input efficiently if possible, but should also give the possibility for
processing any type of input, even such that cannot be processed without
bookkeeping of some history. Sure, an abuse case of millions of split nodes
awaiting a single attribute is possible, but even then it'd be simpler to just
say "yeah, buffer up until you can emit the output" than modify the structure
of such a json (write a converter so that the nested nodes are always placed at
the end of the parent).
[~awislowski] do you think you'd be able to modify the patch so that it accepts
an argument and switches between the 'strict streaming' mode and 'relaxed'
mode? In 'strict streaming' mode there should be no buffering and the parser
should complain with an exception if it encounters extra nodes after the split.
In the 'relaxed mode' the parser should buffer up the information until it's
complete and can be emitted.
> JsonRecordReader ignores root record fields after the split point
> -----------------------------------------------------------------
>
> Key: SOLR-12094
> URL: https://issues.apache.org/jira/browse/SOLR-12094
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrJ
> Affects Versions: master (8.0)
> Reporter: Przemysław Szeremiota
> Priority: Major
> Attachments: SOLR-12094.patch, SOLR-12094.patch,
> json-record-reader-bug.patch
>
>
> JsonRecordReader, when configured with other than top-level split, ignores
> all top-level JSON nodes after the split ends, for example:
> {code}
> {
> "first": "John",
> "last": "Doe",
> "grade": 8,
> "exams": [
> {
> "subject": "Maths",
> "test": "term1",
> "marks": 90
> },
> {
> "subject": "Biology",
> "test": "term1",
> "marks": 86
> }
> ],
> "after": "456"
> }
> {code}
> Node "after" won't be visible in SolrInputDocument constructed from
> /update/json/docs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]