[ https://issues.apache.org/jira/browse/SOLR-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459631#comment-16459631 ]
mosh edited comment on SOLR-12298 at 5/1/18 12:04 PM: ------------------------------------------------------ Approach: I see [~janhoy]'s [proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320] as a starting point for this issue, as it addresses most of the problems, as well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk on Solr Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the starting points to this issue. Firstly, the way a nested document is indexed has to be changed. I propose we add the following fields: # __parent__ # __level__ # __path__ _parent_: This field wild will store the document's parent docId, to be used for building the whole hierarchy, using a new document transformer, as suggested by Jan on the mailing list. _level_: This field will store the level of the specified field in the document, using an int value. This field can be used for the parentFilter, eliminating the need to provide a parentFilter, which will be set by default as "_level_:queriedFieldLevel". _path_: This field will contain the full path, separated by a specific reserved char e.g., '.' for example: "first.second.third". This will enable users to search for a specific path, or provide a regular expression to search for fields sharing the same name in different levels of the document, filtering using the _level_ key if needed. To make this happen at index time, changes have to be made to the JSON loader, which will add the above fields, as well as the _root_ field, which holds the documents top most level docId. This will only happen when a specified parameter is added to the update request, e.g. "nested=true". The new child doc transformer will be able to either reassemble the whole document structure, or do so from a specific level, if specified. Full hierarchy reconstruction can be done relatively cheaply, using the _root_ field to get to the highest level document, and querying the block for its children, ordering the query by the _level_ field. was (Author: moshebla): Approach: I see [~janhoy]'s [proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320] as a starting point for this issue, as it addresses most of the problems, as well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk on Solr Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the starting points to this issue. Firstly, the way a nested document is indexed has to be changed. I propose we add the following fields: # _parent_ # _level_ # _path_ _parent_: This field wild will store the document's parent docId, to be used for building the whole hierarchy, using a new document transformer, as suggested by Jan on the mailing list. _level_: This field will store the level of the specified field in the document, using an int value. This field can be used for the parentFilter, eliminating the need to provide a parentFilter, which will be set by default as "_level_:queriedFieldLevel". _path_: This field will contain the full path, separated by a specific reserved char e.g., '.' for example: "first.second.third". This will enable users to search for a specific path, or provide a regular expression to search for fields sharing the same name in different levels of the document, filtering using the _level_ key if needed. To make this happen at index time, changes have to be made to the JSON loader, which will add the above fields, as well as the _root_ field, which holds the documents top most level docId. This will only happen when a specified parameter is added to the update request, e.g. "nested=true". The new child doc transformer will be able to either reassemble the whole document structure, or do so from a specific level, if specified. Full hierarchy reconstruction can be done relatively cheaply, using the _root_ field to get to the highest level document, and querying the block for its children, ordering the query by the _level_ field. > Index Full nested document Hierarchy For Queries (umbrella issue) > ----------------------------------------------------------------- > > Key: SOLR-12298 > URL: https://issues.apache.org/jira/browse/SOLR-12298 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: mosh > Priority: Major > > Solr ought to have the ability to index deeply nested objects, while storing > the original document hierarchy. > Currently the client has to index the child document's full path and level to > manually reconstruct the original document structure, since the children are > flattened and returned in the reserved "_childDocuments_" key. > Ideally you could index a nested document, having Solr transparently add the > required fields while providing a document transformer to rebuild the > original document's hierarchy. > > This issue is an umbrella issue for the particular tasks that will make it > all happen – either subtasks or issue linking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org