[jira] [Commented] (SOLR-12768) Determine how _nest_path_ should be analyzed to support various use-cases

David Smiley (JIRA) Fri, 28 Dec 2018 21:43:56 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-12768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730574#comment-16730574
 ]


David Smiley commented on SOLR-12768:
-------------------------------------

I toyed with creating a custom field type subclass but I wound up thinking it's 
rather unnecessary... if only Solr had some notion of implicit field types that 
you could refer to without having to explicitly define them (as I mentioned in 
the dev list).  This isn't strictly necessary, I'm trying to both (a) avoid 
implementation detail & bloat in the schema of users, and (b) use an approach 
that could easily change in the future using luceneMatchVersion if we change 
our mind on the default.

So I added this mechanism with {{_nest_path_}} being an implicitly defined 
field that is registered on-demand of first use via 
{{IndexSchema.createImplicitFieldType()}}.  I decided factor out a static 
method to create this specific field type and put it into 
NestedUpdateProcessorFactory so as to keep related code together.  I had to 
make FieldType.setArgs public which seemed fine.  Eventually it'd be nice to 
see primitive field types implicitly declared, which would be done directly in 
the switch statement I added in createImplicitFieldType.  I did _not_ enhance 
the REST Schema mutation API to use this mechanism -- that's a follow-on TODO 
and would need its own test.

About the new analysis.... TestChildDocTransformerHierarchy has many test 
methods that fail.  I have yet to update them to pass.  Many fail because of a 
combination of two factors (a) they use an experimental syntax for the 
"childFilter" parameter that is defined in 
org.apache.solr.response.transform.ChildDocTransformerFactory#processPathHierarchyQueryString
  that assumes a tokenization that allows certain queries to match, and (b) I 
changed the text analysis in this patch and thus (a)'s assumption is false.  I 
think this can be fixed by a simple adjustment to the code building the query 
to insert a leading wildcard if the input/query does not start with a '/'.  Of 
course leading wildcard queries are slow but if the total number of unique 
paths is very small (as I expect it should be) then it's fine.  CC [~moshebla]

> Determine how _nest_path_ should be analyzed to support various use-cases
> -------------------------------------------------------------------------
>
>                 Key: SOLR-12768
>                 URL: https://issues.apache.org/jira/browse/SOLR-12768
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Blocker
>             Fix For: master (8.0)
>
>         Attachments: SOLR-12768.patch
>
>
> We know we need {{\_nest\_path\_}} in the schema for the new nested documents 
> support, and we loosely know what goes in it.  From a DocValues perspective, 
> we've got it down; though we might tweak it.  From an indexing (text 
> analysis) perspective, we're not quite sure yet, though we've got a test 
> schema, {{schema-nest.xml}} with a decent shot at it.  Ultimately, how we 
> index it will depend on the query/filter use-cases we need to support.  So 
> we'll review some of them here.
> TBD: Not sure if the outcome of this task is just a "decide" or wether we 
> also potentially add a few tests for some of these cases, and/or if we also 
> add a FieldType to make declaring it as easy as a one-liner.  A FieldType 
> would have other benefits too once we're ready to make querying on the path 
> easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12768) Determine how _nest_path_ should be analyzed to support various use-cases

Reply via email to