[jira] [Commented] (SOLR-12625) Combine SolrDocumentFetcher and RetrieveFieldsOptimizer

Erick Erickson (JIRA) Tue, 07 Aug 2018 11:39:25 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572134#comment-16572134
 ]


Erick Erickson commented on SOLR-12625:
---------------------------------------

Here's a PoC, it has nocommits etc. [~dsmiley] and [~caomanhdat2] I'm 
particularly interested in whether you think it is a good approach.

Really I hacked together something quickly to see if it makes architectural 
sense. In this version, the pattern to use this is:
{code:java}
Object fetchOptimizer = docFetcher.optimizeForFetchingTheseFields(srf);
while (more docs) {
        List<String> vals = 
docFetcher.getValsAsStrings(docFetcher.docOptimizedFetch(docId, 
fetchOptimizer), uniqFieldName);

     // do whatever you want with the returned values. Single-valued fields 
will return a 1-element list.
}
{code}
Notes:

Since SolrDocumentFetcher is reused I elected to require that the callers get 
back an object (intended to be opaque) that's currently a 
RetrieveFieldsOptimizer that then needs to be passed in to fetch the actual 
document. It's a little clumsy, but...

The names I've given the new methods are clumsy, suggestions welcome. I do want 
something to indicate that we want to get things in an optimized fashion to 
draw attention to it though.

Finally, I find the necessity of having _calling_ code try to resolve objects 
error-prone and unnecessarily complex when you just want a value, man. So I 
added
{code:java}
public List<String> getValsAsStrings(SolrDocument sdoc, String field) {
{code}
to SolrDocumentFetcher. It does leave it up to the caller to make the 
distinction between single-valued and multi-valued fields when it's important 
though.

I'll look this all over several more times before it's ready to commit of 
course, but this is the first thing that I tried that seems to work. All tests 
pass (but I haven't run precommit yet).

Speaking of tests, I've commented everything in  *TestRetrieveFieldsOptimizer* 
since this is a PoC, if I go forward with this I'll have to figure out how to 
test this. I'm not quite sure how to determine that we really do only get 
values from DV fields or only stored fields or a combination as appropriate. We 
can, of course, figure out that we get the results back in the docs, but that 
doesn't tell us where we get them _from_, stored or DV.

> Combine SolrDocumentFetcher and RetrieveFieldsOptimizer
> -------------------------------------------------------
>
>                 Key: SOLR-12625
>                 URL: https://issues.apache.org/jira/browse/SOLR-12625
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: SOLR-12625.patch
>
>
> We have SolrDocumentFetcher and RetrieveFieldsOptimizer. The
> relationship between the two is unclear at first glance. Using
> SolrDocumentFetcher by itself is (or can be) inefficient.
> WDYT about combining the two? Is there a good reason you would want to
> use SolrDocumentFetcher _instead_ of RetrieveFieldsOptimizer?
> Ideally I'd want to be able to write code like:
> solrDocumentFetcher.fillDocValuesMostEfficiently
> That created an optimizer and "did the right thing".
> Assigning to myself to keep track, but if anyone feels motivated feel free to 
> take it over.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12625) Combine SolrDocumentFetcher and RetrieveFieldsOptimizer

Reply via email to