[
https://issues.apache.org/jira/browse/SOLR-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572134#comment-16572134
]
Erick Erickson commented on SOLR-12625:
---------------------------------------
Here's a PoC, it has nocommits etc. [~dsmiley] and [~caomanhdat2] I'm
particularly interested in whether you think it is a good approach.
Really I hacked together something quickly to see if it makes architectural
sense. In this version, the pattern to use this is:
{code:java}
Object fetchOptimizer = docFetcher.optimizeForFetchingTheseFields(srf);
while (more docs) {
List<String> vals =
docFetcher.getValsAsStrings(docFetcher.docOptimizedFetch(docId,
fetchOptimizer), uniqFieldName);
// do whatever you want with the returned values. Single-valued fields
will return a 1-element list.
}
{code}
Notes:
Since SolrDocumentFetcher is reused I elected to require that the callers get
back an object (intended to be opaque) that's currently a
RetrieveFieldsOptimizer that then needs to be passed in to fetch the actual
document. It's a little clumsy, but...
The names I've given the new methods are clumsy, suggestions welcome. I do want
something to indicate that we want to get things in an optimized fashion to
draw attention to it though.
Finally, I find the necessity of having _calling_ code try to resolve objects
error-prone and unnecessarily complex when you just want a value, man. So I
added
{code:java}
public List<String> getValsAsStrings(SolrDocument sdoc, String field) {
{code}
to SolrDocumentFetcher. It does leave it up to the caller to make the
distinction between single-valued and multi-valued fields when it's important
though.
I'll look this all over several more times before it's ready to commit of
course, but this is the first thing that I tried that seems to work. All tests
pass (but I haven't run precommit yet).
Speaking of tests, I've commented everything inĀ *TestRetrieveFieldsOptimizer*
since this is a PoC, if I go forward with this I'll have to figure out how to
test this. I'm not quite sure how to determine that we really do only get
values from DV fields or only stored fields or a combination as appropriate. We
can, of course, figure out that we get the results back in the docs, but that
doesn't tell us where we get them _from_, stored or DV.
> Combine SolrDocumentFetcher and RetrieveFieldsOptimizer
> -------------------------------------------------------
>
> Key: SOLR-12625
> URL: https://issues.apache.org/jira/browse/SOLR-12625
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Major
> Attachments: SOLR-12625.patch
>
>
> We have SolrDocumentFetcher and RetrieveFieldsOptimizer. The
> relationship between the two is unclear at first glance. Using
> SolrDocumentFetcher by itself is (or can be) inefficient.
> WDYT about combining the two? Is there a good reason you would want to
> use SolrDocumentFetcher _instead_ of RetrieveFieldsOptimizer?
> Ideally I'd want to be able to write code like:
> solrDocumentFetcher.fillDocValuesMostEfficiently
> That created an optimizer and "did the right thing".
> Assigning to myself to keep track, but if anyone feels motivated feel free to
> take it over.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]