[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Erick Erickson (JIRA) Mon, 21 Dec 2015 21:16:47 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067566#comment-15067566
 ]


Erick Erickson commented on SOLR-8220:
--------------------------------------

WARNING: I'm stealing this code and back-porting to 4.x for my own purposes so 
this may not pertain to 5x. And I'm not very up on the low-level details.

But this loop for reading multiValued fields puts _all_ the multivalued fields 
for _all_ the docs on the shard into each doc:

{code}
if (values != null && DocValues.getDocsWithField(atomicReader, 
fieldName).get(docid)) {
    values.setDocument(docid);
    if (values.getValueCount() > 0) {
              List<Object> outValues = new LinkedList<Object>();
              for (int i = 0; i < values.getValueCount(); i++) { // Iterates 
more than just this doc, I think all of them!
        
{code}

I had more luck with 
{code}
          if (values != null) {
            values.setDocument(docid);
            List<Object> outValues = new LinkedList<Object>();
            for (int ord = (int) values.nextOrd(); ord != 
SortedSetDocValues.NO_MORE_ORDS; ord = (int) values.nextOrd()) {
{code}

Note that I also think this is unnecessary in the if test, the loop above 
doesn't do anything bad if there are docs with empty fields: 
{code}
DocValues.getDocsWithField(atomicReader, fieldName).get(docid)
{code}

I changed it to just if (values != null). But I did have to test 
outValues.size() > 0  before doing the addField after the loop or I got empty 
braces in the output doc.

Again let me emphasize that 
1> I don't know this code well, so take this with a grain of salt
2> I needed this for a one-off on the 4.x code line and this may work with 5x 
just fine as-is. Needless to say what I'm doing will never make into the 
official project....

But this saved me a TON of work, glad you're tackling this!

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Reply via email to