[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Ishan Chattopadhyaya (JIRA) Wed, 18 Nov 2015 10:13:36 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011570#comment-15011570
 ]


Ishan Chattopadhyaya commented on SOLR-8220:
--------------------------------------------

Thanks for the review, Keith.

bq. 1. [...] This could potentially be very expensive to compute for every 
singe field for ever single document and also add unnecessary GC pressure by 
creating new HashSet for all the fields for every single document.

I was aware of this, and wanted to fix this as part of the "cleanup / 
refactoring" I promised.

bq. {{doc.getField(fieldName)==null}} the doc fields are a list so this will be 
O( n ) for each lookup.

I used that to ensure we're not re-adding unstored docvalues a second time to 
the same document. This is necessary here so that we don't re-add such fields 
to a document was obtained from the documentCache and already has all unstored 
docvalues in it. I can create a set of fields inside the {{StoredDocument}} 
class so that a hasField lookup can be speeded up. However, given that it is a 
Lucene class, I have left this be. Any suggestions?

bq. 3) Re multivalued fields: doing introspection for every single value for 
field for every document is not fast.

I think it shouldn't be a problem. In modern JVMs, the {{instanceof}} has 
negligible cost. However, I will do it once per multivalued field in my next 
patch.

bq. 4) {{SchemaField schemaField = schema.getField(fieldName);}} this throws an 
exception if the field name is not in the schema (think typos in FL)
If it is a dynamic field, it will still work; a wrong field name won't work 
here. Shouldn't a wrong field name throw an exception, rather than silently 
dropping it? I am split either ways.

bq. This creates a whole bunch of new objects which could be slow and cause a 
lot of GC pressure, although it may not be an issue.
I think this creates at most only the value source object, which isn't too bad. 
Internally, it uses the docvalues API.

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Reply via email to