[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067566#comment-15067566 ]
Erick Erickson commented on SOLR-8220: -------------------------------------- WARNING: I'm stealing this code and back-porting to 4.x for my own purposes so this may not pertain to 5x. And I'm not very up on the low-level details. But this loop for reading multiValued fields puts _all_ the multivalued fields for _all_ the docs on the shard into each doc: {code} if (values != null && DocValues.getDocsWithField(atomicReader, fieldName).get(docid)) { values.setDocument(docid); if (values.getValueCount() > 0) { List<Object> outValues = new LinkedList<Object>(); for (int i = 0; i < values.getValueCount(); i++) { // Iterates more than just this doc, I think all of them! {code} I had more luck with {code} if (values != null) { values.setDocument(docid); List<Object> outValues = new LinkedList<Object>(); for (int ord = (int) values.nextOrd(); ord != SortedSetDocValues.NO_MORE_ORDS; ord = (int) values.nextOrd()) { {code} Note that I also think this is unnecessary in the if test, the loop above doesn't do anything bad if there are docs with empty fields: {code} DocValues.getDocsWithField(atomicReader, fieldName).get(docid) {code} I changed it to just if (values != null). But I did have to test outValues.size() > 0 before doing the addField after the loop or I got empty braces in the output doc. Again let me emphasize that 1> I don't know this code well, so take this with a grain of salt 2> I needed this for a one-off on the 4.x code line and this may work with 5x just fine as-is. Needless to say what I'm doing will never make into the official project.... But this saved me a TON of work, glad you're tackling this! > Read field from docValues for non stored fields > ----------------------------------------------- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement > Reporter: Keith Laban > Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" > -- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org