Re: BinaryResponseWriter fetches unnecessary fields?
: Thanks Chris! Is RetrieveFieldsOptimizer a new functionality introduced in : 7.x? Our observation is with botht 5.4 & 6.4. I have created a jira for : the issue: The same basic code path (related to stored fields) probably existed largely as is in 5.x and 6.x and was then later refactored into RetrieveFieldsOptimizer where it knows about things like the useDocValuesAsStored option/optimization. -Hoss http://www.lucidworks.com/
Re: BinaryResponseWriter fetches unnecessary fields?
Thanks Chris! Is RetrieveFieldsOptimizer a new functionality introduced in 7.x? Our observation is with botht 5.4 & 6.4. I have created a jira for the issue: https://issues.apache.org/jira/browse/SOLR-11891 I am also wondering how enableLazyFieldLoading affect the case, but haven't tested yet. Please let us know if you catch anything. Thanks, Wei On Mon, Jan 22, 2018 at 3:15 PM, Chris Hostetterwrote: > > : Inside convertLuceneDocToSolrDoc(): > : > : > : https://github.com/apache/lucene-solr/blob/ > df874432b9a17b547acb24a01d3491 > : 839e6a6b69/solr/core/src/java/org/apache/solr/response/ > : DocsStreamer.java#L182 > : > : > :for (IndexableField f : doc.getFields()) > : > : > : I am a bit puzzled why we need to iterate through all the fields in the > : document. Why can’t we just iterate through the requested fields in fl? > : Specifically: > > I have a hunch here -- but i haven't verified it. > > First of all: the specific code in question that you mention assumes it > doesn't *need* to filter out the result of "doc.getFields()" basd on the > 'fl' because at the point in the processing where the DocsStreamer is > looping over the result of "doc.getFields()" the "Document" object it's > dealing with *should* only contain the specific (subset of stored) fields > requested by the fl param -- this is handled by RetrieveFieldsOptimizer & > SolrDocumentFetcher that the DocsStreamer builds up acording to the > results of ResultContext.getReturnFields() when asking the > SolrIndexSearcher to fetch the doc() > > But i think what's happening here is that because of the documentCache, > there are cases where the SolrIndexSearcher is not actaully using > a SolrDocumentStoredFieldVisitor to limit what's requested from the > IndexReader, and the resulting Document contains all fields -- which is > then compounded by code that loops over every field. > > At a quick glance, I'm a little fuzzy on how exactly > enableLazyFieldLoading may/may-not be affecting things here, but either > way I think you are correct -- we can/should make this overall stack of > code smarter about looping over fields we know we want, vs looping over > all fields in the doc. > > Can you please file a jira for this? > > > -Hoss > http://www.lucidworks.com/
Re: BinaryResponseWriter fetches unnecessary fields?
: Inside convertLuceneDocToSolrDoc(): : : : https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 : 839e6a6b69/solr/core/src/java/org/apache/solr/response/ : DocsStreamer.java#L182 : : :for (IndexableField f : doc.getFields()) : : : I am a bit puzzled why we need to iterate through all the fields in the : document. Why can’t we just iterate through the requested fields in fl? : Specifically: I have a hunch here -- but i haven't verified it. First of all: the specific code in question that you mention assumes it doesn't *need* to filter out the result of "doc.getFields()" basd on the 'fl' because at the point in the processing where the DocsStreamer is looping over the result of "doc.getFields()" the "Document" object it's dealing with *should* only contain the specific (subset of stored) fields requested by the fl param -- this is handled by RetrieveFieldsOptimizer & SolrDocumentFetcher that the DocsStreamer builds up acording to the results of ResultContext.getReturnFields() when asking the SolrIndexSearcher to fetch the doc() But i think what's happening here is that because of the documentCache, there are cases where the SolrIndexSearcher is not actaully using a SolrDocumentStoredFieldVisitor to limit what's requested from the IndexReader, and the resulting Document contains all fields -- which is then compounded by code that loops over every field. At a quick glance, I'm a little fuzzy on how exactly enableLazyFieldLoading may/may-not be affecting things here, but either way I think you are correct -- we can/should make this overall stack of code smarter about looping over fields we know we want, vs looping over all fields in the doc. Can you please file a jira for this? -Hoss http://www.lucidworks.com/
BinaryResponseWriter fetches unnecessary fields?
Hi all, We observe that solr query time increases significantly with the number of rows requested, even all we retrieve for each document is just fl=id,score. Debugged a bit and see that most of the increased time was spent in BinaryResponseWriter, converting lucene document into SolrDocument. Inside convertLuceneDocToSolrDoc(): https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 839e6a6b69/solr/core/src/java/org/apache/solr/response/ DocsStreamer.java#L182 for (IndexableField f : doc.getFields()) I am a bit puzzled why we need to iterate through all the fields in the document. Why can’t we just iterate through the requested fields in fl? Specifically: https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491 839e6a6b69/solr/core/src/java/org/apache/solr/response/ DocsStreamer.java#L156 if we change sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema()) to sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames) and just iterate through fnames in convertLuceneDocToSolrDoc(), there is a significant performance boost in our case, the query time increase from rows=128 vs rows=500 is much smaller. Am I missing something here? Thanks, Wei