[ https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338342#comment-16338342 ]
Hoss Man commented on SOLR-11891: --------------------------------- {quote}convertLuceneDocToSolrDoc should be innocent enough; it merely iterates the IndexableField instances in a collection and potentially calls name() on them which is always a simple getter. ... {quote} That iteration over all fields (real and lazy) is in and of itself the problem being reported – if the docs contain 1000 stored fields, but only 1 is requested in the 'fl' then the current code is looping over all 1000 fields in every doc even though it knows exactly which fields it needs – even if there are no disk reads involved for some lazy fields, it's still a waasteful iteration that's multiplicitive of the # of fields in the docs and the number of docs in the response, regardless of how small the fl is. {quote}...We can't avoid putting all fields on the SolrDocument because of the potential for a document transformer to need it – and that's not known by RetrieveFieldsOptimizer. {quote} IIUC we *can* avoid it and RetrieveFieldsOptimizer *does* know that – as i mentioned in my response to mk: that's the entire point of {{DocTransformer.getExtraRequestFields()}} (see the javadocs) which is used to build up the list returned by {{SolrReturnFields.getLuceneFieldNames()}} {quote}But it should be innocent enough because if no such transformer requests the value, then it shouldn't actually be loaded (it's lazy). {quote} Even if the {{Document}} field values are lazy, the existing code that loops over all of them is still building up the {{SolrDocument}} that contains all of those (lazy) fields – wasting time and a small amount of space (and that assumes they are all lazy: it's an option, it may not be on for some people – if/when they're not lazy then that takes up even more time & space reading them from disk) ---- I think the ideal "fix" is that the {{SolrReturnFields.getLuceneFieldNames()}} should get passed down all the way into {{convertLuceneDocToSolrDoc}} (or something we refactor it into) such that we do an runtime check of which list is smaller: {{SolrReturnFields.getLuceneFieldNames()}} or {{Document.getFields()}} – and then loop over that (smallest) list. Regardless of what changes we make: we should have a whitebox test of {{convertLuceneDocToSolrDoc}} (or something we refactor it into) confirming that: * the resulting {{SolrDocument}} doesn't contain *any* fields that aren't needed * some explicitly un-requested "lazy" IndexableFields in the input Document must still be "lazy" (ie: not "actuallized") when the method returns (ie: that we didn't do a disk read we didn't need) > BinaryResponseWriter fetches unnecessary fields > ----------------------------------------------- > > Key: SOLR-11891 > URL: https://issues.apache.org/jira/browse/SOLR-11891 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Response Writers > Affects Versions: 5.4, 6.4.2, 6.6.2 > Reporter: wei wang > Priority: Major > > We observe that solr query time increases significantly with the number of > rows requested, even all we retrieve for each document is just fl=id,score. > Debugged a bit and see that most of the increased time was spent in > BinaryResponseWriter, converting lucene document into SolrDocument. Inside > convertLuceneDocToSolrDoc(): > [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182] > > I am a bit puzzled why we need to iterate through all the fields in the > document. Why can’t we just iterate through the requested field list? > [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156] > > e.g. when pass in the field list as > sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames) > and just iterate through fnames, there is a significant performance boost in > our case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org