Varun Thacker created SOLR-9978: ----------------------------------- Summary: Reduce collapse query memory usage Key: SOLR-9978 URL: https://issues.apache.org/jira/browse/SOLR-9978 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Varun Thacker Assignee: Varun Thacker
- Single shard test with one replica - 10M documents and 9M of those documents are unique. Test was for string - Collapse query parser creates two arrays : - int array for unique documents ( 9M in this case ) - float array for the corresponding scores ( 9M in this case ) - It goes through all documents and puts the document in the array if the score is better than the previously existing score. - So collapse creates a lot of garbage when the total number of documents is high and the duplicates is very less - Even for a query like this {{q={!cache=false}*:*&fq={!collapse field=collapseField_s cache=false}&sort=id desc}} which has a top level sort , the collapse query parser creates the score array and scores every document Indexing script used to generate dummy data: {code} //Index 10M documents , with every 1/10 document as a duplicate. List<SolrInputDocument> docs = new ArrayList<>(1000); for(int i=0; i<1000*1000*10; i++) { SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", i); if (i%10 ==0 && i!=0) { doc.addField("collapseField_s", i-1); } else { doc.addField("collapseField_s", i); } docs.add(doc); if (docs.size() == 1000) { client.add("ct", docs); docs.clear(); } } client.commit("ct"); {code} Query: {{q={!cache=false}*:*&fq={!collapse field=collapseField_s cache=false}&sort=id desc}} Improvements - We currently default to the SCORE implementation if no min|max|sort param is provided in the collapse query. Check if a global sort is provided and don't score documents picking the first occurrence of each unique value. - Instead of creating an array for unique documents use a bitset -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org