[ https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Isaac Hebsh closed SOLR-5611. ----------------------------- Resolution: Not A Problem Oops. I missed the {{shards.rows}} parameter. > When documents are uniformly distributed over shards, enable returning > approximated results in distributed query > ---------------------------------------------------------------------------------------------------------------- > > Key: SOLR-5611 > URL: https://issues.apache.org/jira/browse/SOLR-5611 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Reporter: Isaac Hebsh > Labels: distributed_search, shard, solrcloud > Fix For: 4.7 > > > Query with rows=1000, which sent to a collection of 100 shards (shard key > behaviour is default - based on hash of the unique key), will generate 100 > requests of rows=1000, on each shard. > This results to total number of rows*numShards unique keys to be retrieved. > This behaviour is getting worst as numShards grows. > If the documents are uniformly distributed over the shards, the expected > number of document should be ~ rows/numShards. Obviously, there might be > extreme cases, when all of the top X documents are in a specific shard. > I suggest adding an optional parameter, say approxResults=true, which decides > whether we should limit the rows in the shard requests to rows/numShardsor > not. Moreover, we can add a numeric parameter which increases the limit, to > be more accurate. > For example, the query {{approxResults=true&approxResults.factor=1.5}} will > retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and > rows=1000, each shard will return 15 documents. > Furthermore, this can reduce the problem of deep paging, because the same > thing can be applied there. when requested start=100000, Solr creating shard > request with start=0 and rows=START+ROWS. In the approximated approach, start > parameter (in the shard requests) can be set to 100000/numShards. The idea of > the approxResults.factor creates some difficulties here, though. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org