[ 
https://issues.apache.org/jira/browse/SOLR-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isaac Hebsh closed SOLR-5611.
-----------------------------

    Resolution: Not A Problem

Oops. I missed the {{shards.rows}} parameter.

> When documents are uniformly distributed over shards, enable returning 
> approximated results in distributed query
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5611
>                 URL: https://issues.apache.org/jira/browse/SOLR-5611
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Isaac Hebsh
>              Labels: distributed_search, shard, solrcloud
>             Fix For: 4.7
>
>
> Query with rows=1000, which sent to a collection of 100 shards (shard key 
> behaviour is default - based on hash of the unique key), will generate 100 
> requests of rows=1000, on each shard.
> This results to total number of rows*numShards unique keys to be retrieved. 
> This behaviour is getting worst as numShards grows.
> If the documents are uniformly distributed over the shards, the expected 
> number of document should be ~ rows/numShards. Obviously, there might be 
> extreme cases, when all of the top X documents are in a specific shard.
> I suggest adding an optional parameter, say approxResults=true, which decides 
> whether we should limit the rows in the shard requests to rows/numShardsor 
> not. Moreover, we can add a numeric parameter which increases the limit, to 
> be more accurate.
> For example, the query {{approxResults=true&approxResults.factor=1.5}} will 
> retrieve 1.5*rows/numShards from each shard. In the case of 100 shards and 
> rows=1000, each shard will return 15 documents.
> Furthermore, this can reduce the problem of deep paging, because the same 
> thing can be applied there. when requested start=100000, Solr creating shard 
> request with start=0 and rows=START+ROWS. In the approximated approach, start 
> parameter (in the shard requests) can be set to 100000/numShards. The idea of 
> the approxResults.factor creates some difficulties here, though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to