[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

Andy Laird (JIRA) Wed, 13 Jun 2012 23:44:47 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294837#comment-13294837
 ]


Andy Laird commented on SOLR-2592:
----------------------------------

I have tried out Michael's patch and would like to provide some feedback to the 
community.  We are using a very-recent build from the 4x branch but I grabbed 
this patch from trunk and tried it out anyway...

Our needs were driven by the fact that, currently, the counts returned when 
using field collapse are only accurate when the documents getting collapsed 
together are all on the same shard (see comments for 
https://issues.apache.org/jira/browse/SOLR-2066).  For our case we collapse on 
a field, xyz, so we need to ensure that all documents with the same value for 
xyz are on the same shard (overall distribution is not a problem here) if we 
want counting to work.

I grabbed the latest patch (dbq_fix.patch) in hopes of finding a solution to 
our problem.  The great news is that Michael's patch worked like a charm for 
what we needed -- thank you kindly, Michael, for this effort!  The not-so-good 
news is that for our particular issue we needed a way to get at data other than 
the uniqueKey (the only data available with ShardKeyParser) -- in our case we 
need access to the xyz field data.  Since this implementation provides nothing 
but uniqueKey we had to encode the xyz data in our uniqueKey (e.g. newUniqueKey 
= what-used-to-be-our-uniqueKey + xyz), which is certainly less-than-ideal and 
adds unsavory coupling.

Nonetheless, as a fix to a last-minute gotcha (our counts with field collapse 
need to be accurate in a multi-shard environment) I was happily surprised at 
how easy it was to find a solution to our particular problem with this patch.  
I would definitely like to see a second iteration that incorporates the ability 
to get at other document data, then you could do whatever you want by looking 
at dates and other fields, etc. though I understand that that probably goes 
quite a bit deeper in the codebase, especially with distributed search.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud

Reply via email to