[ https://issues.apache.org/jira/browse/SOLR-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167034#comment-15167034 ]
Tom Winch commented on SOLR-8234: --------------------------------- That would be another approach, I guess. You'd still have to write the (custom) merge code, and the approach of this JIRA means you get back SOLR results as per usual, and it's a plugin that makes use of the existing distributed search mechanisms for requesting the top N unique ids from each server and merge-ranking them etc. > Federated Search (new) - DJoin > ------------------------------ > > Key: SOLR-8234 > URL: https://issues.apache.org/jira/browse/SOLR-8234 > Project: Solr > Issue Type: New Feature > Reporter: Tom Winch > Priority: Minor > Labels: federated_search > Fix For: 4.10.3 > > Attachments: SOLR-8234.patch, SOLR-8234.patch, SOLR-8234.patch > > > This issue describes a MergeStrategy implementation (DJoin) to facilitate > federated search - that is, distributed search over documents stored in > separated instances of SOLR (for example, one server per continent), where a > single document (identified by an agreed, common unique id) may be stored in > more than one server instance, with (possibly) differing fields and data. > When the MergeStrategy is used in a request handler (via the included > QParser) in combination with distributed search (shards=), documents having > an id that has already been seen are not discarded (as per the default > behaviour) but, instead, are collected and returned as a group of documents > all with the same id taking a single position in the result set (this is > implemented using parent/child documents, with an indicator field in the > parent - see example output, below). > Documents are sorted in the result set based on the highest ranking document > with the same id. It is possible for a document ranking high in one shard to > rank very low on another shard. As a consequence of this, all shards must be > asked to return the fields for every document id in the result set (not just > of those documents they returned), so that all the component parts of each > document in the search result set are returned. > As usual, search parameters are passed on to each shard. So that the shards > do not need any additional configurations in their definition of the /select > request handler, we use the FilterQParserSearchComponent which is configured > to filter out the \{!djoin\} search parameter - otherwise, the target request > handler complains about the missing query parser definition. See the example > config, below. > This issue combines with others to provide full federated search support. See > also SOLR-8235 and SOLR-8236. > Note that this is part of a new implementation of federated search as opposed > to the older issues SOLR-3799 through SOLR-3805. > -- > Example request handler configuration: > {code:xml} > <searchComponent name="filter" > class="org.apache.solr.search.federated.FilterDJoinQParserSearchComponent" /> > > <queryParser name="djoin" > class="org.apache.solr.search.federated.DJoinQParserPlugin" /> > <requestHandler name="djoin" class="solr.SearchHandler"> > <lst name="defaults"> > <str > name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core</str> > <bool name="shards.tolerant">true</bool> > <str name="rq">{!djoin}</str> > </lst> > <arr name="last-components"> > <str>filter</str> > </arr> > </requestHandler> > {code} > Example output: > {code:xml} > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">33</int> > <lst name="params"> > <str name="q">*:*</str> > <str > name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core</str> > <str name="shards.tolerant">true</str> > <str name="wt">xml</str> > <str name="rq">{!djoin}</str> > <str name="fl">*,[shard]</str> > </lst> > </lst> > <result name="response" numFound="5" start="0" maxScore="1.0"> > <doc> > <bool name="__merge_parent__">true</bool> > <doc> > <int name="id">200</int> > <int name="value">1973</int> > <str name="[shard]">http://shard2/solr/core</str> > <long name="_version_">1515645309629235200</long> > </doc> > <doc> > <int name="id">200</int> > <int name="value">2015</int> > <str name="[shard]">http://shard1/solr/core</str> > <long name="_version_">1515645309682712576</long> > </doc> > </doc> > <doc> > <bool name="__merge_parent__">true</bool> > <doc> > <int name="id">100</int> > <int name="value">873</int> > <str name="[shard]">http://shard1/solr/core</str> > <long name="_version_">1515645309629254124</long> > </doc> > <doc> > <int name="id">100</int> > <int name="value">2001</int> > <str name="[shard]">http://shard3/solr/core</str> > <long name="_version_">1515645309682792852</long> > </doc> > </doc> > <doc> > <bool name="__merge_parent__">true</bool> > <doc> > <int name="id">300</int> > <int name="value">1492</int> > <str name="[shard]">http://shard2/solr/core</str> > <long name="_version_">1515645309629251252</long> > </doc> > </doc> > </result> > </response> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org