Joel Bernstein created SOLR-4787:
------------------------------------
Summary: Join Contrib
Key: SOLR-4787
URL: https://issues.apache.org/jira/browse/SOLR-4787
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
Fix For: 4.2.1
This contrib provides a place where different join implementations can be
contributed to Solr. This contrib currently includes 2 join implementations.
PostFilterQParserPlugin aka "pjoin"
The pjoin provides a join implementation that filters results in one core based
on the results of a search in another core. This is similar to the join
implementation in the JoinQParserPlugin but differs in a couple of important
ways.
The first way is that the pjoin is designed to work with integer join keys
only. So, in order to use pjoin, integer join keys must be included in both the
to and from core.
The second difference is that the pjoin builds memory structures that can be
used to quickly connect the join keys. It also uses a custom SolrCache named
"join" to hold intermediate DocSets which are needed to build the join memory
structures. So, the pjoin will need more memory then the JoinQParserPlugin to
perform the join.
The main advantage of the pjoin is that it can scale to join millions of keys
between cores.
Because it's a PostFilter, it only needs to join records that match the main
Query.
The syntax of the pjoin is the same as the JoinQParserPlugin except that the
plugin is referenced by the string "pjoin" rather then "join".
fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1
The example filter query above will search the fromCore (collection2) for
"user:customer1". This query will generate a list of values from the "from"
field that will be used to filter main query. Only records from the main query,
where the "to" field is present in the "from" list will be included in the
results.
The solrconfig.xml in the main query core must contain the reference to the
pjoin.
<queryParser name="pjoin"
class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
And the join contrib jars must be registed in the solrconfig.xml.
<lib dir="../../../dist/" regex="solr-joins-\d.*\.jar" />
The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
<cache name="join"
class="solr.LRUCache"
size="4096"
initialSize="1024"
/>
JoinValueSourceParserPlugin aka vjoin
The second implementation is the JoinValueSourceParserPlugin aka "vjoin". This
implements a ValueSource function query that can return values from a second
core based on join keys. This allows relevance data to be stored in a separate
core and then joined in the main query.
The vjoin is called using the "vjoin" function query. For example:
bf=vjoin(fromCore, fromKey, fromVal, toKey)
This example shows "vjoin" being called by the edismax boost function
parameter. This example will return the "fromVal" from the "fromCore". The
"fromKey" and "toKey" are used to link the records from the main query to the
records in the "fromCore".
As with the "pjoin", both the fromKey and toKey must be integers. Also like
the pjoin, the "join" SolrCache is used to hold the join memory structures.
To configure the vjoin you must register the ValueSource plugin in the
solrconfig.xml as follows:
<valueSourceParser name="vjoin"
class="org.apache.solr.joins.JoinValueSourceParserPlugin" />
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]