Joel Bernstein created SOLR-4787:
------------------------------------

             Summary: Join Contrib
                 Key: SOLR-4787
                 URL: https://issues.apache.org/jira/browse/SOLR-4787
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 4.2.1
            Reporter: Joel Bernstein
            Priority: Minor
             Fix For: 4.2.1



This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

PostFilterQParserPlugin aka "pjoin"

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
"join" to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string "pjoin" rather then "join".

fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1

The example filter query above will search the fromCore (collection2) for 
"user:customer1". This query will generate a list of values from the "from" 
field that will be used to filter main query. Only records from the main query, 
where the "to" field is present in the "from" list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

<queryParser name="pjoin" 
class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>

And the join contrib jars must be registed in the solrconfig.xml.

<lib dir="../../../dist/" regex="solr-joins-\d.*\.jar" />

The solrconfig.xml in the fromcore must have the "join" SolrCache configured.

 <cache name="join"
              class="solr.LRUCache"
              size="4096"
              initialSize="1024"
              />


JoinValueSourceParserPlugin aka vjoin

The second implementation is the JoinValueSourceParserPlugin aka "vjoin". This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the "vjoin" function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows "vjoin" being called by the edismax boost function 
parameter. This example will return the "fromVal" from the "fromCore". The 
"fromKey" and "toKey" are used to link the records from the main query to the 
records in the "fromCore".

As with the "pjoin", both the fromKey and toKey must be integers. Also like
the pjoin, the "join" SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 <valueSourceParser name="vjoin" 
class="org.apache.solr.joins.JoinValueSourceParserPlugin" />







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to