[jira] [Commented] (SOLR-4787) Join Contrib

Joel Bernstein (JIRA) Tue, 23 Jul 2013 05:22:06 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716342#comment-13716342
 ]


Joel Bernstein commented on SOLR-4787:
--------------------------------------

Kranti,

Let me know how the pjoin is performing for you. I'm going to be testing out 
some different data structures for the pjoin to see if I can get better 
performance.
                
> Join Contrib
> ------------
>
>                 Key: SOLR-4787
>                 URL: https://issues.apache.org/jira/browse/SOLR-4787
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 4.2.1
>            Reporter: Joel Bernstein
>            Priority: Minor
>             Fix For: 4.4
>
>         Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
> <queryParser name="pjoin" 
> class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> <lib dir="../../../dist/" regex="solr-joins-\d.*\.jar" />
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
>  <cache name="join"
>               class="solr.LRUCache"
>               size="4096"
>               initialSize="1024"
>               />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
> <valueSourceParser name="vjoin" 
> class="org.apache.solr.joins.ValueSourceJoinParserPlugin" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib

Reply via email to