does this also applicable for the hjoin?
Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Joel Bernstein updated SOLR-4787: > --------------------------------- > > Attachment: SOLR-4787.patch > > Resolved a memory leak when the bjoin is used with cache autowarming. > > > Join Contrib > > ------------ > > > > Key: SOLR-4787 > > URL: https://issues.apache.org/jira/browse/SOLR-4787 > > Project: Solr > > Issue Type: New Feature > > Components: search > > Affects Versions: 4.2.1 > > Reporter: Joel Bernstein > > Priority: Minor > > Fix For: 4.7 > > > > Attachments: SOLR-4787-deadlock-fix.patch, > SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, > SOLR-4797-hjoin-multivaluekeys-trunk.patch > > > > > > This contrib provides a place where different join implementations can > be contributed to Solr. This contrib currently includes 3 join > implementations. The initial patch was generated from the Solr 4.3 tag. > Because of changes in the FieldCache API this patch will only build with > Solr 4.2 or above. > > *HashSetJoinQParserPlugin aka hjoin* > > The hjoin provides a join implementation that filters results in one > core based on the results of a search in another core. This is similar in > functionality to the JoinQParserPlugin but the implementation differs in a > couple of important ways. > > The first way is that the hjoin is designed to work with int and long > join keys only. So, in order to use hjoin, int or long join keys must be > included in both the to and from core. > > The second difference is that the hjoin builds memory structures that > are used to quickly connect the join keys. So, the hjoin will need more > memory then the JoinQParserPlugin to perform the join. > > The main advantage of the hjoin is that it can scale to join millions of > keys between cores and provide sub-second response time. The hjoin should > work well with up to two million results from the fromIndex and tens of > millions of results from the main query. > > The hjoin supports the following features: > > 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 > will turn on the PostFilter. The PostFilter will typically outperform the > Lucene query when the main query results have been narrowed down. > > 2) With the lucene query implementation there is an option to build the > filter with threads. This can greatly improve the performance of the query > if the main query index is very large. The "threads" parameter turns on > threading. For example *threads=6* will use 6 threads to build the filter. > This will setup a fixed threadpool with six threads to handle all hjoin > requests. Once the threadpool is created the hjoin will always use it to > build the filter. Threading does not come into play with the PostFilter. > > 3) The *size* local parameter can be used to set the initial size of the > hashset used to perform the join. If this is set above the number of > results from the fromIndex then the you can avoid hashset resizing which > improves performance. > > 4) Nested filter queries. The local parameter "fq" can be used to nest a > filter query within the join. The nested fq will filter the results of the > join query. This can point to another join to support nested joins. > > 5) Full caching support for the lucene query implementation. The > filterCache and queryResultCache should work properly even with deep > nesting of joins. Only the queryResultCache comes into play with the > PostFilter implementation because PostFilters are not cacheable in the > filterCache. > > The syntax of the hjoin is similar to the JoinQParserPlugin except that > the plugin is referenced by the string "hjoin" rather then "join". > > fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 > fq=$qq\}user:customer1&qq=group:5 > > The example filter query above will search the fromIndex (collection2) > for "user:customer1" applying the local fq parameter to filter the results. > The lucene filter query will be built using 6 threads. This query will > generate a list of values from the "from" field that will be used to filter > the main query. Only records from the main query, where the "to" field is > present in the "from" list will be included in the results. > > The solrconfig.xml in the main query core must contain the reference to > the hjoin. > > <queryParser name="hjoin" > class="org.apache.solr.joins.HashSetJoinQParserPlugin"/> > > And the join contrib lib jars must be registed in the solrconfig.xml. > > <lib dir="../../../contrib/joins/lib" regex=".*\.jar" /> > > After issuing the "ant dist" command from inside the solr directory the > joins contrib jar will appear in the solr/dist directory. Place the the > solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr > webapplication. This will ensure that the top level Solr classloader loads > these classes rather then the core's classloaded. > > *BitSetJoinQParserPlugin aka bjoin* > > The bjoin behaves exactly like the hjoin but uses a BitSet instead of a > HashSet to perform the underlying join. Because of this the bjoin is much > faster and can provide sub-second response times on result sets of tens of > millions of records from the fromIndex and hundreds of millions of records > from the main query. > > But there are limitations to how the bjoin can be used. The bjoin treats > the join keys as addresses in a BitSet and uses the Lucene OpenBitSet > implementation which performs very well but is not sparse. So the BitSet > memory is dictated by the size of the join keys. For example a bitset with > a max join key of 200,000,000 will need 25 MB of memory. For this reason > the BitSet join does not support long join keys. In order to keep memory > usage down the join keys should also be packed at the low end, for example > from 1 to 50,000,000. > > Below is a sampe bjoin: > > fq=\{!bjoin fromIndex=collection2 from=id_i to=id_i threads=6 > fq=$qq\}user:customer1&qq=group:5 > > To register the bjoin the solrconfig.xml in the main query core must > contain the reference to the bjoin. > > <queryParser name="bjoin" > class="org.apache.solr.joins.BitSetJoinQParserPlugin"/> > > *ValueSourceJoinParserPlugin aka vjoin* > > The second implementation is the ValueSourceJoinParserPlugin aka > "vjoin". This implements a ValueSource function query that can return a > value from a second core based on join keys and limiting query. The > limiting query can be used to select a specific subset of data from the > join core. This allows customer specific relevance data to be stored in a > separate core and then joined in the main query. > > The vjoin is called using the "vjoin" function query. For example: > > bf=vjoin(fromCore, fromKey, fromVal, toKey, query) > > This example shows "vjoin" being called by the edismax boost function > parameter. This example will return the "fromVal" from the "fromCore". The > "fromKey" and "toKey" are used to link the records from the main query to > the records in the "fromCore". The "query" is used to select a specific set > of records to join with in fromCore. > > Currently the fromKey and toKey must be longs but this will change in > future versions. Like the pjoin, the "join" SolrCache is used to hold the > join memory structures. > > To configure the vjoin you must register the ValueSource plugin in the > solrconfig.xml as follows: > > <valueSourceParser name="vjoin" > class="org.apache.solr.joins.ValueSourceJoinParserPlugin" /> > > > > -- > This message was sent by Atlassian JIRA > (v6.1.5#6160) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >