Thanks Joel. I shall look into that. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa
On Mon, Jan 27, 2014 at 10:19 AM, Joel Bernstein <joels...@gmail.com> wrote: > Kranti, > > The memory leak in the bjoin dealt with the multi-value field joins. > Specifically how the new UninvertedIntField cache was used in the bjoin. In > a quick review of the hjoin I'm not seeing the same issue but it would be > good to confirm through testing. > > Joel > > Joel Bernstein > Search Engineer at Heliosearch > > > On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa > <kranti.par...@gmail.com>wrote: > >> does this also applicable for the hjoin? >> >> >> Thanks, >> Kranti K. Parisa >> http://www.linkedin.com/in/krantiparisa >> >> >> >> On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) >> <j...@apache.org>wrote: >> >>> >>> [ >>> https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] >>> >>> Joel Bernstein updated SOLR-4787: >>> --------------------------------- >>> >>> Attachment: SOLR-4787.patch >>> >>> Resolved a memory leak when the bjoin is used with cache autowarming. >>> >>> > Join Contrib >>> > ------------ >>> > >>> > Key: SOLR-4787 >>> > URL: https://issues.apache.org/jira/browse/SOLR-4787 >>> > Project: Solr >>> > Issue Type: New Feature >>> > Components: search >>> > Affects Versions: 4.2.1 >>> > Reporter: Joel Bernstein >>> > Priority: Minor >>> > Fix For: 4.7 >>> > >>> > Attachments: SOLR-4787-deadlock-fix.patch, >>> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, >>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, >>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, >>> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, >>> SOLR-4797-hjoin-multivaluekeys-trunk.patch >>> > >>> > >>> > This contrib provides a place where different join implementations can >>> be contributed to Solr. This contrib currently includes 3 join >>> implementations. The initial patch was generated from the Solr 4.3 tag. >>> Because of changes in the FieldCache API this patch will only build with >>> Solr 4.2 or above. >>> > *HashSetJoinQParserPlugin aka hjoin* >>> > The hjoin provides a join implementation that filters results in one >>> core based on the results of a search in another core. This is similar in >>> functionality to the JoinQParserPlugin but the implementation differs in a >>> couple of important ways. >>> > The first way is that the hjoin is designed to work with int and long >>> join keys only. So, in order to use hjoin, int or long join keys must be >>> included in both the to and from core. >>> > The second difference is that the hjoin builds memory structures that >>> are used to quickly connect the join keys. So, the hjoin will need more >>> memory then the JoinQParserPlugin to perform the join. >>> > The main advantage of the hjoin is that it can scale to join millions >>> of keys between cores and provide sub-second response time. The hjoin >>> should work well with up to two million results from the fromIndex and tens >>> of millions of results from the main query. >>> > The hjoin supports the following features: >>> > 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 >>> will turn on the PostFilter. The PostFilter will typically outperform the >>> Lucene query when the main query results have been narrowed down. >>> > 2) With the lucene query implementation there is an option to build >>> the filter with threads. This can greatly improve the performance of the >>> query if the main query index is very large. The "threads" parameter turns >>> on threading. For example *threads=6* will use 6 threads to build the >>> filter. This will setup a fixed threadpool with six threads to handle all >>> hjoin requests. Once the threadpool is created the hjoin will always use it >>> to build the filter. Threading does not come into play with the PostFilter. >>> > 3) The *size* local parameter can be used to set the initial size of >>> the hashset used to perform the join. If this is set above the number of >>> results from the fromIndex then the you can avoid hashset resizing which >>> improves performance. >>> > 4) Nested filter queries. The local parameter "fq" can be used to nest >>> a filter query within the join. The nested fq will filter the results of >>> the join query. This can point to another join to support nested joins. >>> > 5) Full caching support for the lucene query implementation. The >>> filterCache and queryResultCache should work properly even with deep >>> nesting of joins. Only the queryResultCache comes into play with the >>> PostFilter implementation because PostFilters are not cacheable in the >>> filterCache. >>> > The syntax of the hjoin is similar to the JoinQParserPlugin except >>> that the plugin is referenced by the string "hjoin" rather then "join". >>> > fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 >>> fq=$qq\}user:customer1&qq=group:5 >>> > The example filter query above will search the fromIndex (collection2) >>> for "user:customer1" applying the local fq parameter to filter the results. >>> The lucene filter query will be built using 6 threads. This query will >>> generate a list of values from the "from" field that will be used to filter >>> the main query. Only records from the main query, where the "to" field is >>> present in the "from" list will be included in the results. >>> > The solrconfig.xml in the main query core must contain the reference >>> to the hjoin. >>> > <queryParser name="hjoin" >>> class="org.apache.solr.joins.HashSetJoinQParserPlugin"/> >>> > And the join contrib lib jars must be registed in the solrconfig.xml. >>> > <lib dir="../../../contrib/joins/lib" regex=".*\.jar" /> >>> > After issuing the "ant dist" command from inside the solr directory >>> the joins contrib jar will appear in the solr/dist directory. Place the the >>> solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr >>> webapplication. This will ensure that the top level Solr classloader loads >>> these classes rather then the core's classloaded. >>> > *BitSetJoinQParserPlugin aka bjoin* >>> > The bjoin behaves exactly like the hjoin but uses a BitSet instead of >>> a HashSet to perform the underlying join. Because of this the bjoin is much >>> faster and can provide sub-second response times on result sets of tens of >>> millions of records from the fromIndex and hundreds of millions of records >>> from the main query. >>> > But there are limitations to how the bjoin can be used. The bjoin >>> treats the join keys as addresses in a BitSet and uses the Lucene >>> OpenBitSet implementation which performs very well but is not sparse. So >>> the BitSet memory is dictated by the size of the join keys. For example a >>> bitset with a max join key of 200,000,000 will need 25 MB of memory. For >>> this reason the BitSet join does not support long join keys. In order to >>> keep memory usage down the join keys should also be packed at the low end, >>> for example from 1 to 50,000,000. >>> > Below is a sampe bjoin: >>> > fq=\{!bjoin fromIndex=collection2 from=id_i to=id_i threads=6 >>> fq=$qq\}user:customer1&qq=group:5 >>> > To register the bjoin the solrconfig.xml in the main query core must >>> contain the reference to the bjoin. >>> > <queryParser name="bjoin" >>> class="org.apache.solr.joins.BitSetJoinQParserPlugin"/> >>> > *ValueSourceJoinParserPlugin aka vjoin* >>> > The second implementation is the ValueSourceJoinParserPlugin aka >>> "vjoin". This implements a ValueSource function query that can return a >>> value from a second core based on join keys and limiting query. The >>> limiting query can be used to select a specific subset of data from the >>> join core. This allows customer specific relevance data to be stored in a >>> separate core and then joined in the main query. >>> > The vjoin is called using the "vjoin" function query. For example: >>> > bf=vjoin(fromCore, fromKey, fromVal, toKey, query) >>> > This example shows "vjoin" being called by the edismax boost function >>> parameter. This example will return the "fromVal" from the "fromCore". The >>> "fromKey" and "toKey" are used to link the records from the main query to >>> the records in the "fromCore". The "query" is used to select a specific set >>> of records to join with in fromCore. >>> > Currently the fromKey and toKey must be longs but this will change in >>> future versions. Like the pjoin, the "join" SolrCache is used to hold the >>> join memory structures. >>> > To configure the vjoin you must register the ValueSource plugin in the >>> solrconfig.xml as follows: >>> > <valueSourceParser name="vjoin" >>> class="org.apache.solr.joins.ValueSourceJoinParserPlugin" /> >>> >>> >>> >>> -- >>> This message was sent by Atlassian JIRA >>> (v6.1.5#6160) >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >