[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677082#comment-13677082
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/6/13 3:06 PM:
--

New patch. 


JoinValueSourceParserPlugin2 has been renamed the ValueSourceJoinParserPlugin. 
It also now supports long join keys only. This will be changed soon to support 
both int and long join keys.

A README.txt has been added which explains the setup.

Many other changes and as well that will be reflected in the ticket description.




  was (Author: joel.bernstein):
New patch. 


JoinValueSourceParserPlugin2 has been renamed the ValueSourcParserPlugin. It 
also now supports long join keys only. This will be changed soon to support 
both int and long join keys.

A README.txt has been added which explains the setup.

Many other changes and as well that will be reflected in the ticket description.



  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
> in the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *JoinValueSourceParserPlugin aka vjoin*
> The second implementation is the JoinValueSourceParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return values from a 
> second core based on join keys. This allows relevance data to be stored in a 
> separate core and then joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore".
> As with the "pjoin", both the fromKey and toKey must be integers. Also like 
> the pjoin, the "join" SolrCache is used to hold the join memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>  class="org.apache.solr.joins.JoinValueSourceParserPlugin" />
> *JoinValueSourceParserPlugin2 aka vjoin2 aka Personalized ValueSource Join*
> vjoin2 supports "person

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677082#comment-13677082
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/6/13 3:07 PM:
--

New patch. 


JoinValueSourceParserPlugin2 has been renamed the ValueSourceJoinParserPlugin. 
It also now supports long join keys only. This will be changed soon to support 
both int and long join keys.

A README.txt has been added which explains the setup.

Many other changes as well that will be reflected in the ticket description.




  was (Author: joel.bernstein):
New patch. 


JoinValueSourceParserPlugin2 has been renamed the ValueSourceJoinParserPlugin. 
It also now supports long join keys only. This will be changed soon to support 
both int and long join keys.

A README.txt has been added which explains the setup.

Many other changes and as well that will be reflected in the ticket description.



  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
> in the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *JoinValueSourceParserPlugin aka vjoin*
> The second implementation is the JoinValueSourceParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return values from a 
> second core based on join keys. This allows relevance data to be stored in a 
> separate core and then joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore".
> As with the "pjoin", both the fromKey and toKey must be integers. Also like 
> the pjoin, the "join" SolrCache is used to hold the join memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>  class="org.apache.solr.joins.JoinValueSourceParserPlugin" />
> *JoinValueSourceParserPlugin2 aka vjoin2 aka Personalized ValueSource Join*
> vjoin2 supports "perso

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677149#comment-13677149
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/6/13 3:39 PM:
--

Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. Java's array.sort() can sort a random array of integers much 
faster then the LongToInt hashmap impl can load keys.

For personalized relevance though this should be plenty because each user will 
have a custom set of data to join to. You don't want or need to have a 
relevance record for each document. If a document is not available in the 
relevance core, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.





  was (Author: joel.bernstein):
Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. Java's array.sort() can sort a random array of integers much 
faster then the LongToInt hashmap impl used by vjoin.

For personalized relevance though this should be plenty because each user will 
have a custom set of data to join to. You don't want or need to have a 
relevance record for each document. If a document is not available in the 
relevance core, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.




  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677149#comment-13677149
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/6/13 3:48 PM:
--

Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. vjoin needs a map or parallel arrays. Arrays.sort() can sort 
a random array of ints much faster then the LongToInt hashmap impl can load 
keys. The SorterTemplate was 8x slower at sorting parallel arrays then sorting 
a single array with Arrays.sort(). 

For personalized relevance a couple hundred thousand keys should be plenty 
because each user will have a custom set of data to join to. You don't want or 
need to have a relevance record for each document. If a document is not 
available in the relevance core, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.





  was (Author: joel.bernstein):
Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. Java's array.sort() can sort a random array of integers much 
faster then the LongToInt hashmap impl can load keys.

For personalized relevance though this should be plenty because each user will 
have a custom set of data to join to. You don't want or need to have a 
relevance record for each document. If a document is not available in the 
relevance core, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.




  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the r

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677149#comment-13677149
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/6/13 4:02 PM:
--

Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. vjoin needs a map or parallel arrays. Arrays.sort() can sort 
a random array of ints much faster then the LongToInt hashmap impl can load 
keys. The SorterTemplate was 8x slower at sorting parallel arrays then sorting 
a single array with Arrays.sort(). 

For personalized relevance a couple hundred thousand keys should be enough 
because each user will have a custom set of data to join to. You don't need to 
have a custom relevance record for each document. If a document is not 
available in the custom relevance set, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.





  was (Author: joel.bernstein):
Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. vjoin needs a map or parallel arrays. Arrays.sort() can sort 
a random array of ints much faster then the LongToInt hashmap impl can load 
keys. The SorterTemplate was 8x slower at sorting parallel arrays then sorting 
a single array with Arrays.sort(). 

For personalized relevance a couple hundred thousand keys should be plenty 
because each user will have a custom set of data to join to. You don't want or 
need to have a relevance record for each document. If a document is not 
available in the relevance core, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.




  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> fie

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-10 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677149#comment-13677149
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/10/13 2:21 PM:
---

Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
load 2-3 million key/pairs in around 200 milliseconds. 


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.





  was (Author: joel.bernstein):
Kranti,

The vjoin has two performance hotspots:

1) The creation of the HashMap for the hashjoin. My testing shows that it can 
support a couple hundred thousand keys before performance becomes an issue. The 
pjoin is much more scable in this area and can support millions of keys. The 
reason for this is that the pjoin only needs a sorted array to perform  binary 
searches against. vjoin needs a map or parallel arrays. Arrays.sort() can sort 
a random array of ints much faster then the LongToInt hashmap impl can load 
keys. The SorterTemplate was 8x slower at sorting parallel arrays then sorting 
a single array with Arrays.sort(). 

For personalized relevance a couple hundred thousand keys should be enough 
because each user will have a custom set of data to join to. You don't need to 
have a custom relevance record for each document. If a document is not 
available in the custom relevance set, vjoin will return a neutral boost of 1.


2) The hash key lookup each time the vjoin is called. This will be called for 
each document that is scored in the result set. This should scale to support 
result sets into the millions. I tested with 4,000,000 results and had 
excellent performance.




  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueS

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-10 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679527#comment-13679527
 ] 

Joel Bernstein edited comment on SOLR-4787 at 6/10/13 2:22 PM:
---

Switched from fastutil to hppc for the primitive map used for the hashjoin.

Also, further performance testing has shown that the performance on loading the 
LongInt hash map was much better then I initially thought. The vjoin will scale 
comfortably into the millions of keys as well.

  was (Author: joel.bernstein):
Switched from fastutil to hppc for the primitive map used for the hashjoin.

Also, further performance testing has shown that the performance on loading the 
LongInt hash map was much better then I initially thought. The hash joins will 
scale comfortably into the millions of keys as well.
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>  class="or

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-06-27 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694902#comment-13694902
 ] 

Kranti Parisa edited comment on SOLR-4787 at 6/27/13 5:40 PM:
--

if we have the following case

PARENT CORE

ID Field
--
p1
p2
p3

CHILD CORE

ID Field|   Parent ID field  |  ValueToFetch Field
-
c1 | p1  |   1234
c2  |p1   |  3456


Then ValueToFetch will be the last one (in this case: 3456)

May be we should think of doing 

1. Allow sort parameter to the vjoin function

2. consider the Sort when executing the query

3. for each PARENT ID, pick up the first ValueToFetch and ignore the rest (as 
we are specifying the sort preference, we are saying to collect the top 
document)

4. sort param could have multiple values like general solr sort

so vjoin will look like 
vjoin(joinCore, foreignKey, foreignVal, primaryKey, query, $vSort)

&vSort=field1 desc, field2 asc

5. if no sort param is specified then current implementation works (picking the 
the value from the last document)

Joel, your ideas? 

  was (Author: krantiparisa):
if we have the following case

PARENT CORE

ID Field
--
p1
p2
p3

CHILD CORE

ID Field  Parent ID field   ValueToFetch Field
-
c1  p1 1234
c2  p1 3456


Then ValueToFetch will be the last one (in this case: 3456)

May be we should think of doing 

1. Allow sort parameter to the vjoin function

2. consider the Sort when executing the query

3. for each PARENT ID, pick up the first ValueToFetch and ignore the rest (as 
we are specifying the sort preference, we are saying to collect the top 
document)

4. sort param could have multiple values like general solr sort

so vjoin will look like 
vjoin(joinCore, foreignKey, foreignVal, primaryKey, query, $vSort)

&vSort=field1 desc, field2 asc

5. if no sort param is specified then current implementation works (picking the 
the value from the last document)

Joel, your ideas? 
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contr

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-07-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715925#comment-13715925
 ] 

Kranti Parisa edited comment on SOLR-4787 at 7/23/13 12:39 AM:
---

Joel,

I wanted to try implementing pjoin for the following use case.

masterCore = 1M keys (id field, long)
childCore = 5M documents (with parentid field, long, whose values are equal to 
the values of id field in the masterCore

And syntax looks like
http://localhost:8180/solr/masterCore/select?q=title:a&fq=({!pjoin%20fromIndex=childCore%20from=parentid%20to=id%20v=$childQ})&childQ=(fieldOne:somevalue
 AND fieldTwo:[1 TO 100])

I am getting SyntaxError, but the same syntax works with normal "join". any 
ideas?

Also it seems currently pjoin supports only int keys, can you please update 
pjoin to allow long keys

-
Kranti



  was (Author: krantiparisa):
Joel,

I wanted to try implementing pjoin for the following use case.

masterCore = 1M keys (id field, long)
childCore = 5M documents (with parentid field, long, whose values are equal to 
the values of id field in the masterCore

And syntax looks like
http://localhost:8180/solr/masterCore/select?q=title:a&fq=({!pjoin%20fromIndex=childCore%20from=parentid%20to=id%20v=$childQ})&childQ=(fieldOne:somevalue
 AND fieldTwo:[1 TO 100])

I am getting SyntaxError, but the same syntax works with normal "join". any 
ideas?

Also it seems currently pjoin supports only int keys, can you please update 
pjoin to allow for longs

-
Kranti


  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called us

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-07-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715944#comment-13715944
 ] 

Kranti Parisa edited comment on SOLR-4787 at 7/23/13 12:59 AM:
---

I did review the PostFilterJoinQParserPlugin code and found that it was 
expecting "fromCore" instead of "fromIndex" (like in normal Join)

after trying "fromCore" now getting the expected error related to INT keys, as 
my keys are Long.

- 
Kranti

  was (Author: krantiparisa):
I did review the PostFilterJoinQParserPlugin code and found that it was 
expecting "fromCore" instead of "fromIndex" (like in normal Join)

after trying "fromCore" now getting the expected error related to INT keys, as 
my keys are Long.
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>  class="org.apache.solr.joins.ValueSourceJoinParserPlu

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-07-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715987#comment-13715987
 ] 

Kranti Parisa edited comment on SOLR-4787 at 7/23/13 2:53 AM:
--

Attached the patch file (SOLR-4787-pjoin-long-keys.patch) for 
PostFilterJoinQParserPlugin to support LONG keys

  was (Author: krantiparisa):
Update PostFilterJoinQParserPlugin to support LONG keys
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>  class="org.apache.solr.joins.ValueSourceJoinParserPlugin" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To u

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-10-25 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805485#comment-13805485
 ] 

Kranti Parisa edited comment on SOLR-4787 at 10/25/13 5:29 PM:
---

I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*&fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
fq=$BJoinQ,$AlocalFQ})&aQ=(f1:false)&BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)&AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs (eg: 
*fq=$BJoinQ,$AlocalFQ*) sounds ok?


was (Author: krantiparisa):
I have recently extended the hjoin further to support multiple FQs separated by 
comma (,)

{code}
/masterCore/select?q=*:*&fq=({!hjoin fromIndex=ACore from=parentid to=id v=$aQ 
*fq=$BJoinQ,$AlocalFQ*})&aQ=(f1:false)&BJoinQ=({!hjoin fromIndex=BCore from=bid 
to=aid}tag:abc)&AlocalFQ=(fieldName:value)
{code}

This will allow using the filter caches for multiple nested queries while using 
the hjoin like how solr supports multiple FQ params within the same request.

Any feedback for the syntax? is comma separated FQs sounds ok?

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.6
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-08-21 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747147#comment-13747147
 ] 

David Smiley edited comment on SOLR-4787 at 8/22/13 1:53 AM:
-

Excellent suggestion [~krantiparisa] on supporting filter queries on the 
from-side of the join.  I implemented a custom join query and it has a 
from-side filter query feature.  It can help a great deal with performance.  It 
wasn't that hard to add, either.

  was (Author: dsmiley):
Excellent suggestion [~kranti2006] on supporting filter queries on the 
from-side of the join.  I implemented a custom join query and it has a 
from-side filter query feature.  It can help a great deal with performance.  It 
wasn't that hard to add, either.
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follo

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-08-21 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747170#comment-13747170
 ] 

Joel Bernstein edited comment on SOLR-4787 at 8/22/13 2:33 AM:
---

Yeah, that's the exact approach I took David.

This was added to support nested joins but I see how the caching could really 
speed up the whole join.



  was (Author: joel.bernstein):
Yeah, that's the exact approach I took David.


  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>  class="org.apache.solr.joins.ValueSourceJoinParserPlugin" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-08-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747175#comment-13747175
 ] 

Kranti Parisa edited comment on SOLR-4787 at 8/22/13 3:59 PM:
--

Nested Joins! That's exactly what I am trying :) and thought about adding fq to 
the solr joins.

Using local-param in the first join:{code} {!join fromIndex=a from=f1 to=f2 
v=$joinQ}&joinQ=(field:123 AND _query_={another join}){code}. So here "another 
join" could be passed as a FQ and it should get results faster!!

  was (Author: krantiparisa):
Nested Joins! That's exactly what I am trying :) and thought about adding 
fq to the solr joins.

Using local-param in the first join:{code} {!joinfromIndex=a from=f1 to=f2 
v=$joinQ}&joinQ=(field:123 AND _query_={another join}){code}. So here "another 
join" could be passed as a FQ and it should get results faster!!
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold th

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-08-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747175#comment-13747175
 ] 

Kranti Parisa edited comment on SOLR-4787 at 8/22/13 3:59 PM:
--

Nested Joins! That's exactly what I am trying :) and thought about adding fq to 
the solr joins.

Using local-param in the first join:{code} {!joinfromIndex=a from=f1 to=f2 
v=$joinQ}&joinQ=(field:123 AND _query_={another join}){code}. So here "another 
join" could be passed as a FQ and it should get results faster!!

  was (Author: krantiparisa):
Nested Joins! That's exactly what I am trying :) and thought about adding 
fq to the solr joins.

Using local-param in the first join: "v=$joinQ"&joinQ=(field:123 AND 
_query_={another join}). So here "another join" could be passed as a FQ and it 
should get results faster!!
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future 
> versions. Like the pjoin, the "join" SolrCache is used to hold the join 
> memory structures.
> To configure 

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-08-22 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747175#comment-13747175
 ] 

Kranti Parisa edited comment on SOLR-4787 at 8/22/13 9:21 PM:
--

Nested Joins! That's exactly what I am trying :) and thought about adding fq to 
the solr joins.

Using local-param in the first join:{code} {!join fromIndex=a from=f1 to=f2 
v=$joinQ}&joinQ=(field:123 AND _query_={another join}){code}. So here "another 
join" could be passed as a FQ and it should get results faster!! Hence the 
above, query would look like,

{code} {!join fromIndex=a from=f1 to=f2 v=$joinQ 
fq=$joinFQ}&joinQ=(field:123)&joinFQ={another join}{code}

  was (Author: krantiparisa):
Nested Joins! That's exactly what I am trying :) and thought about adding 
fq to the solr joins.

Using local-param in the first join:{code} {!join fromIndex=a from=f1 to=f2 
v=$joinQ}&joinQ=(field:123 AND _query_={another join}){code}. So here "another 
join" could be passed as a FQ and it should get results faster!!
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return a value from a 
> second core based on join keys and limiting query. The limiting query can be 
> used to select a specific subset of data from the join core. This allows 
> customer specific relevance data to be stored in a separate core and then 
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore". The "query" is used to select a specific set of 
> records to join with in fromCo

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-08-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751206#comment-13751206
 ] 

Joel Bernstein edited comment on SOLR-4787 at 8/27/13 12:11 PM:


Right now there aren't really efficient memory structures to perform the joins 
on multi-value fields. Our best bet right now would be the SORTED_SET docValues 
described http://wiki.apache.org/solr/DocValues. But this is not really 
designed for integers or longs.

I think the best way to handle this is to fully normalize the data so that the 
join keys are single valued. Basically model the data the way you would in a 
relational database then use the nested joins to join the normalized indexes 
together. 

  was (Author: joel.bernstein):
Right now there aren't really efficient memory structures to perform the 
joins on multi-value fields. Our best bet right now would be the SORTED_SET 
docValues described http://wiki.apache.org/solr/DocValues. But this is not 
really designed for integers or longs.

I think the best way to handle is to fully normalize the data so that the join 
keys are single valued. Basically model the data the way you would in a 
relational database then use the nested joins to join the normalized indexes 
together. 
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hj

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-09-13 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766446#comment-13766446
 ] 

Joel Bernstein edited comment on SOLR-4787 at 9/13/13 12:46 PM:


Kranti, the bjoin now supports multi-value fields. I'll work on getting the 
patch up here today.

  was (Author: joel.bernstein):
Kranti, the bjoin now supports multi-value joins. I'll work on getting the 
patch up here today.
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
>  
>  
> *BitSetJoinQParserPlugin aka bj

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-09-13 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767361#comment-13767361
 ] 

Kranti Parisa edited comment on SOLR-4787 at 9/14/13 4:16 AM:
--

Something is missing in the Patch? I am seeing ByteArray compilation problem. 
Also does bjoin needs any specific types of field configs in schema.xml ?

  was (Author: krantiparisa):
Something missing in the Patch? I am seeing ByteArray compilation problem. 
Also does bjoin needs any specific types of field configs in schema.xml ?
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.HashSetJoinQPar

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-09-15 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768078#comment-13768078
 ] 

Kranti Parisa edited comment on SOLR-4787 at 9/16/13 5:38 AM:
--

I have implemented multi-value keys for hjoin using a new field 
UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested 
Joins). I will run some performance tests and prepare the patch sometime 
tomorrow.

  was (Author: krantiparisa):
I have implemented multi-value keys for hjoin using a new field 
UnIvertedLongField. Sanity checks looks good. Also tested with FQs (nested 
Joins). I will prepare a patch sometime tomorrow and post here.
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The so

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-02-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916095#comment-13916095
 ] 

Joel Bernstein edited comment on SOLR-4787 at 2/28/14 5:56 PM:
---

Hi Alexander,

This ticket has not been committed. There are two joins described on the list 
of QParserPlugins here:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers

Joel


was (Author: joel.bernstein):
Hi Alexander,

This ticket has not been committed. There are two joins described on the list 
of QParserPlugins here:

https://cwiki.apache.org/confluence/display/solr/Other+Parser

Joel

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in th

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-04-15 Thread Arul Kalaipandian (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969379#comment-13969379
 ] 

Arul Kalaipandian edited comment on SOLR-4787 at 4/15/14 9:44 AM:
--

New patch(SOLR-4787-with-testcase-fix.patch  for Solr-4.2.1) with following fix 
& improvement,

* Testcase thread leaks: SolrCore(fromcore) released on 'finally'  block.
*  'fromIndex' is optional as like the standard 'join', i.e we can do join 
across cores or in single core('self-join').
*  BitSetJoinQParserPlugin, field validation added(From & to fields must be an 
'int').
*  Basic testcases added.



was (Author: arulck):
New patch with following fix & improvement,

* Testcase thread leaks: SolrCore(fromcore) released on 'finally'  block.
*  'fromIndex' is optional as like the standard 'join', i.e we can do join 
across cores or in single core('self-join').
*  BitSetJoinQParserPlugin, field validation added(From & to fields must be an 
'int').
*  Basic testcases added.

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=grou

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-05-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650063#comment-13650063
 ] 

Joel Bernstein edited comment on SOLR-4787 at 5/6/13 8:16 PM:
--

Changed the BSearch class to use the SorterTemplate rather then 
Collections.sort. Much more efficient inplace sorting. SorterTemplate builds 
with Solr 4.2.1. Will need to get this working with trunk as well using the new 
Sorter class.

Thanks David and Adrien for tips on this.

Found major bug in my original logic for how segment level readers were being 
used between the join cores and fixed that as well.

  was (Author: joel.bernstein):
Changed the BSearch class to use the SorterTemplate rather then 
Collections.sort. Much more efficient inplace sorting. SorterTemplate builds 
with Solr 4.2.1. Will need to get this working with trunk as well using the new 
Sorter class.

Found major bug in my original logic for how segment level readers were being 
used between the join cores and fixed that as well.
  
> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.2.1
>
> Attachments: SOLR-4787.patch, SOLR-4787.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 2 join implementations. 
> The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
> in the FieldCache API this patch will only build with Solr 4.2 or above.
> *PostFilterJoinQParserPlugin aka "pjoin"*
> The pjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the pjoin is designed to work with integer join keys 
> only. So, in order to use pjoin, integer join keys must be included in both 
> the to and from core.
> The second difference is that the pjoin builds memory structures that are 
> used to quickly connect the join keys. It also uses a custom SolrCache named 
> "join" to hold intermediate DocSets which are needed to build the join memory 
> structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
> perform the join.
> The main advantage of the pjoin is that it can scale to join millions of keys 
> between cores.
> Because it's a PostFilter, it only needs to join records that match the main 
> query.
> The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
> plugin is referenced by the string "pjoin" rather then "join".
> fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
> The example filter query above will search the fromCore (collection2) for 
> "user:customer1". This query will generate a list of values from the "from" 
> field that will be used to filter the main query. Only records from the main 
> query, where the "to" field is present in the "from" list will be included in 
> the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.PostFilterJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
> 
> The solrconfig.xml in the fromcore must have the "join" SolrCache configured.
> class="solr.LRUCache"
>   size="4096"
>   initialSize="1024"
>   />
> *JoinValueSourceParserPlugin aka vjoin*
> The second implementation is the JoinValueSourceParserPlugin aka "vjoin". 
> This implements a ValueSource function query that can return values from a 
> second core based on join keys. This allows relevance data to be stored in a 
> separate core and then joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey)
> This example shows "vjoin" being called by the edismax boost function 
> parameter. This example will return the "fromVal" from the "fromCore". The 
> "fromKey" and "toKey" are used to link the records from the main query to the 
> records in the "fromCore".
> As with the "pjoin", both the fromKey and toKey must be integers. Also like 
> the pjoin, the "join" SolrCache is used to hold the join memory structures.
> To configure the vjoin you must register the ValueSource plugin in the 
> solrconfig.xml as follows:
>   class="org.apache.solr.joins.JoinValueSourceParserPlugin" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact yo

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-03-05 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920798#comment-13920798
 ] 

Alexander S. edited comment on SOLR-4787 at 3/5/14 12:36 PM:
-

Hi Joel, thanks, I seems need to perform a nested join inside a single 
collection, but need fq inside join as it is shown here: 
https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854

I have a single collection with a field type which determines the kind of 
document. 3 types of documents: Profile, Site, and SiteSource.
When searching for Profiles I have to look in SiteSource content, so I need 
something like this:
{code}
q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → 
Site join
joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # 
Site → SiteSource join
joinQuery2 = {!edismax}my_keywords
joinFilter1 = "type:Site"
joinFilter2 = "type:SiteSource"
{code}

Right now this works only partially, fq inside \{!join\} is ignored.
When to expect this patch to be merged? Also, will it work in the way I've 
explained or do I understand it wrong?

Thank you,
Alex


was (Author: aheaven):
Hi Joel, thanks, I seems need to perform a nested join inside a single 
collection, but need fq inside join as it is shown here: 
https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854

I have a single collection with a field type which determines the kind of 
document. 3 types of documents: Profile, Site, and SiteSource.
When searching for Profiles I have to look in SiteSource content, so I need 
something like this:
{code}
q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → 
Site join
joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # 
Site → SiteSource join
joinQuery2 = {!edismax}my_keywords
joinFilter1 = "type:Site"
joinFilter2 = "type:SiteSource"
{code}

Right now this works only partially, fq inside {!join} is ignored.
When to expect this patch to be merged? Also, will it work in the way I've 
explained or do I understand it wrong?

Thank you,
Alex

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six 

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-03-05 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13921113#comment-13921113
 ] 

Kranti Parisa edited comment on SOLR-4787 at 3/5/14 5:58 PM:
-

Alex,

You may try the Patch 
(https://issues.apache.org/jira/secure/attachment/12632860/SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch)
 for Nested Joins.




was (Author: krantiparisa):
support nested joins

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.7
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib li

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-03-19 Thread Gopal Patwa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941359#comment-13941359
 ] 

Gopal Patwa edited comment on SOLR-4787 at 3/20/14 4:54 AM:


I am trying to use this patch (27/Jan/14 12:26) using hjoin and mismatch type 
issue was resolved it was my bad, I had join id with different type.

Is it possible to collect data from hjoin collection i.e fromIndex and append 
to main query result? In my usecase I need to use hjoin and also show fields 
from fromIndex.

 



was (Author: gpatwa):
I am trying to use this patch (27/Jan/14 12:26) but getting below exception 
using hjoin

Solr 4.6

Schema.xml id field between parent and child


Caused by: java.lang.IllegalStateException: Type mismatch: id was indexed as 
SORTED
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:896)
at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:875)
at 
org.apache.solr.joins.HashSetJoinQParserPlugin$LongJoinRunner.(HashSetJoinQParserPlugin.java:504)
at 
org.apache.solr.joins.HashSetJoinQParserPlugin$LongHashSetJoinQuery.createWeight(HashSetJoinQParserPlugin.java:313)

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
>

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-04-07 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961733#comment-13961733
 ] 

Alexander S. edited comment on SOLR-4787 at 4/7/14 9:10 AM:


@Kranti Parisa, hi, any luck with this?


was (Author: aheaven):
@Kranti Parisa, hi, any lick with this?

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> hjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars must be registed in the solrconfig.xml.
>  
> After issuing the "ant dist" command from inside the so

Re: [jira] [Comment Edited] (SOLR-4787) Join Contrib

2013-09-14 Thread Erick Erickson
DId you apply to trunk (future 5.0) or the 4x branch? Work is often
done on the trunk and backported to 4x, don't know what's up with
this particular patch though.

Best,
Erick


On Sat, Sep 14, 2013 at 12:17 AM, Kranti Parisa (JIRA) wrote:

>
> [
> https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767361#comment-13767361]
>
> Kranti Parisa edited comment on SOLR-4787 at 9/14/13 4:16 AM:
> --
>
> Something is missing in the Patch? I am seeing ByteArray compilation
> problem. Also does bjoin needs any specific types of field configs in
> schema.xml ?
>
>   was (Author: krantiparisa):
> Something missing in the Patch? I am seeing ByteArray compilation
> problem. Also does bjoin needs any specific types of field configs in
> schema.xml ?
>
> > Join Contrib
> > 
> >
> > Key: SOLR-4787
> > URL: https://issues.apache.org/jira/browse/SOLR-4787
> > Project: Solr
> >  Issue Type: New Feature
> >  Components: search
> >Affects Versions: 4.2.1
> >Reporter: Joel Bernstein
> >Priority: Minor
> > Fix For: 4.5, 5.0
> >
> > Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch,
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4787-pjoin-long-keys.patch
> >
> >
> > This contrib provides a place where different join implementations can
> be contributed to Solr. This contrib currently includes 3 join
> implementations. The initial patch was generated from the Solr 4.3 tag.
> Because of changes in the FieldCache API this patch will only build with
> Solr 4.2 or above.
> > *HashSetJoinQParserPlugin aka hjoin*
> > The hjoin provides a join implementation that filters results in one
> core based on the results of a search in another core. This is similar in
> functionality to the JoinQParserPlugin but the implementation differs in a
> couple of important ways.
> > The first way is that the hjoin is designed to work with int and long
> join keys only. So, in order to use hjoin, int or long join keys must be
> included in both the to and from core.
> > The second difference is that the hjoin builds memory structures that
> are used to quickly connect the join keys. So, the hjoin will need more
> memory then the JoinQParserPlugin to perform the join.
> > The main advantage of the hjoin is that it can scale to join millions of
> keys between cores and provide sub-second response time. The hjoin should
> work well with up to two million results from the fromIndex and tens of
> millions of results from the main query.
> > The hjoin supports the following features:
> > 1) Both lucene query and PostFilter implementations. A *"cost"* > 99
> will turn on the PostFilter. The PostFilter will typically outperform the
> Lucene query when the main query results have been narrowed down.
> > 2) With the lucene query implementation there is an option to build the
> filter with threads. This can greatly improve the performance of the query
> if the main query index is very large. The "threads" parameter turns on
> threading. For example *threads=6* will use 6 threads to build the filter.
> This will setup a fixed threadpool with six threads to handle all hjoin
> requests. Once the threadpool is created the hjoin will always use it to
> build the filter. Threading does not come into play with the PostFilter.
> > 3) The *size* local parameter can be used to set the initial size of the
> hashset used to perform the join. If this is set above the number of
> results from the fromIndex then the you can avoid hashset resizing which
> improves performance.
> > 4) Nested filter queries. The local parameter "fq" can be used to nest a
> filter query within the join. The nested fq will filter the results of the
> join query. This can point to another join to support nested joins.
> > 5) Full caching support for the lucene query implementation. The
> filterCache and queryResultCache should work properly even with deep
> nesting of joins. Only the queryResultCache comes into play with the
> PostFilter implementation because PostFilters are not cacheable in the
> filterCache.
> > The syntax of the hjoin is similar to the JoinQParserPlugin except that
> the plugin is referenced by the string "hjoin" rather then "join".
> > fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
> fq=$qq\}user:customer1&qq=group:5
> > The example filter query above will search the fromIndex (collection2)
> for "user:customer1" applying the local fq parameter to filter the results.
> The lucene filter query will be built using 6 threads. This query will
> generate a list of values from the "from" field that will be used to filter
> the main query.