Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Mikhail Khludnev
On Fri, Dec 5, 2014 at 10:44 PM, Darin Amos dari...@gmail.com wrote:

 public Scorer scorer(){
 TermsWithScoreCollector collector = new
 TermsWithScoreCollector();
 JoinQuery.this.s.search(JoinQuery.this.q,
 collector);

 //do the rest..

 }


Darin,
I hardly follow, but this approach either is not efficient or even doesn't
work. Generally join is O(n^2) operation, which is most impls try to
reduce. weight.scorer() is invoked per segment, and scorer yields results
only from a particular segment. However, fromQuery should run across all
segments. Hence, TermsWithScoreCollector will collect IDs globally again
and again.
As you can see, the current JoinUtil design is much more efficient, it
reuses global IDs hash across all to segments searches.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Michael Sokolov
I get the impression there was a concern that the caller could hold on 
to the query generated by JoinUtil for too long - eg across requests in 
Solr. I'm not sure why the OP thinks that would happen, though.


-Mike

On 12/08/2014 04:57 AM, Mikhail Khludnev wrote:

On Fri, Dec 5, 2014 at 10:44 PM, Darin Amos dari...@gmail.com wrote:


 public Scorer scorer(){
 TermsWithScoreCollector collector = new
TermsWithScoreCollector();
 JoinQuery.this.s.search(JoinQuery.this.q,
collector);

 //do the rest..

 }


Darin,
I hardly follow, but this approach either is not efficient or even doesn't
work. Generally join is O(n^2) operation, which is most impls try to
reduce. weight.scorer() is invoked per segment, and scorer yields results
only from a particular segment. However, fromQuery should run across all
segments. Hence, TermsWithScoreCollector will collect IDs globally again
and again.
As you can see, the current JoinUtil design is much more efficient, it
reuses global IDs hash across all to segments searches.






Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Mikhail Khludnev
On Mon, Dec 8, 2014 at 5:38 PM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 I get the impression there was a concern that the caller could hold on to
 the query generated by JoinUtil for too long - eg across requests in Solr.

Michael, if you still bother, SOLR-6234
https://issues.apache.org/jira/browse/SOLR-6234 is free from this issue.
Cache keys (Queries), are fairly small and GC friendly.


 I'm not sure why the OP thinks that would happen, though.

Could you please expand OP? I didn't get it.


 -Mike


 On 12/08/2014 04:57 AM, Mikhail Khludnev wrote:

 On Fri, Dec 5, 2014 at 10:44 PM, Darin Amos dari...@gmail.com wrote:

   public Scorer scorer(){
  TermsWithScoreCollector collector = new
 TermsWithScoreCollector();
  JoinQuery.this.s.search(
 JoinQuery.this.q,
 collector);

  //do the rest..

  }

  Darin,
 I hardly follow, but this approach either is not efficient or even doesn't
 work. Generally join is O(n^2) operation, which is most impls try to
 reduce. weight.scorer() is invoked per segment, and scorer yields results
 only from a particular segment. However, fromQuery should run across all
 segments. Hence, TermsWithScoreCollector will collect IDs globally again
 and again.
 As you can see, the current JoinUtil design is much more efficient, it
 reuses global IDs hash across all to segments searches.






-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Darin Amos
Hi Mikhail,

I was merely posing a thought in an effort to continue to learn and educate 
myself. Your point about Weight.scorer() being called per segment helps my 
understanding. I am in the middle of building a POC for a customer of mine that 
I pointed out in this thread on Dec 5th (shortly after noon). I have spent 
countless hours over the weekend continuing to try and learn the internals of 
SOLR and Lucene.

Thanks

Darin


 On Dec 8, 2014, at 4:57 AM, Mikhail Khludnev mkhlud...@griddynamics.com 
 wrote:
 
 On Fri, Dec 5, 2014 at 10:44 PM, Darin Amos dari...@gmail.com wrote:
 
public Scorer scorer(){
TermsWithScoreCollector collector = new
 TermsWithScoreCollector();
JoinQuery.this.s.search(JoinQuery.this.q,
 collector);
 
//do the rest..
 
}
 
 
 Darin,
 I hardly follow, but this approach either is not efficient or even doesn't
 work. Generally join is O(n^2) operation, which is most impls try to
 reduce. weight.scorer() is invoked per segment, and scorer yields results
 only from a particular segment. However, fromQuery should run across all
 segments. Hence, TermsWithScoreCollector will collect IDs globally again
 and again.
 As you can see, the current JoinUtil design is much more efficient, it
 reuses global IDs hash across all to segments searches.
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Anti-Pattern in lucent-join jar?

2014-12-08 Thread Michael Sokolov


Right - allowing Solr to manage these queries (SOLR-6234) seems like the 
way to go


 ... OP == original poster (I lost track of who started the discussion)


-Mike

On 12/08/2014 10:19 AM, Mikhail Khludnev wrote:

On Mon, Dec 8, 2014 at 5:38 PM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:


I get the impression there was a concern that the caller could hold on to
the query generated by JoinUtil for too long - eg across requests in Solr.

Michael, if you still bother, SOLR-6234
https://issues.apache.org/jira/browse/SOLR-6234 is free from this issue.
Cache keys (Queries), are fairly small and GC friendly.



I'm not sure why the OP thinks that would happen, though.


Could you please expand OP? I didn't get it.


-Mike


On 12/08/2014 04:57 AM, Mikhail Khludnev wrote:


On Fri, Dec 5, 2014 at 10:44 PM, Darin Amos dari...@gmail.com wrote:

   public Scorer scorer(){

  TermsWithScoreCollector collector = new
TermsWithScoreCollector();
  JoinQuery.this.s.search(
JoinQuery.this.q,
collector);

  //do the rest..

  }

  Darin,

I hardly follow, but this approach either is not efficient or even doesn't
work. Generally join is O(n^2) operation, which is most impls try to
reduce. weight.scorer() is invoked per segment, and scorer yields results
only from a particular segment. However, fromQuery should run across all
segments. Hence, TermsWithScoreCollector will collect IDs globally again
and again.
As you can see, the current JoinUtil design is much more efficient, it
reuses global IDs hash across all to segments searches.









Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Mikhail Khludnev
Thanks Roman! Let's expand it for the sake of completeness.
Such issue is not possible in Solr, because caches are associated with the
searcher. While you follow this design (see Solr userCache), and don't
update what's cached once, there is no chance to shoot the foot.
There were few caches inside of Lucene (old FieldCache,
CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
onto segment keys, hence it exclude such leakage across different
searchers.

On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla roman.ch...@gmail.com wrote:

 +1, additionally (as it follows from your observation) the query can get
 out of sync with the index, if eg it was saved for later use and ran
 against newly opened searcher

 Roman
 On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote:

  Hello All,
 
  I have been doing a lot of research in building some custom queries and I
  have been looking at the Lucene Join library as a reference. I noticed
  something that I believe could actually have a negative side effect.
 
  Specifically I was looking at the JoinUtil.createJoinQuery(…) method and
  within that method you see the following code:
 
  TermsWithScoreCollector termsWithScoreCollector =
  TermsWithScoreCollector.create(fromField,
  multipleValuesPerDocument, scoreMode);
  fromSearcher.search(fromQuery, termsWithScoreCollector);
 
  As you can see, when the JoinQuery is being built, the code is executing
  the query that is wraps with it’s own collector to collect all the
 scores.
  If I were to write a query parser using this library (which someone has
  done here), doesn’t this reduce the benefit of the SOLR query cache? The
  wrapped query is being executing when the Join Query is being
 constructed,
  not when it is executed.
 
  Thanks
 
  Darin
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
Hi Mikhail, I think you are right, it won't be problem for SOLR, but it is
likely an antipattern inside a lucene component. Because custom components
may create join queries, hold to them and then execute much later against a
different searcher. One approach would be to postpone term collection until
the query actually runs, I looked far and wide for appropriate place, but
only found createWeight() - but at least it does give developers NO
opportunity to shoot their feet! ;-)

Since it may serve as an inspiration to someone, here is a link:
https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101

roman

On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 Thanks Roman! Let's expand it for the sake of completeness.
 Such issue is not possible in Solr, because caches are associated with the
 searcher. While you follow this design (see Solr userCache), and don't
 update what's cached once, there is no chance to shoot the foot.
 There were few caches inside of Lucene (old FieldCache,
 CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
 onto segment keys, hence it exclude such leakage across different
 searchers.

 On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla roman.ch...@gmail.com wrote:

  +1, additionally (as it follows from your observation) the query can get
  out of sync with the index, if eg it was saved for later use and ran
  against newly opened searcher
 
  Roman
  On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote:
 
   Hello All,
  
   I have been doing a lot of research in building some custom queries
 and I
   have been looking at the Lucene Join library as a reference. I noticed
   something that I believe could actually have a negative side effect.
  
   Specifically I was looking at the JoinUtil.createJoinQuery(…) method
 and
   within that method you see the following code:
  
   TermsWithScoreCollector termsWithScoreCollector =
   TermsWithScoreCollector.create(fromField,
   multipleValuesPerDocument, scoreMode);
   fromSearcher.search(fromQuery, termsWithScoreCollector);
  
   As you can see, when the JoinQuery is being built, the code is
 executing
   the query that is wraps with it’s own collector to collect all the
  scores.
   If I were to write a query parser using this library (which someone has
   done here), doesn’t this reduce the benefit of the SOLR query cache?
 The
   wrapped query is being executing when the Join Query is being
  constructed,
   not when it is executed.
  
   Thanks
  
   Darin
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
Couldn’t you just keep passing the wrapped query and searcher down to 
Weight.scorer()?

This would allow you to wait until the query is executed to do term collection. 
If you want to protect against creating and executing the query with different 
searchers, you would have to make the query factory (or constructor) only 
visible to the query parser or parser plugin?

I might not have followed you, this discussing challenges my understanding of 
Lucene and SOLR.

Darin



 On Dec 5, 2014, at 12:47 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
 Hi Mikhail, I think you are right, it won't be problem for SOLR, but it is
 likely an antipattern inside a lucene component. Because custom components
 may create join queries, hold to them and then execute much later against a
 different searcher. One approach would be to postpone term collection until
 the query actually runs, I looked far and wide for appropriate place, but
 only found createWeight() - but at least it does give developers NO
 opportunity to shoot their feet! ;-)
 
 Since it may serve as an inspiration to someone, here is a link:
 https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
 
 roman
 
 On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:
 
 Thanks Roman! Let's expand it for the sake of completeness.
 Such issue is not possible in Solr, because caches are associated with the
 searcher. While you follow this design (see Solr userCache), and don't
 update what's cached once, there is no chance to shoot the foot.
 There were few caches inside of Lucene (old FieldCache,
 CachingWrapperFilter, ExternalFileField, etc), but they are properly mapped
 onto segment keys, hence it exclude such leakage across different
 searchers.
 
 On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla roman.ch...@gmail.com wrote:
 
 +1, additionally (as it follows from your observation) the query can get
 out of sync with the index, if eg it was saved for later use and ran
 against newly opened searcher
 
 Roman
 On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote:
 
 Hello All,
 
 I have been doing a lot of research in building some custom queries
 and I
 have been looking at the Lucene Join library as a reference. I noticed
 something that I believe could actually have a negative side effect.
 
 Specifically I was looking at the JoinUtil.createJoinQuery(…) method
 and
 within that method you see the following code:
 
TermsWithScoreCollector termsWithScoreCollector =
TermsWithScoreCollector.create(fromField,
 multipleValuesPerDocument, scoreMode);
fromSearcher.search(fromQuery, termsWithScoreCollector);
 
 As you can see, when the JoinQuery is being built, the code is
 executing
 the query that is wraps with it’s own collector to collect all the
 scores.
 If I were to write a query parser using this library (which someone has
 done here), doesn’t this reduce the benefit of the SOLR query cache?
 The
 wrapped query is being executing when the Join Query is being
 constructed,
 not when it is executed.
 
 Thanks
 
 Darin
 
 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com
 



Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
Not sure I understand. It is the searcher which executes the query, how
would you 'convince' it to pass the query? First the Weight is created,
weight instance creates scorer - you would have to change the API to do the
passing (or maybe not...?)
In my case, the relationships were across index segments, so I had to
collect them first - but in some other situations, when you look only at
the data inside one index segments, it _might_ be better to wait



On Fri, Dec 5, 2014 at 1:25 PM, Darin Amos dari...@gmail.com wrote:

 Couldn’t you just keep passing the wrapped query and searcher down to
 Weight.scorer()?

 This would allow you to wait until the query is executed to do term
 collection. If you want to protect against creating and executing the query
 with different searchers, you would have to make the query factory (or
 constructor) only visible to the query parser or parser plugin?

 I might not have followed you, this discussing challenges my understanding
 of Lucene and SOLR.

 Darin



  On Dec 5, 2014, at 12:47 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
  Hi Mikhail, I think you are right, it won't be problem for SOLR, but it
 is
  likely an antipattern inside a lucene component. Because custom
 components
  may create join queries, hold to them and then execute much later
 against a
  different searcher. One approach would be to postpone term collection
 until
  the query actually runs, I looked far and wide for appropriate place, but
  only found createWeight() - but at least it does give developers NO
  opportunity to shoot their feet! ;-)
 
  Since it may serve as an inspiration to someone, here is a link:
 
 https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
 
  roman
 
  On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
  Thanks Roman! Let's expand it for the sake of completeness.
  Such issue is not possible in Solr, because caches are associated with
 the
  searcher. While you follow this design (see Solr userCache), and don't
  update what's cached once, there is no chance to shoot the foot.
  There were few caches inside of Lucene (old FieldCache,
  CachingWrapperFilter, ExternalFileField, etc), but they are properly
 mapped
  onto segment keys, hence it exclude such leakage across different
  searchers.
 
  On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
  +1, additionally (as it follows from your observation) the query can
 get
  out of sync with the index, if eg it was saved for later use and ran
  against newly opened searcher
 
  Roman
  On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote:
 
  Hello All,
 
  I have been doing a lot of research in building some custom queries
  and I
  have been looking at the Lucene Join library as a reference. I noticed
  something that I believe could actually have a negative side effect.
 
  Specifically I was looking at the JoinUtil.createJoinQuery(…) method
  and
  within that method you see the following code:
 
 TermsWithScoreCollector termsWithScoreCollector =
 TermsWithScoreCollector.create(fromField,
  multipleValuesPerDocument, scoreMode);
 fromSearcher.search(fromQuery, termsWithScoreCollector);
 
  As you can see, when the JoinQuery is being built, the code is
  executing
  the query that is wraps with it’s own collector to collect all the
  scores.
  If I were to write a query parser using this library (which someone
 has
  done here), doesn’t this reduce the benefit of the SOLR query cache?
  The
  wrapped query is being executing when the Join Query is being
  constructed,
  not when it is executed.
 
  Thanks
 
  Darin
 
 
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
  mkhlud...@griddynamics.com
 




Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Darin Amos
In this case I was thinking about something like the following.. if you changed 
the Query implementation or created your own similar query:

If you consider this query: q={!scorejoin from=parent to=id}type:child

public class ScoreJoinQuery extends Query(){


private Query q = null;
private IndexSearcher s = null;

public JoinQuery(Query q, IndexSearcher s){
this.q = q;   //THis is the term query type:child
this.s = s;
}

.
.
.
public Weight createWeight(…..){
return new Weight(){
.
.
.
public Scorer scorer(){
TermsWithScoreCollector collector = new 
TermsWithScoreCollector();
JoinQuery.this.s.search(JoinQuery.this.q, 
collector);

//do the rest.. 

}

}
}
}

This is what I was thinking in my head…. but I don’t really believe it offers 
any value above how the scorcejoin query works today.



 On Dec 5, 2014, at 2:16 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
 Not sure I understand. It is the searcher which executes the query, how
 would you 'convince' it to pass the query? First the Weight is created,
 weight instance creates scorer - you would have to change the API to do the
 passing (or maybe not...?)
 In my case, the relationships were across index segments, so I had to
 collect them first - but in some other situations, when you look only at
 the data inside one index segments, it _might_ be better to wait
 
 
 
 On Fri, Dec 5, 2014 at 1:25 PM, Darin Amos dari...@gmail.com wrote:
 
 Couldn’t you just keep passing the wrapped query and searcher down to
 Weight.scorer()?
 
 This would allow you to wait until the query is executed to do term
 collection. If you want to protect against creating and executing the query
 with different searchers, you would have to make the query factory (or
 constructor) only visible to the query parser or parser plugin?
 
 I might not have followed you, this discussing challenges my understanding
 of Lucene and SOLR.
 
 Darin
 
 
 
 On Dec 5, 2014, at 12:47 PM, Roman Chyla roman.ch...@gmail.com wrote:
 
 Hi Mikhail, I think you are right, it won't be problem for SOLR, but it
 is
 likely an antipattern inside a lucene component. Because custom
 components
 may create join queries, hold to them and then execute much later
 against a
 different searcher. One approach would be to postpone term collection
 until
 the query actually runs, I looked far and wide for appropriate place, but
 only found createWeight() - but at least it does give developers NO
 opportunity to shoot their feet! ;-)
 
 Since it may serve as an inspiration to someone, here is a link:
 
 https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
 
 roman
 
 On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
 wrote:
 
 Thanks Roman! Let's expand it for the sake of completeness.
 Such issue is not possible in Solr, because caches are associated with
 the
 searcher. While you follow this design (see Solr userCache), and don't
 update what's cached once, there is no chance to shoot the foot.
 There were few caches inside of Lucene (old FieldCache,
 CachingWrapperFilter, ExternalFileField, etc), but they are properly
 mapped
 onto segment keys, hence it exclude such leakage across different
 searchers.
 
 On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
 +1, additionally (as it follows from your observation) the query can
 get
 out of sync with the index, if eg it was saved for later use and ran
 against newly opened searcher
 
 Roman
 On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote:
 
 Hello All,
 
 I have been doing a lot of research in building some custom queries
 and I
 have been looking at the Lucene Join library as a reference. I noticed
 something that I believe could actually have a negative side effect.
 
 Specifically I was looking at the JoinUtil.createJoinQuery(…) method
 and
 within that method you see the following code:
 
   TermsWithScoreCollector termsWithScoreCollector =
   TermsWithScoreCollector.create(fromField,
 multipleValuesPerDocument, scoreMode);
   fromSearcher.search(fromQuery, termsWithScoreCollector);
 
 As you can see, when the JoinQuery is being built, the code is
 executing
 the query that is wraps with it’s own collector to collect all the
 scores.
 If I were to write a query parser using this library (which someone
 has
 done here), doesn’t this reduce the benefit of the SOLR query cache?
 The
 wrapped query is being executing when the Join Query is being
 constructed,
 not when it is executed.
 
 Thanks
 
 Darin
 
 
 
 
 
 --
 Sincerely 

Re: Anti-Pattern in lucent-join jar?

2014-12-04 Thread Mikhail Khludnev
Hello,

I wonder if you see https://issues.apache.org/jira/browse/SOLR-6234 which
solves such problem.
QueryResult Cache are useless for join, because they carry cropped results.
Potentially you can hit filter cache wrapping fromQuery into this monster
bridge
new FilteredQuery(new MatchAllDocsQuery(),
filterCache.get(fromQuery).getTopFilter())
however, you refer to TermsWithScoreCollector, but filterCache doesn't
stores scores.
fromQuery is not a hotspot for JoinQuery usually (I spoke about it at last
LuceneRevolution)
Fwiw, it's common to have a heavy processing at Lucene level eg. see
RangeQuery. The idea is to cache the result of query execution (but not the
intermediate data) on the levels above like it's done Solr's filterCache or
queryResultCache.
Hope it helps


On Thu, Dec 4, 2014 at 6:49 PM, Darin Amos dari...@gmail.com wrote:

 Hello All,

 I have been doing a lot of research in building some custom queries and I
 have been looking at the Lucene Join library as a reference. I noticed
 something that I believe could actually have a negative side effect.

 Specifically I was looking at the JoinUtil.createJoinQuery(…) method and
 within that method you see the following code:

 TermsWithScoreCollector termsWithScoreCollector =
 TermsWithScoreCollector.create(fromField,
 multipleValuesPerDocument, scoreMode);
 fromSearcher.search(fromQuery, termsWithScoreCollector);

 As you can see, when the JoinQuery is being built, the code is executing
 the query that is wraps with it’s own collector to collect all the scores.
 If I were to write a query parser using this library (which someone has
 done here), doesn’t this reduce the benefit of the SOLR query cache? The
 wrapped query is being executing when the Join Query is being constructed,
 not when it is executed.

 Thanks

 Darin





-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Anti-Pattern in lucent-join jar?

2014-12-04 Thread Roman Chyla
+1, additionally (as it follows from your observation) the query can get
out of sync with the index, if eg it was saved for later use and ran
against newly opened searcher

Roman
On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote:

 Hello All,

 I have been doing a lot of research in building some custom queries and I
 have been looking at the Lucene Join library as a reference. I noticed
 something that I believe could actually have a negative side effect.

 Specifically I was looking at the JoinUtil.createJoinQuery(…) method and
 within that method you see the following code:

 TermsWithScoreCollector termsWithScoreCollector =
 TermsWithScoreCollector.create(fromField,
 multipleValuesPerDocument, scoreMode);
 fromSearcher.search(fromQuery, termsWithScoreCollector);

 As you can see, when the JoinQuery is being built, the code is executing
 the query that is wraps with it’s own collector to collect all the scores.
 If I were to write a query parser using this library (which someone has
 done here), doesn’t this reduce the benefit of the SOLR query cache? The
 wrapped query is being executing when the Join Query is being constructed,
 not when it is executed.

 Thanks

 Darin