RE: Reduce QueryComponent prepare time

2012-11-21 Thread Markus Jelsma
Hi Mikhail,

Thanks for sharing your experiences. I'll look into the flexible query parser.

Markus
 
 
-Original message-
 From:Mikhail Khludnev mkhlud...@griddynamics.com
 Sent: Tue 20-Nov-2012 19:53
 To: solr-user@lucene.apache.org
 Subject: Re: Reduce QueryComponent prepare time
 
 Markus,
 
 It seems you faced the challenge of optimizing complex eDisMax code for
 your particular usecase, which is not so common. I can not help with these
 coding, just can share some experience: we have mind blowing queries too -
 they spawns many fields and enumerate many phrase shingles. We have similar
 contra intuitive hot spot - query parsing takes more than searching and
 faceting. But for our case dictionaries lookup - i.e. terms substitution
 and transformations are the main CPU consumption. We build our own query
 parser with something like
 http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html.
 This way, when you represent core query structure as a DOM-like nodes
 skeleton, and then transform them into particular queries instances, *might
 be more performant* (and *might be not* for you) than current eDismax.
 Nothing more useful from me.
 
 Bye.
 
 
 On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma
 markus.jel...@openindex.iowrote:
 
  Hi,
 
  Profiling pointed me directly to the method i already suspected:
  ExtendedDismaxQParser.parse(). I added manual timers in parts of the method
  and made sure the timers add up to the QueryComponent prepare time. After
  starting Solr there's one small part taking almost 100ms on a fast machine
  with lots of memory, fortunately this is only once. KStemmer and the
  loading of the KStemData and the ThaiWordFilter's init take the bulk of it.
 
ExtendedSolrQueryParser up =
  new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
up.addAlias(IMPOSSIBLE_FIELD_NAME,
  tiebreaker, queryFields);
addAliasesFromRequest(up, tiebreaker);
up.setPhraseSlop(qslop); // slop for explicit user phrase queries
up.setAllowLeadingWildcard(true);
 
  After it's been running for some time two parts continue to take a lot of
  time, parsing the query
 
if (parsedUserQuery == null) {
  sb = new StringBuilder();
  for (Clause clause : clauses) {
 
  
 
  if (parsedUserQuery instanceof BooleanQuery) {
BooleanQuery t = new BooleanQuery();
SolrPluginUtils.flattenBooleanQuery(t,
  (BooleanQuery)parsedUserQuery);
SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
parsedUserQuery = t;
  }
}
 
  and handing the phrase fields (pf, pf2, pf3):
 
if (allPhraseFields.size()  0) {
  // full phrase and shingles
  for (FieldParams phraseField: allPhraseFields) {
MapString,Float pf = new HashMapString,Float(1);
pf.put(phraseField.getField(),phraseField.getBoost());
addShingledPhraseQueries(query, normalClauses, pf,
phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
  }
}
 
  The problem is significant when having a lot of fields, the prepare time
  is usually higher than the process times of query, highlight and facet
  combined.
 
 
 
  -Original message-
   From:Mikhail Khludnev mkhlud...@griddynamics.com
   Sent: Mon 19-Nov-2012 12:52
   To: solr-user@lucene.apache.org
   Subject: Re: Reduce QueryComponent prepare time
  
   Markus,
  
   It's hard to suggest anything until you provide a profiler snapshot which
   says what it spends time in prepare for. As far as I know in prepare it
   parses queries e.g. we have a really heavy query parsers, but I don't
  think
   it's really common.
  
  
   On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
   markus.jel...@openindex.iowrote:
  
I'd also like to know which parts of the entire query constitute the
prepare time and if it would matter significantly if we extend the
  edismax
plugin and hardcode the parameters we pass into (reusable) objects.
   
Thanks,
Markus
   
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Fri 16-Nov-2012 15:57
 To: solr-user@lucene.apache.org
 Subject: Reduce QueryComponent prepare time

 Hi,

 We're seeing high prepare times for the QueryComponent, obviously
  due to
the vast amount of field and queries. It's common to have a prepare
  time of
70-80ms while the process times drop significantly due to warmed
  searchers,
OS cache etc. The prepare time is a recurring issue and i'd hope if
  there
are people here that can share some thoughts or hints.

 We're using a recent check out on a 10 node test cluster with SSD's
(although this is no IO issue) and edismax on about a hundred different
fields, this includes phrase searches over most of those fields

RE: Reduce QueryComponent prepare time

2012-11-20 Thread Markus Jelsma
Hi,

Profiling pointed me directly to the method i already suspected: 
ExtendedDismaxQParser.parse(). I added manual timers in parts of the method and 
made sure the timers add up to the QueryComponent prepare time. After starting 
Solr there's one small part taking almost 100ms on a fast machine with lots of 
memory, fortunately this is only once. KStemmer and the loading of the 
KStemData and the ThaiWordFilter's init take the bulk of it.

  ExtendedSolrQueryParser up =
new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
  up.addAlias(IMPOSSIBLE_FIELD_NAME,
tiebreaker, queryFields);
  addAliasesFromRequest(up, tiebreaker);
  up.setPhraseSlop(qslop); // slop for explicit user phrase queries
  up.setAllowLeadingWildcard(true);

After it's been running for some time two parts continue to take a lot of time, 
parsing the query

  if (parsedUserQuery == null) {
sb = new StringBuilder();
for (Clause clause : clauses) {



if (parsedUserQuery instanceof BooleanQuery) {
  BooleanQuery t = new BooleanQuery();
  SolrPluginUtils.flattenBooleanQuery(t, (BooleanQuery)parsedUserQuery);
  SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
  parsedUserQuery = t;
}
  }

and handing the phrase fields (pf, pf2, pf3):

  if (allPhraseFields.size()  0) {
// full phrase and shingles
for (FieldParams phraseField: allPhraseFields) {
  MapString,Float pf = new HashMapString,Float(1);
  pf.put(phraseField.getField(),phraseField.getBoost());
  addShingledPhraseQueries(query, normalClauses, pf,
  phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
}
  }

The problem is significant when having a lot of fields, the prepare time is 
usually higher than the process times of query, highlight and facet combined.


 
-Original message-
 From:Mikhail Khludnev mkhlud...@griddynamics.com
 Sent: Mon 19-Nov-2012 12:52
 To: solr-user@lucene.apache.org
 Subject: Re: Reduce QueryComponent prepare time
 
 Markus,
 
 It's hard to suggest anything until you provide a profiler snapshot which
 says what it spends time in prepare for. As far as I know in prepare it
 parses queries e.g. we have a really heavy query parsers, but I don't think
 it's really common.
 
 
 On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
 markus.jel...@openindex.iowrote:
 
  I'd also like to know which parts of the entire query constitute the
  prepare time and if it would matter significantly if we extend the edismax
  plugin and hardcode the parameters we pass into (reusable) objects.
 
  Thanks,
  Markus
 
  -Original message-
   From:Markus Jelsma markus.jel...@openindex.io
   Sent: Fri 16-Nov-2012 15:57
   To: solr-user@lucene.apache.org
   Subject: Reduce QueryComponent prepare time
  
   Hi,
  
   We're seeing high prepare times for the QueryComponent, obviously due to
  the vast amount of field and queries. It's common to have a prepare time of
  70-80ms while the process times drop significantly due to warmed searchers,
  OS cache etc. The prepare time is a recurring issue and i'd hope if there
  are people here that can share some thoughts or hints.
  
   We're using a recent check out on a 10 node test cluster with SSD's
  (although this is no IO issue) and edismax on about a hundred different
  fields, this includes phrase searches over most of those fields and
  SpanFirst queries on about 25 fields.  We'd like to see how we can avoid
  doing the same prepare procedure over and over again ;)
  
   Thanks,
   Markus
  
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
  mkhlud...@griddynamics.com
 


Re: Reduce QueryComponent prepare time

2012-11-20 Thread Mikhail Khludnev
Markus,

It seems you faced the challenge of optimizing complex eDisMax code for
your particular usecase, which is not so common. I can not help with these
coding, just can share some experience: we have mind blowing queries too -
they spawns many fields and enumerate many phrase shingles. We have similar
contra intuitive hot spot - query parsing takes more than searching and
faceting. But for our case dictionaries lookup - i.e. terms substitution
and transformations are the main CPU consumption. We build our own query
parser with something like
http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html.
This way, when you represent core query structure as a DOM-like nodes
skeleton, and then transform them into particular queries instances, *might
be more performant* (and *might be not* for you) than current eDismax.
Nothing more useful from me.

Bye.


On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi,

 Profiling pointed me directly to the method i already suspected:
 ExtendedDismaxQParser.parse(). I added manual timers in parts of the method
 and made sure the timers add up to the QueryComponent prepare time. After
 starting Solr there's one small part taking almost 100ms on a fast machine
 with lots of memory, fortunately this is only once. KStemmer and the
 loading of the KStemData and the ThaiWordFilter's init take the bulk of it.

   ExtendedSolrQueryParser up =
 new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME);
   up.addAlias(IMPOSSIBLE_FIELD_NAME,
 tiebreaker, queryFields);
   addAliasesFromRequest(up, tiebreaker);
   up.setPhraseSlop(qslop); // slop for explicit user phrase queries
   up.setAllowLeadingWildcard(true);

 After it's been running for some time two parts continue to take a lot of
 time, parsing the query

   if (parsedUserQuery == null) {
 sb = new StringBuilder();
 for (Clause clause : clauses) {

 

 if (parsedUserQuery instanceof BooleanQuery) {
   BooleanQuery t = new BooleanQuery();
   SolrPluginUtils.flattenBooleanQuery(t,
 (BooleanQuery)parsedUserQuery);
   SolrPluginUtils.setMinShouldMatch(t, minShouldMatch);
   parsedUserQuery = t;
 }
   }

 and handing the phrase fields (pf, pf2, pf3):

   if (allPhraseFields.size()  0) {
 // full phrase and shingles
 for (FieldParams phraseField: allPhraseFields) {
   MapString,Float pf = new HashMapString,Float(1);
   pf.put(phraseField.getField(),phraseField.getBoost());
   addShingledPhraseQueries(query, normalClauses, pf,
   phraseField.getWordGrams(),tiebreaker, phraseField.getSlop());
 }
   }

 The problem is significant when having a lot of fields, the prepare time
 is usually higher than the process times of query, highlight and facet
 combined.



 -Original message-
  From:Mikhail Khludnev mkhlud...@griddynamics.com
  Sent: Mon 19-Nov-2012 12:52
  To: solr-user@lucene.apache.org
  Subject: Re: Reduce QueryComponent prepare time
 
  Markus,
 
  It's hard to suggest anything until you provide a profiler snapshot which
  says what it spends time in prepare for. As far as I know in prepare it
  parses queries e.g. we have a really heavy query parsers, but I don't
 think
  it's really common.
 
 
  On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
  markus.jel...@openindex.iowrote:
 
   I'd also like to know which parts of the entire query constitute the
   prepare time and if it would matter significantly if we extend the
 edismax
   plugin and hardcode the parameters we pass into (reusable) objects.
  
   Thanks,
   Markus
  
   -Original message-
From:Markus Jelsma markus.jel...@openindex.io
Sent: Fri 16-Nov-2012 15:57
To: solr-user@lucene.apache.org
Subject: Reduce QueryComponent prepare time
   
Hi,
   
We're seeing high prepare times for the QueryComponent, obviously
 due to
   the vast amount of field and queries. It's common to have a prepare
 time of
   70-80ms while the process times drop significantly due to warmed
 searchers,
   OS cache etc. The prepare time is a recurring issue and i'd hope if
 there
   are people here that can share some thoughts or hints.
   
We're using a recent check out on a 10 node test cluster with SSD's
   (although this is no IO issue) and edismax on about a hundred different
   fields, this includes phrase searches over most of those fields and
   SpanFirst queries on about 25 fields.  We'd like to see how we can
 avoid
   doing the same prepare procedure over and over again ;)
   
Thanks,
Markus
   
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com

RE: Reduce QueryComponent prepare time

2012-11-19 Thread Markus Jelsma
I'd also like to know which parts of the entire query constitute the prepare 
time and if it would matter significantly if we extend the edismax plugin and 
hardcode the parameters we pass into (reusable) objects.

Thanks,
Markus
 
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Fri 16-Nov-2012 15:57
 To: solr-user@lucene.apache.org
 Subject: Reduce QueryComponent prepare time
 
 Hi,
 
 We're seeing high prepare times for the QueryComponent, obviously due to the 
 vast amount of field and queries. It's common to have a prepare time of 
 70-80ms while the process times drop significantly due to warmed searchers, 
 OS cache etc. The prepare time is a recurring issue and i'd hope if there are 
 people here that can share some thoughts or hints.
 
 We're using a recent check out on a 10 node test cluster with SSD's (although 
 this is no IO issue) and edismax on about a hundred different fields, this 
 includes phrase searches over most of those fields and SpanFirst queries on 
 about 25 fields.  We'd like to see how we can avoid doing the same prepare 
 procedure over and over again ;)
 
 Thanks,
 Markus
 


Re: Reduce QueryComponent prepare time

2012-11-19 Thread Mikhail Khludnev
Markus,

It's hard to suggest anything until you provide a profiler snapshot which
says what it spends time in prepare for. As far as I know in prepare it
parses queries e.g. we have a really heavy query parsers, but I don't think
it's really common.


On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 I'd also like to know which parts of the entire query constitute the
 prepare time and if it would matter significantly if we extend the edismax
 plugin and hardcode the parameters we pass into (reusable) objects.

 Thanks,
 Markus

 -Original message-
  From:Markus Jelsma markus.jel...@openindex.io
  Sent: Fri 16-Nov-2012 15:57
  To: solr-user@lucene.apache.org
  Subject: Reduce QueryComponent prepare time
 
  Hi,
 
  We're seeing high prepare times for the QueryComponent, obviously due to
 the vast amount of field and queries. It's common to have a prepare time of
 70-80ms while the process times drop significantly due to warmed searchers,
 OS cache etc. The prepare time is a recurring issue and i'd hope if there
 are people here that can share some thoughts or hints.
 
  We're using a recent check out on a 10 node test cluster with SSD's
 (although this is no IO issue) and edismax on about a hundred different
 fields, this includes phrase searches over most of those fields and
 SpanFirst queries on about 25 fields.  We'd like to see how we can avoid
 doing the same prepare procedure over and over again ;)
 
  Thanks,
  Markus
 




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Reduce QueryComponent prepare time

2012-11-16 Thread Markus Jelsma
Hi,

We're seeing high prepare times for the QueryComponent, obviously due to the 
vast amount of field and queries. It's common to have a prepare time of 70-80ms 
while the process times drop significantly due to warmed searchers, OS cache 
etc. The prepare time is a recurring issue and i'd hope if there are people 
here that can share some thoughts or hints.

We're using a recent check out on a 10 node test cluster with SSD's (although 
this is no IO issue) and edismax on about a hundred different fields, this 
includes phrase searches over most of those fields and SpanFirst queries on 
about 25 fields.  We'd like to see how we can avoid doing the same prepare 
procedure over and over again ;)

Thanks,
Markus