RE: Reduce QueryComponent prepare time
Hi Mikhail, Thanks for sharing your experiences. I'll look into the flexible query parser. Markus -Original message- From:Mikhail Khludnev mkhlud...@griddynamics.com Sent: Tue 20-Nov-2012 19:53 To: solr-user@lucene.apache.org Subject: Re: Reduce QueryComponent prepare time Markus, It seems you faced the challenge of optimizing complex eDisMax code for your particular usecase, which is not so common. I can not help with these coding, just can share some experience: we have mind blowing queries too - they spawns many fields and enumerate many phrase shingles. We have similar contra intuitive hot spot - query parsing takes more than searching and faceting. But for our case dictionaries lookup - i.e. terms substitution and transformations are the main CPU consumption. We build our own query parser with something like http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html. This way, when you represent core query structure as a DOM-like nodes skeleton, and then transform them into particular queries instances, *might be more performant* (and *might be not* for you) than current eDismax. Nothing more useful from me. Bye. On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Profiling pointed me directly to the method i already suspected: ExtendedDismaxQParser.parse(). I added manual timers in parts of the method and made sure the timers add up to the QueryComponent prepare time. After starting Solr there's one small part taking almost 100ms on a fast machine with lots of memory, fortunately this is only once. KStemmer and the loading of the KStemData and the ThaiWordFilter's init take the bulk of it. ExtendedSolrQueryParser up = new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME); up.addAlias(IMPOSSIBLE_FIELD_NAME, tiebreaker, queryFields); addAliasesFromRequest(up, tiebreaker); up.setPhraseSlop(qslop); // slop for explicit user phrase queries up.setAllowLeadingWildcard(true); After it's been running for some time two parts continue to take a lot of time, parsing the query if (parsedUserQuery == null) { sb = new StringBuilder(); for (Clause clause : clauses) { if (parsedUserQuery instanceof BooleanQuery) { BooleanQuery t = new BooleanQuery(); SolrPluginUtils.flattenBooleanQuery(t, (BooleanQuery)parsedUserQuery); SolrPluginUtils.setMinShouldMatch(t, minShouldMatch); parsedUserQuery = t; } } and handing the phrase fields (pf, pf2, pf3): if (allPhraseFields.size() 0) { // full phrase and shingles for (FieldParams phraseField: allPhraseFields) { MapString,Float pf = new HashMapString,Float(1); pf.put(phraseField.getField(),phraseField.getBoost()); addShingledPhraseQueries(query, normalClauses, pf, phraseField.getWordGrams(),tiebreaker, phraseField.getSlop()); } } The problem is significant when having a lot of fields, the prepare time is usually higher than the process times of query, highlight and facet combined. -Original message- From:Mikhail Khludnev mkhlud...@griddynamics.com Sent: Mon 19-Nov-2012 12:52 To: solr-user@lucene.apache.org Subject: Re: Reduce QueryComponent prepare time Markus, It's hard to suggest anything until you provide a profiler snapshot which says what it spends time in prepare for. As far as I know in prepare it parses queries e.g. we have a really heavy query parsers, but I don't think it's really common. On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields
RE: Reduce QueryComponent prepare time
Hi, Profiling pointed me directly to the method i already suspected: ExtendedDismaxQParser.parse(). I added manual timers in parts of the method and made sure the timers add up to the QueryComponent prepare time. After starting Solr there's one small part taking almost 100ms on a fast machine with lots of memory, fortunately this is only once. KStemmer and the loading of the KStemData and the ThaiWordFilter's init take the bulk of it. ExtendedSolrQueryParser up = new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME); up.addAlias(IMPOSSIBLE_FIELD_NAME, tiebreaker, queryFields); addAliasesFromRequest(up, tiebreaker); up.setPhraseSlop(qslop); // slop for explicit user phrase queries up.setAllowLeadingWildcard(true); After it's been running for some time two parts continue to take a lot of time, parsing the query if (parsedUserQuery == null) { sb = new StringBuilder(); for (Clause clause : clauses) { if (parsedUserQuery instanceof BooleanQuery) { BooleanQuery t = new BooleanQuery(); SolrPluginUtils.flattenBooleanQuery(t, (BooleanQuery)parsedUserQuery); SolrPluginUtils.setMinShouldMatch(t, minShouldMatch); parsedUserQuery = t; } } and handing the phrase fields (pf, pf2, pf3): if (allPhraseFields.size() 0) { // full phrase and shingles for (FieldParams phraseField: allPhraseFields) { MapString,Float pf = new HashMapString,Float(1); pf.put(phraseField.getField(),phraseField.getBoost()); addShingledPhraseQueries(query, normalClauses, pf, phraseField.getWordGrams(),tiebreaker, phraseField.getSlop()); } } The problem is significant when having a lot of fields, the prepare time is usually higher than the process times of query, highlight and facet combined. -Original message- From:Mikhail Khludnev mkhlud...@griddynamics.com Sent: Mon 19-Nov-2012 12:52 To: solr-user@lucene.apache.org Subject: Re: Reduce QueryComponent prepare time Markus, It's hard to suggest anything until you provide a profiler snapshot which says what it spends time in prepare for. As far as I know in prepare it parses queries e.g. we have a really heavy query parsers, but I don't think it's really common. On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Reduce QueryComponent prepare time
Markus, It seems you faced the challenge of optimizing complex eDisMax code for your particular usecase, which is not so common. I can not help with these coding, just can share some experience: we have mind blowing queries too - they spawns many fields and enumerate many phrase shingles. We have similar contra intuitive hot spot - query parsing takes more than searching and faceting. But for our case dictionaries lookup - i.e. terms substitution and transformations are the main CPU consumption. We build our own query parser with something like http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.html. This way, when you represent core query structure as a DOM-like nodes skeleton, and then transform them into particular queries instances, *might be more performant* (and *might be not* for you) than current eDismax. Nothing more useful from me. Bye. On Tue, Nov 20, 2012 at 7:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Profiling pointed me directly to the method i already suspected: ExtendedDismaxQParser.parse(). I added manual timers in parts of the method and made sure the timers add up to the QueryComponent prepare time. After starting Solr there's one small part taking almost 100ms on a fast machine with lots of memory, fortunately this is only once. KStemmer and the loading of the KStemData and the ThaiWordFilter's init take the bulk of it. ExtendedSolrQueryParser up = new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME); up.addAlias(IMPOSSIBLE_FIELD_NAME, tiebreaker, queryFields); addAliasesFromRequest(up, tiebreaker); up.setPhraseSlop(qslop); // slop for explicit user phrase queries up.setAllowLeadingWildcard(true); After it's been running for some time two parts continue to take a lot of time, parsing the query if (parsedUserQuery == null) { sb = new StringBuilder(); for (Clause clause : clauses) { if (parsedUserQuery instanceof BooleanQuery) { BooleanQuery t = new BooleanQuery(); SolrPluginUtils.flattenBooleanQuery(t, (BooleanQuery)parsedUserQuery); SolrPluginUtils.setMinShouldMatch(t, minShouldMatch); parsedUserQuery = t; } } and handing the phrase fields (pf, pf2, pf3): if (allPhraseFields.size() 0) { // full phrase and shingles for (FieldParams phraseField: allPhraseFields) { MapString,Float pf = new HashMapString,Float(1); pf.put(phraseField.getField(),phraseField.getBoost()); addShingledPhraseQueries(query, normalClauses, pf, phraseField.getWordGrams(),tiebreaker, phraseField.getSlop()); } } The problem is significant when having a lot of fields, the prepare time is usually higher than the process times of query, highlight and facet combined. -Original message- From:Mikhail Khludnev mkhlud...@griddynamics.com Sent: Mon 19-Nov-2012 12:52 To: solr-user@lucene.apache.org Subject: Re: Reduce QueryComponent prepare time Markus, It's hard to suggest anything until you provide a profiler snapshot which says what it spends time in prepare for. As far as I know in prepare it parses queries e.g. we have a really heavy query parsers, but I don't think it's really common. On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com
RE: Reduce QueryComponent prepare time
I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus
Re: Reduce QueryComponent prepare time
Markus, It's hard to suggest anything until you provide a profiler snapshot which says what it spends time in prepare for. As far as I know in prepare it parses queries e.g. we have a really heavy query parsers, but I don't think it's really common. On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: I'd also like to know which parts of the entire query constitute the prepare time and if it would matter significantly if we extend the edismax plugin and hardcode the parameters we pass into (reusable) objects. Thanks, Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 16-Nov-2012 15:57 To: solr-user@lucene.apache.org Subject: Reduce QueryComponent prepare time Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Reduce QueryComponent prepare time
Hi, We're seeing high prepare times for the QueryComponent, obviously due to the vast amount of field and queries. It's common to have a prepare time of 70-80ms while the process times drop significantly due to warmed searchers, OS cache etc. The prepare time is a recurring issue and i'd hope if there are people here that can share some thoughts or hints. We're using a recent check out on a 10 node test cluster with SSD's (although this is no IO issue) and edismax on about a hundred different fields, this includes phrase searches over most of those fields and SpanFirst queries on about 25 fields. We'd like to see how we can avoid doing the same prepare procedure over and over again ;) Thanks, Markus