Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Apologies for the late response as this mail was lost somewhere in filters. Issue was that CommonGramsQueryFilterFactory should be used for searching and CommonGramsFilterFactory for indexing. We were using CommonGramsFilterFactory for both due to which it was not dropping single tokens for common grams in a phrase query. I will go through the link you sent and see if it needs any explanation. Thanks!
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Thanks!! Using CommonGramsQueryFilter resolved the issue. This was not there in 1.4.1 and also for some reason was not there in SOLR 4 Release Notes that we studied before upgrading. On Tue, Dec 10, 2013 at 9:55 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I never used commons gram filer but I remember there are two classes in this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that CommonsGramsQueryFilter is what you are after. http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html http://khaidoan.wikidot.com/solr-common-gram-filter On Tuesday, December 10, 2013 6:43 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: We used that syntax in 1.4.1 when Surround was not part of SOLR and has to register it. Didn't know that it is now part of SOLR. Any ways this is a red herring since I have totally removed Surround and the issue remains there. Below is the debug info when I give a simple phrase query having common words with default Query Parser. What I don't understand is that why is it including single tokens as well? I have also included the relevant config part below. rawquerystring: Contents:\only be\, querystring: Contents:\only be\, parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\), parsedquery_toString: Contents:\(only only_be) be\, QParser: LuceneQParser, = fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.CommonGramsFilterFactory words=commonwords.txt ignoreCase=true/ /analyzer /fieldtype On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote: But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Hi Salman, I personally do not perform stopword removal. So are you saying CommonGramsFilter is not useful without CommonGramsFilterQueryFilter? If yes, do you want to add a comment to confluence explaining this? https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-CommonGramsFilter On Tuesday, December 10, 2013 1:17 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: Thanks!! Using CommonGramsQueryFilter resolved the issue. This was not there in 1.4.1 and also for some reason was not there in SOLR 4 Release Notes that we studied before upgrading. On Tue, Dec 10, 2013 at 9:55 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I never used commons gram filer but I remember there are two classes in this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that CommonsGramsQueryFilter is what you are after. http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html http://khaidoan.wikidot.com/solr-common-gram-filter On Tuesday, December 10, 2013 6:43 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: We used that syntax in 1.4.1 when Surround was not part of SOLR and has to register it. Didn't know that it is now part of SOLR. Any ways this is a red herring since I have totally removed Surround and the issue remains there. Below is the debug info when I give a simple phrase query having common words with default Query Parser. What I don't understand is that why is it including single tokens as well? I have also included the relevant config part below. rawquerystring: Contents:\only be\, querystring: Contents:\only be\, parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\), parsedquery_toString: Contents:\(only only_be) be\, QParser: LuceneQParser, = fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.CommonGramsFilterFactory words=commonwords.txt ignoreCase=true/ /analyzer /fieldtype On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote: But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
We used that syntax in 1.4.1 when Surround was not part of SOLR and has to register it. Didn't know that it is now part of SOLR. Any ways this is a red herring since I have totally removed Surround and the issue remains there. Below is the debug info when I give a simple phrase query having common words with default Query Parser. What I don't understand is that why is it including single tokens as well? I have also included the relevant config part below. rawquerystring: Contents:\only be\, querystring: Contents:\only be\, parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\), parsedquery_toString: Contents:\(only only_be) be\, QParser: LuceneQParser, = fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.CommonGramsFilterFactory words=commonwords.txt ignoreCase=true/ /analyzer /fieldtype On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote: But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Hi Salman, I never used commons gram filer but I remember there are two classes in this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that CommonsGramsQueryFilter is what you are after. http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html http://khaidoan.wikidot.com/solr-common-gram-filter On Tuesday, December 10, 2013 6:43 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: We used that syntax in 1.4.1 when Surround was not part of SOLR and has to register it. Didn't know that it is now part of SOLR. Any ways this is a red herring since I have totally removed Surround and the issue remains there. Below is the debug info when I give a simple phrase query having common words with default Query Parser. What I don't understand is that why is it including single tokens as well? I have also included the relevant config part below. rawquerystring: Contents:\only be\, querystring: Contents:\only be\, parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\), parsedquery_toString: Contents:\(only only_be) be\, QParser: LuceneQParser, = fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.CommonGramsFilterFactory words=commonwords.txt ignoreCase=true/ /analyzer /fieldtype On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote: But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
SOLR 4 - Query Issue in Common Grams with Surround Query Parser
All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram