subject:"SOLR 4 \- Query Issue in Common Grams with Surround Query Parser"

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2014-01-22 Thread Salman Akram

Apologies for the late response as this mail was lost somewhere in filters.

Issue was that CommonGramsQueryFilterFactory should be used for searching
and CommonGramsFilterFactory for indexing. We were using
CommonGramsFilterFactory for both due to which it was not dropping single
tokens for common grams in a phrase query.

I will go through the link you sent and see if it needs any explanation.
Thanks!

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-10 Thread Salman Akram

Thanks!! Using CommonGramsQueryFilter resolved the issue.

This was not there in 1.4.1 and also for some reason was not there in SOLR
4 Release Notes that we studied before upgrading.

On Tue, Dec 10, 2013 at 9:55 AM, Ahmet Arslan iori...@yahoo.com wrote:

Hi Salman,

I never used commons gram filer but I remember there are two classes in
this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that
CommonsGramsQueryFilter is what you are after.

http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html

http://khaidoan.wikidot.com/solr-common-gram-filter

On Tuesday, December 10, 2013 6:43 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
We used that syntax in 1.4.1 when Surround was not part of SOLR and has to
register it. Didn't know that it is now part of SOLR. Any ways this is a
red herring since I have totally removed Surround and the issue remains
there.

Below is the debug info when I give a simple phrase query having common
words with default Query Parser. What I don't understand is that why is it
including single tokens as well? I have also included the relevant config
part below.

rawquerystring: Contents:\only be\,
querystring: Contents:\only be\,
parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\),
parsedquery_toString: Contents:\(only only_be) be\,

QParser: LuceneQParser,

fieldtype name=text class=solr.TextField
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.CommonGramsFilterFactory words=commonwords.txt
ignoreCase=true/
/analyzer
/fieldtype

On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com
wrote:

But again, as Ahmet mentioned… it doesn't look like the surround query
parser is actually being used. The debug output also mentioned the
query
parser used, but that part wasn't provided below. One thing to note
here,
the surround query parser is not available in 1.4.1. It also looks like
you're surrounding your query with angle brackets, as it says query
string
is {!surround}Contents:only be, which is not correct syntax. And one
of the most important things to note here is that the surround query
parser
does NOT use the analysis chain of the field, see
http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short,
you're going to have to do some work to get common grams factored into a
surround query (such as maybe calling to the analysis request hander to
parse the query before sending it to the surround query parser).

Erik

On Dec 9, 2013, at 9:36 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Yup on debugging I found that its coming in Analyzer. We are using
Standard
Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
its
a bug or I am missing some config.

On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com
wrote:

Hi Salman,
I am confused because with surround no analysis is applied at query
time.
I suspect that surround query parser is not kicking in. You should see
SrndQuery or something like at parser query section.

On Monday, December 9, 2013 6:24 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

All,

I posted this sub-issue with another issue few days back but maybe it
was
not obvious so posting it on a separate thread.

We recently migrated to SOLR 4.6. We use Common Grams but queries with
words in the CG list have slowed down. On debugging we found that for
CG
words the parser is adding individual tokens of those words in the
query
too which ends up slowing it. Below is an example:

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a
multiphrasequery
and adding individual tokens too. Can someone help?

SOLR 4.6 (takes 20 secs)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
be)/str
str name=parsedquery_toStringContents:(only only_be) be/str

SOLR 1.4.1 (takes 1 sec)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryContents:only_be/str
str name=parsedquery_toStringContents:only_be/str--

Regards,

Salman Akram

--
Regards,

Salman Akram

--
Regards,

Salman Akram

--
Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-10 Thread Ahmet Arslan

Hi Salman,

I personally do not perform stopword removal. So are you saying
CommonGramsFilter is not useful without CommonGramsFilterQueryFilter? If yes,
do you want to add a comment to confluence explaining this?

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-CommonGramsFilter

On Tuesday, December 10, 2013 1:17 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
Thanks!! Using CommonGramsQueryFilter resolved the issue.

This was not there in 1.4.1 and also for some reason was not there in SOLR
4 Release Notes that we studied before upgrading.

On Tue, Dec 10, 2013 at 9:55 AM, Ahmet Arslan iori...@yahoo.com wrote:

Hi Salman,

I never used commons gram filer but I remember there are two classes in
this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that
CommonsGramsQueryFilter is what you are after.

http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html

http://khaidoan.wikidot.com/solr-common-gram-filter

rawquerystring: Contents:\only be\,
querystring: Contents:\only be\,
parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\),
parsedquery_toString: Contents:\(only only_be) be\,

QParser: LuceneQParser,

On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com
wrote:

Erik

On Dec 9, 2013, at 9:36 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Yup on debugging I found that its coming in Analyzer. We are using
Standard
Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
its
a bug or I am missing some config.

On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com
wrote:

On Monday, December 9, 2013 6:24 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

All,

I posted this sub-issue with another issue few days back but maybe it
was
not obvious so posting it on a separate thread.

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a
multiphrasequery
and adding individual tokens too. Can someone help?

SOLR 1.4.1 (takes 1 sec)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryContents:only_be/str
str

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Ahmet Arslan

Hi Salman,
I am confused because with surround no analysis is applied at query time. I 
suspect that surround query parser is not kicking in. You should see SrndQuery 
or something like at parser query section.



On Monday, December 9, 2013 6:24 AM, Salman Akram 
salman.ak...@northbaysolutions.net wrote:
 
All,

I posted this sub-issue with another issue few days back but maybe it was
not obvious so posting it on a separate thread.

We recently migrated to SOLR 4.6. We use Common Grams but queries with
words in the CG list have slowed down. On debugging we found that for CG
words the parser is adding individual tokens of those words in the query
too which ends up slowing it. Below is an example:

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
and adding individual tokens too. Can someone help?

SOLR 4.6 (takes 20 secs)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str
str name=parsedquery_toStringContents:(only only_be) be/str

SOLR 1.4.1 (takes 1 sec)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryContents:only_be/str
str name=parsedquery_toStringContents:only_be/str--


Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Salman Akram

Yup on debugging I found that its coming in Analyzer. We are using Standard
Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its
a bug or I am missing some config.


On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Salman,
 I am confused because with surround no analysis is applied at query time.
 I suspect that surround query parser is not kicking in. You should see
 SrndQuery or something like at parser query section.



 On Monday, December 9, 2013 6:24 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:

 All,

 I posted this sub-issue with another issue few days back but maybe it was
 not obvious so posting it on a separate thread.

 We recently migrated to SOLR 4.6. We use Common Grams but queries with
 words in the CG list have slowed down. On debugging we found that for CG
 words the parser is adding individual tokens of those words in the query
 too which ends up slowing it. Below is an example:

 Query = only be

 Here is what debug shows. I have highlighted the red part which is
 different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
 and adding individual tokens too. Can someone help?

 SOLR 4.6 (takes 20 secs)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
 be)/str
 str name=parsedquery_toStringContents:(only only_be) be/str

 SOLR 1.4.1 (takes 1 sec)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryContents:only_be/str
 str name=parsedquery_toStringContents:only_be/str--


 Regards,

 Salman Akram




-- 
Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Erik Hatcher

But again, as Ahmet mentioned… it doesn't look like the surround query parser 
is actually being used.   The debug output also mentioned the query parser 
used, but that part wasn't provided below.  One thing to note here, the 
surround query parser is not available in 1.4.1.   It also looks like you're 
surrounding your query with angle brackets, as it says query string is 
{!surround}Contents:only be, which is not correct syntax.  And one of the 
most important things to note here is that the surround query parser does NOT 
use the analysis chain of the field, see 
http://wiki.apache.org/solr/SurroundQueryParser#Limitations.  In short, 
you're going to have to do some work to get common grams factored into a 
surround query (such as maybe calling to the analysis request hander to parse 
the query before sending it to the surround query parser).

Erik


On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net 
wrote:

 Yup on debugging I found that its coming in Analyzer. We are using Standard
 Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its
 a bug or I am missing some config.
 
 
 On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
 Hi Salman,
 I am confused because with surround no analysis is applied at query time.
 I suspect that surround query parser is not kicking in. You should see
 SrndQuery or something like at parser query section.
 
 
 
 On Monday, December 9, 2013 6:24 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:
 
 All,
 
 I posted this sub-issue with another issue few days back but maybe it was
 not obvious so posting it on a separate thread.
 
 We recently migrated to SOLR 4.6. We use Common Grams but queries with
 words in the CG list have slowed down. On debugging we found that for CG
 words the parser is adding individual tokens of those words in the query
 too which ends up slowing it. Below is an example:
 
 Query = only be
 
 Here is what debug shows. I have highlighted the red part which is
 different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
 and adding individual tokens too. Can someone help?
 
 SOLR 4.6 (takes 20 secs)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
 be)/str
 str name=parsedquery_toStringContents:(only only_be) be/str
 
 SOLR 1.4.1 (takes 1 sec)
 str name=rawquerystring{!surround}Contents:only be/str
 str name=querystring{!surround}Contents:only be/str
 str name=parsedqueryContents:only_be/str
 str name=parsedquery_toStringContents:only_be/str--
 
 
 Regards,
 
 Salman Akram
 
 
 
 
 -- 
 Regards,
 
 Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Salman Akram

We used that syntax in 1.4.1 when Surround was not part of SOLR and has to
register it. Didn't know that it is now part of SOLR. Any ways this is a
red herring since I have totally removed Surround and the issue remains
there.

Below is the debug info when I give a simple phrase query having common
words with default Query Parser. What I don't understand is that why is it
including single tokens as well? I have also included the relevant config
part below.

rawquerystring: Contents:\only be\,
querystring: Contents:\only be\,
parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\),
parsedquery_toString: Contents:\(only only_be) be\,

QParser: LuceneQParser,

=

fieldtype name=text class=solr.TextField
analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.CommonGramsFilterFactory words=commonwords.txt
ignoreCase=true/
/analyzer
/fieldtype



On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

 But again, as Ahmet mentioned… it doesn't look like the surround query
 parser is actually being used.   The debug output also mentioned the query
 parser used, but that part wasn't provided below.  One thing to note here,
 the surround query parser is not available in 1.4.1.   It also looks like
 you're surrounding your query with angle brackets, as it says query string
 is {!surround}Contents:only be, which is not correct syntax.  And one
 of the most important things to note here is that the surround query parser
 does NOT use the analysis chain of the field, see 
 http://wiki.apache.org/solr/SurroundQueryParser#Limitations.  In short,
 you're going to have to do some work to get common grams factored into a
 surround query (such as maybe calling to the analysis request hander to
 parse the query before sending it to the surround query parser).

 Erik


 On Dec 9, 2013, at 9:36 AM, Salman Akram 
 salman.ak...@northbaysolutions.net wrote:

  Yup on debugging I found that its coming in Analyzer. We are using
 Standard
  Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
 its
  a bug or I am missing some config.
 
 
  On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi Salman,
  I am confused because with surround no analysis is applied at query
 time.
  I suspect that surround query parser is not kicking in. You should see
  SrndQuery or something like at parser query section.
 
 
 
  On Monday, December 9, 2013 6:24 AM, Salman Akram 
  salman.ak...@northbaysolutions.net wrote:
 
  All,
 
  I posted this sub-issue with another issue few days back but maybe it
 was
  not obvious so posting it on a separate thread.
 
  We recently migrated to SOLR 4.6. We use Common Grams but queries with
  words in the CG list have slowed down. On debugging we found that for CG
  words the parser is adding individual tokens of those words in the query
  too which ends up slowing it. Below is an example:
 
  Query = only be
 
  Here is what debug shows. I have highlighted the red part which is
  different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
  and adding individual tokens too. Can someone help?
 
  SOLR 4.6 (takes 20 secs)
  str name=rawquerystring{!surround}Contents:only be/str
  str name=querystring{!surround}Contents:only be/str
  str name=parsedqueryMultiPhraseQuery(Contents:(only only_be)
  be)/str
  str name=parsedquery_toStringContents:(only only_be) be/str
 
  SOLR 1.4.1 (takes 1 sec)
  str name=rawquerystring{!surround}Contents:only be/str
  str name=querystring{!surround}Contents:only be/str
  str name=parsedqueryContents:only_be/str
  str name=parsedquery_toStringContents:only_be/str--
 
 
  Regards,
 
  Salman Akram
 
 
 
 
  --
  Regards,
 
  Salman Akram




-- 
Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-09 Thread Ahmet Arslan

Hi Salman,

I never used commons gram filer but I remember there are two classes in this
family. CommonGramsFilter and CommonGramsQueryFilter. It seems that
CommonsGramsQueryFilter is what you are after.

http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html

http://khaidoan.wikidot.com/solr-common-gram-filter

rawquerystring: Contents:\only be\,
querystring: Contents:\only be\,
parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\),
parsedquery_toString: Contents:\(only only_be) be\,

QParser: LuceneQParser,

On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

But again, as Ahmet mentioned… it doesn't look like the surround query
parser is actually being used. The debug output also mentioned the query
parser used, but that part wasn't provided below. One thing to note here,
the surround query parser is not available in 1.4.1. It also looks like
you're surrounding your query with angle brackets, as it says query string
is {!surround}Contents:only be, which is not correct syntax. And one
of the most important things to note here is that the surround query parser
does NOT use the analysis chain of the field, see
http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short,
you're going to have to do some work to get common grams factored into a
surround query (such as maybe calling to the analysis request hander to
parse the query before sending it to the surround query parser).

Erik

On Dec 9, 2013, at 9:36 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

Yup on debugging I found that its coming in Analyzer. We are using
Standard
Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if
its
a bug or I am missing some config.

On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote:

On Monday, December 9, 2013 6:24 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:

All,

I posted this sub-issue with another issue few days back but maybe it
was
not obvious so posting it on a separate thread.

We recently migrated to SOLR 4.6. We use Common Grams but queries with
words in the CG list have slowed down. On debugging we found that for CG
words the parser is adding individual tokens of those words in the query
too which ends up slowing it. Below is an example:

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
and adding individual tokens too. Can someone help?

Regards,

Salman Akram

--
Regards,

Salman Akram

--
Regards,

Salman Akram

SOLR 4 - Query Issue in Common Grams with Surround Query Parser

2013-12-08 Thread Salman Akram

All,

I posted this sub-issue with another issue few days back but maybe it was
not obvious so posting it on a separate thread.

We recently migrated to SOLR 4.6. We use Common Grams but queries with
words in the CG list have slowed down. On debugging we found that for CG
words the parser is adding individual tokens of those words in the query
too which ends up slowing it. Below is an example:

Query = only be

Here is what debug shows. I have highlighted the red part which is
different in both versions i.e. SOLR 4.6 is making it a multiphrasequery
and adding individual tokens too. Can someone help?

SOLR 4.6 (takes 20 secs)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str
str name=parsedquery_toStringContents:(only only_be) be/str

SOLR 1.4.1 (takes 1 sec)
str name=rawquerystring{!surround}Contents:only be/str
str name=querystring{!surround}Contents:only be/str
str name=parsedqueryContents:only_be/str
str name=parsedquery_toStringContents:only_be/str--


Regards,

Salman Akram

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser

SOLR 4 - Query Issue in Common Grams with Surround Query Parser

9 matches

Site Navigation

Mail list logo

Footer information