Re: EdgeNGram relevancy

2010-11-16 Thread Robert Gründler
thanks for the explanation.

the results for the autocompletion are pretty good now, but we still have a 
small problem. 

When there are hits in the edgytext2 fields, results which only have hits in 
the edgytext field
should not be returned at all.

Example:

Query: Martin Sco

Current Results (in that order):

- Martin Scorsese
- Martin Lawrence
- Joseph Martin

However, in an autocompletion context, only Martin Scorsese makes sense, the 
2 others are logically
not correct.

I'm not sure if this can be solved on the solr side, or if we should implement 
the logic in the
application.


thanks!

-robert







On Nov 12, 2010, at 12:13 AM, Jonathan Rochkind wrote:

 Without the parens, the edgytext: only applied to Mr, the default field 
 still applied to Scorcese.
 
 The double quotes are neccesary in the second case (rather than parens), 
 because on a non-tokenized field because the standard query parser will 
 pre-tokenize on whitespace before sending individual white-space seperated 
 words to match the index. If the index includes multi-word tokens with 
 internal whitespace, they will never match. But the standard query parser 
 doesn't pre-tokenize like this, it passes the whole phrase to the index 
 intact.
 
 Robert Gründler wrote:
 Did you run your query without using () and  operators? If yes can you 
 try this?
 q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0

 
 I didn't use () and  in my query before. Using the query with those 
 operators
 works now, stopwords are thrown out as the should, thanks.
 
 However, i don't understand how the () and  operators affect the 
 StopWordFilter.
 
 Could you give a brief explanation for the above example?
 
 thanks!
 
 
 -robert
 
 
 
 
 
  



Re: EdgeNGram relevancy

2010-11-16 Thread Robert Gründler
it seems adding the '+' (required) operator to each term in a multi-term query 
does the trick:

http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#+

ie: edgytext2:(+Martin +Sco)


-robert



On Nov 16, 2010, at 8:52 PM, Robert Gründler wrote:

 thanks for the explanation.
 
 the results for the autocompletion are pretty good now, but we still have a 
 small problem. 
 
 When there are hits in the edgytext2 fields, results which only have hits 
 in the edgytext field
 should not be returned at all.
 
 Example:
 
 Query: Martin Sco
 
 Current Results (in that order):
 
 - Martin Scorsese
 - Martin Lawrence
 - Joseph Martin
 
 However, in an autocompletion context, only Martin Scorsese makes sense, 
 the 2 others are logically
 not correct.
 
 I'm not sure if this can be solved on the solr side, or if we should 
 implement the logic in the
 application.
 
 
 thanks!
 
 -robert
 
 
 
 
 
 
 
 On Nov 12, 2010, at 12:13 AM, Jonathan Rochkind wrote:
 
 Without the parens, the edgytext: only applied to Mr, the default field 
 still applied to Scorcese.
 
 The double quotes are neccesary in the second case (rather than parens), 
 because on a non-tokenized field because the standard query parser will 
 pre-tokenize on whitespace before sending individual white-space seperated 
 words to match the index. If the index includes multi-word tokens with 
 internal whitespace, they will never match. But the standard query parser 
 doesn't pre-tokenize like this, it passes the whole phrase to the index 
 intact.
 
 Robert Gründler wrote:
 Did you run your query without using () and  operators? If yes can you 
 try this?
 q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0
 
 
 I didn't use () and  in my query before. Using the query with those 
 operators
 works now, stopwords are thrown out as the should, thanks.
 
 However, i don't understand how the () and  operators affect the 
 StopWordFilter.
 
 Could you give a brief explanation for the above example?
 
 thanks!
 
 
 -robert
 
 
 
 
 
 
 



EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
Hi,

consider the following fieldtype (used for autocompletion):

  fieldType name=edgytext class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true / 
 filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z]) replacement= replace=all /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=25 /
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z]) replacement= replace=all /
   /analyzer
  /fieldType


This works fine as long as the query string is a single word. For multiple 
words, the ranking is weird though.

Example:

Query String: Bill Cl

Result (in that order):

- Clyde Phillips
- Clay Rogers
- Roger Cloud
- Bill Clinton

Bill Clinton should have the highest rank in that case.  

Has anyone an idea how to to configure this fieldtype to make matches in both 
tokens rank higher than those who match in either token?


thanks!


-robert





Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan
You can add an additional field, with using KeywordTokenizerFactory instead of 
WhitespaceTokenizerFactory. And query both these fields with an OR operator. 

edgytext:(Bill Cl) OR edgytext2:Bill Cl

You can even apply boost so that begins with matches comes first.

--- On Thu, 11/11/10, Robert Gründler rob...@dubture.com wrote:

 From: Robert Gründler rob...@dubture.com
 Subject: EdgeNGram relevancy
 To: solr-user@lucene.apache.org
 Date: Thursday, November 11, 2010, 5:51 PM
 Hi,
 
 consider the following fieldtype (used for
 autocompletion):
 
   fieldType name=edgytext class=solr.TextField
 positionIncrementGap=100
    analyzer type=index
      tokenizer
 class=solr.WhitespaceTokenizerFactory/
      filter
 class=solr.LowerCaseFilterFactory/
      filter
 class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true
 /     
      filter
 class=solr.PatternReplaceFilterFactory pattern=([^a-z])
 replacement= replace=all /
      filter
 class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /
    /analyzer
    analyzer type=query
      tokenizer
 class=solr.WhitespaceTokenizerFactory/
      filter
 class=solr.LowerCaseFilterFactory/
      filter
 class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
      filter
 class=solr.PatternReplaceFilterFactory pattern=([^a-z])
 replacement= replace=all /
    /analyzer
   /fieldType
 
 
 This works fine as long as the query string is a single
 word. For multiple words, the ranking is weird though.
 
 Example:
 
 Query String: Bill Cl
 
 Result (in that order):
 
 - Clyde Phillips
 - Clay Rogers
 - Roger Cloud
 - Bill Clinton
 
 Bill Clinton should have the highest rank in that
 case.  
 
 Has anyone an idea how to to configure this fieldtype to
 make matches in both tokens rank higher than those who match
 in either token?
 
 
 thanks!
 
 
 -robert
 
 
 
 





Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
thanks a lot, that setup works pretty well now.

the only problem now is that the StopWords do not work that good anymore. I'll 
provide an example, but first the 2 fieldtypes:

  !-- autocomplete field which finds matches inside strings (scor matches 
Martin Scorsese) --
  
  fieldType name=edgytext class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true / 
 filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z]) replacement= replace=all /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=25 /
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z]) replacement= replace=all /
   /analyzer
  /fieldType
  
  !-- autocomplete field which finds startsWith matches only (scor matches 
only Scorpio, but not Martin Scorsese) --  

  fieldType name=edgytext2 class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z]) replacement= replace=all /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=25 /
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory 
pattern=([^a-z]) replacement= replace=all /
   /analyzer
  /fieldType


This setup now makes troubles regarding StopWords, here's an example:

Let's say the index contains 2 Strings: Mr Martin Scorsese and Martin 
Scorsese. Mr is in the stopword list.

Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0

This way, the only result i get is Mr Martin Scorsese, because the strict 
field edgytext2 is boosted by 2.0. 

Any idea why in this case Martin Scorsese is not in the result at all?


thanks again!


-robert






On Nov 11, 2010, at 5:57 PM, Ahmet Arslan wrote:

 You can add an additional field, with using KeywordTokenizerFactory instead 
 of WhitespaceTokenizerFactory. And query both these fields with an OR 
 operator. 
 
 edgytext:(Bill Cl) OR edgytext2:Bill Cl
 
 You can even apply boost so that begins with matches comes first.
 
 --- On Thu, 11/11/10, Robert Gründler rob...@dubture.com wrote:
 
 From: Robert Gründler rob...@dubture.com
 Subject: EdgeNGram relevancy
 To: solr-user@lucene.apache.org
 Date: Thursday, November 11, 2010, 5:51 PM
 Hi,
 
 consider the following fieldtype (used for
 autocompletion):
 
   fieldType name=edgytext class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer
 class=solr.WhitespaceTokenizerFactory/
  filter
 class=solr.LowerCaseFilterFactory/
  filter
 class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true
 / 
  filter
 class=solr.PatternReplaceFilterFactory pattern=([^a-z])
 replacement= replace=all /
  filter
 class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /
/analyzer
analyzer type=query
  tokenizer
 class=solr.WhitespaceTokenizerFactory/
  filter
 class=solr.LowerCaseFilterFactory/
  filter
 class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
  filter
 class=solr.PatternReplaceFilterFactory pattern=([^a-z])
 replacement= replace=all /
/analyzer
   /fieldType
 
 
 This works fine as long as the query string is a single
 word. For multiple words, the ranking is weird though.
 
 Example:
 
 Query String: Bill Cl
 
 Result (in that order):
 
 - Clyde Phillips
 - Clay Rogers
 - Roger Cloud
 - Bill Clinton
 
 Bill Clinton should have the highest rank in that
 case.  
 
 Has anyone an idea how to to configure this fieldtype to
 make matches in both tokens rank higher than those who match
 in either token?
 
 
 thanks!
 
 
 -robert
 
 
 
 
 
 
 



Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan
 This setup now makes troubles regarding StopWords, here's
 an example:
 
 Let's say the index contains 2 Strings: Mr Martin
 Scorsese and Martin Scorsese. Mr is in the stopword
 list.
 
 Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0
 
 This way, the only result i get is Mr Martin Scorsese,
 because the strict field edgytext2 is boosted by 2.0. 
 
 Any idea why in this case Martin Scorsese is not in the
 result at all?

Did you run your query without using () and  operators? If yes can you try 
this?
q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0

If no can you paste output of debugQuery=on


  


Re: EdgeNGram relevancy

2010-11-11 Thread Nick Martin

On 12 Nov 2010, at 01:46, Ahmet Arslan iori...@yahoo.com wrote:

 This setup now makes troubles regarding StopWords, here's
 an example:
 
 Let's say the index contains 2 Strings: Mr Martin
 Scorsese and Martin Scorsese. Mr is in the stopword
 list.
 
 Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0
 
 This way, the only result i get is Mr Martin Scorsese,
 because the strict field edgytext2 is boosted by 2.0. 
 
 Any idea why in this case Martin Scorsese is not in the
 result at all?
 
 Did you run your query without using () and  operators? If yes can you try 
 this?
 q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0
 
 If no can you paste output of debugQuery=on
 
 
 

This would still not deal with the problem of removing stop words from the 
indexing and query analysis stages.

I really need something that will allow that and give a single token as in the 
example below.

Best

Nick

Re: EdgeNGram relevancy

2010-11-11 Thread Andy
Could anyone help me understand what does Clyde Phillips appear in the 
results for Bill Cl??

Clyde Phillips doesn't produce any EdgeNGram that would match Bill Cl, so 
why is it even in the results?

Thanks.

--- On Thu, 11/11/10, Ahmet Arslan iori...@yahoo.com wrote:

 You can add an additional field, with
 using KeywordTokenizerFactory instead of
 WhitespaceTokenizerFactory. And query both these fields with
 an OR operator. 
 
 edgytext:(Bill Cl) OR edgytext2:Bill Cl
 
 You can even apply boost so that begins with matches comes
 first.
 
 --- On Thu, 11/11/10, Robert Gründler rob...@dubture.com
 wrote:
 
  From: Robert Gründler rob...@dubture.com
  Subject: EdgeNGram relevancy
  To: solr-user@lucene.apache.org
  Date: Thursday, November 11, 2010, 5:51 PM
  Hi,
  
  consider the following fieldtype (used for
  autocompletion):
  
    fieldType name=edgytext
 class=solr.TextField
  positionIncrementGap=100
     analyzer type=index
       tokenizer
  class=solr.WhitespaceTokenizerFactory/
       filter
  class=solr.LowerCaseFilterFactory/
       filter
  class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true
  /     
       filter
  class=solr.PatternReplaceFilterFactory
 pattern=([^a-z])
  replacement= replace=all /
       filter
  class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=25 /
     /analyzer
     analyzer type=query
       tokenizer
  class=solr.WhitespaceTokenizerFactory/
       filter
  class=solr.LowerCaseFilterFactory/
       filter
  class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true
 /
       filter
  class=solr.PatternReplaceFilterFactory
 pattern=([^a-z])
  replacement= replace=all /
     /analyzer
    /fieldType
  
  
  This works fine as long as the query string is a
 single
  word. For multiple words, the ranking is weird
 though.
  
  Example:
  
  Query String: Bill Cl
  
  Result (in that order):
  
  - Clyde Phillips
  - Clay Rogers
  - Roger Cloud
  - Bill Clinton
  
  Bill Clinton should have the highest rank in that
  case.  
  
  Has anyone an idea how to to configure this fieldtype
 to
  make matches in both tokens rank higher than those who
 match
  in either token?
  
  
  thanks!
  
  
  -robert
  
  
  
  
 
 
 
 





Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
according to the fieldtype i posted previously, i think it's because of:

1. WhiteSpaceTokenizer splits the String Clyde Phillips into 2 tokens: 
Clyde and Phillips
2. EdgeNGramFilter gets the 2 tokens, and creates an EdgeNGram for each token: 
C Cl Cly ...   AND  P Ph Phi ...

The Query String Bill Cl gets split up in 2 Tokens Bill and Cl by the 
WhitespaceTokenizer.

This creates a match for the 2nd token Ci of the query, and one of the 
subtokens the EdgeNGramFilter created: Cl.


-robert




On Nov 11, 2010, at 21:34 , Andy wrote:

 Could anyone help me understand what does Clyde Phillips appear in the 
 results for Bill Cl??
 
 Clyde Phillips doesn't produce any EdgeNGram that would match Bill Cl, so 
 why is it even in the results?
 
 Thanks.
 
 --- On Thu, 11/11/10, Ahmet Arslan iori...@yahoo.com wrote:
 
 You can add an additional field, with
 using KeywordTokenizerFactory instead of
 WhitespaceTokenizerFactory. And query both these fields with
 an OR operator. 
 
 edgytext:(Bill Cl) OR edgytext2:Bill Cl
 
 You can even apply boost so that begins with matches comes
 first.
 
 --- On Thu, 11/11/10, Robert Gründler rob...@dubture.com
 wrote:
 
 From: Robert Gründler rob...@dubture.com
 Subject: EdgeNGram relevancy
 To: solr-user@lucene.apache.org
 Date: Thursday, November 11, 2010, 5:51 PM
 Hi,
 
 consider the following fieldtype (used for
 autocompletion):
 
   fieldType name=edgytext
 class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer
 class=solr.WhitespaceTokenizerFactory/
  filter
 class=solr.LowerCaseFilterFactory/
  filter
 class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true
 / 
  filter
 class=solr.PatternReplaceFilterFactory
 pattern=([^a-z])
 replacement= replace=all /
  filter
 class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 /
/analyzer
analyzer type=query
  tokenizer
 class=solr.WhitespaceTokenizerFactory/
  filter
 class=solr.LowerCaseFilterFactory/
  filter
 class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true
 /
  filter
 class=solr.PatternReplaceFilterFactory
 pattern=([^a-z])
 replacement= replace=all /
/analyzer
   /fieldType
 
 
 This works fine as long as the query string is a
 single
 word. For multiple words, the ranking is weird
 though.
 
 Example:
 
 Query String: Bill Cl
 
 Result (in that order):
 
 - Clyde Phillips
 - Clay Rogers
 - Roger Cloud
 - Bill Clinton
 
 Bill Clinton should have the highest rank in that
 case.  
 
 Has anyone an idea how to to configure this fieldtype
 to
 make matches in both tokens rank higher than those who
 match
 in either token?
 
 
 thanks!
 
 
 -robert
 
 
 
 
 
 
 
 
 
 
 



Re: EdgeNGram relevancy

2010-11-11 Thread Andy
Ah I see. Thanks for the explanation.

Could you set the defaultOperator to AND? That way both Bill and Cl must 
be a match and that would exclude Clyde Phillips.


--- On Thu, 11/11/10, Robert Gründler rob...@dubture.com wrote:

 From: Robert Gründler rob...@dubture.com
 Subject: Re: EdgeNGram relevancy
 To: solr-user@lucene.apache.org
 Date: Thursday, November 11, 2010, 3:51 PM
 according to the fieldtype i posted
 previously, i think it's because of:
 
 1. WhiteSpaceTokenizer splits the String Clyde Phillips
 into 2 tokens: Clyde and Phillips
 2. EdgeNGramFilter gets the 2 tokens, and creates an
 EdgeNGram for each token: C Cl Cly
 ...   AND  P Ph Phi ...
 
 The Query String Bill Cl gets split up in 2 Tokens Bill
 and Cl by the WhitespaceTokenizer.
 
 This creates a match for the 2nd token Ci of the query,
 and one of the subtokens the EdgeNGramFilter created:
 Cl.
 
 
 -robert
 
 
 
 
 On Nov 11, 2010, at 21:34 , Andy wrote:
 
  Could anyone help me understand what does Clyde
 Phillips appear in the results for Bill Cl??
  
  Clyde Phillips doesn't produce any EdgeNGram that
 would match Bill Cl, so why is it even in the results?
  
  Thanks.
  
  --- On Thu, 11/11/10, Ahmet Arslan iori...@yahoo.com
 wrote:
  
  You can add an additional field, with
  using KeywordTokenizerFactory instead of
  WhitespaceTokenizerFactory. And query both these
 fields with
  an OR operator. 
  
  edgytext:(Bill Cl) OR edgytext2:Bill Cl
  
  You can even apply boost so that begins with
 matches comes
  first.
  
  --- On Thu, 11/11/10, Robert Gründler rob...@dubture.com
  wrote:
  
  From: Robert Gründler rob...@dubture.com
  Subject: EdgeNGram relevancy
  To: solr-user@lucene.apache.org
  Date: Thursday, November 11, 2010, 5:51 PM
  Hi,
  
  consider the following fieldtype (used for
  autocompletion):
  
    fieldType
 name=edgytext
  class=solr.TextField
  positionIncrementGap=100
     analyzer type=index
       tokenizer
  class=solr.WhitespaceTokenizerFactory/
       filter
  class=solr.LowerCaseFilterFactory/
       filter
  class=solr.StopFilterFactory
 ignoreCase=true
  words=stopwords.txt
 enablePositionIncrements=true
  /     
           filter
  class=solr.PatternReplaceFilterFactory
  pattern=([^a-z])
  replacement= replace=all /
       filter
  class=solr.EdgeNGramFilterFactory
 minGramSize=1
  maxGramSize=25 /
     /analyzer
     analyzer type=query
       tokenizer
  class=solr.WhitespaceTokenizerFactory/
       filter
  class=solr.LowerCaseFilterFactory/
       filter
  class=solr.StopFilterFactory
 ignoreCase=true
  words=stopwords.txt
 enablePositionIncrements=true
  /
           filter
  class=solr.PatternReplaceFilterFactory
  pattern=([^a-z])
  replacement= replace=all /
     /analyzer
    /fieldType
  
  
  This works fine as long as the query string is
 a
  single
  word. For multiple words, the ranking is
 weird
  though.
  
  Example:
  
  Query String: Bill Cl
  
  Result (in that order):
  
  - Clyde Phillips
  - Clay Rogers
  - Roger Cloud
  - Bill Clinton
  
  Bill Clinton should have the highest rank in
 that
  case.  
  
  Has anyone an idea how to to configure this
 fieldtype
  to
  make matches in both tokens rank higher than
 those who
  match
  in either token?
  
  
  thanks!
  
  
  -robert
  
  
  
  
  
  
  
  
  
  
  
 
 





Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
 
 Did you run your query without using () and  operators? If yes can you try 
 this?
 q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0

I didn't use () and  in my query before. Using the query with those operators
works now, stopwords are thrown out as the should, thanks.

However, i don't understand how the () and  operators affect the 
StopWordFilter.

Could you give a brief explanation for the above example?

thanks!


-robert






Re: EdgeNGram relevancy

2010-11-11 Thread Jonathan Rochkind
Without the parens, the edgytext: only applied to Mr, the default 
field still applied to Scorcese.


The double quotes are neccesary in the second case (rather than parens), 
because on a non-tokenized field because the standard query parser will 
pre-tokenize on whitespace before sending individual white-space 
seperated words to match the index. If the index includes multi-word 
tokens with internal whitespace, they will never match. But the standard 
query parser doesn't pre-tokenize like this, it passes the whole 
phrase to the index intact.


Robert Gründler wrote:

Did you run your query without using () and  operators? If yes can you try 
this?
q=edgytext:(Mr Scorsese) OR edgytext2:Mr Scorsese^2.0



I didn't use () and  in my query before. Using the query with those operators
works now, stopwords are thrown out as the should, thanks.

However, i don't understand how the () and  operators affect the 
StopWordFilter.

Could you give a brief explanation for the above example?

thanks!


-robert