Re: Problems with exact matces on non-tokenized fields...

2002-11-14 Thread Stefanos Karasavvidis
> Doesn't that one do just that - treats fields differently, based on
> their name?

yes it does, but look at the question's title
"How do I write my own Analyzer?"

if someone has a problem with a non-tokenized field (which was the 
problem of the mail thread that started this) then he doesn't know that 
he has to write a custom analyzer, and so he won't be able to find the 
correct faq entry.

Moreover, the second solution Doug has proposed suites better in some 
cases and should be included, too. (Doug has written these solutions in 
a mail to the users list on 27/9/2002 9:24 p.m.)

I still think that there should be a faq entry as I propose in my 
previous email.

Moreover, there should be an addition to the faq entry
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q15

it states there that it is important to use the same analyzer during 
indexing and searching. Again this may lead to problems if a field is 
not tokenized (during indexing it will _not_ get passed through the 
analyzer, but during searching it get's passed. If the analyzer does not 
treat that field as a special case, there will be a problem.)

I don't know, maybe I'm missing something here, but it seems obvious to 
me that non tokenized fields in conjuction with analyzers produce 
problems which should be mentioned in documenation/faq etc.

Stefanos

Otis Gospodnetic wrote:

Not sure which FAQ entry you are refering to.
This one http://www.jguru.com/faq/view.jsp?EID=1006122 ?

Doesn't that one do just that - treats fields differently, based on
their name?

Otis

--- Stefanos Karasavvidis <[EMAIL PROTECTED]> wrote:
 

I came accross the same problem and I think that the faq entry you 
(Otis) propose should get a better title so that users can find more 
easily an answer to this problem.

Correct me if I'm wrong (and please forgive any wrong assumptions I
may 
have made), put the problem is on "how to query on a non tokenized
field?"

Problem explanation:
If a field is not tokenized than it is not passed through the
analyzer, 
independently of the used analyzer (that's what I understand by
looking 
into DocumentWriter.invertDocument()).
If  you construct a query with a given analyzer  (for example with 
QueryParser.parse(query, field, analyzer))  with this field, the 
queryparser does not know that this field is not tokenized and passes
it 
through the analyzer. Ther analyzer may alter the query (for example
if 
the analyzer has a stemming algorithm) and the document is not
matched 
uppon the query.

The solution:
The solution is to make sure that fields that aren't tokenized during

indexig, are not passed through the analyzer during searching. This
can 
be done in 2 ways, either by making an analyzer that takes care of
this 
according to the field,  or by constructing a TermQuery with this
field 
and adding it to the rest of the query

Example:
put here the 2 examples from Doug

Stefanos 



Otis Gospodnetic wrote:

   

Thanks, it's a FAQ entry now:

How do I write my own Analyzer?
http://www.jguru.com/faq/view.jsp?EID=1006122

Otis


--- Doug Cutting <[EMAIL PROTECTED]> wrote:


 

karl øie wrote:
  

   

I have a Lucene Document with a field named "element" which is


 

stored 
  

   

and indexed but not tokenized. The value of the field is "POST" 
(uppercase). But the only way i can match the field is by entering
 

"element:POST?" or "element:POST*" in the QueryParser class.


 

There are two ways to do this.

If this must be entered by users in the query string, then you need
to 
use a non-lowercasing analyzer for this field.  The way to do this
   

if
   

you're currently using StandardAnalyzer, is to do something like:

 public class MyAnalyzer extends Analyzer {
   private Analyzer standard = new StandardAnalyzer();
   public TokenStream tokenStream(String field, final Reader
reader) {
 if ("element".equals(field)) {// don't tokenize
   return new CharTokenizer(reader) {
 protected boolean isTokenChar(char c) { return true; }
   };
 } else {  // use standard
   

analyzer
   

   return standard.tokenStream(field, reader);
 }
   }
 }

 Analyzer analyzer = new MyAnalyzer();
 Query query = queryParser.parse("... +element:POST", analyzer);

Alternately, if this query field is added by a program, then this
   

can
   

be 
done by bypassing the analyzer for this class, building this clause
   

directly instead:

 Analyzer analyzer = new StandardAnalyzer();
 BooleanQuery query = (BooleanQuery)queryParser.parse("...",
analyzer);

 // now add the element clause
 query.add(new TermQuery(new Term("element", "POST"))), true,
false);

Perhaps this should become an FAQ...

Doug


--
To unsubscribe, e-mail:  

For additional commands, e-mail:


  

   

__

Re: Problems with exact matces on non-tokenized fields...

2002-11-13 Thread Otis Gospodnetic
Not sure which FAQ entry you are refering to.
This one http://www.jguru.com/faq/view.jsp?EID=1006122 ?

Doesn't that one do just that - treats fields differently, based on
their name?

Otis

--- Stefanos Karasavvidis <[EMAIL PROTECTED]> wrote:
> I came accross the same problem and I think that the faq entry you 
> (Otis) propose should get a better title so that users can find more 
> easily an answer to this problem.
> 
> Correct me if I'm wrong (and please forgive any wrong assumptions I
> may 
> have made), put the problem is on "how to query on a non tokenized
> field?"
> 
> Problem explanation:
> If a field is not tokenized than it is not passed through the
> analyzer, 
> independently of the used analyzer (that's what I understand by
> looking 
> into DocumentWriter.invertDocument()).
> If  you construct a query with a given analyzer  (for example with 
> QueryParser.parse(query, field, analyzer))  with this field, the 
> queryparser does not know that this field is not tokenized and passes
> it 
> through the analyzer. Ther analyzer may alter the query (for example
> if 
> the analyzer has a stemming algorithm) and the document is not
> matched 
> uppon the query.
> 
> The solution:
> The solution is to make sure that fields that aren't tokenized during
> 
> indexig, are not passed through the analyzer during searching. This
> can 
> be done in 2 ways, either by making an analyzer that takes care of
> this 
> according to the field,  or by constructing a TermQuery with this
> field 
> and adding it to the rest of the query
> 
> Example:
> put here the 2 examples from Doug
> 
> Stefanos 
> 
> 
> 
> Otis Gospodnetic wrote:
> 
> >Thanks, it's a FAQ entry now:
> >
> >How do I write my own Analyzer?
> >http://www.jguru.com/faq/view.jsp?EID=1006122
> >
> >Otis
> >
> >
> >--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> >  
> >
> >>karl øie wrote:
> >>
> >>
> >>>I have a Lucene Document with a field named "element" which is
> >>>  
> >>>
> >>stored 
> >>
> >>
> >>>and indexed but not tokenized. The value of the field is "POST" 
> >>>(uppercase). But the only way i can match the field is by entering
> 
> >>>"element:POST?" or "element:POST*" in the QueryParser class.
> >>>  
> >>>
> >>There are two ways to do this.
> >>
> >>If this must be entered by users in the query string, then you need
> >>to 
> >>use a non-lowercasing analyzer for this field.  The way to do this
> if
> >>
> >>you're currently using StandardAnalyzer, is to do something like:
> >>
> >>   public class MyAnalyzer extends Analyzer {
> >> private Analyzer standard = new StandardAnalyzer();
> >> public TokenStream tokenStream(String field, final Reader
> >>reader) {
> >>   if ("element".equals(field)) {// don't tokenize
> >> return new CharTokenizer(reader) {
> >>   protected boolean isTokenChar(char c) { return true; }
> >> };
> >>   } else {  // use standard
> analyzer
> >> return standard.tokenStream(field, reader);
> >>   }
> >> }
> >>   }
> >>
> >>   Analyzer analyzer = new MyAnalyzer();
> >>   Query query = queryParser.parse("... +element:POST", analyzer);
> >>
> >>Alternately, if this query field is added by a program, then this
> can
> >>be 
> >>done by bypassing the analyzer for this class, building this clause
> 
> >>directly instead:
> >>
> >>   Analyzer analyzer = new StandardAnalyzer();
> >>   BooleanQuery query = (BooleanQuery)queryParser.parse("...",
> >>analyzer);
> >>
> >>   // now add the element clause
> >>   query.add(new TermQuery(new Term("element", "POST"))), true,
> >>false);
> >>
> >>Perhaps this should become an FAQ...
> >>
> >>Doug
> >>
> >>
> >>--
> >>To unsubscribe, e-mail:  
> >>
> >>For additional commands, e-mail:
> >>
> >>
> >>
> >>
> >
> >
> >__
> >Do you Yahoo!?
> >New DSL Internet Access from SBC & Yahoo!
> >http://sbc.yahoo.com
> >
> >--
> >To unsubscribe, e-mail:  
> 
> >For additional commands, e-mail:
> 
> >
> >
> >  
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> 
> For additional commands, e-mail:
> 
> 


__
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Problems with exact matces on non-tokenized fields...

2002-11-13 Thread Stefanos Karasavvidis
I came accross the same problem and I think that the faq entry you 
(Otis) propose should get a better title so that users can find more 
easily an answer to this problem.

Correct me if I'm wrong (and please forgive any wrong assumptions I may 
have made), put the problem is on "how to query on a non tokenized field?"

Problem explanation:
If a field is not tokenized than it is not passed through the analyzer, 
independently of the used analyzer (that's what I understand by looking 
into DocumentWriter.invertDocument()).
If  you construct a query with a given analyzer  (for example with 
QueryParser.parse(query, field, analyzer))  with this field, the 
queryparser does not know that this field is not tokenized and passes it 
through the analyzer. Ther analyzer may alter the query (for example if 
the analyzer has a stemming algorithm) and the document is not matched 
uppon the query.

The solution:
The solution is to make sure that fields that aren't tokenized during 
indexig, are not passed through the analyzer during searching. This can 
be done in 2 ways, either by making an analyzer that takes care of this 
according to the field,  or by constructing a TermQuery with this field 
and adding it to the rest of the query

Example:
put here the 2 examples from Doug

Stefanos 



Otis Gospodnetic wrote:

Thanks, it's a FAQ entry now:

How do I write my own Analyzer?
http://www.jguru.com/faq/view.jsp?EID=1006122

Otis


--- Doug Cutting <[EMAIL PROTECTED]> wrote:
 

karl øie wrote:
   

I have a Lucene Document with a field named "element" which is
 

stored 
   

and indexed but not tokenized. The value of the field is "POST" 
(uppercase). But the only way i can match the field is by entering 
"element:POST?" or "element:POST*" in the QueryParser class.
 

There are two ways to do this.

If this must be entered by users in the query string, then you need
to 
use a non-lowercasing analyzer for this field.  The way to do this if

you're currently using StandardAnalyzer, is to do something like:

  public class MyAnalyzer extends Analyzer {
private Analyzer standard = new StandardAnalyzer();
public TokenStream tokenStream(String field, final Reader
reader) {
  if ("element".equals(field)) {// don't tokenize
return new CharTokenizer(reader) {
  protected boolean isTokenChar(char c) { return true; }
};
  } else {  // use standard analyzer
return standard.tokenStream(field, reader);
  }
}
  }

  Analyzer analyzer = new MyAnalyzer();
  Query query = queryParser.parse("... +element:POST", analyzer);

Alternately, if this query field is added by a program, then this can
be 
done by bypassing the analyzer for this class, building this clause 
directly instead:

  Analyzer analyzer = new StandardAnalyzer();
  BooleanQuery query = (BooleanQuery)queryParser.parse("...",
analyzer);

  // now add the element clause
  query.add(new TermQuery(new Term("element", "POST"))), true,
false);

Perhaps this should become an FAQ...

Doug


--
To unsubscribe, e-mail:  

For additional commands, e-mail:


   



__
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


 



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Problems with exact matces on non-tokenized fields...

2002-10-01 Thread Otis Gospodnetic

Thanks, it's a FAQ entry now:

How do I write my own Analyzer?
http://www.jguru.com/faq/view.jsp?EID=1006122

Otis


--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> karl øie wrote:
> > I have a Lucene Document with a field named "element" which is
> stored 
> > and indexed but not tokenized. The value of the field is "POST" 
> > (uppercase). But the only way i can match the field is by entering 
> > "element:POST?" or "element:POST*" in the QueryParser class.
> 
> There are two ways to do this.
> 
> If this must be entered by users in the query string, then you need
> to 
> use a non-lowercasing analyzer for this field.  The way to do this if
> 
> you're currently using StandardAnalyzer, is to do something like:
> 
>public class MyAnalyzer extends Analyzer {
>  private Analyzer standard = new StandardAnalyzer();
>  public TokenStream tokenStream(String field, final Reader
> reader) {
>if ("element".equals(field)) {// don't tokenize
>  return new CharTokenizer(reader) {
>protected boolean isTokenChar(char c) { return true; }
>  };
>} else {  // use standard analyzer
>  return standard.tokenStream(field, reader);
>}
>  }
>}
> 
>Analyzer analyzer = new MyAnalyzer();
>Query query = queryParser.parse("... +element:POST", analyzer);
> 
> Alternately, if this query field is added by a program, then this can
> be 
> done by bypassing the analyzer for this class, building this clause 
> directly instead:
> 
>Analyzer analyzer = new StandardAnalyzer();
>BooleanQuery query = (BooleanQuery)queryParser.parse("...",
> analyzer);
> 
>// now add the element clause
>query.add(new TermQuery(new Term("element", "POST"))), true,
> false);
> 
> Perhaps this should become an FAQ...
> 
> Doug
> 
> 
> --
> To unsubscribe, e-mail:  
> 
> For additional commands, e-mail:
> 
> 


__
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Problems with exact matces on non-tokenized fields...

2002-10-01 Thread karl øie

it works :-) when i see this i understand that the term being parsed by 
the queryparser is sent trough the analyzer as well... thanks!

mvh karl øie

On torsdag, sep 26, 2002, at 18:44 Europe/Oslo, Doug Cutting wrote:

> karl øie wrote:
>> I have a Lucene Document with a field named "element" which is stored 
>> and indexed but not tokenized. The value of the field is "POST" 
>> (uppercase). But the only way i can match the field is by entering 
>> "element:POST?" or "element:POST*" in the QueryParser class.
>
> There are two ways to do this.
>
> If this must be entered by users in the query string, then you need to 
> use a non-lowercasing analyzer for this field.  The way to do this if 
> you're currently using StandardAnalyzer, is to do something like:
>
>   public class MyAnalyzer extends Analyzer {
> private Analyzer standard = new StandardAnalyzer();
> public TokenStream tokenStream(String field, final Reader reader) {
>   if ("element".equals(field)) {// don't tokenize
> return new CharTokenizer(reader) {
>   protected boolean isTokenChar(char c) { return true; }
> };
>   } else {  // use standard analyzer
> return standard.tokenStream(field, reader);
>   }
> }
>   }
>
>   Analyzer analyzer = new MyAnalyzer();
>   Query query = queryParser.parse("... +element:POST", analyzer);
>
> Alternately, if this query field is added by a program, then this can 
> be done by bypassing the analyzer for this class, building this clause 
> directly instead:
>
>   Analyzer analyzer = new StandardAnalyzer();
>   BooleanQuery query = (BooleanQuery)queryParser.parse("...", 
> analyzer);
>
>   // now add the element clause
>   query.add(new TermQuery(new Term("element", "POST"))), true, false);
>
> Perhaps this should become an FAQ...
>
> Doug
>
>
> --
> To unsubscribe, e-mail:   
> 
> For additional commands, e-mail: 
> 
>


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Problems with exact matces on non-tokenized fields...

2002-09-27 Thread Alex Murzaku

Thanks! Now that I think of it, I was searching in the documentation for
a method to reset the document 'd' to "empty" once it is indexed so that
it could be reused but I didn't find one and then the bug slipped
through. I was afraid that all these objects might not be garbage
collected in time. In a test much smaller than infinite:
for (i=0; i<=1; i++) {
Document d = new Document();
d.add(Field.Keyword("nr", Integer.toString(i)));
d.add(Field.Keyword("element","POST"));
writer.addDocument(d);
}
I got very soon java.lang.OutOfMemoryError but, by just forcing garbage
collection at the end of the cycle, the memory usage is now a very flat
line... Sorry for bothering you.

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]] 
Sent: Friday, September 27, 2002 2:24 PM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...


lex Murzaku wrote:
> I was trying this as well but now I get something I can't understand: 
> My query (Query: +element:POST +nr:3) is supposed to match only one 
> record. Indeed Lucene returns that record with the highest score but 
> it also returns others that shouldn't be there at all even if it was 
> an OR query. Another observation: it returns all records where "nr" >=

> 3. Notice the last record returned doesn't contain neither "POST" nor 
> "3". I am attaching a self contained running example with this problem

> and would appreciate any comment.
>  
> 0.6869936 Keyword Keyword
> 0.63916886 Keyword Keyword
> 0.6044586 Keyword Keyword
> 0.5773442 Keyword Keyword
> 0.56318253 Keyword Keyword
> 0.54449975 Keyword Keyword
> 0.5247468 Keyword Keyword
> 0.45054603 Keyword Keyword

Phew!  It took me a while to spot this one...

The bug is with your test program.  You keep adding fields to the same 
document instance.  If you change your program to print the entire 
document, you'll see:

Query: +element:POST +nr:3
0.6869936 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.63916886 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>
0.6044586 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>
0.5773442 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.56318253 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.54449975 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>
0.5247468 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.45054603 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>

So you need to create a new document instance each time.  I've attached 
a modified version of your test program that does this and gives the 
results you desire:

Query: +element:POST +nr:3
1.0 Document Keyword>

Doug


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: Problems with exact matces on non-tokenized fields...

2002-09-27 Thread Doug Cutting

lex Murzaku wrote:
> I was trying this as well but now I get something I can't understand:
> My query (Query: +element:POST +nr:3) is supposed to match only one
> record. Indeed Lucene returns that record with the highest score but it
> also returns others that shouldn't be there at all even if it was an OR
> query. Another observation: it returns all records where "nr" >= 3.
> Notice the last record returned doesn't contain neither "POST" nor "3".
> I am attaching a self contained running example with this problem and
> would appreciate any comment.
>  
> 0.6869936 Keyword Keyword
> 0.63916886 Keyword Keyword
> 0.6044586 Keyword Keyword
> 0.5773442 Keyword Keyword
> 0.56318253 Keyword Keyword
> 0.54449975 Keyword Keyword
> 0.5247468 Keyword Keyword
> 0.45054603 Keyword Keyword

Phew!  It took me a while to spot this one...

The bug is with your test program.  You keep adding fields to the same 
document instance.  If you change your program to print the entire 
document, you'll see:

Query: +element:POST +nr:3
0.6869936 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.63916886 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>
0.6044586 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>
0.5773442 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.56318253 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.54449975 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>
0.5247468 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword>
0.45054603 Document Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword 
Keyword Keyword Keyword Keyword>

So you need to create a new document instance each time.  I've attached 
a modified version of your test program that does this and gives the 
results you desire:

Query: +element:POST +nr:3
1.0 Document Keyword>

Doug



TestField.java
Description: Binary data

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


RE: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread Alex Murzaku

I was trying this as well but now I get something I can't understand:
My query (Query: +element:POST +nr:3) is supposed to match only one
record. Indeed Lucene returns that record with the highest score but it
also returns others that shouldn't be there at all even if it was an OR
query. Another observation: it returns all records where "nr" >= 3.
Notice the last record returned doesn't contain neither "POST" nor "3".
I am attaching a self contained running example with this problem and
would appreciate any comment.
 
0.6869936 Keyword Keyword
0.63916886 Keyword Keyword
0.6044586 Keyword Keyword
0.5773442 Keyword Keyword
0.56318253 Keyword Keyword
0.54449975 Keyword Keyword
0.5247468 Keyword Keyword
0.45054603 Keyword Keyword


-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, September 26, 2002 12:44 PM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...


karl øie wrote:
> I have a Lucene Document with a field named "element" which is stored
> and indexed but not tokenized. The value of the field is "POST" 
> (uppercase). But the only way i can match the field is by entering 
> "element:POST?" or "element:POST*" in the QueryParser class.

There are two ways to do this.

If this must be entered by users in the query string, then you need to 
use a non-lowercasing analyzer for this field.  The way to do this if 
you're currently using StandardAnalyzer, is to do something like:

   public class MyAnalyzer extends Analyzer {
 private Analyzer standard = new StandardAnalyzer();
 public TokenStream tokenStream(String field, final Reader reader) {
   if ("element".equals(field)) {// don't tokenize
 return new CharTokenizer(reader) {
   protected boolean isTokenChar(char c) { return true; }
 };
   } else {  // use standard analyzer
 return standard.tokenStream(field, reader);
   }
 }
   }

   Analyzer analyzer = new MyAnalyzer();
   Query query = queryParser.parse("... +element:POST", analyzer);

Alternately, if this query field is added by a program, then this can be

done by bypassing the analyzer for this class, building this clause 
directly instead:

   Analyzer analyzer = new StandardAnalyzer();
   BooleanQuery query = (BooleanQuery)queryParser.parse("...",
analyzer);

   // now add the element clause
   query.add(new TermQuery(new Term("element", "POST"))), true, false);

Perhaps this should become an FAQ...

Doug


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>



TestField.java
Description: Binary data

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


Re: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread Doug Cutting

karl øie wrote:
> I have a Lucene Document with a field named "element" which is stored 
> and indexed but not tokenized. The value of the field is "POST" 
> (uppercase). But the only way i can match the field is by entering 
> "element:POST?" or "element:POST*" in the QueryParser class.

There are two ways to do this.

If this must be entered by users in the query string, then you need to 
use a non-lowercasing analyzer for this field.  The way to do this if 
you're currently using StandardAnalyzer, is to do something like:

   public class MyAnalyzer extends Analyzer {
 private Analyzer standard = new StandardAnalyzer();
 public TokenStream tokenStream(String field, final Reader reader) {
   if ("element".equals(field)) {// don't tokenize
 return new CharTokenizer(reader) {
   protected boolean isTokenChar(char c) { return true; }
 };
   } else {  // use standard analyzer
 return standard.tokenStream(field, reader);
   }
 }
   }

   Analyzer analyzer = new MyAnalyzer();
   Query query = queryParser.parse("... +element:POST", analyzer);

Alternately, if this query field is added by a program, then this can be 
done by bypassing the analyzer for this class, building this clause 
directly instead:

   Analyzer analyzer = new StandardAnalyzer();
   BooleanQuery query = (BooleanQuery)queryParser.parse("...", analyzer);

   // now add the element clause
   query.add(new TermQuery(new Term("element", "POST"))), true, false);

Perhaps this should become an FAQ...

Doug


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread Dave Peixotto

I have also observed this behavior.

- Original Message -
From: "karl øie" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, September 26, 2002 4:50 AM
Subject: Problems with exact matces on non-tokenized fields...


Hi, i have a problem with getting a exact match on a non-tokenized
field.

I have a Lucene Document with a field named "element" which is stored
and indexed but not tokenized. The value of the field is "POST"
(uppercase). But the only way i can match the field is by entering
"element:POST?" or "element:POST*" in the QueryParser class.

Have anyone here run into this problem?

I am using the 1.2 release version of Lucene.

Mvh Karl Øie


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread Alex Murzaku

sorry about that - it was early in the morning...
my guess is that the analyzer you are passing to queryparser lowercases
"POST" but doesn't "POST*" or "POST?". could you try seeing the values
of your query when it is going to the searcher?

-Original Message-
From: karl øie [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, September 26, 2002 8:22 AM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...


Hm.. a misunderstanding: i don't create the field with the value 
"POST?" i create it with "POST". "element:POST?" or "element:POST*" are 
the strings i send to the QueryParser for searching.

mvh Karl Øie

On torsdag, sep 26, 2002, at 14:13 Europe/Oslo, Alex Murzaku wrote:

> But indeed "POST" does not match to "POST?". If you are not tokenizing

> the field, the character "?" remains there together with everything 
> else.
>
> -Original Message-
> From: karl øie [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, September 26, 2002 7:50 AM
> To: Lucene Users List
> Subject: Problems with exact matces on non-tokenized fields...
>
>
> Hi, i have a problem with getting a exact match on a non-tokenized 
> field.
>
> I have a Lucene Document with a field named "element" which is stored 
> and indexed but not tokenized. The value of the field is "POST" 
> (uppercase). But the only way i can match the field is by entering 
> "element:POST?" or "element:POST*" in the QueryParser class.
>
> Have anyone here run into this problem?
>
> I am using the 1.2 release version of Lucene.
>
> Mvh Karl Øie
>
>
> --
> To unsubscribe, e-mail: 
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: 
> <mailto:[EMAIL PROTECTED]>
>
>
> --
> To unsubscribe, e-mail:   
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Re: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread karl øie

Hm.. a misunderstanding: i don't create the field with the value 
"POST?" i create it with "POST". "element:POST?" or "element:POST*" are 
the strings i send to the QueryParser for searching.

mvh Karl Øie

On torsdag, sep 26, 2002, at 14:13 Europe/Oslo, Alex Murzaku wrote:

> But indeed "POST" does not match to "POST?". If you are not tokenizing
> the field, the character "?" remains there together with everything
> else.
>
> -Original Message-
> From: karl øie [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, September 26, 2002 7:50 AM
> To: Lucene Users List
> Subject: Problems with exact matces on non-tokenized fields...
>
>
> Hi, i have a problem with getting a exact match on a non-tokenized
> field.
>
> I have a Lucene Document with a field named "element" which is stored
> and indexed but not tokenized. The value of the field is "POST"
> (uppercase). But the only way i can match the field is by entering
> "element:POST?" or "element:POST*" in the QueryParser class.
>
> Have anyone here run into this problem?
>
> I am using the 1.2 release version of Lucene.
>
> Mvh Karl Øie
>
>
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
>
>
> --
> To unsubscribe, e-mail:   
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: 
> <mailto:[EMAIL PROTECTED]>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




RE: Problems with exact matces on non-tokenized fields...

2002-09-26 Thread Alex Murzaku

But indeed "POST" does not match to "POST?". If you are not tokenizing
the field, the character "?" remains there together with everything
else.

-Original Message-
From: karl øie [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, September 26, 2002 7:50 AM
To: Lucene Users List
Subject: Problems with exact matces on non-tokenized fields...


Hi, i have a problem with getting a exact match on a non-tokenized 
field.

I have a Lucene Document with a field named "element" which is stored 
and indexed but not tokenized. The value of the field is "POST" 
(uppercase). But the only way i can match the field is by entering 
"element:POST?" or "element:POST*" in the QueryParser class.

Have anyone here run into this problem?

I am using the 1.2 release version of Lucene.

Mvh Karl Øie


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>




Problems with exact matces on non-tokenized fields...

2002-09-26 Thread karl øie

Hi, i have a problem with getting a exact match on a non-tokenized 
field.

I have a Lucene Document with a field named "element" which is stored 
and indexed but not tokenized. The value of the field is "POST" 
(uppercase). But the only way i can match the field is by entering 
"element:POST?" or "element:POST*" in the QueryParser class.

Have anyone here run into this problem?

I am using the 1.2 release version of Lucene.

Mvh Karl Øie


--
To unsubscribe, e-mail:   
For additional commands, e-mail: