Re: Not able to search on UN_TOKENIZED fields

Erick Erickson Thu, 05 Apr 2007 15:14:17 -0700

See below

On 4/5/07, Ryan O'Hara <[EMAIL PROTECTED]> wrote:


Hey Erick,

Thanks for the quick response.  I need a truly exact match.  What I
ended up doing was using a TOKENIZED field, but altering the
StandardAnalyzer's stop word list to include only the word/letter
'a'.  Below is my searching code:




You're heading for a world of hurt here. I flat guarantee that you'll
spend a LOT of time tweaking this to get it right, because you're
not doing what you said you wanted to do, that is match exactly.
You said it yourself, you're "removing stop words" which is NOT exact
matching.

You need to make a clear statement of what you're trying to do
before too much longer. An example or two would help.

String[] stopWords = {"a"};

             StandardAnalyzer sa = new StandardAnalyzer(stopWords);
             QueryParser qp = new QueryParser("symbol", sa);
             Query queryPhrase = qp.parse(symbol.toUpperCase());
             Hits hits = searcher.search(queryPhrase);
             String hit;
             if(hits.length() > 0){
                 hit = hits.doc(0).get("count");
                 count = Integer.parseInt(hit);
             }



You're uppercasing your search string. Did you index things
uppercased? Your QueryParser is trying to work on a field called
"symbol". Is that what you intend? I'm a bit confused when you
use the same word for the field and its value.



Is the reason it wasn't working due to the fact that I'm passing in a

StandardAnalyzer?




I have no idea why it isn't working. You haven't provided the
query.toString() output. There's no evidence you've used
Luke to look at what you've indexed to know what to expect.
Without those two things, or a self-contained example I can only
guess.




I thought that maybe the searching mechanisms

would be able to use or not use an analyzer according to what the
field.index value is.



No, you can't do this. You can make a PerFieldAnalyzerWrapper to
process different *fields* with different analyzers, but not different
field *values*. Step back and consider that the purpose of an
Analyzer is to break up the token stream. Attaching
meaning to those tokens isn't part of an Analyzer's job. You
could create your own analyzer (see Lucene In Action) if you
wanted to do special operations on the tokens, but I really don't
think you need to go there.


One other question that you may have an answer to:  I'm eventually

going to need to alter the stop word list to include all default stop
words, except those that match certain criteria.  Can this be done?



Sure, this can be done. Again you've already done it with your
StandardAnalyzer constructor. But you better have *indexed*
the data with the same stop words removed or you'll be disappointed.
Again, you're doing something that directly contradicts
what you stated you are trying to do. So can you come up with
a better problem statement?


I'd also recommend that you use one of the other analyzers for a while
until you get a better feel for what Lucene is doing. StopAnalyzer comes
to mind.

Thanks,

Ryan


On Apr 5, 2007, at 3:08 PM, Erick Erickson wrote:

> Yes, you can search on UN_TOKENIZED fields, but they're exact,
> really, really exact <G>.
>
> I'd recommend that you get a copy of Luke (google lucene luke) and
> examine your index to see what you actually have in your index.
>
> Also, you haven't provided us a clue what the actual query is. I'd
> use Query.toString().
>
> I suspect that the query is getting tokenized if you're using one
> of the normal Analyzers...
>
> Erick
>
> On 4/5/07, Ryan O'Hara <[EMAIL PROTECTED]> wrote:
>>
>> Hey,
>>
>> I was just wondering if you are supposed to be able to search on
>> UN_TOKENIZED fields?  It seems like you can from the docs, but I have
>> been unsuccessful.  I want to do exact string matching on a certain
>> field without analyzer interference.
>>
>> Thanks,
>> Ryan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Not able to search on UN_TOKENIZED fields

Reply via email to