See below On 4/5/07, Ryan O'Hara <[EMAIL PROTECTED]> wrote:
Hey Erick, Thanks for the quick response. I need a truly exact match. What I ended up doing was using a TOKENIZED field, but altering the StandardAnalyzer's stop word list to include only the word/letter 'a'. Below is my searching code:
You're heading for a world of hurt here. I flat guarantee that you'll spend a LOT of time tweaking this to get it right, because you're not doing what you said you wanted to do, that is match exactly. You said it yourself, you're "removing stop words" which is NOT exact matching. You need to make a clear statement of what you're trying to do before too much longer. An example or two would help. String[] stopWords = {"a"};
StandardAnalyzer sa = new StandardAnalyzer(stopWords); QueryParser qp = new QueryParser("symbol", sa); Query queryPhrase = qp.parse(symbol.toUpperCase()); Hits hits = searcher.search(queryPhrase); String hit; if(hits.length() > 0){ hit = hits.doc(0).get("count"); count = Integer.parseInt(hit); }
You're uppercasing your search string. Did you index things uppercased? Your QueryParser is trying to work on a field called "symbol". Is that what you intend? I'm a bit confused when you use the same word for the field and its value. Is the reason it wasn't working due to the fact that I'm passing in a
StandardAnalyzer?
I have no idea why it isn't working. You haven't provided the query.toString() output. There's no evidence you've used Luke to look at what you've indexed to know what to expect. Without those two things, or a self-contained example I can only guess. I thought that maybe the searching mechanisms
would be able to use or not use an analyzer according to what the field.index value is.
No, you can't do this. You can make a PerFieldAnalyzerWrapper to process different *fields* with different analyzers, but not different field *values*. Step back and consider that the purpose of an Analyzer is to break up the token stream. Attaching meaning to those tokens isn't part of an Analyzer's job. You could create your own analyzer (see Lucene In Action) if you wanted to do special operations on the tokens, but I really don't think you need to go there. One other question that you may have an answer to: I'm eventually
going to need to alter the stop word list to include all default stop words, except those that match certain criteria. Can this be done?
Sure, this can be done. Again you've already done it with your StandardAnalyzer constructor. But you better have *indexed* the data with the same stop words removed or you'll be disappointed. Again, you're doing something that directly contradicts what you stated you are trying to do. So can you come up with a better problem statement? I'd also recommend that you use one of the other analyzers for a while until you get a better feel for what Lucene is doing. StopAnalyzer comes to mind. Thanks,
Ryan On Apr 5, 2007, at 3:08 PM, Erick Erickson wrote: > Yes, you can search on UN_TOKENIZED fields, but they're exact, > really, really exact <G>. > > I'd recommend that you get a copy of Luke (google lucene luke) and > examine your index to see what you actually have in your index. > > Also, you haven't provided us a clue what the actual query is. I'd > use Query.toString(). > > I suspect that the query is getting tokenized if you're using one > of the normal Analyzers... > > Erick > > On 4/5/07, Ryan O'Hara <[EMAIL PROTECTED]> wrote: >> >> Hey, >> >> I was just wondering if you are supposed to be able to search on >> UN_TOKENIZED fields? It seems like you can from the docs, but I have >> been unsuccessful. I want to do exact string matching on a certain >> field without analyzer interference. >> >> Thanks, >> Ryan >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]