When searching for phrases, what's important is the position of each token/word extracted by the Analyzer. WhitespaceAnalyzer/LowerCaseFilter don't do anything with the positional information. There is nothing else in your Analyzer?
In any case, the following should help you see what your Analyzer is doing: http://wiki.apache.org/jakarta-lucene/AnalysisParalysis and you can augment the code there to provide positional information, too. Otis --- Peter Posselt Vestergaard <[EMAIL PROTECTED]> wrote: > Hi > I am building an index of texts, each related to a unique id. The > unique ids > might contain a number of underscores which will make the > standardanalyzer > shorten them after it sees the second underscore in a row. > Furthermore many > of the texts I am indexing is in Italian so the removal of 'trivial' > words > done by the standard analyzer is not necessarily meaningful for these > texts. > Therefore I am instead using an analyzer made from the > WhitespaceTokenizer > and the LowerCaseFilter. > This works fine for me until I try searching for a phrase. I am > searching > for a simple phrase containing two words and with double-quotes > around it. I > have found the phrase in one of the texts so I know it should return > at > least one result, but none is found. If I remove the double-quotes > and > searches for the 2 words with AND between them I do find the story. > Can anyone tell me if this is an obvious (side-)effect of not using > the > standard analyzer? And is there a better solution to my problem than > using > the very simple analyzer? > Best regards > Peter Vestergaard > PS: I use the same analyzer for both searching and indexing (of > course). > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]