Hi again Thanks for your answer, Otis. My analyzer did not do anything else than using the WhitespaceAnalyzer/LowerCaseFilter. However I found out that I got problems with characters such as ",.:" when searching because of my simple analyzer. (E.g. I would not be able to search for "world" in the string "Hello world." as . became part of the last word).
Therefore I turned back to the standard analyzer and now do some replacing of the underscores in my ID string to avoid my original problem. This solved my phrase problem so that I can now search for phrases. However I still have the problem with ",.:" described above. As far as I can see the StandardAnalyzer (the StandardTokenizer that is) should tokenize words without the ",.:" characters. Am I mistaken? Is there a tokenizer that will do this? Thanks for the help! Regards Peter > Date: Mon, 20 Dec 2004 08:19:42 -0800 (PST) > From: Otis Gospodnetic <[EMAIL PROTECTED]> > Subject: analyzer effecting phrases? > Content-Type: text/plain; charset=us-ascii > > > When searching for phrases, what's important is the position of each > token/word extracted by the Analyzer. > WhitespaceAnalyzer/LowerCaseFilter don't do anything with the > positional information. There is nothing else in your Analyzer? > > In any case, the following should help you see what your Analyzer is > doing: > http://wiki.apache.org/jakarta-lucene/AnalysisParalysis and you can > augment the code there to provide positional information, too. > > Otis > > -----Original Message----- > From: Peter Posselt Vestergaard [mailto:[EMAIL PROTECTED] > Sent: 20. december 2004 15:24 > To: '[EMAIL PROTECTED]' > Subject: analyzer effecting phrases? > > Hi > I am building an index of texts, each related to a unique id. > The unique ids might contain a number of underscores which > will make the standardanalyzer shorten them after it sees the > second underscore in a row. Furthermore many of the texts I > am indexing is in Italian so the removal of 'trivial' words > done by the standard analyzer is not necessarily meaningful > for these texts. Therefore I am instead using an analyzer > made from the WhitespaceTokenizer and the LowerCaseFilter. > This works fine for me until I try searching for a phrase. I > am searching for a simple phrase containing two words and > with double-quotes around it. I have found the phrase in one > of the texts so I know it should return at least one result, > but none is found. If I remove the double-quotes and searches > for the 2 words with AND between them I do find the story. > Can anyone tell me if this is an obvious (side-)effect of not > using the standard analyzer? And is there a better solution > to my problem than using the very simple analyzer? > Best regards > Peter Vestergaard > PS: I use the same analyzer for both searching and indexing > (of course). --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]