Hi Donna - See previous post below that may help. Tom //////////////////////////////////////////////////////// Hi,
In case this is of help to others: Crux of problem: I wanted numbers and characters such as # and + to be considered. Solution: implement a LowercaseWhitespaceAnalyzer and a LowercaseWhitespaceTokenizer. i.e. IndexWriter writer = new IndexWriter(INDEX_DIR, new LowercaseWhitespaceAnalyzer(), true); Tom ======================================================================= Diagnostics: StandardAnalyzer ---------------- Enter Querystring: (C++ AND C#) Searching for: +c +c Enter Querystring: (C\+\+ AND C\#) Searching for: +c +c Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net" Searching for: ("moss 2007" "sharepoint 2007") asp.net SimpleAnalyser -------------- Enter Querystring: C++ Searching for: c Enter Querystring: C# Searching for: c Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net" Searching for: (moss or sharepoint) and "asp net" WhitespaceAnalyzer ------------------ Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net" Searching for: ("moss 2007" or "sharepoint 2007") and asp.net KeywordAnalyzer --------------- Enter Querystring: (C++ AND C#) Searching for: +C++ +C# Enter Querystring: ("moss 2007" or "sharepoint 2007") and "asp.net" Searching for: (moss 2007 or sharepoint 2007) and asp.net StopAnalyzer ------------ Enter Querystring: (C\++ AND C\#) Searching for: +c +c Enter Querystring: ("MOSS 2007" or "SHAREPOINT 2007") and "ASP.NET" Searching for: (moss sharepoint) "asp net" --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -----Original Message----- From: Donna L Gresh [mailto:[EMAIL PROTECTED] Sent: 04 March 2008 19:22 To: java-user@lucene.apache.org Subject: C++ as token in StandardAnalyzer? I saw some discussion in the archives some time ago about the fact that C++ is tokenized as C in the StandardAnalyzer; this seems to still be C++ the case; I was wondering if there is a simple way for me to get the behavior I want for C++ (that it is tokenized as C++) in particular, and perhaps for other more ideosyncratic terms I may have in my own application-- Thanks Donna --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]