Erik, I think there may be a typo in the website.
When I run the AnalyzerDemo : Analzying "xy&z corporation - [EMAIL PROTECTED]" org.apache.lucene.analysis.standard.StandardAnalyzer: [xy&z] [corporation] [EMAIL PROTECTED] Your website says: org.apache.lucene.analysis.standard.StandardAnalyzer: [xy&z] [corporation] [EMAIL PROTECTED] [com] When I run it it keeps the entire email '[EMAIL PROTECTED] but according to your website it separates the '[EMAIL PROTECTED]' from the 'com' Is there a difference between the versions of Lucene? I'm using 1.3rc2. Plus I think what I want is a StandardAnalyzer with a little tweaking. The simple one was fine until I realized that it doesn't do numbers, which I need as part of my search since numbers is important for what I'm doing. The Standard does numbers but I need it to be a little different of course. Thanks for the site. -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 4:58 AM To: Lucene Users List Subject: Re: Search Question - not returning desired results On Tuesday, November 25, 2003, at 12:11 PM, Pleasant, Tracy wrote: > > The documents I have index contain information regarding file names > also. > > For instance 'return_results.pl' or something like that may be in the > document fields. > > I am not understanding Lucene's way of searching: > > 1. If I search for 'return_results', the search does not return > anything > 2. If I search for 'results' or 'return', the search does not return > anything > 3. If I search for 'results.pl', the search does return the document > containg 'return_results.pl' > 4. If I search for 'results~', the search does return the document > containg 'return_results.pl' > 5. If I search for 'return_results~', the search does not return > anything > > What is going on? > > I want it to return the document in all of the situations. > > I also don't want to have to use '~' all the time. We sure do have a recurring theme lately :) Analysis! Please refer to my article at java.net: http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html Look at the AnalysisDemo code. Copy it over and try it out on the text you're using and the Analyzer you're using. The bracketed text that comes out are the "tokens" that you can search on. It is very very important to understand this process and to really know what terms come out of text you hand it - otherwise it is a mystery why some things can be found and some things cannot despite your expectations to the contrary. A follow-up to the Analysis is querying - and QueryParser has it's own set of quirks and caveats related to how things are tokenized/analyzed. And, I've got just the follow-up article for you handy... http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html If you digest both of these articles (analysis one first please) then I think a lot of questions that get asked on this list will be implicitly answered. Understanding analysis is key. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]