Thanks for your help, As I stated before, the numbers, whether pure or not, are indexed, for I can search them with luke. But supposing what you're saying was the case, the search for "10-year" should return 4 items (according to the number of occurence found by luke). Problem is that the number of documents returned is 6, for it ignored the "10" and searched for "-year".
Xavier Tô Bacc. en Informatique et Génie Logiciel [EMAIL PROTECTED] (450)434-8905 ----- Message d'origine ----- De: Mark Miller <[EMAIL PROTECTED]> Date: Lundi, Février 5, 2007 11:11 am Objet: Re: Problem with a search engine > StandardAnalyzer does not index pure numbers. It will index > alphanumerictokens and numbers that are connected with one of: > "_"|"-"|"/"|"."|"," If > you wish to index pure numbers you might want to add another regex to > StandardAnalyzer that recognizes a series of digits - don't forget > to add > the new token type to the grammar lower in the StandardTokenizer.jj > file. > - Mark > > On 2/5/07, Xavier To <[EMAIL PROTECTED]> wrote: > > > > Thanks for taking time to answer me. The problem is that I'm not > allowed> to post code due to a confidentiality contract that I was > required to sign. > > I'll try to see if I can get a special permission to post code > since I'm > > wasting so much time trying to find the answer to this. > > > > I tried looking for each time the query is touched and numbers > are still > > present in the query. I don't know if it's the analyzer, but if > it was, > > woundl't the numbers be cut out of the index completely ? As I > said in my > > 1st post, they are "findable" with Lukeall. If I read right, the > > FrenchAnalyzer included in lucene is supposed to be based on > > StandardAnalyzer so I really fail to see what is going wrong. > Might it be > > the fact that the tokenizer used is Stringtokenizer and not > Tokenstream ? > > The numbers are tokenized, and in the returned query they are > present....> > > I really don't know where they get zapped out of existence... > > > > Thanks again for helping. > > > > Xavier Tô > > Bacc. en Informatique et Génie Logiciel > > [EMAIL PROTECTED] > > (450)434-8905 > > > > > > ------------------------------------------------------------------ > -------------------- > > > > Hard to tell without seeing any code. Perhaps numbers are being > removed> from the query string > > during search. > > Make sure the same or at least "compatible" Analyzer is used > during both > > indexing and querying. > > Grab the code from Lucene in Action .... hm, lucenebook.com may > be down at > > the moment, but > > that's where you can get the code normally. The code includes some > > classes that let you run > > a query string through a set of Analyzers and see how each of > them behaves > > and what it does > > to a query. > > > > Otis > > > > ----- Original Message ---- > > From: "To, Xavier" <[EMAIL PROTECTED]> > > To: java-user@lucene.apache.org > > Sent: Wednesday, January 31, 2007 12:21:27 AM > > Subject: Problem with a search engine > > > > > > Hi, I recently started an internship and I've been asked to fix > their> search engine so numbers are searched. At first, I thought > it was the > > Analyzer that wasn't working right, but we're using > StandardAnalyzer and > > the numbers are indexed (I checked with Lukeall). Then I thought > they> are not tokenized during the search, but they are. They just > seem to be > > ignored for some reason. Did anyone experienced something similar > ? If > > so, how can I fix this ? It's probably something that would jump > in my > > face if it was alive, but I just can't see it. Can anyone help me > ? It > > would be very much appreciated. > > > > > > Xavier T� > > Stagiaire > > D�veloppement - Maintenance & �volution > > AXA Canada Tech > > 2020, rue University, bureau 700 > > Montr�al(Qu�bec)H3A 2A5 > > T�l. : (514) 282-6817, poste 2224 > > T�l�c. : (514) 282-6017 > > Courriel : [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > _____ > > > > "Ce message est confidentiel, � l'usage exclusif du destinataire > > ci-dessus et son contenu ne repr�sente en aucun cas un engagement > de la > > part de AXA, sauf en cas de stipulation expresse et par �crit de > la part > > de AXA. Toute publication, utilisation ou diffusion, m�me partielle, > > doit �tre autoris�e pr�alablement. Si vous n'�tes pas > destinataire de ce > > message, merci d'en avertir imm�diatement l'exp�diteur." > > > > "This e-mail message is confidential, for the exclusive use of the > > addressee and its contents shall not constitute a commitment by AXA, > > except as otherwise specifically provided in writing by AXA. Any > > unauthorized disclosure, use or dissemination, either whole or > partial,> is prohibited. If you are not the intended recipient of > the message, > > please notify the sender immediately." > > > > ------------------------------------------------------------------ > --- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > ------------------------------------------------------------------ > --- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]