Hi Min, Try other Analyzers( such as WhitespaceAnalyzer).
DIGY -----Original Message----- From: Min Yin [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 09, 2008 2:38 AM To: lucene-net-user@incubator.apache.org Subject: Re: Strange Indexing Problem with letter-number combination Hello, Thanks for the reply! I've found that the problem is caused by the commas that separate different words, if I change the commas to spaces or semi-colons, then it works fine. Comma also works as long as you don't have any digits in the word. Maybe it has something to do with "10,000" or that sort? And I have a second question that somewhat related, if I have text "deskbar-abc" indexed, it will be indexed as "deskbar" and "abc", but if I have "deskbar-abc288" instead, it will be treated as one word. Is there a way to make it work consistently? For example, always keep the dash and do not split the word? Many thanks in advance! Min DIGY wrote: > 1. > I tried your case with the following code and everything worked as expected. > > Test(new Lucene.Net.Analysis.Standard.StandardAnalyzer(), "hello > alison20 there", "alison20"); > > void Test(Lucene.Net.Analysis.Analyzer analyzer, string > stringToIndex, string stringToSearch) > { > Lucene.Net.Store.RAMDirectory dir = new > Lucene.Net.Store.RAMDirectory(); > Lucene.Net.Index.IndexWriter writer = new > Lucene.Net.Index.IndexWriter(dir, analyzer); > Lucene.Net.Documents.Document doc = new > Lucene.Net.Documents.Document(); > Lucene.Net.Documents.Field field = new > Lucene.Net.Documents.Field("field1", stringToIndex, > Lucene.Net.Documents.Field.Store.YES, > Lucene.Net.Documents.Field.Index.TOKENIZED); > doc.Add(field); > writer.AddDocument(doc); > writer.Close(); > > Lucene.Net.Search.IndexSearcher searcher = new > Lucene.Net.Search.IndexSearcher(dir); > Lucene.Net.QueryParsers.QueryParser qp = new > Lucene.Net.QueryParsers.QueryParser("field1", analyzer); > Lucene.Net.Search.Query q = qp.Parse(stringToSearch); > Lucene.Net.Search.Hits hits = searcher.Search(q); > Console.WriteLine(hits.Length().ToString() + " hit(s)"); > } > > > 2. > Using StandardAnalyzer, tokens of string "hello alison20 there" are "hello" > and "alison20"( as expected ). > > TokenizeString(new Lucene.Net.Analysis.Standard.StandardAnalyzer() , > "hello alison20 there"); > > void TokenizeString(Lucene.Net.Analysis.Analyzer analyzer, string s) > { > Lucene.Net.Analysis.TokenStream ts = analyzer.TokenStream("", > new System.IO.StringReader(s)); > for (Lucene.Net.Analysis.Token t = ts.Next(); t != null; t = > ts.Next()) > { > Console.WriteLine(t.TermText() + " " + t.Type()); > } > } > > > DIGY > > -----Original Message----- > From: yin [mailto:[EMAIL PROTECTED] > Sent: Saturday, January 05, 2008 2:43 AM > To: lucene-net-user@incubator.apache.org > Subject: Strange Indexing Problem with letter-number combination > > Hello there! > > > > I see a very strange indexing problem that I hope someone can shed a light > on. > > > > I have a StandardAnalyzer (the default one, no special configurations), it > works great until it hits a file that contains a letter-number combination > word such as "alison29". I checked the index with Luke and here's the > strange thing: > > > > For text "how are you", I got three index entries as "how", "are", and > "you", while as for text "hello alison20 there", I got only one index entry > as "hello,alison29,there", as a consequence, none of the searches for > "alison29", for "hello", or for "there" returns anything, it only works if I > search precisely for "hello,alison29,there". > > > > I can pad both my index and search keyword but not very comfortable about > it, and I feel the issue is too obvious to be a overlooked bug, more likely > I missed something, perhaps some parameter setting in Lucene > StandardAnalyzer? Any idea? Thank you very much for your help! > > > > Regards, > > Min > >