RE: Strange Indexing Problem with letter-number combination

DIGY Tue, 08 Jan 2008 21:55:48 -0800

Hi Min,

Try other Analyzers( such as WhitespaceAnalyzer).


DIGY

-----Original Message-----
From: Min Yin [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 09, 2008 2:38 AM
To: lucene-net-user@incubator.apache.org
Subject: Re: Strange Indexing Problem with letter-number combination

Hello,

Thanks for the reply! I've found that the problem is caused by the 
commas that separate different words, if I change the commas to spaces 
or semi-colons, then it works fine. Comma also works as long as you 
don't have any digits in the word. Maybe it has something to do with 
"10,000" or that sort?

And I have a second question that somewhat related, if I have text 
"deskbar-abc" indexed, it will be indexed as "deskbar" and "abc", but if 
I have "deskbar-abc288" instead, it will be treated as one word. Is 
there a way to make it work consistently? For example, always keep the 
dash and do not split the word?

Many thanks in advance!
Min

DIGY wrote:
> 1.
> I tried your case with the following code and everything worked as
expected.
>
>       Test(new Lucene.Net.Analysis.Standard.StandardAnalyzer(), "hello
> alison20 there", "alison20");
>
>       void Test(Lucene.Net.Analysis.Analyzer analyzer, string
> stringToIndex, string stringToSearch)
>         {
>             Lucene.Net.Store.RAMDirectory dir = new
> Lucene.Net.Store.RAMDirectory();
>             Lucene.Net.Index.IndexWriter writer = new
> Lucene.Net.Index.IndexWriter(dir, analyzer);
>             Lucene.Net.Documents.Document doc = new
> Lucene.Net.Documents.Document();
>             Lucene.Net.Documents.Field field = new
> Lucene.Net.Documents.Field("field1", stringToIndex,
> Lucene.Net.Documents.Field.Store.YES,
> Lucene.Net.Documents.Field.Index.TOKENIZED);
>             doc.Add(field);
>             writer.AddDocument(doc);
>             writer.Close();
>
>             Lucene.Net.Search.IndexSearcher searcher = new
> Lucene.Net.Search.IndexSearcher(dir);
>             Lucene.Net.QueryParsers.QueryParser qp = new
> Lucene.Net.QueryParsers.QueryParser("field1", analyzer);
>             Lucene.Net.Search.Query q = qp.Parse(stringToSearch);
>             Lucene.Net.Search.Hits hits = searcher.Search(q);
>             Console.WriteLine(hits.Length().ToString() + " hit(s)");
>         }
>
>
> 2.
> Using StandardAnalyzer, tokens of string "hello alison20 there" are
"hello"
> and "alison20"( as expected ).
>
>       TokenizeString(new Lucene.Net.Analysis.Standard.StandardAnalyzer() ,
> "hello alison20 there");
>
>       void TokenizeString(Lucene.Net.Analysis.Analyzer analyzer, string s)
>         {
>             Lucene.Net.Analysis.TokenStream ts = analyzer.TokenStream("",
> new System.IO.StringReader(s));
>             for (Lucene.Net.Analysis.Token t = ts.Next(); t != null; t =
> ts.Next())
>             {
>                 Console.WriteLine(t.TermText() + " " + t.Type());
>             }
>         }
>
>
> DIGY
>
> -----Original Message-----
> From: yin [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, January 05, 2008 2:43 AM
> To: lucene-net-user@incubator.apache.org
> Subject: Strange Indexing Problem with letter-number combination
>
> Hello there!
>
>  
>
> I see a very strange indexing problem that I hope someone can shed a light
> on.
>
>  
>
> I have a StandardAnalyzer (the default one, no special configurations), it
> works great until it hits a file that contains a letter-number combination
> word such as "alison29". I checked the index with Luke and here's the
> strange thing:
>
>  
>
> For text "how are you", I got three index entries as "how", "are", and
> "you", while as for text "hello alison20 there", I got only one index
entry
> as "hello,alison29,there", as a consequence, none of the searches for
> "alison29", for "hello", or for "there" returns anything, it only works if
I
> search precisely for "hello,alison29,there". 
>
>  
>
> I can pad both my index and search keyword but not very comfortable about
> it, and I feel the issue is too obvious to be a overlooked bug, more
likely
> I missed something, perhaps some parameter setting in Lucene
> StandardAnalyzer? Any idea? Thank you very much for your help!
>
>  
>
> Regards,
>
> Min
>
>

RE: Strange Indexing Problem with letter-number combination

Reply via email to