When indexing you could normalise them down to a standard format without spaces or hyphens, but searching is much harder if you really can't identify possible product ids within user queries. Make triplets without spaces or hyphens? "CRX USB-2.0 16GB" ==> CRXUSB2.016GB but also "some random words" ==> somerandomwords. The latter wouldn't match, the former would if it was a valid id.
Some form of synonym analysis/injection at indexing would be better if you could do that: CRXUSB2.016GB ==> "CRX USB2.0 16GB", to be indexed as well as the base value. If you can't either have a dedicated product id search field or standardise the product ids, this is going to be hard. -- Ian, On Tue, Jan 3, 2012 at 8:44 AM, Christoph Kaser <lucene_l...@iconparc.de> wrote: > Hello, > > we use lucene as search engine in an online shop. The products in this shop > often contain product keys like CRXUSB2.0-16GB. > We would like our customers to be able to find products by entering their > key. The problem is that product keys sometimes contain spaces or dashes and > customers sometimes don't enter these whitespaces correctly. On the other > hand, some customers enter whitespaces where there are none. Is there an > analyzer or some other method that allows us to find the product if the user > enters things like: > - "CRX USB2.0 16GB" > - "CRXUSB2.016GB" > - "CRX USB-2.0 16GB" > ... > > The problem is that the product keys don't all have a common format and are > contained in the normal text, so we don't have an easy way to treat them > different to the rest of the text. > > Any help would be great! > > Best regards, > Christoph > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org