Some time ago I need to tune our home grown search engine based on lucene to 
perform well on product searches. Product search is search where users come 
with part of product name and we should find the product.

The problem here is that users doesn't provide full model name. For instance id 
product model name is "Sony PRS-A9000QF", users frequently search for "PRS 
9000", "9000QF" etc.

The simple and straightforward solution to this problem is to tokenize model 
names on the different character type boundary. So for "Sony PRS-A9000QF" we 
will have 5 terms: "sony", "prs", "a", "9000" "qf". This solution could 
dramatically increase search sensitive (which is not a good thing in a general 
search), but works well in a specialized indexes.

So a developed such a token filter. My question is there any interest in this 
solution for the community, and does it make sense to contribute it back?
---
Denis Bazhenov <dot...@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to