Re: WordBoundTokenFilter

Denis Bazhenov Mon, 13 Jun 2011 03:57:47 -0700

It seems so. Interestingly I can't find any mentions of 
WordDelimiterTokenFilter using google. Is it part of Solr codebase?
On 13.06.2011, at 21:49, Em wrote:


> Hi,
> 
> sounds like the WordDelimiterTokenFilter from Solr, doesn't it?
> 
> Regards,
> Em
> 
> Am 13.06.2011 12:06, schrieb Denis Bazhenov:
>> Some time ago I need to tune our home grown search engine based on lucene to 
>> perform well on product searches. Product search is search where users come 
>> with part of product name and we should find the product.
>> 
>> The problem here is that users doesn't provide full model name. For instance 
>> id product model name is "Sony PRS-A9000QF", users frequently search for 
>> "PRS 9000", "9000QF" etc.
>> 
>> The simple and straightforward solution to this problem is to tokenize model 
>> names on the different character type boundary. So for "Sony PRS-A9000QF" we 
>> will have 5 terms: "sony", "prs", "a", "9000" "qf". This solution could 
>> dramatically increase search sensitive (which is not a good thing in a 
>> general search), but works well in a specialized indexes.
>> 
>> So a developed such a token filter. My question is there any interest in 
>> this solution for the community, and does it make sense to contribute it 
>> back?
>> ---
>> Denis Bazhenov <dot...@gmail.com>
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

---
Denis Bazhenov <dot...@gmail.com>

Re: WordBoundTokenFilter

Reply via email to