WhitespaceAnalyzer vs StandardAnalyzer

raghavendra.k.rao Fri, 15 Nov 2013 12:23:12 -0800

Hi,

I implemented my Lucene solution using StandardAnalyzer for both indexing and 
searching. While testing, I noticed that special characters such as hyphens, 
forward slash etc. are omitted by this Analyzer.


In plain English, the requirement is to search for individual words, in Lucene 
terms SPACE should be the only tokenizer. Also, no part of the text should not 
be modified / omitted.

For eg. ModelNumber: ABC/x:123
Here there should be only 2 tokens, "ModelNumber:" and "ABC/x:123".

Based on what I read about WhitespaceAnalyzer, it sounds as though it can do 
exactly what I am looking for. Before I make this big decision, I also wanted 
to run this by you folks to check if there are any side-effects of switching 
the Analyzer - keeping in mind my requirements.

Any suggestions as always would be greatly appreciated.

Regards,
Raghu


_______________________________________________

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

_______________________________________________

WhitespaceAnalyzer vs StandardAnalyzer

Reply via email to