Using StandardTokenizer should remove punctuation as well.

Alan Woodward
www.flax.co.uk


> On 28 Nov 2016, at 16:06, Thomas Johnson <tjohn...@paperhost.com> wrote:
> 
> We are using Lucene 5.0. Some of our documents are getting indexed with a 
> comma after the value. For example “John Doe, bob smith, and jane go into a 
> bar.”  We are using a WhitespaceTokenizer and a  LowerCaseFilter as the 
> analyzer. If we search for “Doe” nothing is found because the value in the 
> index is “Doe,” I was wondering if there was a way to get the reader to 
> ignore the comma. The current work around is to have the user do their search 
> with * at the end. This is slow and also returns unwanted values such as 
> “Does” when we search for  “Doe*”
>  
> Thank you.
>  
>  
> Thomas W. Johnson, Senior Programmer
> 678-397-1663
> tjohn...@paperhost.com <mailto:tjohn...@paperhost.com>                
>       
>  <http://bit.ly/PaperHost_Twitter>    
> Follow PaperHost on Twitter <http://bit.ly/PaperHost_Twitter>
>  <http://bit.ly/PaperHost_FaceBook>   
> Become a Fan of PaperHost <http://bit.ly/PaperHost_FaceBook>
>  <http://paperhost.blogspot.com/>     
> PaperHost Blog <http://paperhost.blogspot.com/>
>  <http://www.linkedin.com/groups?homeNewMember=&gid=2468558>  
> PaperHost LinkedIn Discussion Group 
> <http://www.linkedin.com/groups?homeNewMember=&gid=2468558>
> LEGAL DISCLAIMER
> 
> The information transmitted is intended solely for the individual or entity 
> to which it is addressed and may contain confidential and/or privileged 
> material. Any review, retransmission, dis-semination or other use of or 
> taking action in reliance upon this information by persons or entities other 
> than the intended recipient is prohibited. If you have received this email in 
> error please contact the sender and delete the material from any computer.

Reply via email to