Using StandardTokenizer should remove punctuation as well. Alan Woodward www.flax.co.uk
> On 28 Nov 2016, at 16:06, Thomas Johnson <tjohn...@paperhost.com> wrote: > > We are using Lucene 5.0. Some of our documents are getting indexed with a > comma after the value. For example “John Doe, bob smith, and jane go into a > bar.” We are using a WhitespaceTokenizer and a LowerCaseFilter as the > analyzer. If we search for “Doe” nothing is found because the value in the > index is “Doe,” I was wondering if there was a way to get the reader to > ignore the comma. The current work around is to have the user do their search > with * at the end. This is slow and also returns unwanted values such as > “Does” when we search for “Doe*” > > Thank you. > > > Thomas W. Johnson, Senior Programmer > 678-397-1663 > tjohn...@paperhost.com <mailto:tjohn...@paperhost.com> > > <http://bit.ly/PaperHost_Twitter> > Follow PaperHost on Twitter <http://bit.ly/PaperHost_Twitter> > <http://bit.ly/PaperHost_FaceBook> > Become a Fan of PaperHost <http://bit.ly/PaperHost_FaceBook> > <http://paperhost.blogspot.com/> > PaperHost Blog <http://paperhost.blogspot.com/> > <http://www.linkedin.com/groups?homeNewMember=&gid=2468558> > PaperHost LinkedIn Discussion Group > <http://www.linkedin.com/groups?homeNewMember=&gid=2468558> > LEGAL DISCLAIMER > > The information transmitted is intended solely for the individual or entity > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dis-semination or other use of or > taking action in reliance upon this information by persons or entities other > than the intended recipient is prohibited. If you have received this email in > error please contact the sender and delete the material from any computer.