URL Tokenization

Sudha Verma Wed, 23 Jun 2010 11:07:00 -0700

Hi,

I am new to lucene and I am using Lucene 3.0.2.


I am using Lucene to parse text which may contain URLs. I noticed the
StandardTokenizer keeps the email addresses in one token, but not the URLs.
I also looked at Solr wiki pages, and even though the wiki page for
solr.StandardTokenizerFactory says it keeps track of the URL token type - it
does not seem to be the case.

Is there an Analyzer implementation that can keep the URLs intact into one
token? or does anyone have an example of that for Solr or Lucene?

Thanks much,
Sudha

URL Tokenization

Reply via email to