Hi Sudha, In the past, I've built regexes to recognize URLs using the information here:
http://www.foad.org/~abigail/Perl/url2.html The above, however, is currently a dead link. Here's the Internet Archive's WayBack Machine's cache of this page from August 2007: <http://web.archive.org/web/20070807114147/http://www.foad.org/~abigail/Perl/url2.html> Here's the same content, of unknown vintage, as a text file (even though it has a .html extension): http://nerxs.com/mirrorpages/urlregex.html Also, Jeffrey Friedl's book "Mastering Regular Expressions", 2nd edition (but not the 1st edition), has a section on recognizing URLs in Chapter 5. Steve On 11/19/2009 at 12:58 AM, Sudha Verma wrote: > Hi, > > I am using lucene 2-9-1. > > I am reading in free text documents which I index using lucene and the > StandardAnalyzer at the moment. > > The StandardAnalyzer keeps email addresses intact and does not tokenize > them. Is there something similar for > URLs? This seems like a common need. So, I thought I'd check if there > is anything out there that does it already. > > I'd appreciate any help. > > Thanks, > sudha --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org