Hello, Use Droids, it's much simpler than Nutch or Heritrix:
http://incubator.apache.org/droids/ Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch ----- Original Message ---- > From: Phan The Dai <[email protected]> > To: [email protected] > Sent: Sat, January 16, 2010 2:20:47 AM > Subject: A way to download URLs and index better ? > > Hi everyone, please help me this question: > I need downloading some webpages from a list of URLs (about 200 links) and > then index them by Lucene. > This list is not fixed, because it depends on definition of my process. > Currently, in my web application, I wrote class for downloading, but it > download time is too long. > > Please recommend me a Java library suitable with my situation for optimize > downloading. > More its examples are very wonderful (INPUT: list of URLs; OUTPUT: webpages > content, or indexed repository) > Thank you very much. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
