Re: A way to download URLs and index better ?

Ahmet Arslan Fri, 15 Jan 2010 23:27:00 -0800

> Hi everyone, please help me this
> question:
> I need downloading some webpages from a list of URLs (about
> 200 links) and
> then index them by Lucene.
> This list is not fixed, because it depends on definition of
> my process.
> Currently, in my web application, I wrote class for
> downloading, but it
> download time is too long.
> 
> Please recommend me a Java library suitable with my
> situation for optimize
> downloading.
> More its examples are very wonderful (INPUT: list of URLs;
> OUTPUT: webpages
> content, or indexed repository)
> Thank you very much.


Probably most famous ones :

http://lucene.apache.org/nutch/
http://crawler.archive.org/



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: A way to download URLs and index better ?

Reply via email to