Thank much to Ahmet Arslan, Although I have known about Nutch and Heritrix, when I read Lucene in Action book but I am a newer of using them I am not sure they are support for my situation.
I need an API allowing my application automatically fetching URLs and download them into a Directory (as Directory of Lucene). Detail question: I need 2 progresses running sequentially written in Java. Progress1 generates a list of URLs. Now I want a module (progress 1) download URLs' content to index them directly. Nutch and Heritrix can support libraries to do it ? An example or other comment ! Thank you again for answering this question. On Sat, Jan 16, 2010 at 4:26 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > > Hi everyone, please help me this > > question: > > I need downloading some webpages from a list of URLs (about > > 200 links) and > > then index them by Lucene. > > This list is not fixed, because it depends on definition of > > my process. > > Currently, in my web application, I wrote class for > > downloading, but it > > download time is too long. > > > > Please recommend me a Java library suitable with my > > situation for optimize > > downloading. > > More its examples are very wonderful (INPUT: list of URLs; > > OUTPUT: webpages > > content, or indexed repository) > > Thank you very much. > > Probably most famous ones : > > http://lucene.apache.org/nutch/ > http://crawler.archive.org/ > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >