so, There is no way to crawl if they blocked their web-sites to crawl ? I've one idea, But seems little bit foolish(not works/I've to Modify whole architecture) still I'm telling, If I use Html-Parser(Jsoup) Instead of fetcher then? Anyhow Html-parser easily takes all contents of the web-page.Can i do this.. I think rest of the parts(segments,updater,indexer,parser) I've to write all these things, I think it'll(Html-parser) not work with the already existing (parts) if i replace fetcher with Html-parser.
-- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607833p4078228.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org