Gaurang, About that AVG alerts - you are fetching web pages together with all viruses they may be infected with. Of course, antivirus software will scream about it.
I wouldn't run any kind of such software on crawling machine. პატივისცემით, დავით ჯაში On Tue, Oct 6, 2009 at 12:36, Gaurang Patel <[email protected]> wrote: > Hey, > > Can anyone tell what could be the reason for following which happened while > fetching data using bin/nutch fetch: > > My AVG Antivirus is detecting virus threats while Nutch fetches pages from > available urls of *crawldb.* I injected DMOZ Open Directory urls to crawldb. > Antivirus already detected 4 threats within only half an hour after start of > fetching. > > Is there any other way(any source other than DMOZ) to get list of whole web > urls ? Or is there an automatic way to avoid such harrmful urls from being > fetched? Let me know asap. > > > Regards, > Gaurang >
