The DMOZ rdf dump is best free source of URLs on the Internet, within one file.
Its just meant as a starting point to feed your Nutch DB, and after you fetch
all (or some) of that list you will grow your collection of links in the DB. In
theory, starting with that dump you should be able to fetch any linked web page
on the Internet. Just don't expect to finish that anytime soon, like within
your lifetime.
----- Original Message ----
From: Shrinivas Patwardhan <[EMAIL PROTECTED]>
To: [email protected]
Sent: Saturday, January 13, 2007 1:30:47 AM
Subject: alternative for dmoz rdf ?
>
> hello ..
> i am using the dmoz rdf file to inject the db are there any other files (
> list of urls ) available on the web ?
> my question would be does dmoz cover the entire web ? i dont think so ..
> then how do i get my crawler to crawl the entire web
>
>
> --
> Shrinivas Patwardhan
>
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general