There is a way, but you would need to check to make sure your not breaking any 
agreements.
 
Registrars of domains, ones that are directly ICANN-accredited can usually 
request the complete zone file from the registry operator for there specific 
controlled domains. You can then parse it to take out all the domains, and 
inject them into the Nutch DB.
 
Its usually offered for research and testing purposes, so I'm not sure you 
using it to for a commercial purpose would be acceptable. Plus, you would need 
to become a registered registrar, and that's not cheap.


----- Original Message ----
From: Iain <[EMAIL PROTECTED]>
To: [email protected]
Sent: Saturday, January 13, 2007 11:05:38 AM
Subject: RE: alternative for dmoz rdf ?


Is there anyway (free or commercial) to get the domain names within a domain
(e.g. *.com)? More specifically to get all the allocated urls on the
internet?


Iain
-----Original Message-----
From: Sean Dean [mailto:[EMAIL PROTECTED] 
Sent: 13 January 2007 07:23
To: [email protected]
Subject: Re: alternative for dmoz rdf ?

The DMOZ rdf dump is best free source of URLs on the Internet, within one
file.

Its just meant as a starting point to feed your Nutch DB, and after you
fetch all (or some) of that list you will grow your collection of links in
the DB. In theory, starting with that dump you should be able to fetch any
linked web page on the Internet. Just don't expect to finish that anytime
soon, like within your lifetime.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to