Hi All,

I have some starting Nutch questions that I am hoping to gain insight about.

I want to start at Dmoz.org and follow links for entertainment (like concerts, art gallery events, etc) and examine the link to see if I should get data back about it and from it.

My questions:

1. Can Nutch start at a given URL and examine every link (based upon my criteria)? (obviously I can write Case or If/Else or While to do this)

2. If I find a link that has certain keywords that I find of interest, can I hit that link of interest and get information from that page?

3. How do I get the information about the link of interest and its content of interest into a MySQL database? (I know ColdFusion and MySQL and PHP). I think what I am asking is how do I get back to my database from a crawler?

4. As I know Nutch is Java, which is fine, I will need Tomcat running etc. Are there other java App Servers out there as well for OS X?

5. Does anyone have deployment instructions for OS X?

Am I making any sense?

-Jason

Reply via email to