Re: Crawler and Scraper with different priorities

Peng Cheng Tue, 09 Sep 2014 17:12:07 -0700

Hi Sandeep,

would you be interesting in joining my open source project?


https://github.com/tribbloid/spookystuff

IMHO spark is indeed not for general purpose crawling, of which distributed
job is highly homogeneous. But good enough for directional scraping which
involves heterogeneous input and deep graph following & extraction. Please
drop me a line if you have a user case, as I'll try to integrate it as a
feature.

Yours Peng



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Crawler-Scraper-with-different-priorities-tp13645p13838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Crawler and Scraper with different priorities

Reply via email to