Hi Sandeep, would you be interesting in joining my open source project?
https://github.com/tribbloid/spookystuff IMHO spark is indeed not for general purpose crawling, of which distributed job is highly homogeneous. But good enough for directional scraping which involves heterogeneous input and deep graph following & extraction. Please drop me a line if you have a user case, as I'll try to integrate it as a feature. Yours Peng -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Crawler-Scraper-with-different-priorities-tp13645p13838.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org