Is Spark streaming suitable for our architecture?

2014-10-23 Thread Albert Vila
Hi I'm evaluating Spark streaming to see if it fits to scale or current architecture. We are currently downloading and processing 6M documents per day from online and social media. We have a different workflow for each type of document, but some of the steps are keyword extraction, language

Re: Is Spark streaming suitable for our architecture?

2014-10-23 Thread Jayant Shekhar
Hi Albert, Have a couple of questions: - You mentioned near real-time. What exactly is your SLA for processing each document? - Which crawler are you using and are you looking to bring in Hadoop into your overall workflow. You might want to read up on how network traffic is

Re: Is Spark streaming suitable for our architecture?

2014-10-23 Thread Albert Vila
Hi Jayant, On 23 October 2014 11:14, Jayant Shekhar jay...@cloudera.com wrote: Hi Albert, Have a couple of questions: - You mentioned near real-time. What exactly is your SLA for processing each document? The minimum the best :). Right now it's between 30s - 5m, but I would like to