Hi
I'm evaluating Spark streaming to see if it fits to scale or current
architecture.
We are currently downloading and processing 6M documents per day from
online and social media. We have a different workflow for each type of
document, but some of the steps are keyword extraction, language
Hi Albert,
Have a couple of questions:
- You mentioned near real-time. What exactly is your SLA for processing
each document?
- Which crawler are you using and are you looking to bring in Hadoop
into your overall workflow. You might want to read up on how network
traffic is
Hi Jayant,
On 23 October 2014 11:14, Jayant Shekhar jay...@cloudera.com wrote:
Hi Albert,
Have a couple of questions:
- You mentioned near real-time. What exactly is your SLA for
processing each document?
The minimum the best :). Right now it's between 30s - 5m, but I would like
to