We are creating a real-time stream processing system with spark streaming
which uses large number (millions) of analytic models applied to RDDs in
the many different type of streams. Since we do not know which spark node
will process specific RDDs , we need to make these models available at each
Hi,
I am trying to get a sense of number of streams we can process in parallel
on a Spark streaming cluster(Hadoop Yarn).
Is there any benchmark for the same?
We need a large number of streams(original + transformed) to be processed in
parallel.
The number is approximately around= 30,.