We are creating a real-time stream processing system with spark streaming which uses large number (millions) of analytic models applied to RDDs in the many different type of streams. Since we do not know which spark node will process specific RDDs , we need to make these models available at each Spark compute node. We are planning to use Redis as in-memory cache over Spark cluster to feed these models to the Spark compute nodes. Is it the right approach? We can not cache all models locally at all Spark compute nodes.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-with-Redis-Working-with-large-number-of-model-objects-at-spark-compute-nodes-tp7663.html Sent from the Apache Spark User List mailing list archive at Nabble.com.