Recommended way to run spark streaming in production in EMR
All, We have an use case in which 2 spark streaming jobs in same EMR cluster. I am thinking of allowing multiple streaming contexts and run them as 2 separate spark-submit with wait for app completion set to false. With this, the failure detection and monitoring seems obscure and doesn't seem to be a correct option for production. Is there any recommended strategy to execute this in production in EMR with appropriate failure detection and monitoring setup? -- Thanks, Pandeeswaran
Re: Spark Streaming in Production
Run Spark Cluster managed my Apache Mesos. Mesos can run in high-availability mode, in which multiple Mesos masters run simultaneously. - Software Developer SigmoidAnalytics, Bangalore -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20651.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming in Production
Thanks for the reply. I might be misunderstanding something basic.As far as I can tell, the cluster manager (e.g. Mesos) manages the master and worker nodes but not the drivers or receivers, those are external to the spark cluster: http://spark.apache.org/docs/latest/cluster-overview.html I know that the spark-submit script has a --deploy-mode cluster option. Does this mean that the receiver will be managed on the cluster? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20662.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming in Production
IIUC, Receivers run on workers, colocated with other tasks. The Driver, on the other hand, can either run on the querying machine (local mode) or as a worker (cluster mode). — FG On Fri, Dec 12, 2014 at 4:49 PM, twizansk twiza...@gmail.com wrote: Thanks for the reply. I might be misunderstanding something basic.As far as I can tell, the cluster manager (e.g. Mesos) manages the master and worker nodes but not the drivers or receivers, those are external to the spark cluster: http://spark.apache.org/docs/latest/cluster-overview.html I know that the spark-submit script has a --deploy-mode cluster option. Does this mean that the receiver will be managed on the cluster? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20662.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Streaming in Production
Hi, I'm looking for resources and examples for the deployment of spark streaming in production. Specifically, I would like to know how high availability and fault tolerance of receivers is typically achieved. The workers are managed by the spark framework and are therefore fault tolerant out of the box but it seems like the receiver deployment and management is up to me. Is that correct? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Streaming in Production
Spark Streaming takes care of restarting receivers if it fails. Regarding the fault-tolerance properties and deployment options, we made some improvements in the upcoming Spark 1.2. Here is a staged version of the Spark Streaming programming guide that you can read for the up-to-date explanation of streaming fault-tolerance semantics. http://people.apache.org/~tdas/spark-1.2-temp/ On Thu, Dec 11, 2014 at 4:03 PM, twizansk twiza...@gmail.com wrote: Hi, I'm looking for resources and examples for the deployment of spark streaming in production. Specifically, I would like to know how high availability and fault tolerance of receivers is typically achieved. The workers are managed by the spark framework and are therefore fault tolerant out of the box but it seems like the receiver deployment and management is up to me. Is that correct? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org