Recommended way to run spark streaming in production in EMR

2016-10-11 Thread pandees waran
All,

We have an use case in which 2 spark streaming jobs in same EMR cluster.

I am thinking of allowing multiple streaming contexts and run them as 2
separate spark-submit with wait for app completion set to false.

With this, the failure detection and monitoring seems obscure and doesn't
seem to be a correct option for production.

Is there any recommended strategy to execute this in production in EMR with
appropriate failure detection and monitoring setup?

-- 
Thanks,
Pandeeswaran


Re: Spark Streaming in Production

2014-12-12 Thread rahulkumar-aws
Run Spark Cluster managed my Apache Mesos. Mesos can run in high-availability
mode, in which multiple Mesos masters run simultaneously.



-
Software Developer
SigmoidAnalytics, Bangalore

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20651.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming in Production

2014-12-12 Thread twizansk
Thanks for the reply.  I might be misunderstanding something basic.As far
as I can tell, the cluster manager (e.g. Mesos) manages the master and
worker nodes but not the drivers or receivers, those are external to the
spark cluster:

http://spark.apache.org/docs/latest/cluster-overview.html


I know that the spark-submit script has a --deploy-mode cluster option. 
Does this mean that the receiver will be managed on the cluster?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20662.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming in Production

2014-12-12 Thread francois . garillot
IIUC, Receivers run on workers, colocated with other tasks.

The Driver, on the other hand, can either run on the querying machine (local 
mode) or as a worker (cluster mode).



—
FG

On Fri, Dec 12, 2014 at 4:49 PM, twizansk twiza...@gmail.com wrote:

 Thanks for the reply.  I might be misunderstanding something basic.As far
 as I can tell, the cluster manager (e.g. Mesos) manages the master and
 worker nodes but not the drivers or receivers, those are external to the
 spark cluster:
 http://spark.apache.org/docs/latest/cluster-overview.html
 I know that the spark-submit script has a --deploy-mode cluster option. 
 Does this mean that the receiver will be managed on the cluster?
 Thanks
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644p20662.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming in Production

2014-12-11 Thread twizansk
Hi,

I'm looking for resources and examples for the deployment of spark streaming
in production.  Specifically, I would like to know how high availability and
fault tolerance of receivers is typically achieved.

The workers are managed by the spark framework and are therefore fault
tolerant out of the box but it seems like the receiver deployment and
management is up to me.  Is that correct?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming in Production

2014-12-11 Thread Tathagata Das
Spark Streaming takes care of restarting receivers if it fails.
Regarding the fault-tolerance properties and deployment options, we
made some improvements in the upcoming Spark 1.2. Here is a staged
version of the Spark Streaming programming guide that you can read for
the up-to-date explanation of streaming fault-tolerance semantics.

http://people.apache.org/~tdas/spark-1.2-temp/

On Thu, Dec 11, 2014 at 4:03 PM, twizansk twiza...@gmail.com wrote:
 Hi,

 I'm looking for resources and examples for the deployment of spark streaming
 in production.  Specifically, I would like to know how high availability and
 fault tolerance of receivers is typically achieved.

 The workers are managed by the spark framework and are therefore fault
 tolerant out of the box but it seems like the receiver deployment and
 management is up to me.  Is that correct?

 Thanks



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-Production-tp20644.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org