Hi All,

Apologies if this question has been asked before. I'd like to know if there
are any downsides to running spark over yarn with the --master yarn-cluster
option vs having a separate spark standalone cluster to execute jobs?

We're looking at installing a hdfs/hadoop cluster with Ambari and
submitting jobs to the cluster using yarn, or having an Ambari cluster and
a separate standalone spark cluster, which will run the spark jobs on data
within hdfs.

With yarn, will we still get all the benefits of spark?

Will it be possible to process streaming data?

Many thanks in advance for any responses.

Cheers!

Reply via email to