Hi All, Apologies if this question has been asked before. I'd like to know if there are any downsides to running spark over yarn with the --master yarn-cluster option vs having a separate spark standalone cluster to execute jobs?
We're looking at installing a hdfs/hadoop cluster with Ambari and submitting jobs to the cluster using yarn, or having an Ambari cluster and a separate standalone spark cluster, which will run the spark jobs on data within hdfs. With yarn, will we still get all the benefits of spark? Will it be possible to process streaming data? Many thanks in advance for any responses. Cheers!