Thank you Hemant and Andrew, I got it working. On Mon, Sep 21, 2015 at 11:48 PM, Andrew Or <and...@databricks.com> wrote:
> Hi Joshua, > > What cluster manager are you using, standalone or YARN? (Note that > standalone here does not mean local mode). > > If standalone, you need to do `setMaster("spark://[CLUSTER_URL]:7077")`, > where CLUSTER_URL is the machine that started the standalone Master. If > YARN, you need to do `setMaster("yarn")`, assuming that all the Hadoop > configurations files such as core-site.xml are already set up properly. > > -Andrew > > > 2015-09-21 8:53 GMT-07:00 Hemant Bhanawat <hemant9...@gmail.com>: > >> When you specify master as local[2], it starts the spark components in a >> single jvm. You need to specify the master correctly. >> I have a default AWS EMR cluster (1 master, 1 slave) with Spark. When I >> run a Spark process, it works fine -- but only on the master, as if it were >> standalone. >> >> The web-UI and logging code shows only 1 executor, the localhost. >> >> How can I diagnose this? >> >> (I create *SparkConf, *in Python, with *setMaster('local[2]'). )* >> >> (Strangely, though I don't think that this causes the problem, there is >> almost nothing spark-related on the slave machine:* /usr/lib/spark *has >> a few jars, but that's it: *datanucleus-api-jdo.jar >> datanucleus-core.jar datanucleus-rdbms.jar spark-yarn-shuffle.jar. *But >> this is an AWS EMR cluster as created by* create-cluster*, so I would >> assume that the slave and master are configured OK out-of the box.) >> >> Joshua >> > >