EC2 node submit jobs to separate Spark Cluster

2013-11-18 Thread Matt Cheah
Hi,

I'm working with an infrastructure that already has its own web server set up 
on EC2. I would like to set up a separate spark cluster on EC2 with the scripts 
and have the web server submit jobs to this spark cluster.

Is it possible to do this? I'm getting some errors running the spark shell from 
the spark shell on the web server: Initial job has not accepted any resources; 
check your cluster UI to ensure that workers are registered and have sufficient 
memory. I have heard that it's not possible for any local computer to connect 
to the spark cluster, but I was wondering if other EC2 nodes could have their 
firewalls configured to allow this.

We don't want to deploy the web server on the master node of the spark cluster.

Thanks,

-Matt Cheah




Re: EC2 node submit jobs to separate Spark Cluster

2013-11-18 Thread Aaron Davidson
The main issue with running a spark-shell locally is that it orchestrates
the actual computation, so you want it to be close to the actual Worker
nodes for latency reasons. Running a spark-shell on EC2 in the same region
as the Spark cluster avoids this problem.

The error you're seeing seems to indicate a different issue. Check the
Master web UI (accessible on port 8080 at the master's IP address) to make
sure that Workers are successfully registered and they have the expected
amount of memory available to Spark. You can also check to see how much
memory your spark-shell is trying to get per executor. A couple common
problems are (1) an abandoned spark-shell is holding onto all of your
cluster's resources or (2) you've manually configured your spark-shell to
try to get more memory than your Workers have available. Both of these
should be visible in the web UI.


On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah mch...@palantir.com wrote:

  Hi,

  I'm working with an infrastructure that already has its own web server
 set up on EC2. I would like to set up a *separate* spark cluster on EC2
 with the scripts and have the web server submit jobs to this spark cluster.

  Is it possible to do this? I'm getting some errors running the spark
 shell from the spark shell on the web server: Initial job has not accepted
 any resources; check your cluster UI to ensure that workers are registered
 and have sufficient memory. I have heard that it's not possible for any
 local computer to connect to the spark cluster, but I was wondering if
 other EC2 nodes could have their firewalls configured to allow this.

  We don't want to deploy the web server on the master node of the spark
 cluster.

  Thanks,

  -Matt Cheah