Re: Network requirements between Driver, Master, and Slave

2014-09-12 Thread Akhil Das
Hi Jim,

This approach will not work right out of the box. You need to understand a
few things. A driver program and the master will be communicating with each
other, for that you need to open up certain ports for your public ip (Read
about port forwarding http://portforward.com/). Also on the cluster you
need to set *spark.driver.host* and *spark.driver.port *(by default this is
random) pointing to your public ip and the port that you opened up.


Thanks
Best Regards

On Thu, Sep 11, 2014 at 11:52 PM, Jim Carroll jimfcarr...@gmail.com wrote:

 Hello all,

 I'm trying to run a Driver on my local network with a deployment on EC2 and
 it's not working. I was wondering if either the master or slave instances
 (in standalone) connect back to the driver program.

 I outlined the details of my observations in a previous post but here is
 what I'm seeing:

 I have v1.1.0 installed (the new tag) on ec2 using the spark-ec2 script.
 I have the same version of the code built locally.
 I edited the master security group to allow inbound access from anywhere to
 7077 and 8080.
 I see a connection take place.
 I see the workers fail with a timeout when any job is run.
 The master eventually removes the driver's job.

 I supposed this makes sense if there's a requirement for either the worker
 or the master to be on the same network as the driver. Is that the case?

 Thanks
 Jim




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Network requirements between Driver, Master, and Slave

2014-09-12 Thread Jim Carroll
Hi Akhil,

Thanks! I guess in short that means the master (or slaves?) connect back to
the driver. This seems like a really odd way to work given the driver needs
to already connect to the master on port 7077. I would have thought that if
the driver could initiate a connection to the master, that would be all
that's required.

Can you describe what it is about the architecture that requires the master
to connect back to the driver even when the driver initiates a connection to
the master? Just curious.

Thanks anyway.
Jim
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997p14086.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Network requirements between Driver, Master, and Slave

2014-09-12 Thread Mayur Rustagi
Driver needs a consistent connection to the master in standalone mode as whole 
bunch of client stuff happens on the driver. So calls like parallelize send 
data from driver to the master  collect send data from master to the driver. 

If you are looking to avoid the connect you can look into embedded driver model 
in yarn where the driver will also run inside the cluster  hence reliability  
connectivity is a given. 
-- 
Regards,
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi

On Fri, Sep 12, 2014 at 6:46 PM, Jim Carroll jimfcarr...@gmail.com
wrote:

 Hi Akhil,
 Thanks! I guess in short that means the master (or slaves?) connect back to
 the driver. This seems like a really odd way to work given the driver needs
 to already connect to the master on port 7077. I would have thought that if
 the driver could initiate a connection to the master, that would be all
 that's required.
 Can you describe what it is about the architecture that requires the master
 to connect back to the driver even when the driver initiates a connection to
 the master? Just curious.
 Thanks anyway.
 Jim
  
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997p14086.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Network requirements between Driver, Master, and Slave

2014-09-11 Thread Jim Carroll
Hello all,

I'm trying to run a Driver on my local network with a deployment on EC2 and
it's not working. I was wondering if either the master or slave instances
(in standalone) connect back to the driver program.

I outlined the details of my observations in a previous post but here is
what I'm seeing:

I have v1.1.0 installed (the new tag) on ec2 using the spark-ec2 script.
I have the same version of the code built locally.
I edited the master security group to allow inbound access from anywhere to
7077 and 8080.
I see a connection take place.
I see the workers fail with a timeout when any job is run.
The master eventually removes the driver's job.

I supposed this makes sense if there's a requirement for either the worker
or the master to be on the same network as the driver. Is that the case?

Thanks
Jim




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org