Re: Network requirements between Driver, Master, and Slave
Hi Jim, This approach will not work right out of the box. You need to understand a few things. A driver program and the master will be communicating with each other, for that you need to open up certain ports for your public ip (Read about port forwarding http://portforward.com/). Also on the cluster you need to set *spark.driver.host* and *spark.driver.port *(by default this is random) pointing to your public ip and the port that you opened up. Thanks Best Regards On Thu, Sep 11, 2014 at 11:52 PM, Jim Carroll jimfcarr...@gmail.com wrote: Hello all, I'm trying to run a Driver on my local network with a deployment on EC2 and it's not working. I was wondering if either the master or slave instances (in standalone) connect back to the driver program. I outlined the details of my observations in a previous post but here is what I'm seeing: I have v1.1.0 installed (the new tag) on ec2 using the spark-ec2 script. I have the same version of the code built locally. I edited the master security group to allow inbound access from anywhere to 7077 and 8080. I see a connection take place. I see the workers fail with a timeout when any job is run. The master eventually removes the driver's job. I supposed this makes sense if there's a requirement for either the worker or the master to be on the same network as the driver. Is that the case? Thanks Jim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Network requirements between Driver, Master, and Slave
Hi Akhil, Thanks! I guess in short that means the master (or slaves?) connect back to the driver. This seems like a really odd way to work given the driver needs to already connect to the master on port 7077. I would have thought that if the driver could initiate a connection to the master, that would be all that's required. Can you describe what it is about the architecture that requires the master to connect back to the driver even when the driver initiates a connection to the master? Just curious. Thanks anyway. Jim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997p14086.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Network requirements between Driver, Master, and Slave
Driver needs a consistent connection to the master in standalone mode as whole bunch of client stuff happens on the driver. So calls like parallelize send data from driver to the master collect send data from master to the driver. If you are looking to avoid the connect you can look into embedded driver model in yarn where the driver will also run inside the cluster hence reliability connectivity is a given. -- Regards, Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Fri, Sep 12, 2014 at 6:46 PM, Jim Carroll jimfcarr...@gmail.com wrote: Hi Akhil, Thanks! I guess in short that means the master (or slaves?) connect back to the driver. This seems like a really odd way to work given the driver needs to already connect to the master on port 7077. I would have thought that if the driver could initiate a connection to the master, that would be all that's required. Can you describe what it is about the architecture that requires the master to connect back to the driver even when the driver initiates a connection to the master? Just curious. Thanks anyway. Jim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997p14086.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Network requirements between Driver, Master, and Slave
Hello all, I'm trying to run a Driver on my local network with a deployment on EC2 and it's not working. I was wondering if either the master or slave instances (in standalone) connect back to the driver program. I outlined the details of my observations in a previous post but here is what I'm seeing: I have v1.1.0 installed (the new tag) on ec2 using the spark-ec2 script. I have the same version of the code built locally. I edited the master security group to allow inbound access from anywhere to 7077 and 8080. I see a connection take place. I see the workers fail with a timeout when any job is run. The master eventually removes the driver's job. I supposed this makes sense if there's a requirement for either the worker or the master to be on the same network as the driver. Is that the case? Thanks Jim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org