Re: Spark Driver behind NAT
From what I can tell, this isn't a firewall issue per se..it's how the Remoting Service binds to an IP given cmd line parameters. So, if I have a VM (or OpenStack or EC2 instance) running on a private network let's say, where the IP address is 192.168.X.Y...I can't tell the Workers to reach me on this IP. Because the Remoting Service binds to the interface passed in those parameters. So, if my public IP is a routable IP address...but the one the VM sees is the 192.168.X.Y address..it appears I can't do some kinda of port forwarding from the external to the internal...is this correct? If I set spark.driver.host and spark.driver.port properties at the command line..it tries to actually bind to that IP..rather than just telling the worker..reach back to this IP. Is there a way around this? Is there a way to tell the workers which IP address to use..WITHOUT, binding to it maybe? Maybe allow the Remoting Service to bind to the internal IP..but, advertise it differently? On Mon, Jan 5, 2015 at 9:02 AM, Aaron aarongm...@gmail.com wrote: Thanks for the link! However, from reviewing the thread, it appears you cannot have a NAT/firewall between the cluster and the spark-driver/shell..is this correct? When the shell starts up, it binds to the internal IP (e.g. 192.168.x.y)..not the external floating IP..which is routable from the cluster. When i did set a static port for the spark.driver.port and set the spark.driver.host to the floating IP address...I get the same exception, (Caused by: java.net.BindException: Cannot assign requested address: bind), because of the use of the InetAddress.getHostAddress method call. Cheers, Aaron On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You can have a look at this discussion http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html Thanks Best Regards On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote: Hello there, I was wondering if there is a way to have the spark-shell (or pyspark) sit behind a NAT when talking to the cluster? Basically, we have OpenStack instances that run with internal IPs, and we assign floating IPs as needed. Since the workers make direct TCP connections back, the spark-shell is binding to the internal IP..not the floating. Our other use case is running Vagrant VMs on our local machines..but, we don't have those VMs' NICs setup in bridged mode..it too has an internal IP. I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host parameters...but it still get's angry. Any thoughts/suggestions? Currently our work around is to VPNC connection from inside the vagrant VMs or Openstack instances...but, that doesn't seem like a long term plan. Thanks in advance! Cheers, Aaron
Re: Spark Driver behind NAT
Found the issue in JIRA: https://issues.apache.org/jira/browse/SPARK-4389?jql=project%20%3D%20SPARK%20AND%20text%20~%20NAT On Tue, Jan 6, 2015 at 10:45 AM, Aaron aarongm...@gmail.com wrote: From what I can tell, this isn't a firewall issue per se..it's how the Remoting Service binds to an IP given cmd line parameters. So, if I have a VM (or OpenStack or EC2 instance) running on a private network let's say, where the IP address is 192.168.X.Y...I can't tell the Workers to reach me on this IP. Because the Remoting Service binds to the interface passed in those parameters. So, if my public IP is a routable IP address...but the one the VM sees is the 192.168.X.Y address..it appears I can't do some kinda of port forwarding from the external to the internal...is this correct? If I set spark.driver.host and spark.driver.port properties at the command line..it tries to actually bind to that IP..rather than just telling the worker..reach back to this IP. Is there a way around this? Is there a way to tell the workers which IP address to use..WITHOUT, binding to it maybe? Maybe allow the Remoting Service to bind to the internal IP..but, advertise it differently? On Mon, Jan 5, 2015 at 9:02 AM, Aaron aarongm...@gmail.com wrote: Thanks for the link! However, from reviewing the thread, it appears you cannot have a NAT/firewall between the cluster and the spark-driver/shell..is this correct? When the shell starts up, it binds to the internal IP (e.g. 192.168.x.y)..not the external floating IP..which is routable from the cluster. When i did set a static port for the spark.driver.port and set the spark.driver.host to the floating IP address...I get the same exception, (Caused by: java.net.BindException: Cannot assign requested address: bind), because of the use of the InetAddress.getHostAddress method call. Cheers, Aaron On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You can have a look at this discussion http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html Thanks Best Regards On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote: Hello there, I was wondering if there is a way to have the spark-shell (or pyspark) sit behind a NAT when talking to the cluster? Basically, we have OpenStack instances that run with internal IPs, and we assign floating IPs as needed. Since the workers make direct TCP connections back, the spark-shell is binding to the internal IP..not the floating. Our other use case is running Vagrant VMs on our local machines..but, we don't have those VMs' NICs setup in bridged mode..it too has an internal IP. I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host parameters...but it still get's angry. Any thoughts/suggestions? Currently our work around is to VPNC connection from inside the vagrant VMs or Openstack instances...but, that doesn't seem like a long term plan. Thanks in advance! Cheers, Aaron
Re: Spark Driver behind NAT
Thanks for the link! However, from reviewing the thread, it appears you cannot have a NAT/firewall between the cluster and the spark-driver/shell..is this correct? When the shell starts up, it binds to the internal IP (e.g. 192.168.x.y)..not the external floating IP..which is routable from the cluster. When i did set a static port for the spark.driver.port and set the spark.driver.host to the floating IP address...I get the same exception, (Caused by: java.net.BindException: Cannot assign requested address: bind), because of the use of the InetAddress.getHostAddress method call. Cheers, Aaron On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You can have a look at this discussion http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html Thanks Best Regards On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote: Hello there, I was wondering if there is a way to have the spark-shell (or pyspark) sit behind a NAT when talking to the cluster? Basically, we have OpenStack instances that run with internal IPs, and we assign floating IPs as needed. Since the workers make direct TCP connections back, the spark-shell is binding to the internal IP..not the floating. Our other use case is running Vagrant VMs on our local machines..but, we don't have those VMs' NICs setup in bridged mode..it too has an internal IP. I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host parameters...but it still get's angry. Any thoughts/suggestions? Currently our work around is to VPNC connection from inside the vagrant VMs or Openstack instances...but, that doesn't seem like a long term plan. Thanks in advance! Cheers, Aaron
Re: Spark Driver behind NAT
You can have a look at this discussion http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html Thanks Best Regards On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote: Hello there, I was wondering if there is a way to have the spark-shell (or pyspark) sit behind a NAT when talking to the cluster? Basically, we have OpenStack instances that run with internal IPs, and we assign floating IPs as needed. Since the workers make direct TCP connections back, the spark-shell is binding to the internal IP..not the floating. Our other use case is running Vagrant VMs on our local machines..but, we don't have those VMs' NICs setup in bridged mode..it too has an internal IP. I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host parameters...but it still get's angry. Any thoughts/suggestions? Currently our work around is to VPNC connection from inside the vagrant VMs or Openstack instances...but, that doesn't seem like a long term plan. Thanks in advance! Cheers, Aaron
Spark Driver behind NAT
Hello there, I was wondering if there is a way to have the spark-shell (or pyspark) sit behind a NAT when talking to the cluster? Basically, we have OpenStack instances that run with internal IPs, and we assign floating IPs as needed. Since the workers make direct TCP connections back, the spark-shell is binding to the internal IP..not the floating. Our other use case is running Vagrant VMs on our local machines..but, we don't have those VMs' NICs setup in bridged mode..it too has an internal IP. I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host parameters...but it still get's angry. Any thoughts/suggestions? Currently our work around is to VPNC connection from inside the vagrant VMs or Openstack instances...but, that doesn't seem like a long term plan. Thanks in advance! Cheers, Aaron