Re: Spark Driver behind NAT

2015-01-06 Thread Aaron
From what I can tell, this isn't a firewall issue per se..it's how the
Remoting Service binds to an IP given cmd line parameters.  So, if I have
a VM (or OpenStack or EC2 instance) running on a private network let's say,
where the IP address is 192.168.X.Y...I can't tell the Workers to reach me
on this IP.  Because the Remoting Service binds to the interface passed in
those parameters.

So, if my public IP is a routable IP address...but the one the VM sees is
the 192.168.X.Y address..it appears I can't do some kinda of port
forwarding from the external to the internal...is this correct?

If I set spark.driver.host and spark.driver.port properties at the command
line..it tries to actually bind to that IP..rather than just telling the
worker..reach back to this IP.  Is there a way around this?  Is there a way
to tell the workers which IP address to use..WITHOUT, binding to it maybe?
Maybe allow the Remoting Service to bind to the internal IP..but, advertise
it differently?



On Mon, Jan 5, 2015 at 9:02 AM, Aaron aarongm...@gmail.com wrote:

 Thanks for the link!  However, from reviewing the thread, it appears you
 cannot have a NAT/firewall between the cluster and the
 spark-driver/shell..is this correct?

 When the shell starts up, it binds to the internal IP (e.g.
 192.168.x.y)..not the external floating IP..which is routable from the
 cluster.

 When i did set a static port for the spark.driver.port and set the
 spark.driver.host to the floating IP address...I get the same exception, 
 (Caused
 by: java.net.BindException: Cannot assign requested address: bind), because
 of the use of the InetAddress.getHostAddress method call.


 Cheers,
 Aaron


 On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You can have a look at this discussion
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html

 Thanks
 Best Regards

 On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote:

 Hello there, I was wondering if there is a way to have the spark-shell
 (or pyspark) sit behind a NAT when talking to the cluster?

 Basically, we have OpenStack instances that run with internal IPs, and
 we assign floating IPs as needed.  Since the workers make direct TCP
 connections back, the spark-shell is binding to the internal IP..not the
 floating.  Our other use case is running Vagrant VMs on our local
 machines..but, we don't have those VMs' NICs setup in bridged mode..it
 too has an internal IP.

 I tried using the SPARK_LOCAL_IP, and the various --conf
 spark.driver.host parameters...but it still get's angry.

 Any thoughts/suggestions?

 Currently our work around is to VPNC connection from inside the vagrant
 VMs or Openstack instances...but, that doesn't seem like a long term plan.

 Thanks in advance!

 Cheers,
 Aaron






Re: Spark Driver behind NAT

2015-01-06 Thread Aaron
Found the issue in JIRA:

https://issues.apache.org/jira/browse/SPARK-4389?jql=project%20%3D%20SPARK%20AND%20text%20~%20NAT

On Tue, Jan 6, 2015 at 10:45 AM, Aaron aarongm...@gmail.com wrote:

 From what I can tell, this isn't a firewall issue per se..it's how the
 Remoting Service binds to an IP given cmd line parameters.  So, if I have
 a VM (or OpenStack or EC2 instance) running on a private network let's say,
 where the IP address is 192.168.X.Y...I can't tell the Workers to reach me
 on this IP.  Because the Remoting Service binds to the interface passed in
 those parameters.

 So, if my public IP is a routable IP address...but the one the VM sees
 is the 192.168.X.Y address..it appears I can't do some kinda of port
 forwarding from the external to the internal...is this correct?

 If I set spark.driver.host and spark.driver.port properties at the command
 line..it tries to actually bind to that IP..rather than just telling the
 worker..reach back to this IP.  Is there a way around this?  Is there a way
 to tell the workers which IP address to use..WITHOUT, binding to it maybe?
 Maybe allow the Remoting Service to bind to the internal IP..but, advertise
 it differently?



 On Mon, Jan 5, 2015 at 9:02 AM, Aaron aarongm...@gmail.com wrote:

 Thanks for the link!  However, from reviewing the thread, it appears you
 cannot have a NAT/firewall between the cluster and the
 spark-driver/shell..is this correct?

 When the shell starts up, it binds to the internal IP (e.g.
 192.168.x.y)..not the external floating IP..which is routable from the
 cluster.

 When i did set a static port for the spark.driver.port and set the
 spark.driver.host to the floating IP address...I get the same exception, 
 (Caused
 by: java.net.BindException: Cannot assign requested address: bind), because
 of the use of the InetAddress.getHostAddress method call.


 Cheers,
 Aaron


 On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You can have a look at this discussion
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html

 Thanks
 Best Regards

 On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote:

 Hello there, I was wondering if there is a way to have the spark-shell
 (or pyspark) sit behind a NAT when talking to the cluster?

 Basically, we have OpenStack instances that run with internal IPs, and
 we assign floating IPs as needed.  Since the workers make direct TCP
 connections back, the spark-shell is binding to the internal IP..not the
 floating.  Our other use case is running Vagrant VMs on our local
 machines..but, we don't have those VMs' NICs setup in bridged mode..it
 too has an internal IP.

 I tried using the SPARK_LOCAL_IP, and the various --conf
 spark.driver.host parameters...but it still get's angry.

 Any thoughts/suggestions?

 Currently our work around is to VPNC connection from inside the vagrant
 VMs or Openstack instances...but, that doesn't seem like a long term plan.

 Thanks in advance!

 Cheers,
 Aaron







Re: Spark Driver behind NAT

2015-01-05 Thread Aaron
Thanks for the link!  However, from reviewing the thread, it appears you
cannot have a NAT/firewall between the cluster and the
spark-driver/shell..is this correct?

When the shell starts up, it binds to the internal IP (e.g.
192.168.x.y)..not the external floating IP..which is routable from the
cluster.

When i did set a static port for the spark.driver.port and set the
spark.driver.host to the floating IP address...I get the same
exception, (Caused
by: java.net.BindException: Cannot assign requested address: bind), because
of the use of the InetAddress.getHostAddress method call.


Cheers,
Aaron


On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You can have a look at this discussion
 http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html

 Thanks
 Best Regards

 On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote:

 Hello there, I was wondering if there is a way to have the spark-shell
 (or pyspark) sit behind a NAT when talking to the cluster?

 Basically, we have OpenStack instances that run with internal IPs, and we
 assign floating IPs as needed.  Since the workers make direct TCP
 connections back, the spark-shell is binding to the internal IP..not the
 floating.  Our other use case is running Vagrant VMs on our local
 machines..but, we don't have those VMs' NICs setup in bridged mode..it
 too has an internal IP.

 I tried using the SPARK_LOCAL_IP, and the various --conf
 spark.driver.host parameters...but it still get's angry.

 Any thoughts/suggestions?

 Currently our work around is to VPNC connection from inside the vagrant
 VMs or Openstack instances...but, that doesn't seem like a long term plan.

 Thanks in advance!

 Cheers,
 Aaron





Re: Spark Driver behind NAT

2015-01-05 Thread Akhil Das
You can have a look at this discussion
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html

Thanks
Best Regards

On Mon, Jan 5, 2015 at 6:11 PM, Aaron aarongm...@gmail.com wrote:

 Hello there, I was wondering if there is a way to have the spark-shell (or
 pyspark) sit behind a NAT when talking to the cluster?

 Basically, we have OpenStack instances that run with internal IPs, and we
 assign floating IPs as needed.  Since the workers make direct TCP
 connections back, the spark-shell is binding to the internal IP..not the
 floating.  Our other use case is running Vagrant VMs on our local
 machines..but, we don't have those VMs' NICs setup in bridged mode..it
 too has an internal IP.

 I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host
 parameters...but it still get's angry.

 Any thoughts/suggestions?

 Currently our work around is to VPNC connection from inside the vagrant
 VMs or Openstack instances...but, that doesn't seem like a long term plan.

 Thanks in advance!

 Cheers,
 Aaron



Spark Driver behind NAT

2015-01-05 Thread Aaron
Hello there, I was wondering if there is a way to have the spark-shell (or
pyspark) sit behind a NAT when talking to the cluster?

Basically, we have OpenStack instances that run with internal IPs, and we
assign floating IPs as needed.  Since the workers make direct TCP
connections back, the spark-shell is binding to the internal IP..not the
floating.  Our other use case is running Vagrant VMs on our local
machines..but, we don't have those VMs' NICs setup in bridged mode..it
too has an internal IP.

I tried using the SPARK_LOCAL_IP, and the various --conf spark.driver.host
parameters...but it still get's angry.

Any thoughts/suggestions?

Currently our work around is to VPNC connection from inside the vagrant VMs
or Openstack instances...but, that doesn't seem like a long term plan.

Thanks in advance!

Cheers,
Aaron