Re: Using Spark with a SOCKS proxy

2015-03-18 Thread Akhil Das
Did you try ssh tunneling instead of SOCKS?

Thanks
Best Regards

On Wed, Mar 18, 2015 at 5:45 AM, Kelly, Jonathan jonat...@amazon.com
wrote:

  I'm trying to figure out how I might be able to use Spark with a SOCKS
 proxy.  That is, my dream is to be able to write code in my IDE then run it
 without much trouble on a remote cluster, accessible only via a SOCKS proxy
 between the local development machine and the master node of the
 cluster (ignoring, for now, any dependencies that would need to be
 transferred--assume it's a very simple app with no dependencies that aren't
 part of the Spark classpath on the cluster).  This is possible with Hadoop
 by setting hadoop.rpc.socket.factory.class.default to
 org.apache.hadoop.net.SocksSocketFactory and hadoop.socks.server to
 localhost:port on which a SOCKS proxy has been opened via ssh -D to the
 master node.  However, I can't seem to find anything like this for Spark,
 and I only see very few mentions of it on the user list and on
 stackoverflow, with no real answers.  (See links below.)

  I thought I might be able to use the JVM's -DsocksProxyHost and
 -DsocksProxyPort system properties, but it still does not seem to work.
 That is, if I start a SOCKS proxy to my master node using something like
 ssh -D 2600 master node public name then run a simple Spark app that
 calls SparkConf.setMaster(spark://master node private IP:7077), passing
 in JVM args of -DsocksProxyHost=locahost -DsocksProxyPort=2600, the
 driver hangs for a while before finally giving up (Application has been
 killed. Reason: All masters are unresponsive! Giving up.).  It seems like
 it is not even attempting to use the SOCKS proxy.  Do
 -DsocksProxyHost/-DsocksProxyPort not even work for Spark?


 http://stackoverflow.com/questions/28047000/connect-to-spark-through-a-socks-proxy
  (unanswered
 similar question from somebody else about a month ago)
 https://issues.apache.org/jira/browse/SPARK-5004 (unresolved, somewhat
 related JIRA from a few months ago)

  Thanks,
  Jonathan



Using Spark with a SOCKS proxy

2015-03-17 Thread Kelly, Jonathan
I'm trying to figure out how I might be able to use Spark with a SOCKS proxy.  
That is, my dream is to be able to write code in my IDE then run it without 
much trouble on a remote cluster, accessible only via a SOCKS proxy between the 
local development machine and the master node of the cluster (ignoring, for 
now, any dependencies that would need to be transferred--assume it's a very 
simple app with no dependencies that aren't part of the Spark classpath on the 
cluster).  This is possible with Hadoop by setting 
hadoop.rpc.socket.factory.class.default to 
org.apache.hadoop.net.SocksSocketFactory and hadoop.socks.server to 
localhost:port on which a SOCKS proxy has been opened via ssh -D to the 
master node.  However, I can't seem to find anything like this for Spark, and 
I only see very few mentions of it on the user list and on stackoverflow, with 
no real answers.  (See links below.)

I thought I might be able to use the JVM's -DsocksProxyHost and 
-DsocksProxyPort system properties, but it still does not seem to work.  That 
is, if I start a SOCKS proxy to my master node using something like ssh -D 
2600 master node public name then run a simple Spark app that calls 
SparkConf.setMaster(spark://master node private IP:7077), passing in JVM 
args of -DsocksProxyHost=locahost -DsocksProxyPort=2600, the driver hangs for 
a while before finally giving up (Application has been killed. Reason: All 
masters are unresponsive! Giving up.).  It seems like it is not even 
attempting to use the SOCKS proxy.  Do -DsocksProxyHost/-DsocksProxyPort not 
even work for Spark?

http://stackoverflow.com/questions/28047000/connect-to-spark-through-a-socks-proxy
 (unanswered similar question from somebody else about a month ago)
https://issues.apache.org/jira/browse/SPARK-5004 (unresolved, somewhat related 
JIRA from a few months ago)

Thanks,
Jonathan