Did you try ssh tunneling instead of SOCKS?
Thanks
Best Regards
On Wed, Mar 18, 2015 at 5:45 AM, Kelly, Jonathan jonat...@amazon.com
wrote:
I'm trying to figure out how I might be able to use Spark with a SOCKS
proxy. That is, my dream is to be able to write code in my IDE then run it
without much trouble on a remote cluster, accessible only via a SOCKS proxy
between the local development machine and the master node of the
cluster (ignoring, for now, any dependencies that would need to be
transferred--assume it's a very simple app with no dependencies that aren't
part of the Spark classpath on the cluster). This is possible with Hadoop
by setting hadoop.rpc.socket.factory.class.default to
org.apache.hadoop.net.SocksSocketFactory and hadoop.socks.server to
localhost:port on which a SOCKS proxy has been opened via ssh -D to the
master node. However, I can't seem to find anything like this for Spark,
and I only see very few mentions of it on the user list and on
stackoverflow, with no real answers. (See links below.)
I thought I might be able to use the JVM's -DsocksProxyHost and
-DsocksProxyPort system properties, but it still does not seem to work.
That is, if I start a SOCKS proxy to my master node using something like
ssh -D 2600 master node public name then run a simple Spark app that
calls SparkConf.setMaster(spark://master node private IP:7077), passing
in JVM args of -DsocksProxyHost=locahost -DsocksProxyPort=2600, the
driver hangs for a while before finally giving up (Application has been
killed. Reason: All masters are unresponsive! Giving up.). It seems like
it is not even attempting to use the SOCKS proxy. Do
-DsocksProxyHost/-DsocksProxyPort not even work for Spark?
http://stackoverflow.com/questions/28047000/connect-to-spark-through-a-socks-proxy
(unanswered
similar question from somebody else about a month ago)
https://issues.apache.org/jira/browse/SPARK-5004 (unresolved, somewhat
related JIRA from a few months ago)
Thanks,
Jonathan