Howdy Andrew,

I think I am running into the same issue [1] as you.  It appears that Spark
opens up dynamic / ephemera [2] ports for each job on the shell and the
workers.  As you are finding out, this makes securing and managing the
network for Spark very difficult.

> Any idea how to restrict the 'Workers' port range?
The port range can be found by running:
   $ sysctl net.ipv4.ip_local_port_range
   net.ipv4.ip_local_port_range = 32768 61000

With that being said, a couple avenues you may try:
      Limit the dynamic ports [3] to a more reasonable number and open all
      of these ports on your firewall; obviously, this might have
      unintended consequences like port exhaustion.
      Secure the network another way like through a private VPN; this may
      reduce Spark's performance.

If you have other workarounds, I am all ears --- please let me know!
Jacob

[1]
http://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html
[2] http://en.wikipedia.org/wiki/Ephemeral_port
[3]
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075



From:   Andrew Lee <alee...@hotmail.com>
To:     "user@spark.apache.org" <user@spark.apache.org>
Date:   05/02/2014 03:15 PM
Subject:        RE: spark-shell driver interacting with Workers in YARN mode -
            firewall blocking communication



Hi Yana,

I did. I configured the the port in spark-env.sh, the problem is not the
driver port which is fixed.
it's the Workers port that are dynamic every time when they are launched in
the YARN container. :-(

Any idea how to restrict the 'Workers' port range?

Date: Fri, 2 May 2014 14:49:23 -0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org

I think what you want to do is set spark.driver.port to a fixed port.


On Fri, May 2, 2014 at 1:52 PM, Andrew Lee <alee...@hotmail.com> wrote:
      Hi All,

      I encountered this problem when the firewall is enabled between the
      spark-shell and the Workers.

      When I launch spark-shell in yarn-client mode, I notice that Workers
      on the YARN containers are trying to talk to the driver
      (spark-shell), however, the firewall is not opened and caused
      timeout.

      For the Workers, it tries to open listening ports on 54xxx for each
      Worker? Is the port random in such case?
      What will be the better way to predict the ports so I can configure
      the firewall correctly between the driver (spark-shell) and the
      Workers? Is there a range of ports we can specify in the
      firewall/iptables?

      Any ideas?

Reply via email to