Howdy Andrew,

This is a standalone cluster.  And, yes, if my understanding of Spark
terminology is correct, you are correct about the port ownerships.

Jacob

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075



From:   Andrew Ash <and...@andrewash.com>
To:     user@spark.apache.org
Date:   05/28/2014 05:18 PM
Subject:        Re: Comprehensive Port Configuration reference?



Hmm, those do look like 4 listening ports to me.  PID 3404 is an executor
and PID 4762 is a worker?  This is a standalone cluster?


On Wed, May 28, 2014 at 8:22 AM, Jacob Eisinger <jeis...@us.ibm.com> wrote:
  Howdy Andrew,

  Here is what I ran before an application context was created (other
  services have been deleted):


        # netstat -l -t tcp -p  --numeric-ports

        Active Internet connections (only servers)

        Proto Recv-Q Send-Q Local Address           Foreign Address
        State       PID/Program name
        tcp6       0      0 10.90.17.100:8888       :::*
        LISTEN      4762/java
        tcp6       0      0 :::8081                 :::*
        LISTEN      4762/java

  And, then while the application context is up:
        # netstat -l -t tcp -p  --numeric-ports

        Active Internet connections (only servers)

        Proto Recv-Q Send-Q Local Address           Foreign Address
        State       PID/Program name
        tcp6       0      0 10.90.17.100:8888       :::*
        LISTEN      4762/java
        tcp6       0      0 :::57286                :::*
        LISTEN      3404/java
        tcp6       0      0 10.90.17.100:38118      :::*
        LISTEN      3404/java
        tcp6       0      0 10.90.17.100:35530      :::*
        LISTEN      3404/java
        tcp6       0      0 :::60235                :::*
        LISTEN      3404/java
        tcp6       0      0 :::8081                 :::*
        LISTEN      4762/java

  My understanding is that this says four ports are open.  Is 57286 and
  60235 not being used?


  Jacob

  Jacob D. Eisinger
  IBM Emerging Technologies
  jeis...@us.ibm.com - (512) 286-6075

  Inactive hide details for Andrew Ash ---05/25/2014 06:25:18 PM---Hi
  Jacob, The config option spark.history.ui.port is new for 1Andrew Ash
  ---05/25/2014 06:25:18 PM---Hi Jacob, The config option
  spark.history.ui.port is new for 1.0  The problem that


  From: Andrew Ash <and...@andrewash.com>
  To: user@spark.apache.org
  Date: 05/25/2014 06:25 PM

  Subject: Re: Comprehensive Port Configuration reference?



  Hi Jacob,

  The config option spark.history.ui.port is new for 1.0  The problem that
  History server solves is that in non-Standalone cluster deployment modes
  (Mesos and YARN) there is no long-lived Spark Master that can store logs
  and statistics about an application after it finishes.  History server is
  the UI that renders logged data from applications after they complete.

  Read more here: https://issues.apache.org/jira/browse/SPARK-1276 and
  https://github.com/apache/spark/pull/204

  As far as the two vs four dynamic ports, are those all listening ports?
  I did observe 4 ports in use, but only two of them were listening.  The
  other two were the random ports used for responses on outbound
  connections, the source port of the (srcIP, srcPort, dstIP, dstPort)
  tuple that uniquely identifies a TCP socket.

  
http://unix.stackexchange.com/questions/75011/how-does-the-server-find-out-what-client-port-to-send-to


  Thanks for taking a look through!

  I also realized that I had a couple mistakes with the 0.9 to 1.0
  transition so appropriately documented those now as well in the updated
  PR.

  Cheers!
  Andrew



  On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger <jeis...@us.ibm.com>
  wrote:
        Howdy Andrew,

        I noticed you have a configuration item that we were not aware of:
        spark.history.ui.port .  Is that new for 1.0?

        Also, we noticed that the Workers and the Drivers were opening up
        four dynamic ports per application context.  It looks like you were
        seeing two.

        Everything else looks like it aligns!
        Jacob




        Jacob D. Eisinger
        IBM Emerging Technologies
        jeis...@us.ibm.com - (512) 286-6075

        Inactive hide details for Andrew Ash ---05/23/2014 10:30:58 AM---Hi
        everyone, I've also been interested in better understandingAndrew
        Ash ---05/23/2014 10:30:58 AM---Hi everyone, I've also been
        interested in better understanding what ports are used where

        From: Andrew Ash <and...@andrewash.com>
        To: user@spark.apache.org
        Date: 05/23/2014 10:30 AM
        Subject: Re: Comprehensive Port Configuration reference?





        Hi everyone,

        I've also been interested in better understanding what ports are
        used where and the direction the network connections go.  I've
        observed a running cluster and read through code, and came up with
        the below documentation addition.

        https://github.com/apache/spark/pull/856

        Scott and Jacob -- it sounds like you two have pulled together some
        of this yourselves for writing firewall rules.  Would you mind
        taking a look at this pull request and confirming that it matches
        your observations?  Wrong documentation is worse than no
        documentation, so I'd like to make sure this is right.

        Cheers,
        Andrew


        On Wed, May 7, 2014 at 10:19 AM, Mark Baker <dist...@acm.org>
        wrote:
              On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger <
              jeis...@us.ibm.com> wrote:
              > In a nut shell, Spark opens up a couple of well known
              ports.  And,then the workers and the shell open up dynamic
              ports for each job.  These dynamic ports make securing the
              Spark network difficult.

              Indeed.

              Judging by the frequency with which this topic arises, this
              is a
              concern for many (myself included).

              I couldn't find anything in JIRA about it, but I'm curious to
              know
              whether the Spark team considers this a problem in need of a
              fix?

              Mark.







Reply via email to