Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Wenchen Fan
Just remove `val sparkConf = sc.getConf`, `RDD.sparkContext` is already defined On Thu, Mar 1, 2018 at 3:13 AM, Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > Wenchen, > > > > Thank you very much for your prompt reply and pointer! > > > > As I think through, it makes sense that since

[ANNOUNCE] Announcing Apache Spark 2.3.0

2018-02-28 Thread Sameer Agarwal
Hi all, Apache Spark 2.3.0 is the fourth major release in the 2.x line. This release adds support for continuous processing in structured streaming along with a brand new Kubernetes scheduler backend. Other major updates include the new data source and structured streaming v2 APIs, a standard

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Thakrar, Jayesh
Wenchen, Thank you very much for your prompt reply and pointer! As I think through, it makes sense that since my custom RDD is instantiated on the driver, get whatever things I need from the SparkContext and assign them to instance variables. However the "RDD.SparkContext" and the Scala magic

Re: SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Wenchen Fan
My understanding: RDD is also a driver side stuff like SparkContext, works like a handler to your distributed data on the cluster. However, `RDD.compute` (defines how to produce data for each partition) needs to be executed on the remote nodes. It's more convenient to make RDD serializable, and

SparkContext - parameter for RDD, but not serializable, why?

2018-02-28 Thread Thakrar, Jayesh
Hi All, I was just toying with creating a very rudimentary RDD datasource to understand the inner workings of RDDs. It seems that one of the constructors for RDD has a parameter of type SparkContext, but it (apparently) exists on the driver only and is not serializable. Consequently, any

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Marcelo Vanzin
The mechanism to close idle connections already exists. It doesn't mean you can just use it as is in existing connections. So if you want to go and fix that, you're going to have to figure out that part. Or figure out a different solution. Either way, file a bug so that this is properly tracked.

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Sandeep Katta
Yeh monitor is present but for some cases like long running job I found App master is idle.so it will end up closing the App master’s channel so job will not be completed. So needed a mechanism to close only invalid connections . On Wed, 28 Feb 2018 at 10:54 PM, Marcelo Vanzin

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Marcelo Vanzin
Spark already has code to monitor idle connections and close them. That's in TransportChannelHandler.java. If there's anything to do here, it's to allow all users of the transport library to support the "close idle connections" feature of that class. On Wed, Feb 28, 2018 at 9:07 AM,

[Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread sandeep_katta
In case of client mode App Master and Driver are in different JVM process,the port opened by the Driver is vulnerable for flooding attacks as it is not closing the IDLE connections. I am thinking to fix this issue using below mechanism 1.Expose configuration to close the IDLE connections as

Re: Please keep s3://spark-related-packages/ alive

2018-02-28 Thread Marton, Elek
2. *Apache mirrors are inconvenient to use.* When you download something from an Apache mirror, you get a link like this one . Instead of automatically redirecting you to your download, though,