Re: Dataframe multiple joins with same dataframe not able to resolve correct join columns

2018-07-11 Thread Ben White
Sounds like the same root cause as SPARK-14948 or SPARK-10925. A workaround is to "clone" df3 like this: val df3clone = df3.toDF(df.schema.fieldNames:_*) Then use df3clone in place of df3 in the second join. On Wed, Jul 11, 2018 at 2:52 PM Nirav Patel wrote: > I am trying to joind df1 with

Dataframe multiple joins with same dataframe not able to resolve correct join columns

2018-07-11 Thread Nirav Patel
I am trying to joind df1 with df2 and result of which to again with df2. df is a common dataframe. val df3 = df1 .join(*df2*, df1("PARTICIPANT_ID") === df2("PARTICIPANT_ID") and df1("BUSINESS_ID") === df2("BUSINESS_ID")) .drop(df1("BUSINESS_ID")) //dropping

CVE-2018-8024 Apache Spark XSS vulnerability in UI

2018-07-11 Thread Sean Owen
Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Spark versions through 2.1.2 Spark 2.2.0 through 2.2.1 Spark 2.3.0 Description: In Apache Spark up to and including 2.1.2, 2.2.0 to 2.2.1, and 2.3.0, it's possible for a malicious user to construct a URL pointing to a

CVE-2018-1334 Apache Spark local privilege escalation vulnerability

2018-07-11 Thread Sean Owen
Severity: High Vendor: The Apache Software Foundation Versions affected: Spark versions through 2.1.2 Spark 2.2.0 to 2.2.1 Spark 2.3.0 Description: In Apache Spark up to and including 2.1.2, 2.2.0 to 2.2.1, and 2.3.0, when using PySpark or SparkR, it's possible for a different local user to

Spark accessing fakes3

2018-07-11 Thread Patrick Roemer
Hi, does anybody if (and how) it's possible to get a (dev-local) Spark installation to talk to fakes3 for s3[n|a]:// URLs? I have managed to connect to AWS S3 from my local installation by adding hadoop-aws and aws-java-sdk to jars, using s3:// URLs as arguments for SparkContext#textFile(), but

Re: Spark on Mesos - Weird behavior

2018-07-11 Thread Pavel Plotnikov
Oh, sorry, i missed that you use spark without dynamic allocation. Anyway, i don't know does this parameters works without dynamic allocation. On Wed, Jul 11, 2018 at 5:11 PM Thodoris Zois wrote: > Hello, > > Yeah you are right, but I think that works only if you use Spark dynamic > allocation.

Re: Spark on Mesos - Weird behavior

2018-07-11 Thread Thodoris Zois
Hello, Yeah you are right, but I think that works only if you use Spark dynamic allocation. Am I wrong? -Thodoris > On 11 Jul 2018, at 17:09, Pavel Plotnikov > wrote: > > Hi, Thodoris > You can configure resources per executor and manipulate with number of > executers instead using

Re: Spark on Mesos - Weird behavior

2018-07-11 Thread Pavel Plotnikov
Hi, Thodoris You can configure resources per executor and manipulate with number of executers instead using spark.max.cores. I think spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors configuration values can help you. On Tue, Jul 10, 2018 at 5:07 PM Thodoris Zois

Re: DataTypes of an ArrayType

2018-07-11 Thread Patrick McCarthy
Arrays need to be a single type, I think you're looking for a Struct column. See: https://medium.com/@mrpowers/adding-structtype-columns-to-spark-dataframes-b44125409803 On Wed, Jul 11, 2018 at 6:37 AM, dimitris plakas wrote: > Hello everyone, > > I am new to Pyspark and i would like to ask if

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-11 Thread Tien Dat
Thanks for your suggestion. I have been checking Spark-jobserver. Just a off-topic question about this project: Does Apache Spark project have any support/connection to this Spark-jobserver project? I noticed that they do not have release for the newest version of Spark (e.g., 2.3.1). As you

DataTypes of an ArrayType

2018-07-11 Thread dimitris plakas
Hello everyone, I am new to Pyspark and i would like to ask if there is any way to have a Dataframe column which is ArrayType and have a different DataType for each elemnt of the ArrayType. For example to have something like : StructType([StructField("Column_Name",