Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Takeshi Yamamuro
If there is no objection in following responses, I'll wait one more week while watching that PR progress. Once that PR merged, I'll start to prepare the next vote. On Tue, Jan 29, 2019 at 4:57 AM Jungtaek Lim wrote: > Regarding PR 23634, it is waiting for getting consensus on the approach >

Re: Make .unpersist() non-blocking by default?

2019-01-28 Thread Sean Owen
Slight update: broadcast doesn't block by default, but GraphX does: - RDD: true - Broadcast: false - Dataset / DataFrame: false - Graph (in GraphX): true - Pyspark RDD: (no option) - Pyspark Broadcast: false - Pyspark DataFrame: false (Scala) RDD is the odd one out, with its subclasses in

Re: [build system] speeding up maven build building only changed modules compared to master branch

2019-01-28 Thread Reynold Xin
This might be useful to do. BTW, based on my experience with different build systems in the past few years (extensively SBT/Maven/Bazel, and to a less extent Gradle/Cargo), I think the longer term solution is to move to Bazel. It is so much easier to understand and use, and also much more

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Jungtaek Lim
Regarding PR 23634, it is waiting for getting consensus on the approach for the fix, as well as it also needs to have some time to clean up some code and move focus to concern backward compatibility. I'm postponing these works since I haven't reached consensus on the approach. So it may take some

CVE-2018-11760: Apache Spark local privilege escalation vulnerability

2019-01-28 Thread Imran Rashid
Severity: Important Vendor: The Apache Software Foundation Versions affected: All Spark 1.x, Spark 2.0.x, and Spark 2.1.x versions Spark 2.2.0 to 2.2.2 Spark 2.3.0 to 2.3.1 Description: When using PySpark , it's possible for a different local user to connect to the Spark application and

DataSourceV2 sync notes

2019-01-28 Thread Ryan Blue
Hi everyone, here are notes from the last DSv2 sync on 23 January 2019. Here are the highlights: - Agreed that using v2 should not change behavior for file sources. (Cannot make this guarantee for all v1 sources) - Consensus for the approach proposed on the dev list for identifying

Re: Make .unpersist() non-blocking by default?

2019-01-28 Thread Reynold Xin
Seems to make sense to have it false by default. (I agree this deserves a dev list mention though even if there is easy consensus). We should make sure we mark the Jira with releasenotes so we can add it to uograde guide. On Mon, Jan 28, 2019 at 8:47 AM Sean Owen wrote: > Interesting notion at

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Sean Owen
More analysis at https://github.com/apache/spark/pull/23634 It's not a regression, though it does relate to correctness, although somewhat niche. TD, Jose et al, is this a Blocker? and is the fix probably reliable enough to commit now? On Mon, Jan 28, 2019 at 10:59 AM Sandeep Katta wrote: > > I

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Sandeep Katta
I feel this https://issues.apache.org/jira/browse/SPARK-26154 bug should be fixed in this release as it is related to data correctness On Mon, 28 Jan 2019 at 17:55, Takeshi Yamamuro wrote: > Hi, all > > I checked the two issues below had been resolved and there is no blocker > for branch-2.3

Make .unpersist() non-blocking by default?

2019-01-28 Thread Sean Owen
Interesting notion at https://github.com/apache/spark/pull/23650 : .unpersist() takes an optional 'blocking' argument. If true, the call waits until the resource is freed. Otherwise it doesn't. The default looks pretty inconsistent: - RDD: true - Broadcast: true - Dataset / DataFrame: false -

Re: Why outdated third-parties exist on documentation?

2019-01-28 Thread Sean Owen
Nobody's actively monitoring the list or anything. It's also not clear when something has been discontinued; it may still be usable and widely used with no recent project activity. For example ganglia is still, I think, widely used. If you see a project that has totally disappeared or formally

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Takeshi Yamamuro
Hi, all I checked the two issues below had been resolved and there is no blocker for branch-2.3 now, so I'll start prepare RC2 tomorrow. https://issues.apache.org/jira/browse/SPARK-26682 https://issues.apache.org/jira/browse/SPARK-26709 If there are some blockers and critical issues in

Why outdated third-parties exist on documentation?

2019-01-28 Thread Moein Hosseini
Hi everyone, I was taking look at spark documentation about third-party projects and monitoring and realize that many of introduced projects is discontinued. For example BlickDB

Re: [build system] speeding up maven build building only changed modules compared to master branch

2019-01-28 Thread Gabor Somogyi
Do you have some numbers how much is this faster? I'm asking it because previously I've evaluated another plugin and found the following: - Incremental build didn't bring too much even in bigger than spark projects - Incremental test was buggy and sometimes the required tests were not executed