Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-13 Thread Ryan Blue
Reynold, did you get a chance to look at my response about using `Expression`? I think that it's okay since it is already exposed in the v2 data source API. Plus, I wouldn't want to block this on building a public expression API that is more stable. I think that's the only objection to this SPIP.

[discuss][minor] impending python 3.x jenkins upgrade... 3.5.x? 3.6.x?

2018-08-13 Thread shane knapp
hey everyone! i was checking out the EOL/release cycle for python 3.5 and it looks like we'll have 3.5.6 released in early 2019. this got me to thinking: instead of 3.5, what about 3.6? i looked around, and according to the 'docs' and 'collective wisdom of the internets', 3.5 and 3.6 should be

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Tom Graves
> I mean, what are concrete steps beyond saying this is a problem? That's the >important thing to discuss. Sorry I'm a bit confused by your statement but also think I agree.  I started this thread for this reason. I pointed out that I thought it was a problem and also brought up things I

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-08-13 Thread Marcelo Vanzin
On this topic... when I worked on 2.3.1 and caused this breakage by deleting and old release, I tried to write some code to make this more automatic: https://github.com/vanzin/spark/tree/SPARK-24532 I just found that the code was a little too large and hacky for what it does (find out the latest

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-13 Thread Xingbo Jiang
I'm working on the fix of SPARK-23243 and should be able push another commit in 1~2 days. More detailed discussions can go to the PR. Thanks for pushing this issue forward! I really appreciate efforts by submit PRs or involve in the discussions

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Sean Owen
Generally: if someone thinks correctness fix X should be backported further, I'd say just do it, if it's to an active release branch (see below). Anything that important has to outweigh most any other concern, like behavior changes. On Mon, Aug 13, 2018 at 11:08 AM Tom Graves wrote: > I'm not

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Imran Rashid
I don't think we've been great about backporting correctness issues. This is one example which comes to mind (not to point fingers, just the one I know of immediately): https://issues.apache.org/jira/browse/SPARK-23207 we also let another related issue slide for quite a while:

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Tom Graves
Not a specific jira but was looking at all the recent jiras with the "correctness" label and things are definitely being handled in consistently in my opinion (https://issues.apache.org/jira/issues/?jql=labels+%3D+correctness).    The inconsistencies are in the things I've mentioned above. 

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-13 Thread Tom Graves
I agree with Imran, we need to fix SPARK-23243 and any correctness issues for that matter. Tom On Wednesday, August 8, 2018, 9:06:43 AM CDT, Imran Rashid wrote: On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan wrote: SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect

CVE-2018-11770: Apache Spark standalone master, Mesos REST APIs not controlled by authentication

2018-08-13 Thread Sean Owen
Severity: Medium Vendor: The Apache Software Foundation Versions Affected: Spark versions from 1.3.0, running standalone master with REST API enabled, or running Mesos master with cluster mode enabled Description: >From version 1.3.0 onward, Spark's standalone master exposes a REST API for job

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Sean Owen
I doubt the question is whether people want to take such issues seriously -- all else equal, of course everyone does. A JIRA label plus place in the release notes sounds like a good concrete step that isn't happening consistently now. That's a clear flag that at least one person believes issue X

[DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Tom Graves
Hello all, I've noticed some inconsistencies in the way we are handling data loss/correctness issues.  I think we need to take these very seriously as they could be causing businesses real money and impacting real decisions and business logic.   I would like to discuss how we can make sure