Re: [DISCUSSION]JDK11 for Apache 2.x?

2019-08-27 Thread Sean Owen
lve to Spark 3, we could combine it with Java 11. > > On the other hand, not everybody may think this way and it may slow down the > adoption of Spark 3… > > However, I concur with Sean, I don’t think another 2.x is needed for Java 11. > > > On Aug 27, 2019, at 3:09 PM, Sean Owen

Re: [DISCUSSION]JDK11 for Apache 2.x?

2019-08-27 Thread Sean Owen
I think one of the key problems here are the required dependency upgrades. It would mean many minor breaking changes and a few bigger ones, notably around Hive, and forces a scala 2.12-only update. I think my question is whether that even makes sense as a minor release? it wouldn't be backwards

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-27 Thread Sean Owen
+1 - license and signature looks OK, the docs look OK, the artifacts seem to be in order. Tests passed for me when building from source with most common profiles set. On Mon, Aug 26, 2019 at 3:28 PM Kazuaki Ishizaki wrote: > > Please vote on releasing the following candidate as Apache Spark

Re: [VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Sean Owen
+1 as per response to RC1. The existing issues identified there seem to have been fixed. On Mon, Aug 26, 2019 at 2:45 AM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 29th 1AM PST and passes if a

Re: JDK11 Support in Apache Spark

2019-08-26 Thread Sean Owen
Bringing a side conversation back to main: good news / bad news. We most definitely want one build to run on JDK 8 and JDK 11. That is actually what both of the JDK 11 jobs do right now, so I believe the passing Jenkins job suggests that already works. The downside is I think we haven't

Re: How to load Python Pickle File in Spark Data frame

2019-08-26 Thread Sean Owen
Yes, this does not read raw pickle files. It reads files written in the standard Spark/Hadoop form for binary objects (SequenceFiles) but uses Python pickling for the serialization. See the docs, which say this reads what saveAsPickleFile() writes. On Mon, Aug 26, 2019 at 12:23 AM hxngillani

Re: Unmarking most things as experimental, evolving for 3.0?

2019-08-22 Thread Sean Owen
> >> +1 for unmarking old ones (made in `2.3.x` and before). >> Thank you, Sean. >> >> Bests, >> Dongjoon. >> >> On Wed, Aug 21, 2019 at 6:46 PM Sean Owen wrote: >>> >>> There are currently about 130 things marked as 'experimental'

Unmarking most things as experimental, evolving for 3.0?

2019-08-21 Thread Sean Owen
There are currently about 130 things marked as 'experimental' in Spark, and some have been around since Spark 1.x. A few may be legitimately still experimental (e.g. barrier mode), but, would it be safe to say most of these annotations should be removed for 3.0? What's the theory for evolving vs

Re: [VOTE] Release Apache Spark 2.4.4 (RC1)

2019-08-20 Thread Sean Owen
Sounds fine, we probably needed SPARK-28775 anyway. I merged that and SPARK-28749. It looks like it's just the one you're talking about right now, SPARK-28699. The rest of the tests seemed to pass OK, release looks good, but bears more testing by everyone out there before a next RC. On Tue, Aug

Re: [VOTE] Release Apache Spark 2.4.4 (RC1)

2019-08-19 Thread Sean Owen
Things are looking pretty good so far, but a few notes: I thought we might need this PR to make the 2.12 build of 2.4.x not try to build Kafka 0.8 support, but, I'm not seeing that 2.4.x + 2.12 builds or tests it? https://github.com/apache/spark/pull/25482 I can merge this to 2.4 shortly anyway,

Re: Release Spark 2.3.4

2019-08-16 Thread Sean Owen
I think it's fine to do these in parallel, yes. Go ahead if you are willing. On Fri, Aug 16, 2019 at 9:48 AM Kazuaki Ishizaki wrote: > > Hi, All. > > Spark 2.3.3 was released six months ago (15th February, 2019) at > http://spark.apache.org/news/spark-2-3-3-released.html. And, about 18 months

Re: Ask for ARM CI for spark

2019-08-15 Thread Sean Owen
en-hadoop-2.6-ubuntu-testing/lastBuild/consoleFull > > Best regards > > ZhaoBo > > > > > [image: Mailtrack] > <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;> > Sender > notified by > Mailtrack > <https://mailtrack.

Re: Release Apache Spark 2.4.4

2019-08-15 Thread Sean Owen
While we're on the topic: In theory, branch 2.3 is meant to be unsupported as of right about now. There are 69 fixes in branch 2.3 since 2.3.3 was released in Februrary: https://issues.apache.org/jira/projects/SPARK/versions/12344844 Some look moderately important. Should we also, or first,

Re: Ask for ARM CI for spark

2019-08-15 Thread Sean Owen
nlab/spark/pull/17/ , there are several things I > want to talk about: > > First, about the failed tests: > 1.we have fixed some problems like > https://github.com/apache/spark/pull/25186 and > https://github.com/apache/spark/pull/25279, thanks sean owen and others > to help us. >

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Sean Owen
Seems fine to me if there are enough valuable fixes to justify another release. If there are any other important fixes imminent, it's fine to wait for those. On Tue, Aug 13, 2019 at 6:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today

Re: My curation of pending structured streaming PRs to review

2019-08-13 Thread Sean Owen
General tips: - dev@ is not usually the right place to discuss _specific_ changes except once in a while to call attention - Ping the authors of the code being changed directly - Tighten the change if possible - Tests, reproductions, docs, etc help prove the change - Bugs are more important than

Re: [SPARK-23207] Repro

2019-08-09 Thread Sean Owen
Interesting but I'd put this on the JIRA, and also test vs master first. It's entirely possible this is something else that was subsequently fixed, and maybe even backported for 2.4.4. (I can't quite reproduce it - just makes the second job fail, which is also puzzling) On Fri, Aug 9, 2019 at

Re: Recognizing non-code contributions

2019-08-06 Thread Sean Owen
On Tue, Aug 6, 2019 at 11:45 AM Myrle Krantz wrote: > I had understood your position to be that you would be willing to make at > least some non-coding contributors to committers but that your "line" is > somewhat different than my own. My response to you assumed that position on > your

Re: Recognizing non-code contributions

2019-08-06 Thread Sean Owen
On Tue, Aug 6, 2019 at 10:46 AM Myrle Krantz wrote: >> You can tell there's a range of opinions here. I'm probably less >> 'conservative' about adding committers than most on the PMC, right or >> wrong, but more conservative than some at the ASF. I think there's >> room to inch towards the middle

Re: Recognizing non-code contributions

2019-08-06 Thread Sean Owen
On Tue, Aug 6, 2019 at 1:14 AM Myrle Krantz wrote: > If someone makes a commit who you are not expecting to make a commit, or in > an area you weren't expecting changes in, you'll notice that, right? Not counterarguments, but just more color on the hesitation: - Probably, but it's less obvious

Re: Recognizing non-code contributions

2019-08-05 Thread Sean Owen
On Mon, Aug 5, 2019 at 3:50 AM Myrle Krantz wrote: > So... events coordinators? I'd still make them committers. I guess I'm > still struggling to understand what problem making people VIP's without > giving them committership is trying to solve. We may just agree to disagree, which is fine,

Re: Recognizing non-code contributions

2019-08-04 Thread Sean Owen
On Sun, Aug 4, 2019 at 11:21 AM Myrle Krantz wrote: > Let me make a guess at what you are trying to accomplish with it. Correct me > please if I'm wrong: > * You want to encourage contributions that aren't just code contributions. > You recognize for example that good documentation is

Re: Recognizing non-code contributions

2019-08-02 Thread Sean Owen
to justify the process, and may wade too deeply into controversies about whether this is just extra gatekeeping vs something helpful. On Thu, Aug 1, 2019 at 11:09 PM Sean Owen wrote: > > (Let's move this thread to dev@ now as it is a general and important > community question. This was

Re: Recognizing non-code contributions

2019-08-01 Thread Sean Owen
019 at 6:13 PM Hyukjin Kwon wrote: >>>> >>>> I agree with Sean in general, in particular, commit bit. >>>> >>>> Personal thought: >>>> I think committer should at least be used to the dev at some degree as >>>> primary. >>>&

Fwd: The result of Math.log(3.0) is different on x86_64 and aarch64?

2019-07-29 Thread Sean Owen
and aarch64? To: Sean Owen Sorry to disturb you, I forward the jdk-dev email to you, maybe you are interesting :) -- Forwarded message - From: Pengfei Li (Arm Technology China) Date: Mon, Jul 29, 2019 at 5:52 PM Subject: RE: The result of Math.log(3.0) is different on x86_64

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-29 Thread Sean Owen
TL;DR is: take the below as feedback to consider, and proceed as you see fit. Nobody's suggesting you can't do this. On Mon, Jul 29, 2019 at 2:58 AM Lars Francke wrote: > The way I read your point is that anyone can publish material (which includes > source code) under the ALv2 outside of the

Re: Ask for ARM CI for spark

2019-07-27 Thread Sean Owen
Great thanks - we can take this to JIRAs now. I think it's worth changing the implementation of atanh if the test value just reflects what Spark does, and there's evidence is a little bit inaccurate. There's an equivalent formula which seems to have better accuracy. On Fri, Jul 26, 2019 at 10:02

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-26 Thread Sean Owen
On Fri, Jul 26, 2019 at 4:01 PM Lars Francke wrote: > I understand why it might be seen that way and we need to make sure to point > out that we have no intention of becoming "The official Apache Spark > training" because that's not our intention at all. Of course that's the intention; the

Re: Apache Training contribution for Spark - Feedback welcome

2019-07-26 Thread Sean Owen
Generally speaking, I think we want to encourage more training and tutorial content out there, for sure, so, the more the merrier. My reservation here is that as an Apache project, it might appear to 'bless' one set of materials as authoritative over all the others out there. And there are

Re: Ask for ARM CI for spark

2019-07-26 Thread Sean Owen
ob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L764 >> >> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L792 >> the test passed, and there are more than one executor up, not sure whether >&g

Re: Ask for ARM CI for spark

2019-07-17 Thread Sean Owen
On Wed, Jul 17, 2019 at 6:28 AM Tianhua huang wrote: > Two failed and the reason is 'Can't find 1 executors before 1 > milliseconds elapsed', see below, then we try increase timeout the tests > passed, so wonder if we can increase the timeout? and here I have another > question about >

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Sean Owen
We will certainly want a 2.4.4 release eventually. In fact I'd expect 2.4.x gets maintained for longer than the usual 18 months, as it's the last 2.x branch. It doesn't need to happen before 3.0, but could. Usually maintenance releases happen 3-4 months apart and the last one was 2 months ago. If

Opinions wanted: how much to match PostgreSQL semantics?

2019-07-08 Thread Sean Owen
See the particular issue / question at https://github.com/apache/spark/pull/24872#issuecomment-509108532 and the larger umbrella at https://issues.apache.org/jira/browse/SPARK-27764 -- Dongjoon rightly suggests this is a broader question.

Re: Sample date_trunc error for webpage (https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc )

2019-07-07 Thread Sean Owen
! On Sun, Jul 7, 2019 at 1:56 PM Russell Spitzer wrote: > The args look like they are in the wrong order in the doc > > On Sun, Jul 7, 2019, 1:50 PM Sean Owen wrote: > >> binggan1989, I don't see any problem in that snippet. What are you >> referring to? >> >>

Re: Sample date_trunc error for webpage (https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc )

2019-07-07 Thread Sean Owen
binggan1989, I don't see any problem in that snippet. What are you referring to? On Sun, Jul 7, 2019, 2:22 PM Chris Lambertus wrote: > Spark, > > We received this message. I have not ACKd it. > > -Chris > INFRA > > > Begin forwarded message: > > *From: *"binggan1989" > *Subject: **Sample

Re: Disabling `Merge Commits` from GitHub Merge Button

2019-07-01 Thread Sean Owen
I'm using the merge script in both repos. I think that was the best practice? So, sure, I'm fine with disabling it. On Mon, Jul 1, 2019 at 3:53 PM Dongjoon Hyun wrote: > > Hi, Apache Spark PMC members and committers. > > We are using GitHub `Merge Button` in `spark-website` repository > because

Re: Timeline for Spark 3.0

2019-06-28 Thread Sean Owen
That's a good question. Although we had penciled in 'middle of the year' I don't think we're in sight of a QA phase just yet, as I believe some key items are still in progress. I'm thinking of the Hive update, and DS v2 work (?). I'm also curious to hear what broad TODOs people see for 3.0? we

Re: Jackson version updation

2019-06-28 Thread Sean Owen
https://github.com/apache/spark/blob/branch-2.4/pom.xml#L161 Correct, because it would introduce behavior changes. On Fri, Jun 28, 2019 at 3:54 AM Pavithra R wrote: > In spark master branch, the version of Jackson jars have been upgraded to > 2.9.9 > > >

Re: Ask for ARM CI for spark

2019-06-26 Thread Sean Owen
;> ARM-related issues. I'd be happy to help if you like. And could you give >>>>> the trace link of this issue, then I can check it is fixed or not, thank >>>>> you. >>>>> As far as I know the old versions of spark support ARM, and now the new >>

Re: Java version for building Spark

2019-06-24 Thread Sean Owen
"The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.5.4 and Java 8." It doesn't depend on a particular version of Java 8. Installing it is platform-dependent. On Mon, Jun 24, 2019 at 6:43 PM Valeriy Trofimov wrote: > > Hi All, > > What

Re: sparkmaster-test-sbt-hadoop-2.7 failing RAT check

2019-06-24 Thread Sean Owen
(We have two PRs to patch it up anyway already) On Mon, Jun 24, 2019 at 11:39 AM shane knapp wrote: > > i'm aware and will be looking in to this later today. > > see: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/6043/console > > -- > Shane Knapp > UC Berkeley

Re: Ask for ARM CI for spark

2019-06-19 Thread Sean Owen
I'd begin by reporting and fixing ARM-related issues in the build. If they're small, of course we should do them. If it requires significant modifications, we can discuss how much Spark can support ARM. I don't think it's yet necessary for the Spark project to run these CI builds until that point,

Re: Spark 2.4.3 source download is a dead link

2019-06-18 Thread Sean Owen
Huh, I don't know how long that's been a bug, but the JS that creates the filename with .replace doesn't seem to have ever worked? https://github.com/apache/spark-website/pull/207 On Tue, Jun 18, 2019 at 4:07 AM Olivier Girardot wrote: > > Hi everyone, > FYI the spark source download link on

jQuery 3.4.1 update

2019-06-14 Thread Sean Owen
Just surfacing this change as it's probably pretty good to go, but, a) I'm not a jQuery / JS expert and b) we don't have comprehensive UI tests. https://github.com/apache/spark/pull/24843 I'd like to get us up to a modern jQuery for 3.0, to keep up with security fixes (which was the minor

Master maven build failing for 6 days -- may need some more eyes

2019-05-30 Thread Sean Owen
I might need some help figuring this out. The master Maven build has been failing for almost a week, and I'm having trouble diagnosing why. Of course, the PR builder has been fine. First one seems to be:

Re: Should python-2 be supported in Spark 3.0?

2019-05-29 Thread Sean Owen
Deprecated -- certainly and sooner than later. I don't have a good sense of the overhead of continuing to support Python 2; is it large enough to consider dropping it in Spark 3.0? On Wed, May 29, 2019 at 11:47 PM Xiangrui Meng wrote: > > Hi all, > > I want to revive this old thread since no

Re: Interesting implications of supporting Scala 2.13

2019-05-29 Thread Sean Owen
I think the particular issue here isn't resolved by scala-collection-compat: TraversableOnce goes away. However I hear that maybe Scala 2.13 retains it as a deprecated alias, which might help. On Wed, May 29, 2019 at 4:59 PM antonkulaga wrote: > > There is

Re: RDD object Out of scope.

2019-05-21 Thread Sean Owen
I'm not clear what you're asking. An RDD itself is just an object in the JVM. It will be garbage collected if there are no references. What else would there be to clean up in your case? ContextCleaner handles cleaned up of persisted RDDs, etc. On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Sean Owen
it but i am > not too familiar with the pom file. > > regarding jline you only run into this if you use spark-shell (and it isnt > always reproducible it seems). see SPARK-25783 > best, > koert > > > > > On Mon, May 20, 2019 at 5:43 PM Sean Owen wrote: &g

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-20 Thread Sean Owen
Re: 1), I think we tried to fix that on the build side and it requires flags that not all tar versions (i.e. OS X) have. But that's tangential. I think the Avro + Parquet dependency situation is generally problematic -- see JIRA for some details. But yes I'm not surprised if Spark has a different

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Sean Owen
am good with 'Incomplete' too. >>>> >>>> 2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon 님이 작성: >>>> >>>>> I actually recently used 'Incomplete' a bit when the JIRA is >>>>> basically too poorly formed (like just copying and pasting an error) .

Re: Signature verification failed

2019-05-18 Thread Sean Owen
Moving to dev@ Ah, looks like it was added to https://dist.apache.org/repos/dist/dev/spark/KEYS but not the final dist KEYS file. I just copied it over now. On Sat, May 18, 2019 at 7:12 AM Andreas Költringer wrote: > > Hi, > > I wanted to download and verify (via signature) an Apache Spark

Re: Access to live data of cached dataFrame

2019-05-17 Thread Sean Owen
A cached DataFrame isn't supposed to change, by definition. You can re-read each time or consider setting up a streaming source on the table which provides a result that updates as new data comes in. On Fri, May 17, 2019 at 1:44 PM Tomas Bartalos wrote: > > Hello, > > I have a cached dataframe:

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Sean Owen
t;Timed Out" / "Cannot Reproduce", not "Fixed"). Using a label >> makes it easier to audit what was closed, simplifying the process of >> identifying and re-opening valid issues caught in our dragnet. >> >> >> On Wed, May 15, 2019 at 7:19 AM Se

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Sean Owen
I gave up looking through JIRAs a long time ago, so, big respect for continuing to try to triage them. I am afraid we're missing a few important bug reports in the torrent, but most JIRAs are not well-formed, just questions, stale, or simply things that won't be added. I do think it's important to

Re: adding shutdownmanagerhook to spark.

2019-05-13 Thread Sean Owen
Spark just adds a hook to the mechanism that Hadoop exposes. You can do the same. You shouldn't use Spark's. On Mon, May 13, 2019 at 6:11 PM Nasrulla Khan Haris wrote: > > HI All, > > > > I am trying to add shutdown hook, but looks like shutdown manager object > requires the package to be spark

Re: Interesting implications of supporting Scala 2.13

2019-05-11 Thread Sean Owen
ave to deal with all the other points you raised >> when we do cross that bridge, but hopefully those are things we can cover in >> a minor release. >> >> On Fri, May 10, 2019 at 2:31 PM Sean Owen wrote: >>> >>> I really hope we don't have to have sepa

Re: Interesting implications of supporting Scala 2.13

2019-05-10 Thread Sean Owen
eat idea to make changes in Spark 3.0 to prepare for Scala > 2.13 upgrade. > > Are there breaking changes that would require us to have two different > source code for 2.12 vs 2.13? > > > On Fri, May 10, 2019 at 11:41 AM, Sean Owen wrote: > >> While that's not happ

Interesting implications of supporting Scala 2.13

2019-05-10 Thread Sean Owen
While that's not happening soon (2.13 isn't out), note that some of the changes to collections will be fairly breaking changes. https://issues.apache.org/jira/browse/SPARK-25075 https://docs.scala-lang.org/overviews/core/collections-migration-213.html Some of this may impact a public API, so may

Re: SparkR latest API docs missing?

2019-05-08 Thread Sean Owen
I think the SparkR release always trails a little bit due to the additional CRAN processes. On Wed, May 8, 2019 at 11:23 AM Shivaram Venkataraman wrote: > > I just noticed that the SparkR API docs are missing at > https://spark.apache.org/docs/latest/api/R/index.html --- It looks > like they

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-03 Thread Sean Owen
Hadoop 3 has not been supported in 2.4.x. 2.12 has been since 2.4.0, and 2.12 artifacts have always been released where available. What are you referring to? On Fri, May 3, 2019 at 9:28 AM antonkulaga wrote: > > Can you prove release version for Hadoop 3 and Scala 2.12 this time? >

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-01 Thread Sean Owen
+1 from me. There is little change from 2.4.2 anyway, except for the important change to the build script that should build pyspark with Scala 2.11 jars. I verified that the package contains the _2.11 Spark jars, but have a look! I'm still getting this weird error from the Kafka module when

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-29 Thread Sean Owen
I think this is a reasonable idea; I know @vanzin had suggested it was simpler to use the latest in case a bug was found in the release script and then it could just be fixed in master rather than back-port and re-roll the RC. That said I think we did / had to already drop the ability to build <=

Re: Spark build can't find javac

2019-04-29 Thread Sean Owen
Your JAVA_HOME is pointing to a JRE rather than JDK installation. Or you've actually installed the JRE. Only the JDK has javac, etc. On Mon, Apr 29, 2019 at 4:36 PM Shmuel Blitz wrote: > Hi, > > Trying to build Spark on Manjaro with OpenJDK version 1.8.0_212, and I'm > getting the following

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
brew formula depends on the apache-spark Homebrew > formula. > > Using Scala 2.12 in the binary distribution for Spark 2.4.2 was > unintentional and never voted on. There was a successful vote to default > to Scala 2.12 in Spark version 3.0. > >michael > > > On

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
To be clear, what's the nature of the problem there... just Pyspark apps that are using a Scala-based library? Trying to make sure we understand what is and isn't a problem here. On Fri, Apr 26, 2019 at 9:44 AM Michael Heuer wrote: > This will also cause problems in Conda builds that depend on

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-26 Thread Sean Owen
Re: .NET, what's the particular issue in there that it's causing? 2.4.2 still builds for 2.11. I'd imagine you'd be pulling dependencies from Maven central (?) or if needed can build for 2.11 from source. I'm more concerned about pyspark because it builds in 2.12 jars. On Fri, Apr 26, 2019 at

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Sean Owen
One minor comment: for 2.4.1 we had a couple JIRAs marked 'release-notes': https://issues.apache.org/jira/browse/SPARK-27198?jql=project%20%3D%20SPARK%20and%20fixVersion%20%20in%20(2.4.1%2C%202.4.2)%20and%20labels%20%3D%20%27release-notes%27 They should be mentioned in

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-20 Thread Sean Owen
+1 from me too. It seems like there is support for merging the Jackson change into 2.4.x (and, I think, a few more minor dependency updates) but this doesn't have to go into 2.4.2. That said, if there is another RC for any reason, I think we could include it. Otherwise can wait for 2.4.3. On

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
/SPARK-27469 https://issues.apache.org/jira/browse/SPARK-27470 On Fri, Apr 19, 2019 at 11:13 AM Sean Owen wrote: > > All: here is the backport of changes to update to 2.9.8 from master back to > 2.4. > https://github.com/apache/spark/pull/24418 > > master has been on 2.9

Re: Spark 2.4.2

2019-04-19 Thread Sean Owen
made earlier today in a PR... >>> >>> (Context- in many cases Spark has light or indirect dependencies but >>> bringing them into the process breaks users code easily) >>> >>> >>> >>> From: Michael Heuer

Re: Spark 2.4.2

2019-04-18 Thread Sean Owen
there might have an opinion. I'm not pushing for it necessarily. On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin wrote: > > For Jackson - are you worrying about JSON parsing for users or internal Spark > functionality breaking? > > On Wed, Apr 17, 2019 at 6:02 PM Sean Owen wrote: >

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
SPARK-25250. Is there any other > ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the > release process today (CST). > > Thanks, > Wenchen > > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen wrote: >> >> I think the 'only backport bug fixes to bran

Re: Spark 2.4.2

2019-04-17 Thread Sean Owen
I think the 'only backport bug fixes to branches' principle remains sound. But what's a bug fix? Something that changes behavior to match what is explicitly supposed to happen, or implicitly supposed to happen -- implied by what other similar things do, by reasonable user expectations, or simply

Re: JDK vs JRE in Docker Images

2019-04-17 Thread Sean Owen
I confess I don't know, but I don't think scalac or janino need javac and related tools, and those are the only things that come to mind. If the tests pass without a JDK, that's good evidence. On Wed, Apr 17, 2019 at 8:49 AM Rob Vesse wrote: > > Folks > > > > For those using the Kubernetes

Re: pyspark.sql.functions ide friendly

2019-04-17 Thread Sean Owen
I use IntelliJ and have never seen an issue parsing the pyspark functions... you're just saying the linter has an optional inspection to flag it? just disable that? I don't think we want to complicate the Spark code just for this. They are declared at runtime for a reason. On Wed, Apr 17, 2019 at

Is there value in publishing nightly snapshots?

2019-04-16 Thread Sean Owen
I noticed recently ... https://github.com/apache/spark-website/pull/194/files#diff-d95d573366135f01d4fbae2d64522500R466 ... that we stopped publishing nightly releases a long while ago. That's fine. What about turning off the job that builds -SNAPSHOTs of the artifacts each night? does anyone

Re: Antlr plugin for sql/catalyst project

2019-04-14 Thread Sean Owen
aven 3.3.9 while Spark's >> master branch requires maven 3.6.0. Be honest, failing an action sliently >> should be an IntelliJ bug. But a note to guide spark developers to set the >> maven version for IntelliJ should be helpful. I just create a JIRA >> (https://is

Re: Antlr plugin for sql/catalyst project

2019-04-14 Thread Sean Owen
For IntelliJ, in the Maven pane, there's a button to generate all sources and resources that the build creates. That's the easier option. You can open a PR to add a note about it along with other docs for IntelliJ users. On Sun, Apr 14, 2019 at 4:24 AM William Wong wrote: > > Dear all, > > I

Re: Which parts of a parquet read happen on the driver vs the executor?

2019-04-11 Thread Sean Owen
Spark is a distributed compute framework of course, so things you do with Spark operations like map, filter, groupBy, etc do not happen on the driver. The function is serialized to the executors. The error here just indicates you are making some function that references things that can't be

Raise Jenkins test timeout? with alternatives

2019-04-11 Thread Sean Owen
I have a big PR that keeps failing because it his the 300 minute build timeout: https://github.com/apache/spark/pull/24314 https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4703/console It's because it touches so much code that all tests run including things like Kinesis. It

Re: [SPARK-25079] moving from python 3.4 to python 3.6.8, impacts all active branches

2019-04-10 Thread Sean Owen
ckporting. > > the bigger question i have is how far back, branch-wise, are we willing to > support w/regards to tests? > > On Wed, Apr 10, 2019 at 12:16 PM Sean Owen wrote: >> >> In theory Spark 2.4 supports Python 3.4; would this mean it's now just >> tested vs 3.6? that's

Re: [SPARK-25079] moving from python 3.4 to python 3.6.8, impacts all active branches

2019-04-10 Thread Sean Owen
In theory Spark 2.4 supports Python 3.4; would this mean it's now just tested vs 3.6? that's not out of the question, but can the older branches continue to test on older versions or is that super complex? On Wed, Apr 10, 2019 at 1:37 PM shane knapp wrote: > > details here (see most recent

Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Sean Owen
Hadoop 3 isn't supported yet, not quite even in master. I think the profile there exists for testing at the moment. Others may know a way that it can work but don't think it would out of the box. On Fri, Apr 5, 2019 at 12:53 PM akirillov wrote: > > Hi there! I'm trying to run Spark unit tests

Re: Spark 2.4.0 tests fail with hadoop-3.1 profile: NoClassDefFoundError org.apache.hadoop.hive.conf.HiveConf

2019-04-05 Thread Sean Owen
Yes, you can try it, though I doubt that will 100% work. Have a look at the "hadoop 3" JIRAs and PRs still in progress on master. On Fri, Apr 5, 2019 at 1:14 PM Anton Kirillov wrote: > > Marcelo, Sean, thanks for the clarification. So in order to support Hadoop 3+ > the preferred way would be

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Sean Owen
On Tue, Apr 2, 2019 at 12:23 PM Vinoo Ganesh wrote: > @Sean – To the point that Ryan made, it feels wrong that stopping a session > force stops the global context. Building in the logic to only stop the > context when the last session is stopped also feels like a solution, but the > best way I

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Sean Owen
g a context would, well, stop the context. > > If stopping the session is expected to stop the context, what's the intended > usage of clearing the active / default session? > > Vinoo > > On 4/2/19, 10:57, "Sean Owen" wrote: > > What are you expecting ther

Re: Closing a SparkSession stops the SparkContext

2019-04-02 Thread Sean Owen
What are you expecting there ... that sounds correct? something else needs to be closed? On Tue, Apr 2, 2019 at 9:45 AM Vinoo Ganesh wrote: > > Hi All - > >I’ve been digging into the code and looking into what appears to be a > memory leak (https://jira.apache.org/jira/browse/SPARK-27337)

Re: Do you use single-quote syntax for the DataFrame API?

2019-03-31 Thread Sean Owen
FWIW I use "foo" in Pyspark or col("foo") where necessary, and $"foo" in Scala On Sun, Mar 31, 2019 at 1:58 AM Reynold Xin wrote: > As part of evolving the Scala language, the Scala team is considering > removing single-quote syntax for representing symbols. Single-quote syntax > is one of the

Re: [Spark SQL]: looking for place operators apply on the dataset / dataframe

2019-03-28 Thread Sean Owen
I'd suggest loading the source in an IDE if you want to explore the code base. It will let you answer this in one click. Here it's Dataset, as a DataFrame is a Dataset[Row]. On Thu, Mar 28, 2019 at 9:21 AM ehsan shams wrote: > > Hi > > I would like to know where exactly(which class/function)

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-27 Thread Sean Owen
+1 from me - same as last time. On Wed, Mar 27, 2019 at 1:31 PM DB Tsai wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.1. > > The vote is open until March 30 PST and passes if a majority +1 PMC votes are > cast, with > a minimum of 3 +1 votes. > > [ ]

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Sean Owen
I don't know a lot about Arrow here, but seems reasonable. Is this for Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3 seems right. On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon wrote: > > Hi all, > > We really need to upgrade the minimal version soon. It's actually slowing >

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Sean Owen
ease? Thanks! > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > > On Sun, Mar 24, 2019 at 8:19 PM Sean Owen wrote: > > > > Still waiting on a successful test - hope

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Sean Owen
This last test failed again, but, I claim we've actually seen it pass: https://github.com/apache/spark/pull/24126#issuecomment-476410462 Would anybody else endorse merging it into 2.4 to proceed? I'll kick of one more test for good measure. On Mon, Mar 25, 2019 at 4:33 PM Sean Owen wrote

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Sean Owen
-- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > > On Sun, Mar 24, 2019 at 8:19 PM Sean Owen wrote: > > > > Still waiting on a successful test - hope this one works. > > > > On Sun, Mar 24, 2019, 10:13 PM DB Tsai wrote: > >

Scala 2.11 support removed for Spark 3.0.0

2019-03-25 Thread Sean Owen
I merged https://github.com/apache/spark/pull/23098 . "-Pscala-2.11" won't work anymore in master. I think this shouldn't be a surprise or disruptive as 2.12 is already the default. The change isn't big and I think pretty reliable, but keep an eye out for issues. Shane you are welcome to remove

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-24 Thread Sean Owen
erely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 42E5B25A8F7A82C1 > > On Sat, Mar 23, 2019 at 12:04 PM Sean Owen wrote: > > > > I think we can/should get in SPARK-26961 too; it's all but ready to > commit. > > > &

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-23 Thread Sean Owen
7178] > > > https://github.com/apache/spark/commit/342e91fdfa4e6ce5cc3a0da085d1fe723184021b > > > > > > Is problematic too and it’s not in the rc8 cut > > > > > > https://github.com/apache/spark/commits/branch-2.4 > > > > > > (Personally I don’t want t

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Sean Owen
+1 for this RC. The tag is correct, licenses and sigs check out, tests of the source with most profiles enabled works for me. On Tue, Mar 19, 2019 at 5:28 PM DB Tsai wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.1. > > The vote is open until March 23

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Sean Owen
ink while checking: > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1 > I don't know what the right url looks like, but we should fix it. > > On Wed, Mar 20, 2019 at 9:14 AM Stavros Kontopoulos < > stavros.kontopou...@lightbend.com> wrote: > >> +1 (non-bindin

<    1   2   3   4   5   6   7   8   9   10   >