Re: [DISCUSSION] Avoiding duplicate work

2020-02-21 Thread Sean Owen
We've avoided using Assignee because it implies that someone 'owns' resolving the issue, when we want to keep it collaborative, and many times in the past someone would ask to be assigned and then didn't follow through. You can comment on the JIRA to say "I'm working on this" but that has the

Re: SparkGraph review process

2020-02-14 Thread Sean Owen
This will not be Spark 3.0, no. On Fri, Feb 14, 2020 at 1:12 AM kant kodali wrote: > > any update on this? Is spark graph going to make it into Spark or no? > > On Mon, Oct 14, 2019 at 12:26 PM Holden Karau wrote: >> >> Maybe let’s ask the folks from Lightbend who helped with the previous scala

Re: Apache Spark Docker image repository

2020-02-11 Thread Sean Owen
To be clear this is a convenience 'binary' for end users, not just an internal packaging to aid the testing framework? There's nothing wrong with providing an additional official packaging if we vote on it and it follows all the rules. There is an open question about how much value it adds vs

Re: Incorrect param in Doc ref url:https://spark.apache.org/docs/latest/ml-datasource

2020-02-08 Thread Sean Owen
To be clear, you're referring to the Python version of the example. Yes it should be True. Can you open a pull request to fix it? the docs are under docs/ in the apache/spark github repo. That's how we normally take fixes. On Sat, Feb 8, 2020 at 9:14 AM Tanay Banerjee wrote: > > >> Hi Team, >>

Re: Apache Spark Docker image repository

2020-02-05 Thread Sean Owen
What would the images have - just the image for a worker? We wouldn't want to publish N permutations of Python, R, OS, Java, etc. But if we don't then we make one or a few choices of that combo, and then I wonder how many people find the image useful. If the goal is just to support Spark testing,

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-04 Thread Sean Owen
+1 from me too. Same outcome as in RC1 for me. On Sun, Feb 2, 2020 at 9:31 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.5. > > The vote is open until February 5th 11PM PST and passes if a majority +1 PMC > votes are cast, with a

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Sean Owen
alein" test > failures again > > Then, after the regular RC preparation testing including the manual > integration tests, > I can roll 2.4.5 RC2 next Monday (Feb. 3rd, PST) and all late blocker patches > will block 2.4.6 instead of causing RC failure. > > Bests, > Dong

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Sean Owen
> > Bests, > Dongjoon. > > > On Wed, Jan 29, 2020 at 9:56 AM Sean Owen wrote: > >> OK what if anything is in question for 2.4.5? I don't see anything open >> and targeted for it. >> Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - >>

Re: Spark 2.4.5 RC2 Preparation Status

2020-01-29 Thread Sean Owen
OK what if anything is in question for 2.4.5? I don't see anything open and targeted for it. Are we talking about https://issues.apache.org/jira/browse/SPARK-28344 - targeted for 2.4.5 but not backported, and a 'correctness' issue? Simply: who argues this must hold up 2.4.5, and if so what's the

Re: [DISCUSS][SPARK-30275] Discussion about whether to add a gitlab-ci.yml file

2020-01-23 Thread Sean Owen
Yeah the color on this is that 'snapshot' or 'nightly' builds are not quite _discouraged_ by the ASF, but need to be something only devs are likely to find and clearly signposted, because they aren't official blessed releases. It gets into a gray area if the project is 'officially' hosting a way

Adding Maven Central mirror from Google to the build?

2020-01-21 Thread Sean Owen
See https://github.com/apache/spark/pull/27307 for some context. We've had to add, in at least one place, some settings to resolve artifacts from a mirror besides Maven Central to work around some build problems. Now, we find it might be simpler to just use this mirror as the primary repo in the

Re: Spark master build hangs using parallel build option in maven

2020-01-18 Thread Sean Owen
thriftserver >> -Phive-provided -Pyarn -Phadoop-provided -Dhadoop.version=2.8.5 >> -DskipTests=true -T 4 clean package >> >> Also I have seen the maven version is changed from 3.5.4 to 3.6.3 in master >> branch compared to spark 2.4.3. >> Not sure if it's

Re: Spark master build hangs using parallel build option in maven

2020-01-17 Thread Sean Owen
I don't believe you can use a parallel build indeed. Some things collide with each other. Some of the suites are run in parallel inside the build though already. On Fri, Jan 17, 2020 at 1:23 PM Saurabh Chawla wrote: > > Hi All, > > Spark master build hangs using parallel build option in maven.

Re: [FYI] SBT Build Failure

2020-01-16 Thread Sean Owen
Ah. The Maven build already long since points at https:// for resolution for security. I tried just overriding the resolver for the SBT build, but it doesn't seem to work. I don't understand the SBT build well enough to debug right now. I think it's possible to override resolvers with local config

Re: More publicly documenting the options under spark.sql.*

2020-01-14 Thread Sean Owen
Some of it is intentionally undocumented, as far as I know, as an experimental option that may change, or legacy, or safety valve flag. Certainly anything that's marked an internal conf. (That does raise the question of who it's for, if you have to read source to find it.) I don't know if we need

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-14 Thread Sean Owen
ven` clean build? > > Bests, > Dongjoon. > > > On Tue, Jan 14, 2020 at 6:40 AM Sean Owen wrote: >> >> +1 from me. I checked sigs/licenses, and built/tested from source on >> Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver >> -Phadoop-2.7 -Pmesos

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-14 Thread Sean Owen
+1 from me. I checked sigs/licenses, and built/tested from source on Java 8 + Ubuntu 18.04 with " -Pyarn -Phive -Phive-thriftserver -Phadoop-2.7 -Pmesos -Pkubernetes -Psparkr -Pkinesis-asl". I do get test failures, but, these are some I have always seen on Ubuntu, and I do not know why they

Re: Build error: python/lib/pyspark.zip is not a ZIP archive

2020-01-10 Thread Sean Owen
Sounds like you might have some corrupted file locally. I don't see any of the automated test builders failing. Nuke your local assembly build and try again? On Fri, Jan 10, 2020 at 3:49 PM Jeff Evans wrote: > > Greetings, > > I'm getting an error when building, on latest master (2bd873181 as of

Re: Issues with Delta Lake on 3.0.0 preview + preview 2

2019-12-30 Thread Sean Owen
It looks like delta calls org.apache.spark.Utils, which is technically a private class in Spark. The signature of Utils.classForName changed (in the bytecode) to take two more params. Either delta would have to cross-compile for Spark 2 vs 3, or, needs to avoid calling Utils, or we can add a small

Re: Issue with map Java lambda function with 3.0.0 preview and preview 2

2019-12-28 Thread Sean Owen
Yes, it's necessary to cast the lambda in Java as (MapFunction) in many cases. This is because the Scala-specific and Java-specific versions of .map() both end up accepting a function object that the lambda can match, and an Encoder. What I'd have to go back and look up is why that would be

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2019-12-24 Thread Sean Owen
Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary but indeed this has been in progress for a while, and there's a downside to not releasing it, to making the gap to 3.0 larger. On my end I don't know of anything that's holding up a release; is it basically DSv2? BTW these

Re: [VOTE] SPARK 3.0.0-preview2 (RC2)

2019-12-17 Thread Sean Owen
Same result as last time. It all looks good and tests pass for me on Ubuntu with all profiles enables (Hadoop 3.2 + Hive 2.3), building from source. 'pyspark-3.0.0.dev2.tar.gz' appears to be the desired python artifact name, yes. +1 On Tue, Dec 17, 2019 at 12:36 AM Yuming Wang wrote: > > Please

Re: Running Spark through a debugger

2019-12-16 Thread Sean Owen
I just make a new test suite or something, set breakpoints, and execute it in IJ. That generally works fine. You may need to set the run configuration to have the right working dir (Spark project root), and set the right system property to say 'this is running in a test' in some cases. What are

Re: Do we need to finally update Guava?

2019-12-16 Thread Sean Owen
ded.) > > On Sun, Dec 15, 2019 at 8:08 AM Sean Owen wrote: > > > > See for example: > > > > https://github.com/apache/spark/pull/25932#issuecomment-565822573 > > https://issues.apache.org/jira/browse/SPARK-23897 > > > > This is a dicey dependency th

Re: Do we need to finally update Guava?

2019-12-16 Thread Sean Owen
rk. But since Spark uses > some test artifacts from Hadoop, that may be a bit tricky, since I > don't believe those are shaded.) > > On Sun, Dec 15, 2019 at 8:08 AM Sean Owen wrote: > > > > See for example: > > > > https://github.com/apache/spark/pull/25932#issuecomme

Do we need to finally update Guava?

2019-12-15 Thread Sean Owen
See for example: https://github.com/apache/spark/pull/25932#issuecomment-565822573 https://issues.apache.org/jira/browse/SPARK-23897 This is a dicey dependency that we have been reluctant to update as a) Hadoop used an old version and b) Guava versions are incompatible after a few releases. But

Re: Spark 2.4.4 with which version of Hadoop?

2019-12-11 Thread Sean Owen
My moderately informed take is that the "Hadoop 2.7" build is really a "Hadoop 2.x" build and AFAIK should work with 2.8, 2.9, but, I certainly haven't tested it nor have the PR builders. Just use the "Hadoop provided" build on your env. Of course, you might well want to use Hadoop 3.x (3.2.x

Re: I would like to add JDBCDialect to support Vertica database

2019-12-11 Thread Sean Owen
It's probably OK, IMHO. The overhead of another dialect is small. Are there differences that require a new dialect? I assume so and might just be useful to summarize them if you open a PR. On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger wrote: > > Hi, I am a Vertica support engineer, and we have

Re: Release Apache Spark 2.4.5 and 2.4.6

2019-12-09 Thread Sean Owen
Sure, seems fine. The release cadence slows down in a branch over time as there is probably less to fix, so Jan-Feb 2020 for 2.4.5 and something like middle or Q3 2020 for 2.4.6 is a reasonable expectation. It might plausibly be the last 2.4.x release but who knows. On Mon, Dec 9, 2019 at 12:29

Re: Spark 3.0 preview release 2?

2019-12-09 Thread Sean Owen
Seems fine to me of course. Honestly that wouldn't be a bad result for a release candidate, though we would probably roll another one now. How about simply moving to a release candidate? If not now then at least move to code freeze from the start of 2020. There is also some downside in pushing out

Re: SQL test failures in PR builder?

2019-12-08 Thread Sean Owen
.sh mv run-tests-jenkins dev/ mv run-tests-codes.sh dev/ chmod 755 dev/run-tests-jenkins chmod 755 dev/run-tests-codes.sh fi ./dev/run-tests-jenkins On Wed, Dec 4, 2019 at 5:53 PM Shane Knapp wrote: > > ++yin huai for more insight in to the NewSparkPullRequestBuilder job... >

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Sean Owen
. just updating commit statuses. I only ask because > I remember permissions were an issue in the past when discussing tooling > like this. > > In any case, I'd be happy to submit a PR adding this in if there are no > concerns. We can hash out the details on the PR. > > On Fri, Dec 6,

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Sean Owen
I think we can add Actions, right? they're used for the newer tests in Github? I'm OK closing PRs inactive for a 'long time', where that's maybe 6-12 months or something. It's standard practice and doesn't mean it can't be reopened. Often the related JIRA should be closed as well but we have done

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Sean Owen
la:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spar

Re: Is it feasible to build and run Spark on Windows?

2019-12-05 Thread Sean Owen
What was the build error? you didn't say. Are you sure it succeeded? Try running from the Spark home dir, not bin. I know we do run Windows tests and it appears to pass tests, etc. On Thu, Dec 5, 2019 at 3:28 PM Ping Liu wrote: > > Hello, > > I understand Spark is preferably built on Linux. But

Re: SQL test failures in PR builder?

2019-12-04 Thread Sean Owen
e, but the root cause might be different from this > one. > > BTW, to reduce the scope of investigation, could you try with `[hive-1.2]` > tag in your PR? > > Bests, > Dongjoon. > > > On Wed, Dec 4, 2019 at 6:29 AM Sean Owen wrote: >> >> I'm seeing consistent

SQL test failures in PR builder?

2019-12-04 Thread Sean Owen
I'm seeing consistent failures in the PR builder when touching SQL code: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4960/testReport/ org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.Spark's own GetSchemasOperation(SparkGetSchemasOperation)14 ms2

Status of Scala 2.13 support

2019-12-01 Thread Sean Owen
As you can see, I've been working on Scala 2.13 support. The umbrella is https://issues.apache.org/jira/browse/SPARK-25075 I wanted to lay out status and strategy. This will not be done for 3.0. At the least, there are a few key dependencies (Chill, Kafka) that aren't published for 2.13, and at

Re: Can't build unidoc

2019-11-29 Thread Sean Owen
I'm not seeing that error for either command. Try blowing away your local .ivy / .m2 dir? On Fri, Nov 29, 2019 at 11:48 AM Nicholas Chammas wrote: > > Howdy folks. Running `./build/sbt unidoc` on the latest master is giving me > this trace: > > ``` > [warn]

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Sean Owen
How big is the overhead, at scale? If it has a non-trivial effect for most jobs, I could imagine reusing the existing approximate quantile support to more efficiently find a pretty-close median. On Wed, Nov 27, 2019 at 3:55 AM Jungtaek Lim wrote: > > Hi Spark devs, > > The change might be

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Sean Owen
Without knowing much about it, I have had the same question. How much is how important about this to justify the effort? One particular negative effect has been that new postgresql tests add well over an hour to tests, IIRC. So, tend to agree about drawing any reasonable line on compatibility and

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Sean Owen
I haven't been following this closely, but I'm aware that there are some tricky compatibility problems between Avro and Parquet, both of which are used in Spark. That's made it pretty hard to update in 2.x. master/3.0 is on Parquet 1.10.1 and Avro 1.8.2. Just a general question: is that the best

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-20 Thread Sean Owen
Yes, good point. A user would get whatever the POM says without profiles enabled so it matters. Playing it out, an app _should_ compile with the Spark dependency marked 'provided'. In that case the app that is spark-submit-ted is agnostic to the Hive dependency as the only one that matters is

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Sean Owen
Should Hadoop 2 + Hive 2 be considered to work on JDK 11? I wasn't sure if 2.7 did, but honestly I've lost track. Anyway, it doesn't matter much as the JDK doesn't cause another build permutation. All are built targeting Java 8. I also don't know if we have to declare a binary release a default.

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Sean Owen
Same idea? support this combo in 3.0 and then remove Hadoop 2 support in 3.1 or something? or at least make them non-default, not necessarily publish special builds? On Tue, Nov 19, 2019 at 4:53 PM Dongjoon Hyun wrote: > For additional `hadoop-2.7 with Hive 2.3` pre-built distribution, how do

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Sean Owen
I would favor moving whatever we can to Github. It's difficult to modify the Jenkins instances without Shane's valiant help, and over time makes more sense to modernize and integrate it into the project. On Tue, Nov 19, 2019 at 3:35 PM Dongjoon Hyun wrote: > > Hi, All. > > Apache Spark community

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Sean Owen
Just to clarify, as even I have lost the details over time: hadoop-2.7 works with hive-2.3? it isn't tied to hadoop-3.2? Roughly how much risk is there in using the Hive 1.x fork over Hive 2.x, for end users using Hive via Spark? I don't have a strong opinion, other than sharing the view that we

Re: Ask for ARM CI for spark

2019-11-17 Thread Sean Owen
as suggested. Sean On Sun, Nov 17, 2019 at 8:01 PM Tianhua huang wrote: > > @Sean Owen, > I'm afraid I don't agree with you this time, I still remember no one can tell > me whether Spark supports ARM or how much Spark can support ARM when I asked > this first time on Dev@, you're very k

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-16 Thread Sean Owen
I'd prefer simply not making Hadoop 3 the default until 3.1+, rather than introduce yet another build combination. Does Hadoop 2 + Hive 2 work and is there demand for it? On Sat, Nov 16, 2019 at 3:52 AM Wenchen Fan wrote: > > Do we have a limitation on the number of pre-built distributions?

Re: Ask for ARM CI for spark

2019-11-15 Thread Sean Owen
, 2019 at 9:01 PM bo zhaobo wrote: > > Hi @Sean Owen , > > Thanks for reply. We know that Spark community has own release date and plan. > We are happy to follow Spark community. But we think it's great if community > could add a sentence into the next releasenotes and claim "S

Re: Ask for ARM CI for spark

2019-11-14 Thread Sean Owen
I don't quite understand. You are saying tests don't pass yet, so why would anyone yet run these tests regularly? If it's because the instances aren't fast enough, use bigger instances? I don't think anyone would create a separate release of Spark for ARM, no. But why would that be necessary? On

Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Sean Owen
Let's suggest "SPARK-12345:" but not go back and change a bunch of test cases. I'd add this only when a test specifically targets a certain issue. It's a nice-to-have, not super essential, just because in the rare case you need to understand why a test asserts something, you can go back and find

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Sean Owen
se: Your index contains uncommitted changes. > error: please commit or stash them. > > On Fri, Nov 8, 2019 at 10:17 AM Sean Owen wrote: > > > > Hm, the last change was on Oct 1, and should have actually helped it > > still work with Python 2: > > https://github.com

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Sean Owen
Hm, the last change was on Oct 1, and should have actually helped it still work with Python 2: https://github.com/apache/spark/commit/2ec3265ae76fc1e136e44c240c476ce572b679df#diff-c321b6c82ebb21d8fd225abea9b7b74c Hasn't otherwise changed in a while. What's the error? On Fri, Nov 8, 2019 at 11:37

Re: [DISCUSS] Expensive deterministic UDFs

2019-11-07 Thread Sean Owen
Interesting, what does non-deterministic do except have this effect? aside from the naming, it could be a fine use of this flag if that's all it effectively does. I'm not sure I'd introduce another flag with the same semantics just over naming. If anything 'expensive' also isn't the right word,

Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

2019-11-06 Thread Sean Owen
You asked for an inner join but it turned into a cross-join. This might be surprising, hence the error you can disable. The query is not invalid in any case. It's just stopping you from doing something you may not meant to, and which may be expensive. However I think we've already changed the

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-10-31 Thread Sean Owen
This isn't a big thing, but I see that the pyspark build includes Hadoop 2.7 rather than 3.2. Maybe later we change the build to put in 3.2 by default. Otherwise, the tests all seems to pass with JDK 8 / 11 with all profiles enabled, so I'm +1 on it. On Thu, Oct 31, 2019 at 1:00 AM Xingbo Jiang

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Sean Owen
, you stand a chance of Java 6-era code still running on Java 14. On Thu, Oct 31, 2019 at 4:14 PM Cody Koeninger wrote: > > On Wed, Oct 30, 2019 at 5:57 PM Sean Owen wrote: > > > Or, frankly, maybe Scala should reconsider the mutual incompatibility > > between minor releases

Fwd: [apache/spark] [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ (#26332)

2019-10-30 Thread Sean Owen
) To: apache/spark Cc: Sean Owen , Assign *Test build #112974 has started <https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112974/testReport>* for PR 26332 at commit aefde48 <https://github.com/apache/spark/commit/aefde48a30942f94670f634cbeab98e23749c283> . — You a

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-30 Thread Sean Owen
I agree that we need a Pyspark release for this preview release. If it's a matter of producing it from the same tag, we can evaluate it within this same release candidate. Otherwise, just roll another release candidate. I was able to build it and pass all tests with JDK 8 and JDK 11 (hadoop-3.2

Re: Packages to release in 3.0.0-preview

2019-10-30 Thread Sean Owen
I don't agree with this take. The bottleneck is pretty much not Spark -- it is all of its dependencies, and there are unfortunately a lot. For example, Chill (among other things) doesn't support 2.13 yet. I don't think 2.13 is that 'mainstream' yet. We are not close to Scala 2.13 support, so it

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Sean Owen
I'm OK with that, but don't have a strong opinion nor info about the implications. That said my guess is we're close to the point where we don't need to support Hadoop 2.x anyway, so, yeah. On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun wrote: > > Hi, All. > > There was a discussion on publishing

Re: Spark 3.0 and S3A

2019-10-28 Thread Sean Owen
There will be a "Hadoop 3.x" version of 3.0, as it's essential to get a JDK 11-compatible build. you can see the hadoop-3.2 profile. hadoop-aws is pulled in in the hadoop-cloud module I believe, so bears checking whether the profile updates the versions there too. On Mon, Oct 28, 2019 at 10:34 AM

Re: Packages to release in 3.0.0-preview

2019-10-27 Thread Sean Owen
t; I don't think JDK 11 is a separate release (by design). We build >> > everything targeting JDK 8 and it should work on JDK 11 too. >> +1. a single package working on both jvms looks nice. >> >> >> On Sat, Oct 26, 2019 at 4:18 AM Sean Owen wrote: >>> >>> I d

Re: Packages to release in 3.0.0-preview

2019-10-25 Thread Sean Owen
I don't think JDK 11 is a separate release (by design). We build everything targeting JDK 8 and it should work on JDK 11 too. So, just two releases, but, frankly I think we soon need to stop multiple releases for multiple Hadoop versions, and stick to Hadoop 3. I think it's fine to try to release

Re: Minimum JDK8 version

2019-10-24 Thread Sean Owen
latest/#downloading >> >> btw, any other project announcing the minimum support jdk version? >> It seems that hadoop does not. >> >> On Fri, Oct 25, 2019 at 6:51 AM Sean Owen wrote: >>> >>> Probably, but what is the difference that makes it d

Re: Minimum JDK8 version

2019-10-24 Thread Sean Owen
Probably, but what is the difference that makes it different to support u81 vs later? On Thu, Oct 24, 2019 at 4:39 PM Dongjoon Hyun wrote: > > Hi, All. > > Apache Spark 3.x will support both JDK8 and JDK11. > > I'm wondering if we can have a minimum JDK8 version in Apache Spark 3.0. > >

Re: Unable to resolve dependency of sbt-mima-plugin since yesterday

2019-10-22 Thread Sean Owen
Weird. Let's discuss at https://issues.apache.org/jira/browse/SPARK-29560 On Tue, Oct 22, 2019 at 2:06 PM Xingbo Jiang wrote: > > Hi, > > Do you have any idea why the `./dev/lint-scala` check are failure with the > following message since yesterday ? > >> WARNING: An illegal reflective access

Fwd: [PMCs] Any project news or announcements this week?

2019-10-20 Thread Sean Owen
I wonder if we are likely to have a Spark 3.0 preview release this week? no rush, but if we do, let's CC Sally to maybe mention at ApacheCon. -- Forwarded message - From: Sally Khudairi Date: Sun, Oct 20, 2019 at 4:00 PM Subject: [PMCs] Any project news or announcements this

Re: Meetup Link for Bangalore is broken

2019-10-20 Thread Sean Owen
Sure, can you open a pull request to change https://github.com/apache/spark-website/blob/asf-site/community.md ? See https://github.com/apache/spark-website/blob/asf-site/README.md for how to generate the HTML after the edit. On Sun, Oct 20, 2019 at 1:03 AM Sanjay wrote: > > Greetings from

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-17 Thread Sean Owen
out the risk that we > keep on import new commits and need to resolve more critical bugs thus the > release would never converge. > > Cheers, > > Xingbo > > Sean Owen 于2019年10月16日周三 下午6:34写道: >> >> We do not have to do anything to branch-3.0-preview; it's just

Re: Apache Spark 3.0 timeline

2019-10-16 Thread Sean Owen
I think the branch question is orthogonal but yeah we can probably make an updated statement about 3.0 release. Clearly a preview is imminent. I figure we are probably moving to code freeze late in the year, release early next year? Any better ideas about estimates to publish? They aren't binding.

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

2019-10-16 Thread Sean Owen
of maintainers of that module, I personally > would like to volunteer to maintain Spark Graph. I'm also a contributor to > the Okapi stack and am able to work on whatever issue might come up on that > end including updating dependencies etc. FWIW, Okapi is actively maintained > by a

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-16 Thread Sean Owen
old's > suggestion) > > We can do vote and stabilize `3.0-alpha` in master branch. > > Bests, > Dongjoon. > > > On Wed, Oct 16, 2019 at 3:04 AM Sean Owen wrote: >> >> I don't think we would want to cut 'branch-3.0' right now, which would >> imply that mas

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-16 Thread Sean Owen
I don't think we would want to cut 'branch-3.0' right now, which would imply that master is 3.1. We don't want to merge every new change into two branches. It may still be useful to have `branch-3.0-preview` as a short-lived branch just used to manage the preview release, as we will need to let

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

2019-10-15 Thread Sean Owen
I do not have a very informed opinion here, so take this with a grain of salt. I'd say that we need to either commit a coherent version of this for Spark 3, or not at all. If it doesn't have support, I'd back out the existing changes. I was initially skeptical about how much this needs to be in

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Sean Owen
See the JIRA - this is too open-ended and not obviously just due to choices in data representation, what you're trying to do, etc. It's correctly closed IMHO. However, identifying the issue more narrowly, and something that looks ripe for optimization, would be useful. On Thu, Oct 10, 2019 at

Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Sean Owen
I'd have the same question on the PR - why does this need to be in the Apache Spark project vs where it is now? Yes, it's not a Spark package per se, but it seems like this is a tool for K8S to use Spark rather than a core Spark tool. Yes of course all the packages, licenses, etc have to be

Re: Auto-closing PRs when there are no feedback or response from its author

2019-10-08 Thread Sean Owen
I'm generally all for closing pretty old PRs. They can be reopened easily. Closing a PR (a particular proposal for how to resolve an issue) is less drastic than closing a JIRA (a description of an issue). Closing them just delivers the reality, that nobody is going to otherwise revisit it, and can

Re: [build system] maven master branch builds timing out en masse...

2019-10-07 Thread Sean Owen
Moving the conversation here -- yes, why on earth are they taking this long all of the sudden? we'll have to look again when they come back online. The last successful build took 6 hours, of which 4:45 were the unit tests themselves. It's mostly SQL tests; SQLQuerySuite is approaching an hour.

Committers: if you can't log into JIRA...

2019-09-26 Thread Sean Owen
I hit a few snags with the JIRA LDAP update. In case this saves anyone time: - You have to use your ASF LDAP password now - If your JIRA and ASF IDs aren't the same, file an INFRA JIRA - If it still won't let you log in after answering captchas, try logging in once in Chrome incognito mode -

Re: Urgent : Changes required in the archive

2019-09-26 Thread Sean Owen
The message in question has already been public, and copied to mirrors the ASF does not control, for a year and a half. There is a process for requesting modification to ASF archives, but this case does not qualify: https://www.apache.org/foundation/public-archives.html On Thu, Sep 26, 2019 at

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Sean Owen
I don't know enough about DSv2 to comment on this part, but, any theoretical 2.5 is still a ways off. Does waiting for 3.0 to 'stabilize' it as much as is possible help? I say that because re: Java 11, the main breaking change is probably the Hive 2 / Hadoop 3 dependency, JPMML (minor), as well

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Sean Owen
Narrowly on Java 11: the problem is that it'll take some breaking changes, more than would be usually appropriate in a minor release, I think. I'm still not convinced there is a burning need to use Java 11 but stay on 2.4, after 3.0 is out, and at least the wheels are in motion there. Java 8 is

Re: Spark 3.0 preview release on-going features discussion

2019-09-20 Thread Sean Owen
Is this a list of items that might be focused on for the final 3.0 release? At least, Scala 2.13 support shouldn't be on that list. The others look plausible, or are already done, but there are probably more. As for the 3.0 preview, I wouldn't necessarily block on any particular feature, though,

Re: [build system] weird mvn errors post-cache cleaning

2019-09-17 Thread Sean Owen
That's super weird; can you just delete ~/.m2 and let it download the internet again? or at least blow away the downloaded Kafka dir? Turning it on and off, so to speak, often works. On Tue, Sep 17, 2019 at 2:41 PM Shane Knapp wrote: > > a bunch of the PRB builds are now failing w/various

Re: Thoughts on Spark 3 release, or a preview release

2019-09-14 Thread Sean Owen
I don't think this suggests anything is finalized, including APIs. I would not guess there will be major changes from here though. On Fri, Sep 13, 2019 at 4:27 PM Andrew Melo wrote: > > Hi Spark Aficionados- > > On Fri, Sep 13, 2019 at 15:08 Ryan Blue wrote: >> >> +1 for a preview release. >>

Re: Thoughts on Spark 3 release, or a preview release

2019-09-13 Thread Sean Owen
;>> For JDK11 clean-up, it will meet the timeline and `3.0.0-preview` helps >>>>>> it a lot. >>>>>> >>>>>> After this discussion, can we have some timeline for `Spark 3.0 Release >>>>>> Window` in our versioning-policy

Re: Ask for ARM CI for spark

2019-09-12 Thread Sean Owen
2.6-ubuntu-testing/lastBuild/consoleFull >>> >>> >>> Best regards >>> >>> ZhaoBo >>> >>> [image: Mailtrack] >>> <https://mailtrack.io?utm_source=gmail_medium=signature_campaign=signaturevirality5;

Thoughts on Spark 3 release, or a preview release

2019-09-11 Thread Sean Owen
I'm curious what current feelings are about ramping down towards a Spark 3 release. It feels close to ready. There is no fixed date, though in the past we had informally tossed around "back end of 2019". For reference, Spark 1 was May 2014, Spark 2 was July 2016. I'd expect Spark 2 to last longer,

Re: Resolving all JIRAs affecting EOL releases

2019-09-08 Thread Sean Owen
I think simply closing old issues with no activity in a long time is OK. The "Affected Version" is somewhat noisy, so not even particularly important to also query, but yeah I see some value in trying to limit the scope this way. On Sat, Sep 7, 2019 at 10:15 PM Hyukjin Kwon wrote: > > HI all, >

Re: DataFrameReader bottleneck in DataSource#checkAndGlobPathIfNecessary when reading S3 files

2019-09-06 Thread Sean Owen
I think the problem is calling globStatus to expand all 300K files. This is a general problem for object stores and huge numbers of files. Steve L. may have better thoughts on real solutions. But you might consider, if possible, running a lot of .csv jobs in parallel to query subsets of all the

Re: Why two netty libs?

2019-09-04 Thread Sean Owen
on the CP > > On Tue, Sep 3, 2019 at 5:18 PM Shixiong(Ryan) Zhu > wrote: >> >> Yep, historical reasons. And Netty 4 is under another namespace, so we can >> use Netty 3 and Netty 4 in the same JVM. >> >> On Tue, Sep 3, 2019 at 6:15 AM Sean Owen wrote: &

Re: Schema inference for nested case class issue

2019-09-04 Thread Sean Owen
user@ is the right place for these types of questions. As the error says, you have a case class that defines a schema including columns like 'fix' but these don't appear to be in your DataFrame. It needs to match. On Wed, Sep 4, 2019 at 6:44 AM El Houssain ALLAMI wrote: > > Hi , > > i have

Re: maven 3.6.1 removed from apache maven repo

2019-09-03 Thread Sean Owen
It's because build/mvn only queries ASF mirrors, and they remove non-current releases from mirrors regularly (we do the same). This may help avoid this in the future: https://github.com/apache/spark/pull/25667 On Tue, Sep 3, 2019 at 1:41 PM Xiao Li wrote: > Hi, Tom, > > To unblock the build, I

Re: Why two netty libs?

2019-09-03 Thread Sean Owen
It was for historical reasons; some other transitive dependencies needed it. I actually was just able to exclude Netty 3 last week from master. Spark uses Netty 4. On Tue, Sep 3, 2019 at 6:59 AM Jacek Laskowski wrote: > > Hi, > > Just noticed that Spark 2.4.x uses two netty deps of different

Re: [DISCUSSION]JDK11 for Apache 2.x?

2019-09-01 Thread Sean Owen
1-still-in-safe-hands/ > ). > > On Tue, Aug 27, 2019 at 2:22 PM Sean Owen wrote: >> >> Spark 3 will not require Java 11; it will work with Java 8 too. I >> think the question is whether someone who _wants_ Java 11 should have >> a 2.x release instead of 3.0. >> >> In

Re: Providing a namespace for third-party configurations

2019-08-30 Thread Sean Owen
It's possible, but pretty unlikely to have an exact namespace collision. It's probably a best practice to clearly separates settings, etc that are downstream add-ons into a separate namespace, and I don't mind a sentence in a doc somewhere suggesting a convention, but I really think it's up to

Standardizing test build config

2019-08-28 Thread Sean Owen
I'm surfacing this to dev@ as the right answers may depend on a lot of historical decisions that I don't know about. See https://issues.apache.org/jira/browse/SPARK-28900 for a summary of how the different build configs are set up, and why we might need to standardize them to fully test with JDK

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread Sean Owen
+1 from me again. On Tue, Aug 27, 2019 at 6:06 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.4. > > The vote is open until August 30th 5PM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1

<    1   2   3   4   5   6   7   8   9   10   >