Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-04-27 Thread Sean Owen
These are still blockers for 2.2: SPARK-20501 ML, Graph 2.2 QA: API: New Scala APIs, docs SPARK-20504 ML 2.2 QA: API: Java compatibility, docs SPARK-20503 ML 2.2 QA: API: Python API coverage SPARK-20502 ML, Graph 2.2 QA: API: Experimental, DeveloperApi, final, sealed audit SPARK-20500 ML, Graph

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Sean Owen
+1 , same result as with the last RC. All checks out for me. On Thu, Apr 27, 2017 at 1:29 AM Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.1.1. The vote is open until Sat, April 29th, 2018 at 18:00 PST and > passes

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Sean Owen
+1 from me -- this worked unusually smoothly on the first try. Sigs and license and so forth look OK. Tests pass with Java 8, Ubuntu 17, -Phive -Phadoop-2.7 -Pyarn. I had to run the build with -Xss2m to get this test to pass, but it might be somewhat specific to my env somehow: - SPARK-16845:

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-15 Thread Sean Owen
I don't think this is an example of Java 8 javadoc being more strict; it is not finding classes, not complaining about syntax. (Hyukjin cleaned up all of the javadoc 8 errors in master, and they're different and much more extensive!) It wouldn't necessarily break anything to build with Java 8

Re: Updating

2017-04-13 Thread Sean Owen
The site source is at https://github.com/apache/spark-website/ On Thu, Apr 13, 2017 at 4:20 PM Sam Elamin wrote: > Hey all > > Who do I need to talk to in order to update the Useful developer tools > page ? > > I want to

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-04 Thread Sean Owen
This is maybe a blocker. See my suggested action about voting on the current artifact to, I believe, eliminate the possible blocking part of the issue in the short term. On Tue, Apr 4, 2017, 22:02 Mridul Muralidharan wrote: > Hi, > > >

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-02 Thread Sean Owen
That backport is fine, for another RC even in my opinion, but it's not a regression. It's a JDK bug really. 2.1.0 would have failed too. On Sun, Apr 2, 2017 at 8:20 AM Kazuaki Ishizaki wrote: > -1 (non-binding) > > I tested it on Ubuntu 16.04 and openjdk8 on ppc64le. I got

Re: Should we consider a Spark 2.1.1 release?

2017-03-13 Thread Sean Owen
It seems reasonable to me, in that other x.y.1 releases have followed ~2 months after the x.y.0 release and it's been about 3 months since 2.1.0. Related: creating releases is tough work, so I feel kind of bad voting for someone else to do that much work. Would it make sense to deputize another

Re: Spark Improvement Proposals

2017-03-13 Thread Sean Owen
re about it, if I > really wanted a vote I guess I would -1 the change. I'm not going to do > that. > I would at least like to see an announcement go out about it. The last > thing I saw you say was you were going to call a vote. A few people chimed > in with their thoughts on

Re: Spark Improvement Proposals

2017-03-13 Thread Sean Owen
It's not a new process, in that it doesn't entail anything not already in http://apache.org/foundation/voting.html . We're just deciding to call a VOTE for this type of code modification. To your point -- yes, it's been around a long time with no further comment, and I called several times for

Re: Spark Improvement Proposals

2017-03-13 Thread Sean Owen
; I think a vote here would be good. I think most of the discussion was done > by 4 or 5 people and its a long thread. If nothing else it summarizes > everything and gets people attention to the change. > > Tom > > > On Thursday, March 9, 2017 10:55 AM, Sean Owen <so...@c

Re: Spark Local Pipelines

2017-03-13 Thread Sean Owen
I'm skeptical. Serving synchronous queries from a model at scale is a fundamentally different activity. As you note, it doesn't logically involve Spark. If it has to happen in milliseconds it's going to be in-core. Scoring even 10qps with a Spark job per request is probably a non-starter; think

SPIP docs are live

2017-03-12 Thread Sean Owen
http://spark.apache.org/improvement-proposals.html (Thanks Cody!) We should use this process where appropriate now, and we can refine it further if needed.

Re: Spark Improvement Proposals

2017-03-10 Thread Sean Owen
, Mar 9, 2017 at 4:55 PM Sean Owen <so...@cloudera.com> wrote: > I think a VOTE is over-thinking it, and is rarely used, but, can't hurt. > Nah, anyone can call a vote. This really isn't that formal. We just want to > declare and document consensus. > > I think SPIP is jus

Re: Spark Improvement Proposals

2017-03-09 Thread Sean Owen
I think a VOTE is over-thinking it, and is rarely used, but, can't hurt. Nah, anyone can call a vote. This really isn't that formal. We just want to declare and document consensus. I think SPIP is just a remix of existing process anyway, and don't think it will actually do much anyway, which is

Re: Spark Improvement Proposals

2017-03-07 Thread Sean Owen
Do we need a VOTE? heck I think anyone can call one, anyway. Pre-flight vote check: anyone have objections to the text as-is? See https://docs.google.com/document/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit# If so let's hash out specific suggest changes. If not, then I think the next

RFC: removing Scala 2.10

2017-03-06 Thread Sean Owen
Another call for comments on removal of Scala 2.10 support, if you haven't already. See https://github.com/apache/spark/pull/17150 http://issues.apache.org/jira/browse/SPARK-19810 I've heard several votes in support and no specific objections at this point, but wanted to make another call to

Re: Straw poll: dropping support for things like Scala 2.10

2017-03-03 Thread Sean Owen
Spitzer <russell.spit...@gmail.com> wrote: > +1 on removing 2.10 > > > On Thu, Mar 2, 2017 at 8:51 AM Koert Kuipers <ko...@tresata.com> wrote: > > given the issues with scala 2.10 and java 8 i am in favor of dropping > scala 2.10 in next release > > On Sat,

Re: Spark Improvement Proposals

2017-02-27 Thread Sean Owen
To me, no new process is being invented here, on purpose, and so we should just rely on whatever governs any large JIRA or vote, because SPIPs are really just guidance for making a big JIRA. http://apache.org/foundation/voting.html suggests that PMC members have the binding votes in general, and

Re: Straw poll: dropping support for things like Scala 2.10

2017-02-25 Thread Sean Owen
only supports Scala 2.11 with Spark 2.x Before I open a JIRA, just soliciting opinions. On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <so...@cloudera.com> wrote: > I'd like to gauge where people stand on the issue of dropping support for > a few things that were considered for 2.0. >

Re: In Intellij, maven failed to build Catalyst project

2017-02-20 Thread Sean Owen
I saw this too yesterday but not today. It may have been fixed by some recent commits. On Mon, Feb 20, 2017 at 6:52 PM ron8hu wrote: I am using Intellij IDEA 15.0.6. I used to use Maven to compile Spark project Catalyst inside Intellij without problem. A couple of days

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Sean Owen
I'm not sure what you're specifically suggesting. Of course flaky tests are bad and they should be fixed, and people do. Yes, some are pretty hard to fix because they are rarely reproducible if at all. If you want to fix, fix; there's nothing more to it. I don't perceive flaky tests to be a

Re: Spark Improvement Proposals

2017-02-16 Thread Sean Owen
The text seems fine to me. Really, this is not describing a fundamentally new process, which is good. We've always had JIRAs, we've always been able to call a VOTE for a big question. This just writes down a sensible set of guidelines for putting those two together when a major change is proposed.

Does anyone here run the TheApacheSpark Youtube channel?

2017-02-15 Thread Sean Owen
I just saw https://www.youtube.com/user/TheApacheSpark and wondered who 'owns' it? if it's a quasi-official channel, can we list it on http://spark.apache.org/community.html but then, how does one add videos? If it's the Spark Summit video account, as it seems to be at the moment, it shouldn't be

Re: Request for comments: Java 7 removal

2017-02-14 Thread Sean Owen
milar discussion that went forward in the Druid community. > > > On Fri, Feb 10, 2017 at 8:47 AM Sean Owen <so...@cloudera.com> wrote: > > As you have seen, there's a WIP PR to implement removal of Java 7 support: > https://github.com/apache/spark/pull/16871 > > I have

Re: Request for comments: Java 7 removal

2017-02-10 Thread Sean Owen
are your vision > about bug fix releases for 2.0 and 2.1? > > About python 2.6 > https://www.python.org/download/releases/2.6/ > Python 2.6 (final) was released on October 1st, 2008. > > If supporting python 2.6 has any costs I would definitely remove that. > > Kind regard

Request for comments: Java 7 removal

2017-02-10 Thread Sean Owen
As you have seen, there's a WIP PR to implement removal of Java 7 support: https://github.com/apache/spark/pull/16871 I have heard several +1s at https://issues.apache.org/jira/browse/SPARK-19493 but am asking for concerns too, now that there's a concrete change to review. If this goes in for

Re: PSA: Java 8 unidoc build

2017-02-07 Thread Sean Owen
I believe that if we ran the Jenkins builds with Java 8 we would catch these? this doesn't require dropping Java 7 support or anything. @joshrosen I know we are just now talking about modifying the Jenkins jobs to remove old Hadoop configs. Is it possible to change the master jobs to use Java 8?

Re: Java 9

2017-02-07 Thread Sean Owen
I don't think anyone's tried it. I think we'd first have to agree to drop Java 7 support before that could be seriously considered. The 8-9 difference is a bit more of a breaking change. On Tue, Feb 7, 2017 at 11:44 AM Pete Robbins wrote: > Is anyone working on support for

Re: Google Summer of Code 2017 is coming

2017-02-03 Thread Sean Owen
I have a contrarian opinion on GSoC from experience many years ago in Mahout. Of 3 students I interacted with, 2 didn't come close to completing the work they signed up for. I think it's mostly that students are hungry for the resumé line item, and don't understand the amount of work they're

Remove support for Hadoop 2.5 and earlier?

2017-02-03 Thread Sean Owen
Last year we discussed removing support for things like Hadoop 2.5 and earlier. It was deprecated in Spark 2.1.0. I'd like to go ahead with this, so am checking whether anyone has strong feelings about it. The original rationale for separate Hadoop profile was bridging the significant difference

Re: Typo on spark.apache.org? "cyclic data flow"

2017-01-28 Thread Sean Owen
Certainly a typo -- feel free to make a PR for the spark-website repo. (Might search for other instances of 'cyclic' too) On Sat, Jan 28, 2017 at 7:18 PM Nicholas Chammas wrote: > The tagline on http://spark.apache.org/ says: "Apache Spark has an > advanced DAG

Re: Feedback on MLlib roadmap process proposal

2017-01-25 Thread Sean Owen
On Wed, Jan 25, 2017 at 6:01 AM Ilya Matiach wrote: > My confusion was that the ML 2.2 roadmap critical features ( > https://issues.apache.org/jira/browse/SPARK-18813) did not line up with > the top ML/MLLIB JIRAs by Votes >

Re: Feedback on MLlib roadmap process proposal

2017-01-24 Thread Sean Owen
ache.org/committers.html >but I’m not sure if it is up to date and it doesn’t mention what code the >committers own. It would be useful to know who owns ML or MLLIB. From my >limited personal experience this seems to be Joseph K. Bradley, Yanbo Liang >and Sean Owen. >

Re: MLlib mission and goals

2017-01-24 Thread Sean Owen
My $0.02, which shouldn't be weighted too much. I believe the mission as of Spark ML has been to provide the framework, and then implementation of 'the basics' only. It should have the tools that cover ~80% of use cases, out of the box, in a pretty well-supported and tested way. It's not a goal

Re: Possible bug - Java iterator/iterable inconsistency

2017-01-19 Thread Sean Owen
Yes, confirmed that fixing it unfortunately causes trouble in Java 8. See https://issues.apache.org/jira/browse/SPARK-19287 for further discussion. On Wed, Jan 18, 2017 at 9:00 PM Sean Owen <so...@cloudera.com> wrote: > Hm. Unless I am also totally missing or forgetting something

Re: Possible bug - Java iterator/iterable inconsistency

2017-01-18 Thread Sean Owen
Hm. Unless I am also totally missing or forgetting something, I think you're right. The equivalent in PairRDDFunctions.scala operations on a function from T to TraversableOnce[U] and a TraversableOnce is most like java.util.Iterator. You can work around it by wrapping it in a faked

Re: can someone review my PR?

2017-01-18 Thread Sean Owen
It still doesn't pass tests -- I'd usually not look until that point. On Wed, Jan 18, 2017 at 11:10 AM Steve Loughran wrote: > I've had a PR outstanding on spark/object store integration, works for > both maven and sbt builds > >

Re: GraphX-related "open" issues

2017-01-17 Thread Sean Owen
WontFix or Later is fine. There's not really any practical distinction. I figure that if something times out and is closed, it's very unlikely to be looked at again. Therefore marking it as something to do 'later' seemed less accurate. On Tue, Jan 17, 2017 at 5:30 PM Takeshi Yamamuro

Re: What about removing TaskContext#getPartitionId?

2017-01-15 Thread Sean Owen
the method's a public API, I think it > should be fixed, shouldn't it? > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklask

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Sean Owen
It doesn't strike me as something that's problematic to use. There are a thousand things in the API that, maybe in hindsight, could have been done differently, but unless something is bad practice or superseded by another superior mechanism, it's probably not worth the bother for maintainers or

Re: Why are ml models repartition(1)'d in save methods?

2017-01-13 Thread Sean Owen
you look at ALS you'll see there is no repartitioning since the factor > dataframes can be large > On Fri, 13 Jan 2017 at 19:42, Sean Owen <so...@cloudera.com> wrote: > > You're referring to code that serializes models, which are quite small. > For example a PCA model consist

Re: Why are ml models repartition(1)'d in save methods?

2017-01-13 Thread Sean Owen
as Parquet. On Fri, Jan 13, 2017 at 5:29 PM Asher Krim <ak...@hubspot.com> wrote: > But why is that beneficial? The data is supposedly quite large, > distributing it across many partitions/files would seem to make sense. > > On Fri, Jan 13, 2017 at 12:25 PM, Sean Owen <so..

Re: Why are ml models repartition(1)'d in save methods?

2017-01-13 Thread Sean Owen
That is usually so the result comes out in one file, not partitioned over n files. On Fri, Jan 13, 2017 at 5:23 PM Asher Krim wrote: > Hi, > > I'm curious why it's common for data to be repartitioned to 1 partition > when saving ml models: > >

Re: Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Sean Owen
A-ha. I'll try to clean them up and ask for new JIRAs to be created in some cases. On Fri, Jan 13, 2017 at 4:15 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > FWIW there is an option to Delete the issue (in More -> Delete). > > Shivaram > >

Re: Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Sean Owen
On Fri, Jan 13, 2017 at 4:29 PM Sean Owen <so...@cloudera.com> wrote: > > Do you see a button to resolve other issues? you may not be able to > resolve any of them. I am a JIRA admin though, like most other devs, so > should be able to resolve anything. > > Yes, I ce

Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Sean Owen
Looks like the JIRA maintenance left a bunch of duplicate JIRAs, from SPARK-19191 to SPARK-19202. For some reason, I can't resolve these issues, but I can resolve others. Does anyone else see the same? I know SPARK-19190 was similarly borked but closed by its owner.

Re: A note about MLlib's StandardScaler

2017-01-09 Thread Sean Owen
This could be true if you knew you were just going to scale the input to StandardScaler and nothing else. It's probably more typical you'd scale some other data. The current behavior is therefore the sensible default, because the input is a sample of some unknown larger population. I think it

Quick request: prolific PR openers, review your open PRs

2017-01-04 Thread Sean Owen
Just saw that there are many people with >= 8 open PRs. Some are legitimately in flight but many are probably stale. To set a good example, would (everyone) mind flicking through what they've got open and see if some PRs are stale and should be closed? https://spark-prs.appspot.com/users

Re: repeated unioning of dataframes take worse than O(N^2) time

2016-12-29 Thread Sean Owen
Don't do that. Union them all at once with SparkContext.union On Thu, Dec 29, 2016, 17:21 assaf.mendelson wrote: > Hi, > > > > I have been playing around with doing union between a large number of > dataframes and saw that the performance of the actual union (not the >

Re: shapeless in spark 2.1.0

2016-12-29 Thread Sean Owen
It is breeze, but, what's the option? It can't be excluded. I think this falls in the category of things an app would need to shade in this situation. On Thu, Dec 29, 2016, 16:49 Koert Kuipers wrote: > i just noticed that spark 2.1.0 bring in a new transitive dependency on >

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Sean Owen
PS, here are the open issues for 2.1.0. Forgot this one. No Blockers, but one "Critical": SPARK-16845 org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB SPARK-18669 Update Apache docs regard watermarking in Structured Streaming SPARK-18894 Event time

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Sean Owen
(If you have a template for these emails, maybe update it to use https links. They work for apache.org domains. After all we are asking people to verify the integrity of release artifacts, so it might as well be secure.) (Also the new archives use .tar.gz instead of .tgz like the others. No big

Re: SPARK-17455 Isotonic Regression fix languishing

2016-12-13 Thread Sean Owen
One of many things that gets lost in the shuffle -- it looks pretty straightforward so I will review today. On Tue, Dec 13, 2016 at 4:32 PM nseggert wrote: > I have PR that has been sitting untouched for months. Could someone please > take a look at it? > >

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-09 Thread Sean Owen
Sure, it's only an issue insofar as it may be a flaky test. If it's fixable or disable-able for a possible next RC that could be helpful. On Sat, Dec 10, 2016 at 2:09 AM Shixiong(Ryan) Zhu wrote: > Sean, "stress test for failOnDataLoss=false" is because Kafka consumer >

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-09 Thread Sean Owen
As usual, the sigs / hashes are fine and licenses look fine. I am still seeing some test failures. A few I've seen over time and aren't repeatable, but a few seem persistent. ANyone else observed these? I'm on Ubuntu 16 / Java 8 building for -Pyarn -Phadoop-2.7 -Phive If anyone can confirm I'll

Re: [MLLIB] RankingMetrics.precisionAt

2016-12-06 Thread Sean Owen
read that it returns "the > average precision at the first k ranking positions" I somehow expect there > will ap@k there and a the final output would be MAP@k not average > precision at the k-th position. > > I guess it is not enough sleep. > On 12/06/2016 02:45 AM, Sean Ow

Re: [MLLIB] RankingMetrics.precisionAt

2016-12-05 Thread Sean Owen
I read it again and that looks like it implements mean precision@k as I would expect. What is the issue? On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz wrote: > Hi, > > Could I ask for a fresh pair of eyes on this piece of code: > > >

Re: Difference between netty and netty-all

2016-12-05 Thread Sean Owen
netty should be Netty 3.x. It is all but unused but I couldn't manage to get rid of it: https://issues.apache.org/jira/browse/SPARK-17875 netty-all should be 4.x, actually used. On Tue, Dec 6, 2016 at 12:54 AM Nicholas Chammas wrote: > I’m looking at the list of

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Sean Owen
FWIW I am seeing several test failures, each more than once, but, none are necessarily repeatable. These are likely just flaky tests but I thought I'd flag these unless anyone else sees similar failures: - SELECT a.i, b.i FROM oneToTen a JOIN oneToTen b ON a.i = b.i + 1 *** FAILED ***

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-29 Thread Sean Owen
We still have several blockers for 2.1, so I imagine at least one will mean this won't be the final RC: SPARK-18318 ML, Graph 2.1 QA: API: New Scala APIs, docs SPARK-18319 ML, Graph 2.1 QA: API: Experimental, DeveloperApi, final, sealed audit SPARK-18326 SparkR 2.1 QA: New R APIs and API docs

Re: Two major versions?

2016-11-28 Thread Sean Owen
Yeah, there's no official position on this. BTW see the new home of what info is published on this topic: http://spark.apache.org/versioning-policy.html The answer is indeed that minor releases have a target cadence, but maintenance releases are as-needed, as defined by the release manager's

Re: [SQL][JDBC] Possible regression in JDBC reader

2016-11-25 Thread Sean Owen
See https://github.com/apache/spark/pull/15499#discussion_r89008564 in particular. Hyukjin / Xiao do we need to undo part of this change? On Fri, Nov 25, 2016 at 1:02 PM Takeshi Yamamuro wrote: > Hi, > > Seems we forget to pass `parts:Array[Partition]` into

Re: Scaling issues due to contention in Random

2016-11-25 Thread Sean Owen
SparkPi is just an example, so its performance doesn't really matter. Simpler is better. Kryo could be an issue but that would be a change in Kryo. On Fri, Nov 25, 2016 at 7:30 AM Prasun Ratn wrote: > Hi, > > I am seeing perf degradation in the Spark/Pi example on a

Re: Handling questions in the mailing lists

2016-11-24 Thread Sean Owen
gt; the last 24 hours alone ( > http://stackoverflow.com/questions/tagged/apache-spark?sort=newest=50). > > > > > I believe this should be enough traffic (and the traffic would rise once > quality answers begin to appear). > > > > > > *From:* Sean Owen [via Ap

Re: Handling questions in the mailing lists

2016-11-24 Thread Sean Owen
I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place. On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson

Spark Wiki now migrated to spark.apache.org

2016-11-23 Thread Sean Owen
I completed the migration. You can see the results live right now at http://spark.apache.org, and https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage A summary of the changes: https://issues.apache.org/jira/browse/SPARK-18073 The substance of the changes:

Re: Please limit commits for branch-2.1

2016-11-22 Thread Sean Owen
did send an email out with those information on Nov 1st. It is not meant > to be in new feature development mode anymore. > > FWIW, I will cut an RC today to remind people of that. The RC will fail, > but it can serve as a good reminder. > > On Tue, Nov 22, 2016 at 1:53 AM Se

Re: Please limit commits for branch-2.1

2016-11-22 Thread Sean Owen
Maybe I missed it, but did anyone declare a QA period? In the past I've not seen this, and just seen people start talking retrospectively about how "we're in QA now" until it stops. We have https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage saying it is already over, but clearly we're

Re: MinMaxScaler behaviour

2016-11-21 Thread Sean Owen
It's a degenerate case of course. 0, 0.5 and 1 all make about as much sense. Is there a strong convention elsewhere to use 0? Min/max scaling is the wrong thing to do for a data set like this anyway. What you probably intend to do is scale each image so that its max intensity is 1 and min

Re: issues with github pull request notification emails missing

2016-11-18 Thread Sean Owen
I have seen the same issue from time to time where I couldn't see a person's alias in the popup. Happened yesterday for me with @joshrosen. No idea why. I was also missing emails for a while to spam, but it seemed like a Gmail problem. It said I had marked messages from about 6 different people

Re: Handling questions in the mailing lists

2016-11-16 Thread Sean Owen
I updated the wiki to point to the /community.html page. (We're going to migrate the wiki real soon now anyway) I updated the /community.html page per this thread too. PR: https://github.com/apache/spark-website/pull/16 On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson

Re: Spark-SQL parameters like shuffle.partitions should be stored in the lineage

2016-11-15 Thread Sean Owen
Once you get to needing this level of fine-grained control, should you not consider using the programmatic API in part, to let you control individual jobs? On Tue, Nov 15, 2016 at 1:19 AM leo9r wrote: > Hi Daniel, > > I completely agree with your request. As the amount of

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-14 Thread Sean Owen
> Reynold Xin* > Herman van Hövell tot Westerflier > Ricardo Almeida > Shixiong (Ryan) Zhu > Sean Owen* > Michael Armbrust* > Dongjoon Hyun > Jagadeesan As > Liwei Lin > Weiqing Yang > Vaquar Khan > Denny Lee > Yin Huai* > Ryan Blue > Pratik Sharma > Kou

Re: Component naming in the PR title

2016-11-13 Thread Sean Owen
Yes they really correspond to, if anything, the categories at spark-prs.appspot.com . They aren't that consistently used however and there isn't really a definite list. It is really mostly of use for the fact that it tags emails in a way people can filter semi-effectively. So I think we have left

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-08 Thread Sean Owen
+1 binding (See comments on last vote; same results, except, the regression we identified is fixed now.) On Tue, Nov 8, 2016 at 6:10 AM Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.2. The vote is open until Thu, Nov

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-04 Thread Sean Owen
would unblock this. There's also some discussion about an alternative resolution for the test problem. On Wed, Nov 2, 2016 at 5:44 PM Sean Owen <so...@cloudera.com> wrote: > Sigs, license, etc are OK. There are no Blockers for 2.0.2, though here > are the 4 issues still open: > >

Re: [VOTE] Release Apache Spark 1.6.3 (RC2)

2016-11-04 Thread Sean Owen
Likewise, ran my usual tests on Ubuntu with yarn/hive/hive-thriftserver/hadoop-2.6 on JDK 8 and all passed. Sigs and licenses are OK. +1 On Thu, Nov 3, 2016 at 7:57 PM Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: > +1 > > On Thu, Nov 3, 2016 at 6:58 PM, Michael Armbrust

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-02 Thread Sean Owen
Sigs, license, etc are OK. There are no Blockers for 2.0.2, though here are the 4 issues still open: SPARK-14387 Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc SPARK-17957 Calling outer join and na.fill(0) and then inner join will miss rows SPARK-17981 Incorrectly Set

Re: Handling questions in the mailing lists

2016-11-02 Thread Sean Owen
y as I said, the main issue is not user questions (except maybe > advanced ones) but more for dev questions. It is so easy to get lost in the > chatter that it makes it very hard for people to learn spark internals… > > Assaf. > > > > *From:* Sean Owen [mailto:so...@cloudera.

Re: Handling questions in the mailing lists

2016-11-02 Thread Sean Owen
I think that unfortunately mailing lists don't scale well. This one has thousands of subscribers with different interests and levels of experience. For any given person, most messages will be irrelevant. I also find that a lot of questions on user@ are not well-asked, aren't an SSCCE (

Anyone seeing a lot of Spark emails go to Gmail spam?

2016-11-02 Thread Sean Owen
I couldn't figure out why I was missing a lot of dev@ announcements, and have just realized hundreds of messages to dev@ over the past month or so have been marked as spam for me by Gmail. I have no idea why but it's usually messages from Michael and Reynold, but not all of them. I'll see replies

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Sean Owen
Yes this came up from a different direction: https://issues.apache.org/jira/browse/SPARK-18140 I think it's fine to pursue an upgrade to fix these several issues. The question is just how well it will play with other components, so bears some testing and evaluation of the changes from 1.8, but

Re: Spark has a compile dependency on scalatest

2016-10-31 Thread Sean Owen
SBT (which I imagine a great many people do > when building Spark applications). > > Jeremy > > On Sat, Oct 29, 2016 at 2:50 AM, Sean Owen <so...@cloudera.com> wrote: > > Declare your scalatest dependency as test scope (which is correct anyway). > That would override i

Re: Spark has a compile dependency on scalatest

2016-10-29 Thread Sean Owen
lang:scala-reflect:jar:2.10.6:compile > [INFO]| \- > com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:compile > [INFO]+- org.apache.ivy:ivy:jar:2.4.0:compile > [INFO]+- oro:oro:jar:2.0.8:compile > [INFO]+- net.razorvine:pyrolite:jar:4.9:compile >

Re: Spark has a compile dependency on scalatest

2016-10-28 Thread Sean Owen
g it? > > On Fri, Oct 28, 2016 at 12:27 PM, Sean Owen <so...@cloudera.com> wrote: > > It's required because the tags module uses it to define annotations for > tests. I don't see it in compile scope for anything but the tags module, > which is then in test scope for other m

Re: Spark has a compile dependency on scalatest

2016-10-28 Thread Sean Owen
It's required because the tags module uses it to define annotations for tests. I don't see it in compile scope for anything but the tags module, which is then in test scope for other modules. What are you seeing that makes you say it's in compile scope? On Fri, Oct 28, 2016 at 8:19 PM Jeremy

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-28 Thread Sean Owen
If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar. CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Sean Owen
> https://issues.apache.org/jira/browse/SPARK-18138 > > > > On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > > > On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote: > > Seems OK by me. > How about Hadoo

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-27 Thread Sean Owen
+1 from me. All the sigs and licenses and hashes check out. It builds and passes tests with -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver on Ubuntu 16 + Java 8. Here are the open issues for 2.0.2 BTW. No blockers, but some marked Critical FYI. Just making sure nobody really meant for one of

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Sean Owen
Seems OK by me. How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now. On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers wrote: > that sounds good to me > > On Wed, Oct 26, 2016

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-25 Thread Sean Owen
t we should use to decide when to end Scala 2.10 > and/or Java 7 support? > > On Tue, Oct 25, 2016 at 8:36 AM, Sean Owen <so...@cloudera.com> wrote: > > I'd like to gauge where people stand on the issue of dropping support for > a few things that were considered for 2.0. &g

Straw poll: dropping support for things like Scala 2.10

2016-10-25 Thread Sean Owen
I'd like to gauge where people stand on the issue of dropping support for a few things that were considered for 2.0. First: Scala 2.10. We've seen a number of build breakages this week because the PR builder only tests 2.11. No big deal at this stage, but, it did cause me to wonder whether it's

PSA: watch the apache/spark-website repo if interested

2016-10-25 Thread Sean Owen
I don't believe emails about the spark-website repo are forwarded to the project mailing lists. If you want to watch for them, go star/watch the repo to be sure. I just opened a PR, for example. https://github.com/apache/spark-website

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-24 Thread Sean Owen
than its > established meaning: The logical fallacy of declaring victory by knocking > down an easily defeated argument or position that the opposition has never > actually made. > > On Mon, Oct 24, 2016 at 5:51 AM, Sean Owen <so...@cloudera.com> wrote: > > BTW I wrote up a st

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-24 Thread Sean Owen
BTW I wrote up a straw-man proposal for migrating the wiki content: https://issues.apache.org/jira/browse/SPARK-18073 On Tue, Oct 18, 2016 at 12:25 PM Holden Karau wrote: > Right now the wiki isn't particularly accessible to updates by external > contributors. We've

Re: [VOTE] Release Apache Spark 1.6.3 (RC1)

2016-10-19 Thread Sean Owen
Yeah I see that too. I'll work on back-porting it. The release otherwise looks good to me, but let's keep testing please to identify anything else in the meantime. On Wed, Oct 19, 2016 at 8:58 AM Pete Robbins wrote: > We see a regression since 1.6.2. I think this PR needs

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Sean Owen
I'm OK with that. The upside to the wiki is that it can be edited directly outside of a release cycle. However, in practice I find that the wiki is rarely changed. To me it also serves as a place for information that isn't exactly project documentation like "powered by" listings. In a way I'd

Re: cutting 2.0.2?

2016-10-17 Thread Sean Owen
(I don't think 2.0.2 will be released for a while if at all but that's not what you're asking I think) It's a fairly safe change, but also isn't exactly a fix in my opinion. Because there are some other changes to make it all work for SPARC, I think it's more realistic to look to the 2.1.0

Re: source for org.spark-project.hive:1.2.1.spark2

2016-10-17 Thread Sean Owen
IIRC this was all about shading of dependencies, not changes to the source. On Mon, Oct 17, 2016 at 6:26 PM Ryan Blue wrote: > Are these changes that the Hive community has rejected? I don't see a > compelling reason to have a long-term Spark fork of Hive. > > rb > >

<    5   6   7   8   9   10   11   12   13   14   >