SUB

2017-03-13 Thread dongxu
- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread Tim Hunter
Hello Enzo, since this question is also relevant to Spark, I will answer it here. The goal of GraphFrames is to provide graph capabilities along with excellent integration to the rest of the Spark ecosystem (using modern APIs such as DataFrames). As you seem to be well aware, a large number of

Adding the executor ID to Spark logs when launching an executor in a YARN container

2017-03-13 Thread Rodriguez Hortala, Juan
Hi Spark developers, For Spark running on YARN, I would like to be able to find out the container where an executor is running by looking at the logs. I haven't been able to find a way to do this, not even with the Spark UI, as neither the Executors tab nor the stage information page show the

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread Nicholas Chammas
Since GraphFrames is not part of the Spark project, your GraphFrames-specific questions are probably better directed at the GraphFrames issue tracker: https://github.com/graphframes/graphframes/issues As far as I know, GraphFrames is an active project, though not as active as Spark of course.

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread enzo
Nick Thanks for the quick answer :) Sadly, the comment in the page doesn’t answer my questions. More specifically: 1. GraphFrames last activity in github was 2 months ago. Last release on 12 Nov 2016. Till recently 2 month was close to a Spark release cycle. Why there has been no major

Re: Question on Spark's graph libraries roadmap

2017-03-13 Thread Nicholas Chammas
Your question is answered here under "Will GraphFrames be part of Apache Spark?", no? http://graphframes.github.io/#what-are-graphframes Nick On Mon, Mar 13, 2017 at 4:56 PM enzo wrote: > Please see this email trail: no answer so far on the user@spark board.

Fwd: Question on Spark's graph libraries roadmap

2017-03-13 Thread enzo
Please see this email trail: no answer so far on the user@spark board. Trying the developer board for better luck The question: I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark. I understand that we have had for some time two libraries (the following

Re: Should we consider a Spark 2.1.1 release?

2017-03-13 Thread Holden Karau
I'd be happy to do the work of coordinating a 2.1.1 release if that's a thing a committer can do (I think the release coordinator for the most recent Arrow release was a committer and the final publish step took a PMC member to upload but other than that I don't remember any issues). On Mon, Mar

Re: Should we consider a Spark 2.1.1 release?

2017-03-13 Thread Sean Owen
It seems reasonable to me, in that other x.y.1 releases have followed ~2 months after the x.y.0 release and it's been about 3 months since 2.1.0. Related: creating releases is tough work, so I feel kind of bad voting for someone else to do that much work. Would it make sense to deputize another

Re: Spark Improvement Proposals

2017-03-13 Thread Sean Owen
Responding to your request for a vote, I meant that this isn't required per se and the consensus here was not to vote on it. Hence the jokes about meta-voting protocol. In that sense nothing new happened process-wise, nothing against ASF norms, if that's your concern. I think it's just an agreed

Re: Should we consider a Spark 2.1.1 release?

2017-03-13 Thread Felix Cheung
+1 there are a lot of good fixes in overall and we need a release for Python and R packages. From: Holden Karau Sent: Monday, March 13, 2017 12:06:47 PM To: Felix Cheung; Shivaram Venkataraman; dev@spark.apache.org Subject: Should we

Should we consider a Spark 2.1.1 release?

2017-03-13 Thread Holden Karau
Hi Spark Devs, Spark 2.1 has been out since end of December and we've got quite a few fixes merged for 2.1.1

Re: Spark Improvement Proposals

2017-03-13 Thread Tom Graves
Another thing I think you should send out is when exactly does this take affect.  Is it any major new feature without a pull request?   Is it anything major starting with the 2.3 release?   Tom On Monday, March 13, 2017 1:08 PM, Tom Graves wrote: I'm not

Re: Spark Improvement Proposals

2017-03-13 Thread Tom Graves
I'm not sure how you can say its not a new process.  If that is the case why do we need a page documenting it?   As a developer if I want to put up a major improvement I have to now follow the SPIP whereas before I didn't, that certain seems like a new process.  As a PMC member I now have the

Re: Spark Improvement Proposals

2017-03-13 Thread Sean Owen
It's not a new process, in that it doesn't entail anything not already in http://apache.org/foundation/voting.html . We're just deciding to call a VOTE for this type of code modification. To your point -- yes, it's been around a long time with no further comment, and I called several times for

Re: Spark Improvement Proposals

2017-03-13 Thread Tom Graves
It seems like if you are adding responsibilities you should do a vote.  SPIP'S require votes from PMC members so you are now putting more responsibility on them. It feels like we should have an official vote to make sure they (PMC members) agree with that and to make sure everyone pays

Re: Spark Improvement Proposals

2017-03-13 Thread Sean Owen
This ended up proceeding as a normal doc change, instead of precipitating a meta-vote. However, the text that's on the web site now can certainly be further amended if anyone wants to propose a change from here. On Mon, Mar 13, 2017 at 1:50 PM Tom Graves wrote: > I think a

Re: Spark Local Pipelines

2017-03-13 Thread Asher Krim
Thanks for the feedback. If we strip away all of the fancy stuff, my proposal boils down to exposing the logic used in Spark's ML library. In an ideal world, Spark would possibly have relied on an existing ML implementation rather than reimplement, since there's very little that's Spark specific

Re: Spark Local Pipelines

2017-03-13 Thread Dongjin Lee
Although I love the cool idea of Asher, I'd rather +1 for Sean's view; I think it would be much better to live outside of the project. Best, Dongjin On Mon, Mar 13, 2017 at 5:39 PM, Sean Owen wrote: > I'm skeptical. Serving synchronous queries from a model at scale is a >

Re: Spark Improvement Proposals

2017-03-13 Thread Tom Graves
I think a vote here would be good. I think most of the discussion was done by 4 or 5 people and its a long thread.  If nothing else it summarizes everything and gets people attention to the change. Tom On Thursday, March 9, 2017 10:55 AM, Sean Owen wrote: I think a

Re: how to construct parameter for model.transform() from datafile

2017-03-13 Thread jinhong lu
Anyone help? > 在 2017年3月13日,19:38,jinhong lu 写道: > > After train the mode, I got the result look like this: > > > scala> predictionResult.show() > > +-++++--+ > |label|

Re: how to construct parameter for model.transform() from datafile

2017-03-13 Thread jinhong lu
After train the mode, I got the result look like this: scala> predictionResult.show() +-++++--+ |label|features| rawPrediction| probability|prediction|

Re: Spark Local Pipelines

2017-03-13 Thread Sean Owen
I'm skeptical. Serving synchronous queries from a model at scale is a fundamentally different activity. As you note, it doesn't logically involve Spark. If it has to happen in milliseconds it's going to be in-core. Scoring even 10qps with a Spark job per request is probably a non-starter; think

Re: Spark Local Pipelines

2017-03-13 Thread Georg Heiler
Great idea. I see the same problem. I would suggest checking the following projects as a kick start as well ( not only mleap) https://github.com/ucbrise/clipper and https://github.com/Hydrospheredata/mist Regards Georg Asher Krim schrieb am So. 12. März 2017 um 23:21: > Hi