Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
The use case of docker images in general is that you can deploy and develop with exactly the same binary environment - same java 8, same scala, same spark. This makes things repeatable. On Wed, May 25, 2016 at 8:38 PM, Matei Zaharia wrote: > Just wondering, what is the

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Matei Zaharia
Just wondering, what is the main use case for the Docker images -- to develop apps locally or to deploy a cluster? If the image is really just a script to download a certain package name from a mirror, it may be okay to create an official one, though it does seem tricky to make it properly use

Re: Labeling Jiras

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 3:45 PM, Reynold Xin wrote: > I think the risk is everybody starts following this, then this will be > unmanageable, given the size of the number of organizations involved. > > The two main labels that we actually use are starter + releasenotes. > >

Re: Labeling Jiras

2016-05-25 Thread Reynold Xin
I think the risk is everybody starts following this, then this will be unmanageable, given the size of the number of organizations involved. The two main labels that we actually use are starter + releasenotes. On Wed, May 25, 2016 at 2:58 PM, Luciano Resende wrote: > > >

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 2:34 PM, Sean Owen wrote: > I don't think the project would bless anything but the standard > release artifacts since only those are voted on. People are free to > maintain whatever they like and even share it, as long as it's clear > it's not from the

Re: Labeling Jiras

2016-05-25 Thread Sean Owen
Yeah I think using labels is fine -- just not if they're for someone's internal purpose. I don't have a problem with using meaningful labels if they're meaningful to everyone. In fact, I'd rather be using labels rather than "umbrella" JIRAs. Labels I have removed as unuseful are ones like "patch"

Re: Labeling Jiras

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 2:33 PM, Sean Owen wrote: > I don't think we generally use labels at all except "starter". I > sometimes remove labels when I'm editing a JIRA otherwise, perhaps to > make that point. I don't recall doing this recently. > We have used for other things

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Sean Owen
I don't think the project would bless anything but the standard release artifacts since only those are voted on. People are free to maintain whatever they like and even share it, as long as it's clear it's not from the Apache project. On Wed, May 25, 2016 at 3:41 PM, Marcin Tustin

Re: Labeling Jiras

2016-05-25 Thread Sean Owen
I don't think we generally use labels at all except "starter". I sometimes remove labels when I'm editing a JIRA otherwise, perhaps to make that point. I don't recall doing this recently. However I'd say they should not be used to tag JIRAs for your internal purposes. Have you looked at things

Labeling Jiras

2016-05-25 Thread Luciano Resende
I recently used labels to mark couple jiras that me and my team have some interest on them, so it's easier to share a query and check the status on them. But I noticed that these labels were removed. Are there any issues with labeling jiras ? Any other suggestions ? -- Luciano Resende

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
Ah very nice. Would it be possible to have this blessed into an official image? On Wed, May 25, 2016 at 4:12 PM, Luciano Resende wrote: > > > On Wed, May 25, 2016 at 6:53 AM, Marcin Tustin > wrote: > >> Would it be useful to start baking docker

Re: feedback on dataset api explode

2016-05-25 Thread Koert Kuipers
oh yes, this was by accident, it should have gone to dev On Wed, May 25, 2016 at 4:20 PM, Reynold Xin wrote: > Created JIRA ticket: https://issues.apache.org/jira/browse/SPARK-15533 > > @Koert - Please keep API feedback coming. One thing - in the future, can > you send api

Re: feedback on dataset api explode

2016-05-25 Thread Reynold Xin
Created JIRA ticket: https://issues.apache.org/jira/browse/SPARK-15533 @Koert - Please keep API feedback coming. One thing - in the future, can you send api feedbacks to the dev@ list instead of user@? On Wed, May 25, 2016 at 1:05 PM, Cheng Lian wrote: > Agree, since

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 6:53 AM, Marcin Tustin wrote: > Would it be useful to start baking docker images? Would anyone find that a > boon to their testing? > > +1, I had done one (still based on 1.6) for some SystemML experiments, I could easily get it based on a nightly

LiveListenerBus with started and stopped flags? Why both?

2016-05-25 Thread Jacek Laskowski
Hi, I'm wondering why LiveListenerBus has two AtomicBoolean flags [1]? Could it not have just one, say started? Why does Spark have to check the stopped state? [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L49-L51

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Max Sperlich
Cartesian joins tend to give a huge result size, and are inherently slow. If RDD B has N records then your result size will be at least N * 30 MB, since you have to replicate all the rows of A for a single record in B. Assuming RDD B has 10,000 records then you can see that your cartesian join

Re: Cannot build master with sbt

2016-05-25 Thread Nick Pentreath
I've filed https://issues.apache.org/jira/browse/SPARK-15525 For now, you would have to check out sbt-antlr4 at https://github.com/ihji/sbt-antlr4/commit/23eab68b392681a7a09f6766850785afe8dfa53d (since I don't see any branches or tags in the github repo for different versions), and sbt

Cannot build master with sbt

2016-05-25 Thread Yiannis Gkoufas
Hi there, I have cloned the latest version from github. I am using scala 2.10.x When I invoke build/sbt clean package I get the exceptions because for the sbt-antlr library: [warn] module not found: com.simplytyped#sbt-antlr4;0.7.10 [warn] typesafe-ivy-releases: tried [warn]

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Daniel Darabos
Awesome, thanks! It's very helpful for preparing for the migration. Do you plan to push 2.0.0-preview to Maven too? (I for one would appreciate the convenience.) On Wed, May 25, 2016 at 8:44 AM, Reynold Xin wrote: > In the past the Spark community have created preview

Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Hi All, I have two RDDs A and B where in A is of size 30 MB and B is of size 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in cartesian operation ? I am using spark 1.6.0 version Regards, Padma Ch

[ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Reynold Xin
In the past the Spark community have created preview packages (not official releases) and used those as opportunities to ask community members to test the upcoming versions of Apache Spark. Several people in the Apache community have suggested we conduct votes for these preview packages and turn