SparkPlan/Shuffle stage reuse with Dataset/DataFrame

2016-10-18 Thread Zhan Zhang
Hi Folks, We have some Dataset/Dataframe use cases that will benefit from reuse the SparkPlan and shuffle stage. For example, the following cases. Because the query optimization and sparkplan is generated by catalyst when it is executed, as a result, the underlying RDD lineage is regenerated

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Hyukjin Kwon
+1 if the docs can be exposed more. On 19 Oct 2016 2:04 a.m., "Shivaram Venkataraman" < shiva...@eecs.berkeley.edu> wrote: > +1 - Given that our website is now on github > (https://github.com/apache/spark-website), I think we can move most of > our wiki into the main website. That way we'll only

Re: source for org.spark-project.hive:1.2.1.spark2

2016-10-18 Thread Steve Loughran
On 17 Oct 2016, at 18:26, Ryan Blue > wrote: Are these changes that the Hive community has rejected? I don't see a compelling reason to have a long-term Spark fork of Hive. More changes in hive that haven't been picked up HIVE-11720 is needed to

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Shivaram Venkataraman
+1 - Given that our website is now on github (https://github.com/apache/spark-website), I think we can move most of our wiki into the main website. That way we'll only have two sources of documentation to maintain: A release specific one in the main repo and the website which is more long lived.

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Matei Zaharia
Is there any way to tie wiki accounts with JIRA accounts? I found it weird that they're not tied at the ASF. Otherwise, moving this into the docs might make sense. Matei > On Oct 18, 2016, at 6:19 AM, Cody Koeninger wrote: > > +1 to putting docs in one clear place. > >

Re: On convenience methods

2016-10-18 Thread Holden Karau
I think what Reynold means is that if its easy for a developer to build this convenience function using the current Spark API it probably doesn't need to go into Spark unless its being done to provide a similar API to a system we are attempting to be semi-compatible with (e.g. if a corresponding

Re: On convenience methods

2016-10-18 Thread roehst
Sorry, by API you mean by use of 3rd party libraries or user code or something else? Thanks -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/On-convenience-methods-tp19460p19496.html Sent from the Apache Spark Developers List mailing list archive at

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Cody Koeninger
+1 to putting docs in one clear place. On Oct 18, 2016 6:40 AM, "Sean Owen" wrote: > I'm OK with that. The upside to the wiki is that it can be edited directly > outside of a release cycle. However, in practice I find that the wiki is > rarely changed. To me it also serves

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Sean Owen
I'm OK with that. The upside to the wiki is that it can be edited directly outside of a release cycle. However, in practice I find that the wiki is rarely changed. To me it also serves as a place for information that isn't exactly project documentation like "powered by" listings. In a way I'd

Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Holden Karau
Right now the wiki isn't particularly accessible to updates by external contributors. We've already got a contributing to spark page which just links to the wiki - how about if we just move the wiki contents over? This way contributors can contribute to our documentation about how to contribute

Re: Contributing to PySpark

2016-10-18 Thread Holden Karau
Hi Krishna, Thanks for your interest contributing to PySpark! I don't personally use either of those IDEs so I'll leave that part for someone else to answer - but in general you can find the building spark documentation at http://spark.apache.org/docs/latest/building-spark.html which includes

Contributing to PySpark

2016-10-18 Thread Krishna Kalyan
Hello, I am a masters student. Could someone please let me know how set up my dev working environment to contribute to pyspark. Questions I had were: a) Should I use Intellij Idea or PyCharm?. b) How do I test my changes?. Regards, Krishna

StructuredStreaming status

2016-10-18 Thread Ofir Manor
Hi, I hope it is the right forum. I am looking for some information of what to expect from StructuredStreaming in its next releases to help me choose when / where to start using it more seriously (or where to invest in workarounds and where to wait). I couldn't find a good place where such