Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Hey Sean, Right now we don't publish every 2.11 binary to avoid combinatorial explosion of the number of build artifacts we publish (there are other parameters such as whether hive is included, etc). We can revisit this in future feature releases, but .1 releases like this are reserved for bug

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Sean McNamara
Sounds good, that makes sense. Cheers, Sean On Jan 27, 2015, at 11:35 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, Right now we don't publish every 2.11 binary to avoid combinatorial explosion of the number of build artifacts we publish (there are other parameters such as

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Okay - we've resolved all issues with the signatures and keys. However, I'll leave the current vote open for a bit to solicit additional feedback. On Tue, Jan 27, 2015 at 10:43 AM, Sean McNamara sean.mcnam...@webtrends.com wrote: Sounds good, that makes sense. Cheers, Sean On Jan 27, 2015,

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Reynold Xin
Koert, As Mark said, I have already refactored the API so that nothing is catalyst is exposed (and users won't need them anyway). Data types, Row interfaces are both outside catalyst package and in org.apache.spark.sql. On Tue, Jan 27, 2015 at 9:08 AM, Koert Kuipers ko...@tresata.com wrote:

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Dirceu Semighini Filho
Reynold, But with type alias we will have the same problem, right? If the methods doesn't receive schemardd anymore, we will have to change our code to migrade from schema to dataframe. Unless we have an implicit conversion between DataFrame and SchemaRDD 2015-01-27 17:18 GMT-02:00 Reynold Xin

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Reynold Xin
Dirceu, That is not possible because one cannot overload return types. SQLContext.parquetFile (and many other methods) needs to return some type, and that type cannot be both SchemaRDD and DataFrame. In 1.3, we will create a type alias for DataFrame called SchemaRDD to not break source

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Yes - the key issue is just due to me creating new keys this time around. Anyways let's take another stab at this. In the mean time, please don't hesitate to test the release itself. - Patrick On Tue, Jan 27, 2015 at 10:00 AM, Sean Owen so...@cloudera.com wrote: Got it. Ignore the SHA512 issue

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Koert Kuipers
thats great. guess i was looking at a somewhat stale master branch... On Tue, Jan 27, 2015 at 2:19 PM, Reynold Xin r...@databricks.com wrote: Koert, As Mark said, I have already refactored the API so that nothing is catalyst is exposed (and users won't need them anyway). Data types, Row

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Dmitriy Lyubimov
It has been pretty evident for some time that's what it is, hasn't it? Yes that's a better name IMO. On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin r...@databricks.com wrote: Hi, We are considering renaming SchemaRDD - DataFrame in 1.3, and wanted to get the community's opinion. The context

Re: talk on interface design

2015-01-27 Thread Reynold Xin
Thanks, Andrew. That's great material. On Mon, Jan 26, 2015 at 10:23 PM, Andrew Ash and...@andrewash.com wrote: In addition to the references you have at the end of the presentation, there's a great set of practical examples based on the learnings from Qt posted here:

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Matei Zaharia
The type alias means your methods can specify either type and they will work. It's just another name for the same type. But Scaladocs and such will show DataFrame as the type. Matei On Jan 27, 2015, at 12:10 PM, Dirceu Semighini Filho dirceu.semigh...@gmail.com wrote: Reynold, But with

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Krishna Sankar
+1 1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 1.2.0 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Reynold Xin
+1 Tested on Mac OS X On Tue, Jan 27, 2015 at 12:35 PM, Krishna Sankar ksanka...@gmail.com wrote: +1 1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests 2. Tested pyspark, mlib -

Re: Friendly reminder/request to help with reviews!

2015-01-27 Thread Gurumurthy Yeleswarapu
Hi Patrick: I would love to help reviewing in any way I can. Im fairly new here. Can you help with a pointer to get me started. Thanks From: Patrick Wendell pwend...@gmail.com To: dev@spark.apache.org dev@spark.apache.org Sent: Tuesday, January 27, 2015 3:56 PM Subject: Friendly

Friendly reminder/request to help with reviews!

2015-01-27 Thread Patrick Wendell
Hey All, Just a reminder, as always around release time we have a very large volume of patches show up near the deadline. One thing that can help us maximize the number of patches we get in is to have community involvement in performing code reviews. And in particular, doing a thorough review

Re: Use mvn to build Spark 1.2.0 failed

2015-01-27 Thread Sean Owen
You certainly do not need yo build Spark as root. It might clumsily overcome a permissions problem in your local env but probably causes other problems. On Jan 27, 2015 11:18 AM, angel__ angel.alvarez.pas...@gmail.com wrote: I had that problem when I tried to build Spark 1.2. I don't exactly

Re: Use mvn to build Spark 1.2.0 failed

2015-01-27 Thread angel__
I had that problem when I tried to build Spark 1.2. I don't exactly know what is causing it, but I guess it might have something to do with user permissions. I could finally fix this by building Spark as root user (now I'm dealing with another problem, but ...that's another story...) -- View

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Evan R. Sparks
I'm +1 on this, although a little worried about unknowingly introducing SparkSQL dependencies every time someone wants to use this. It would be great if the interface can be abstract and the implementation (in this case, SparkSQL backend) could be swapped out. One alternative suggestion on the

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Sean Owen
I think there are several signing / hash issues that should be fixed before this release. Hashes: http://issues.apache.org/jira/browse/SPARK-5308 https://github.com/apache/spark/pull/4161 The hashes here are correct, but have two issues: As noted in the JIRA, the format of the hash file is

Re: Maximum size of vector that reduce can handle

2015-01-27 Thread Boromir Widas
I am running into this issue as well, when storing large Arrays as the value in a kv pair and then doing a reducebykey. Can one of the experts please comment if it would make sense to add an operation to add values in place like accumulators do - this would essentially merge the vectors for a

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Mark Hamstra
In master, Reynold has already taken care of moving Row into org.apache.spark.sql; so, even though the implementation of Row (and GenericRow et al.) is in Catalyst (which is more optimizer than parser), that needn't be of concern to users of the API in its most recent state. On Tue, Jan 27, 2015

Re: renaming SchemaRDD - DataFrame

2015-01-27 Thread Michael Malak
I personally have no preference DataFrame vs. DataTable, but only wish to lay out the history and etymology simply because I'm into that sort of thing. Frame comes from Marvin Minsky's 1970's AI construct: slots and the data that go in them. The S programming language (precursor to R) adopted

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Sean Owen
Got it. Ignore the SHA512 issue since these aren't somehow expected by a policy or Maven to be in a certain format. Just wondered if the difference was intended. The Maven way of generated the SHA1 hashes is to set this on the install plugin, AFAIK, although I'm not sure if the intent was to hash