Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11

2015-06-29 Thread Iulian Dragoș
On Mon, Jun 29, 2015 at 3:02 AM, Alessandro Baretta alexbare...@gmail.com wrote: I am building the current master branch with Scala 2.11 following these instructions: Building for Scala 2.11 To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11 property:

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Sean Owen
+1 sigs, license, etc check out. All tests pass for me in the Hadoop 2.6 + Hive configuration on Ubuntu. (I still get those pesky cosmetic UDF test failures in Java 8, but they are clearly just test issues.) I'll follow up on retargeting 1.4.1 issues afterwards as needed, but again feel free to

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Tom Graves
+1. Tested on yarn on hadoop 2.6 cluster Tom On Monday, June 29, 2015 2:04 AM, Tathagata Das tathagata.das1...@gmail.com wrote: @Ted, could you elaborate more on what was the test command that you ran? What profiles, using SBT or Maven?  TD On Sun, Jun 28, 2015 at 12:21 PM, Patrick

Re: how can I write a language wrapper?

2015-06-29 Thread Daniel Darabos
Hi Vasili, It so happens that the entire SparkR code was merged to Apache Spark in a single pull request. So you can see at once all the required changes in https://github.com/apache/spark/pull/5096. It's 12,043 lines and took more than 20 people about a year to write as I understand it. On Mon,

Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11

2015-06-29 Thread Steve Loughran
On 29 Jun 2015, at 11:27, Iulian Dragoș iulian.dra...@typesafe.commailto:iulian.dra...@typesafe.com wrote: On Mon, Jun 29, 2015 at 3:02 AM, Alessandro Baretta alexbare...@gmail.commailto:alexbare...@gmail.com wrote: I am building the current master branch with Scala 2.11 following these

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Yin Huai
+1. I tested those SQL blocker bugs in my laptop and they have been fixed. On Mon, Jun 29, 2015 at 6:51 AM, Sean Owen so...@cloudera.com wrote: +1 sigs, license, etc check out. All tests pass for me in the Hadoop 2.6 + Hive configuration on Ubuntu. (I still get those pesky cosmetic UDF test

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Ted Yu
Here is the command I used: mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package Java: 1.8.0_45 OS: Linux x.com 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Cheers On Mon, Jun 29, 2015 at 12:04 AM, Tathagata Das tathagata.das1...@gmail.com

Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11

2015-06-29 Thread Alessandro Baretta
Steve, It was indeed a protocol buffers issue. I am able to build spark now. Thanks. On Mon, Jun 29, 2015 at 7:37 AM, Steve Loughran ste...@hortonworks.com wrote: On 29 Jun 2015, at 11:27, Iulian Dragoș iulian.dra...@typesafe.com wrote: On Mon, Jun 29, 2015 at 3:02 AM, Alessandro

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Ted Yu
The test passes when run alone on my machine as well. Please run test suite. Thanks On Mon, Jun 29, 2015 at 2:01 PM, Tathagata Das tathagata.das1...@gmail.com wrote: @Ted, I ran the following two commands. mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DskipTests clean package mvn

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Andrew Or
Hi Ted, We haven't observed a StreamingContextSuite failure on our test infrastructure recently. Given that we cannot reproduce it even locally it is unlikely that this uncovers a real bug. Even if it does I would not block the release on it because many in the community are waiting for a few

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:26 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4.

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Ted Yu
Andrew: I agree with your assessment. Cheers On Mon, Jun 29, 2015 at 3:33 PM, Andrew Or and...@databricks.com wrote: Hi Ted, We haven't observed a StreamingContextSuite failure on our test infrastructure recently. Given that we cannot reproduce it even locally it is unlikely that this

Re: Question about Spark process and thread

2015-06-29 Thread Reynold Xin
Most of those threads are not for task execution. They are for RPC, scheduling, ... On Sun, Jun 28, 2015 at 8:32 AM, Dogtail Ray spark.ru...@gmail.com wrote: Hi, I was looking at Spark source code, and I found that when launching a Executor, actually Spark is launching a threadpool; each

Dataframes filter by count fails with python API

2015-06-29 Thread Andrew Vykhodtsev
Dear developers, I found the following behaviour that I think is a minor bug. If I apply groupBy and count in python API, the resulting data frame has grouped columns and the field named count. Filtering by that field does not work because it thinks it is a key word: x =

Re: how can I write a language wrapper?

2015-06-29 Thread Vasili I. Galchin
Shivaram, Vis-a-vis Haskell support, I am reading DataFrame.R, SparkRBackend*, context.R, et. al., am I headed in the correct direction?/ Yes or no, please give more guidance. Thank you. Kind regards, Vasili On Tue, Jun 23, 2015 at 1:46 PM, Shivaram Venkataraman

Re: Dataframes filter by count fails with python API

2015-06-29 Thread Reynold Xin
Hi Andrew, Thanks for the email. This is a known bug with the expression parser. We will hopefully fix this in 1.5. There are more keywords with the expression parser, and we have already got rid of most of them. Count is still there due to the handling of count distinct, but we plan to get rid

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-29 Thread Tathagata Das
@Ted, could you elaborate more on what was the test command that you ran? What profiles, using SBT or Maven? TD On Sun, Jun 28, 2015 at 12:21 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Krishna - this is still the current release candidate. - Patrick On Sun, Jun 28, 2015 at 12:14 PM,

Re: how can I write a language wrapper?

2015-06-29 Thread Justin Uang
My guess is that if you are just wrapping the spark sql APIs, you can get away with not having to reimplement a lot of the complexities in Pyspark like storing everything in RDDs as pickled byte arrays, pipelining RDDs, doing aggregations and joins in the python interpreters, etc. Since the