Re: Tungsten's Vectorized Execution

2015-05-22 Thread Reynold Xin
Yijie, As Davies said, it will take us a while to get to vectorized execution. However, before that, we are going to refactor code generation to push it into each expression: https://issues.apache.org/jira/browse/SPARK-7813 Once this one is in (probably in the next 2 or 3 weeks), there will be

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Justin Uang
I'm working on one of the Palantir teams using Spark, and here is our feedback: We have encountered three issues when upgrading to spark 1.4.0. I'm not sure they qualify as a -1, as they come from using non-public APIs and multiple spark contexts for the purposes of testing, but I do want to

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Shivaram Venkataraman
Thanks for catching this. I'll check with Patrick to see why the R API docs are not getting included. On Fri, May 22, 2015 at 2:44 PM, Andrew Psaltis psaltis.and...@gmail.com wrote: All, Should all the docs work from http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Andrew Psaltis
All, Should all the docs work from http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API docs 404. On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Michael Armbrust
Thanks for the feedback. As you stated UDTs are explicitly not a public api as we knew we were going to be making breaking changes to them. We hope to stabilize / open them up in future releases. Regarding the Hive issue, have you tried using TestHive instead. This is what we use for testing

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Patrick Wendell
Thanks Andrew, the doc issue should be fixed in RC2 (if not, please chine in!). R was missing in the build envirionment. - Patrick On Fri, May 22, 2015 at 3:33 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Thanks for catching this. I'll check with Patrick to see why the R API docs

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread jameszhouyi
We came across a Spark SQL issue (https://issues.apache.org/jira/browse/SPARK-7119) that cause query to fail. I not sure that if vote -1 to this RC1. -- View this message in context:

Re: Testing spark applications

2015-05-22 Thread Josh Rosen
I think that @holdenk's *spark-testing-base* project publishes some of these test classes as well as some helper classes for testing streaming jobs: https://github.com/holdenk/spark-testing-base On Thu, May 21, 2015 at 10:39 PM, Reynold Xin r...@databricks.com wrote: It is just 15 lines of code

UDTs and StringType upgrade issue for Spark 1.4.0

2015-05-22 Thread Justin Uang
We ran into an issue regarding Strings in UDTs when upgrading to Spark 1.4.0-rc. I understand that it's a non-public APIs, so it's expected, but I just wanted to bring it up for awareness and so we can maybe change the release notes to mention them =) Our UDT was serializing to a StringType, but

Available Functions in SparkR

2015-05-22 Thread Eskilson,Aleksander
I’ve built Spark 1.4.0 for Hadoop 2.6 in a CDH5.4 and am testing SparkR. I’ve loaded up SparkR using the executable in /bin. The library import library(SparkR) seems to no longer import some of the same functions as it did for SparkR before the merge, e.g. textFile, lapply, etc. but it does

Unable to build from assembly

2015-05-22 Thread Manoj Kumar
Hello, I updated my master from upstream recently, and on running build/sbt assembly it gives me this error [error] /home/manoj/spark/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java:106: error: MyJavaLogisticRegression is not abstract and does not override

Spark Bug: Counting twice with different results

2015-05-22 Thread Niklas Wilcke
Hi, I have recognized a strange behavior of spark core in combination with mllib. Running my pipeline results in a RDD. Calling count() on this RDD results in 160055. Calling count() directly afterwards results in 160044 and so on. The RDD seems to be unstable. How can that be? Do you maybe have

Re: Unable to build from assembly

2015-05-22 Thread Ted Yu
What version of Java do you use ? Can you run this command first ? build/sbt clean BTW please see [SPARK-7498] [MLLIB] add varargs back to setDefault Cheers On Fri, May 22, 2015 at 7:34 AM, Manoj Kumar manojkumarsivaraj...@gmail.com wrote: Hello, I updated my master from upstream

Re: Unable to build from assembly

2015-05-22 Thread Edoardo Vacchi
confirming. master has been broken in the morning; currently it should be ok, though On Fri, May 22, 2015 at 4:34 PM, Manoj Kumar manojkumarsivaraj...@gmail.com wrote: Hello, I updated my master from upstream recently, and on running build/sbt assembly it gives me this error [error]

Re: Change for submitting to yarn in 1.3.1

2015-05-22 Thread Marcelo Vanzin
Hi Kevin, One thing that might help you in the meantime, while we work on a better interface for all this... On Thu, May 21, 2015 at 5:21 PM, Kevin Markey kevin.mar...@oracle.com wrote: Making *yarn.Client* private has prevented us from moving from Spark 1.0.x to Spark 1.2 or 1.3 despite many

Re: Unable to build from assembly

2015-05-22 Thread Manoj Kumar
A clean build worked. Thanks everyone for the help! On Fri, May 22, 2015 at 8:42 PM, Edoardo Vacchi uncommonnonse...@gmail.com wrote: confirming. master has been broken in the morning; currently it should be ok, though On Fri, May 22, 2015 at 4:34 PM, Manoj Kumar

Re: Spark Bug: Counting twice with different results

2015-05-22 Thread Sean Owen
This is expected for example if your RDD is the result of random sampling, or if the underlying source is not consistent. You haven't shown any code. On Fri, May 22, 2015 at 3:34 PM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote: Hi, I have recognized a strange behavior of spark core