Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Chang Ya-Hsuan
Thanks for your quickly reply. I will test several pypy versions and report the result later. On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen wrote: > I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's > docs say that we only support PyPy 2.3+. Could you

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Josh Rosen
I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's docs say that we only support PyPy 2.3+. Could you try using a newer PyPy version to see if that works? I just checked and it looks like our Jenkins tests are running against PyPy 2.5.1, so that version is known to work. I'm

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Chang Ya-Hsuan
I've test on following pypy version against to spark-1.5.1 pypy-2.2.1 pypy-2.3 pypy-2.3.1 pypy-2.4.0 pypy-2.5.0 pypy-2.5.1 pypy-2.6.0 pypy-2.6.1 I run $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy /path/to/spark-1.5.1/bin/pyspark and only pypy-2.2.1 failed. Any suggestion

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-11-05 Thread Sjoerd Mulder
Hi Reynold, I had version 2.6.1 in my project which was provided by the fine folks from spring-boot-dependencies. Now have overridden it to 2.7.8 :) Sjoerd 2015-11-01 8:22 GMT+01:00 Reynold Xin : > Thanks for reporting it, Sjoerd. You might have a different version of >

Fwd: dataframe slow down with tungsten turn on

2015-11-05 Thread gen tang
-- Forwarded message -- From: gen tang Date: Fri, Nov 6, 2015 at 12:14 AM Subject: Re: dataframe slow down with tungsten turn on To: "Cheng, Hao" Hi, My application is as follows: 1. create dataframe from hive table 2. transform

Recommended change to core-site.xml template

2015-11-05 Thread Christian
We ended up reading and writing to S3 a ton in our Spark jobs. For this to work, we ended up having to add s3a, and s3 key/secret pairs. We also had to add fs.hdfs.impl to get these things to work. I thought maybe I'd share what we did and it might be worth adding these to the spark conf for out

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
Thanks for sharing this, Christian. What build of Spark are you using? If I understand correctly, if you are using Spark built against Hadoop 2.6+ then additional configs alone won't help because additional libraries also need to be installed .

Re: Recommended change to core-site.xml template

2015-11-05 Thread Shivaram Venkataraman
Thanks for investigating this. The right place to add these is the core-site.xml template we have at https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/spark/conf/core-site.xml and/or

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
> I am using both 1.4.1 and 1.5.1. That's the Spark version. I'm wondering what version of Hadoop your Spark is built against. For example, when you download Spark you have to select from a number of packages (under "Choose a package type"), and each is

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the new feature for instance-profile which greatly helps with this as well. Without the instance-profile, we got it working by copying a .aws/credentials file up to each node. We could easily automate that through the templates.

Re: Master build fails ?

2015-11-05 Thread Steve Loughran
SBT/ivy pulls in the most recent version of a JAR in, whereas maven pulls in the "closest", where closest is lowest distance/depth from the root. > On 5 Nov 2015, at 18:53, Marcelo Vanzin wrote: > > Seems like it's an sbt issue, not a maven one, so "dependency:tree" >

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Does anyone know how to get something similar to "mvn dependency:tree" from sbt? mvn dependency:tree with hadoop 2.6.0 does not show any instances of guava 16... On Thu, Nov 5, 2015 at 11:37 AM, Ted Yu wrote: > build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver >

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-05 Thread Nicholas Chammas
-0 The spark-ec2 version is still set to 1.5.1 . Nick On Wed, Nov 4, 2015 at 8:20 PM Egor Pahomov wrote: > +1 > > Things, which our infrastructure use and I checked: > > Dynamic allocation > Spark

Re: Master build fails ?

2015-11-05 Thread Ted Yu
build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0 -DskipTests assembly The above command fails on Mac. build/sbt -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver -Pkinesis-asl -DskipTests assembly The above command, used by Jenkins, passes. That's why the build error

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Man that command is slow. Anyway, it seems guava 16 is being brought transitively by curator 2.6.0 which should have been overridden by the explicit dependency on curator 2.4.0, but apparently, as Steve mentioned, sbt/ivy decided to break things, so I'll be adding some exclusions. On Thu, Nov 5,

Re: [BUILD SYSTEM] quick jenkins downtime, november 5th 7am

2015-11-05 Thread shane knapp
well, i forgot to put this on my calendar and didn't get around to getting it done this morning. :) anyways, i'll be shooting for tomorrow (friday) morning instead. shane On Mon, Nov 2, 2015 at 9:55 AM, shane knapp wrote: > i'd like to take jenkins down briefly thursday

Re: pyspark with pypy not work for spark-1.5.1

2015-11-05 Thread Josh Rosen
You could try running PySpark's own unit tests. Try ./python/run-tests --help for instructions. On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan wrote: > I've test on following pypy version against to spark-1.5.1 > > pypy-2.2.1 > pypy-2.3 > pypy-2.3.1 > pypy-2.4.0 >

Re: Master build fails ?

2015-11-05 Thread Dilip Biswal
Hello, I am getting the same build error about not being able to find com.google.common.hash.HashCodes. Is there a solution to this ? Regards, Dilip Biswal Tel: 408-463-4980 dbis...@us.ibm.com From: Jean-Baptiste Onofré To: Ted Yu Cc:

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Seems like it's an sbt issue, not a maven one, so "dependency:tree" might not help. Still, the command line would be helpful. I use sbt and don't see this. On Thu, Nov 5, 2015 at 10:44 AM, Marcelo Vanzin wrote: > Hi Jeff, > > On Tue, Nov 3, 2015 at 2:50 AM, Jeff Zhang

Re: Spark 1.6 Release Schedule

2015-11-05 Thread Michael Armbrust
Sorry for the delay due to traveling... The branch has been cut. At this point anything that we want to go into Spark 1.6 will need to be cherry-picked. Please be cautious when doing so, and contact me if you are uncertain. Michael On Sun, Nov 1, 2015 at 4:16 AM, Sean Owen

Re: How to force statistics calculation of Dataframe?

2015-11-05 Thread Reynold Xin
If your data came from RDDs (i.e. not a file system based data source), and you don't want to cache, then no On Wed, Nov 4, 2015 at 3:51 PM, Charmee Patel wrote: > Due to other reasons we are using spark sql, not dataframe api. I saw that > broadcast hint is only

Re: Master build fails ?

2015-11-05 Thread Ted Yu
Dilip: Can you give the command you used ? Which release were you building ? What OS did you build on ? Cheers On Thu, Nov 5, 2015 at 10:21 AM, Dilip Biswal wrote: > Hello, > > I am getting the same build error about not being able to find >

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Hi Jeff, On Tue, Nov 3, 2015 at 2:50 AM, Jeff Zhang wrote: > Looks like it's due to guava version conflicts, I see both guava 14.0.1 and > 16.0.1 under lib_managed/bundles. Anyone meet this issue too ? What command line are you using to build? Can you run "mvn dependency:tree"

Re: Master build fails ?

2015-11-05 Thread Dilip Biswal
Hello Ted, Thanks for your response. Here is the command i used : build/sbt clean build/sbt -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0 -DskipTests assembly I am building on CentOS and on master branch. One other thing, i was able to build fine with the above

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
Answering my own question: "dependency-graph" On Thu, Nov 5, 2015 at 11:44 AM, Marcelo Vanzin wrote: > Does anyone know how to get something similar to "mvn dependency:tree" from > sbt? > > mvn dependency:tree with hadoop 2.6.0 does not show any instances of guava > 16...

Re: Need advice on hooking into Sql query plan

2015-11-05 Thread Jörn Franke
Would it be possible to use views to address some of your requirements? Alternatively it might be better to parse it yourself. There are open source libraries for it, if you need really a complete sql parser. Do you want to do it on sub queries? > On 05 Nov 2015, at 23:34, Yana Kadiyska

Re: Need advice on hooking into Sql query plan

2015-11-05 Thread Reynold Xin
You can hack around this by constructing logical plans yourself and then creating a DataFrame in order to execute them. Note that this is all depending on internals of the framework and can break when Spark upgrades. On Thu, Nov 5, 2015 at 4:18 PM, Yana Kadiyska wrote:

Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion: http://search-hadoop.com/m/q3RTtPnPnzwOhBr FYI On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch wrote: > Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build > environment by updating the pom.xml in each of the subprojects. If you

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
I created the cluster with the following: --hadoop-major-version=2 --spark-version=1.4.1 from: spark-1.5.1-bin-hadoop1 Are you saying there might be different behavior if I download spark-1.5.1-hadoop-2.6 and create my cluster? On Thu, Nov 5, 2015 at 1:28 PM, Christian

State of the Build

2015-11-05 Thread Jakob Odersky
Hi everyone, in the process of learning Spark, I wanted to get an overview of the interaction between all of its sub-projects. I therefore decided to have a look at the build setup and its dependency management. Since I am alot more comfortable using sbt than maven, I decided to try to port the

Need advice on hooking into Sql query plan

2015-11-05 Thread Yana Kadiyska
Hi folks, not sure if this belongs to dev or user list..sending to dev as it seems a bit convoluted. I have a UI in which we allow users to write ad-hoc queries against a (very large, partitioned) table. I would like to analyze the queries prior to execution for two purposes: 1. Reject

Re: Need advice on hooking into Sql query plan

2015-11-05 Thread Yana Kadiyska
I don't think a view would help -- in the case of under-constraining, I want to make sure that the user is constraining a column (e.g. I want to restrict them to querying a single partition at a time but I don't care which one)...a view per partition value is not practical due to the fairly high

Re: State of the Build

2015-11-05 Thread Stephen Boesch
Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build environment by updating the pom.xml in each of the subprojects. If you were able to come up with a structure that avoids that approach it would be an improvement. 2015-11-05 15:38 GMT-08:00 Jakob Odersky

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
Spark 1.5.1-hadoop1 On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > I am using both 1.4.1 and 1.5.1. > > That's the Spark version. I'm wondering what version of Hadoop your Spark > is built against. > > For example, when you download Spark >

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
Even with the changes I mentioned above? On Thu, Nov 5, 2015 at 8:10 PM Nicholas Chammas wrote: > Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you > cannot access S3, unfortunately. > > On Thu, Nov 5, 2015 at 3:53 PM Christian

Re: Recommended change to core-site.xml template

2015-11-05 Thread Christian
Oh right. I forgot about the libraries being removed. On Thu, Nov 5, 2015 at 10:35 PM Nicholas Chammas wrote: > I might be mistaken, but yes, even with the changes you mentioned you will > not be able to access S3 if Spark is built against Hadoop 2.6+ unless you >

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
I might be mistaken, but yes, even with the changes you mentioned you will not be able to access S3 if Spark is built against Hadoop 2.6+ unless you install additional libraries. The issue is explained in SPARK-7481 and SPARK-7442

Re: State of the Build

2015-11-05 Thread Sean Owen
Maven isn't 'legacy', or supported for the benefit of third parties. SBT had some behaviors / problems that Maven didn't relative to what Spark needs. SBT is a development-time alternative only, and partly generated from the Maven build. On Fri, Nov 6, 2015 at 1:48 AM, Koert Kuipers

Re: State of the Build

2015-11-05 Thread Koert Kuipers
People who do upstream builds of spark (think bigtop and hadoop distros) are used to legacy systems like maven, so maven is the default build. I don't think it will change. Any improvements for the sbt build are of course welcome (it is still used by many developers), but i would not do anything

Re: Master build fails ?

2015-11-05 Thread Marcelo Vanzin
FYI I pushed a fix for this to github; so if you pull everything should work now. On Thu, Nov 5, 2015 at 12:07 PM, Marcelo Vanzin wrote: > Man that command is slow. Anyway, it seems guava 16 is being brought > transitively by curator 2.6.0 which should have been overridden

RE: dataframe slow down with tungsten turn on

2015-11-05 Thread Cheng, Hao
What’s the big size of the raw data and the result data? Is that any other changes like HDFS, Spark configuration, your own code etc. besides the Spark binary? Can you monitor the IO/CPU state while executing the final stage, and it will be great if you can paste the call stack if you observe

Re: State of the Build

2015-11-05 Thread Mark Hamstra
There was a lot of discussion that preceded our arriving at this statement in the Spark documentation: "Maven is the official build tool recommended for packaging Spark, and is the build of reference." https://spark.apache.org/docs/latest/building-spark.html#building-with-sbt I'm not aware of

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you cannot access S3, unfortunately. On Thu, Nov 5, 2015 at 3:53 PM Christian wrote: > I created the cluster with the following: > > --hadoop-major-version=2 > --spark-version=1.4.1 > > from:

Re: State of the Build

2015-11-05 Thread Patrick Wendell
Hey Jakob, The builds in Spark are largely maintained by me, Sean, and Michael Armbrust (for SBT). For historical reasons, Spark supports both a Maven and SBT build. Maven is the build of reference for packaging Spark and is used by many downstream packagers and to build all Spark releases. SBT