How to run specific sparkSQL test with maven

2014-08-01 Thread 田毅
Hi everyone! Could any one tell me how to run specific sparkSQL test with maven? For example: I want to test HiveCompatibilitySuite. I ran “mvm test -Dtest=HiveCompatibilitySuite” It did not work. BTW, is there any information about how to build a test environment of sparkSQL? I got this

Re:How to run specific sparkSQL test with maven

2014-08-01 Thread witgo
You can try these commands‍ ./sbt/sbt assembly‍./sbt/sbt test-only *.HiveCompatibilitySuite -Phive‍ ‍ -- Original -- From: 田毅;tia...@asiainfo.com; Date: Fri, Aug 1, 2014 05:00 PM To: devdev@spark.apache.org; Subject: How to run specific sparkSQL test

Re: Re:How to run specific sparkSQL test with maven

2014-08-01 Thread Jeremy Freeman
With maven you can run a particular test suite like this: mvn -DwildcardSuites=org.apache.spark.sql.SQLQuerySuite test see the note here (under Spark Tests in Maven): http://spark.apache.org/docs/latest/building-with-maven.html -- View this message in context:

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
Heya, Dunno if these ideas are still in the air or felt in the warp ^^. However there is a paper on avocado http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project8_report.pdf that mentions a way of working with their data (sequence's reads) in a windowed manner without

Re: How to run specific sparkSQL test with maven

2014-08-01 Thread Michael Armbrust
It seems that the HiveCompatibilitySuite need a hadoop and hive environment, am I right? Relative path in absolute URI: file:$%7Bsystem:test.tmp.dir%7D/tmp_showcrt1” You should only need Hadoop and Hive if you are creating new tests that we need to compute the answers for. Existing tests

Interested in contributing to GraphX in Python

2014-08-01 Thread Rajiv Abraham
Hi, I just saw Ankur's GraphX presentation and it looks very exciting! I would like to contribute to a Python version of GraphX. I checked out JIRA and Github but I did not find much info. - Are there limitations currently to port GraphX in Python? (e.g. Maybe the Python Spark RDD API is

My Spark application had huge performance refression after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075

2014-08-01 Thread Jin, Zhonghui
I found huge performance regression ( 1/20 of original) of my application after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075. Apply the following patch, will fix my issue: diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Andrew Ash
After several days of debugging, we think the issue is that we have conflicting versions of Guava. Our application was running with Guava 14 and the Spark services (Master, Workers, Executors) had Guava 16. We had custom Kryo serializers for Guava's ImmutableLists, and commenting out those

Re: Compiling Spark master (284771ef) with sbt/sbt assembly fails on EC2

2014-08-01 Thread Shivaram Venkataraman
Thanks Patrick -- It does look like some maven misconfiguration as wget http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.10.2/scala-library-2.10.2.pom works for me. Shivaram On Fri, Aug 1, 2014 at 3:27 PM, Patrick Wendell pwend...@gmail.com wrote: This is a Scala bug - I filed

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread Mayur Rustagi
Interesting, clickstream data would have its own window concept based on session of User , I can imagine windows would change across streams but wouldnt they large be domain specific in Nature? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
Actually for click stream, the users space wouldn't be a continuum, unless the order of users is important or the fact that they are coming in a kind of order can be used by the algo. The purpose of the break or binning function is to package things in a cluster for which we know the properties,

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Colin McCabe
On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash and...@andrewash.com wrote: After several days of debugging, we think the issue is that we have conflicting versions of Guava. Our application was running with Guava 14 and the Spark services (Master, Workers, Executors) had Guava 16. We had custom

SparkContext.hadoopConfiguration vs. SparkHadoopUtil.newConfiguration()

2014-08-01 Thread Marcelo Vanzin
Hi all, While working on some seemingly unrelated code, I ran into this issue where spark.hadoop.* configs were not making it to the Configuration objects in some parts of the code. I was trying to do that to avoid having to do dirty ticks with the classpath while running tests, but that's a

Re: How to run specific sparkSQL test with maven

2014-08-01 Thread Cheng Lian
It’s also useful to set hive.exec.mode.local.auto to true to accelerate the test. ​ On Sat, Aug 2, 2014 at 1:36 AM, Michael Armbrust mich...@databricks.com wrote: It seems that the HiveCompatibilitySuite need a hadoop and hive environment, am I right? Relative path in absolute URI: