Evaluating spark + Cassandra for our use cases

2015-08-18 Thread Benjamin Ross
My company is interested in building a real-time time-series querying solution using Spark and Cassandra. Specifically, we're interested in setting up a Spark system against Cassandra running a hive thrift server. We need to be able to perform real-time queries on time-series data - things

RE: Evaluating spark + Cassandra for our use cases

2015-08-18 Thread Benjamin Ross
From: Jörn Franke [jornfra...@gmail.com] Sent: Tuesday, August 18, 2015 4:14 PM To: Benjamin Ross; user@spark.apache.org Cc: Ron Gonzalez Subject: Re: Evaluating spark + Cassandra for our use cases Hi, First you need to make your SLA clear. It does not sound for me

RE: Is there any external dependencies for lag() and lead() when using data frames?

2015-08-11 Thread Benjamin Ross
Jerry, I was able to use window functions without the hive thrift server. HiveContext does not imply that you need the hive thrift server running. Here’s what I used to test this out: var conf = new SparkConf(true).set(spark.cassandra.connection.host, 127.0.0.1) val sc = new

RE: Is there any external dependencies for lag() and lead() when using data frames?

2015-08-11 Thread Benjamin Ross
I forgot to mention, my setup was: - Spark 1.4.1 running in standalone mode - Datastax spark cassandra connector 1.4.0-M1 - Cassandra DB - Scala version 2.10.4 From: Benjamin Ross Sent: Tuesday, August 11, 2015 10:16 AM To: Jerry; Michael Armbrust Cc: user

How to run start-thrift-server in debug mode?

2015-08-07 Thread Benjamin Ross
Hi, I'm trying to run the hive thrift server in debug mode. I've tried to simply pass -Xdebug -Xrunjdwp:transport=dt_socket,address=127.0.0.1:,server=y,suspend=n to start-thriftserver.sh as a driver option, but it doesn't seem to host a server. I've then tried to edit the various shell

Failed to load class for data source: org.apache.spark.sql.cassandra

2015-07-30 Thread Benjamin Ross
Hey all, I'm running what should be a very straight-forward application of the Cassandra sql connector, and I'm getting an error: Exception in thread main java.lang.RuntimeException: Failed to load class for data source: org.apache.spark.sql.cassandra at

RE: Failed to load class for data source: org.apache.spark.sql.cassandra

2015-07-30 Thread Benjamin Ross
, Ben From: Benjamin Ross Sent: Thursday, July 30, 2015 3:45 PM To: user@spark.apache.org Subject: Failed to load class for data source: org.apache.spark.sql.cassandra Hey all, I'm running what should be a very straight-forward application of the Cassandra sql connector, and I'm getting an error

RE: Failed to load class for data source: org.apache.spark.sql.cassandra

2015-07-30 Thread Benjamin Ross
If anyone's curious, the issue here is that I was using the 1.2.4 connector of the datastax spark Cassandra connector, rather than the 1.4.0-M1 pre-release. 1.2.4 doesn't fully support data frames, and it's presumably still only experimental in 1.4.0-M1. Ben From: Benjamin Ross Sent

NoClassDefFoundError: scala/collection/GenTraversableOnce$class

2015-07-29 Thread Benjamin Ross
Hello all, I'm new to both spark and scala, and am running into an annoying error attempting to prototype some spark functionality. From forums I've read online, this error should only present itself if there's a version mismatch between the version of scala used to compile spark and the scala

RE: NoClassDefFoundError: scala/collection/GenTraversableOnce$class

2015-07-29 Thread Benjamin Ross
] | \- org.scala-lang:scala-reflect:jar:2.10.5:compile [INFO] +- com.datastax.spark:spark-cassandra-connector-java_2.10:jar:1.2.4:compile [INFO] +- commons-codec:commons-codec:jar:1.4:compile Ben From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, July 29, 2015 8:30 PM To: Benjamin Ross Cc: user