RE: Tableau + Spark SQL Thrift Server + Cassandra

Mohammed Guller Mon, 06 Apr 2015 17:51:38 -0700

Sure, will do. I may not be able to get to it until next week, but will let you 
know if I am able to the crack the code.

Mohammed

From: Todd Nist [mailto:tsind...@gmail.com]
Sent: Friday, April 3, 2015 5:52 PM
To: Mohammed Guller
Cc: pawan kumar; user@spark.apache.org
Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra

Thanks Mohammed,

I was aware of Calliope, but haven't used it since with since the 
spark-cassandra-connector project got released.  I was not aware of the 
CalliopeServer2; cool thanks for sharing that one.

I would appreciate it if you could lmk how you decide to proceed with this; I 
can see this coming up on my radar in the next few months; thanks.

-Todd

On Fri, Apr 3, 2015 at 5:53 PM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Thanks, Todd.

It is an interesting idea; worth trying.

I think the cash project is old. The tuplejump guy has created another project 
called CalliopeServer2, which works like a charm with BI tools that use JDBC, 
but unfortunately Tableau throws an error when it connects to it.

Mohammed

From: Todd Nist [mailto:tsind...@gmail.com<mailto:tsind...@gmail.com>]
Sent: Friday, April 3, 2015 11:39 AM
To: pawan kumar
Cc: Mohammed Guller; user@spark.apache.org<mailto:user@spark.apache.org>

Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra

Hi Mohammed,

Not sure if you have tried this or not.  You could try using the below api to 
start the thriftserver with an existing context.

https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42

The one thing that Michael Ambrust @ databrick recommended was this:
You can start a JDBC server with an existing context.  See my answer here: 
http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html

So something like this based on example from Cheng Lian:

Server

import  org.apache.spark.sql.hive.HiveContext

import  org.apache.spark.sql.catalyst.types._

val  sparkContext  =  sc

import  sparkContext._

val  sqlContext  =  new  HiveContext(sparkContext)

import  sqlContext._

makeRDD((1,"hello") :: (2,"world") 
::Nil).toSchemaRDD.cache().registerTempTable("t")

// replace the above with the C* + spark-casandra-connectore to generate 
SchemaRDD and registerTempTable

import  org.apache.spark.sql.hive.thriftserver._

HiveThriftServer2.startWithContext(sqlContext)
Then Startup

./bin/beeline -u jdbc:hive2://localhost:10000/default

0: jdbc:hive2://localhost:10000/default> select * from t;

I have not tried this yet from Tableau.   My understanding is that the 
tempTable is only valid as long as the sqlContext is, so if one terminates the 
code representing the Server, and then restarts the standard thrift server, 
sbin/start-thriftserver ..., the table won't be available.

Another possibility is to perhaps use the tuplejump cash project, 
https://github.com/tuplejump/cash.

HTH.

-Todd

On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar 
<pkv...@gmail.com<mailto:pkv...@gmail.com>> wrote:

Thanks mohammed. Will give it a try today. We would also need the sparksSQL 
piece as we are migrating our data store from oracle to C* and it would be 
easier to maintain all the reports rather recreating each one from scratch.

Thanks,
Pawan Venugopal.
On Apr 3, 2015 7:59 AM, "Mohammed Guller" 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Hi Todd,

We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* 
using the ODBC driver, but now would like to add Spark SQL to the mix. I 
haven’t been able to find any documentation for how to make this combination 
work.

We are using the Spark-Cassandra-Connector in our applications, but haven’t 
been able to figure out how to get the Spark SQL Thrift Server to use it and 
connect to C*. That is the missing piece. Once we solve that piece of the 
puzzle then Tableau should be able to see the tables in C*.

Hi Pawan,
Tableau + C* is pretty straight forward, especially if you are using DSE. 
Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you 
connect, Tableau allows to use C* keyspace as schema and column families as 
tables.

Mohammed

From: pawan kumar [mailto:pkv...@gmail.com<mailto:pkv...@gmail.com>]
Sent: Friday, April 3, 2015 7:41 AM
To: Todd Nist
Cc: user@spark.apache.org<mailto:user@spark.apache.org>; Mohammed Guller
Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra

Hi Todd,

Thanks for the link. I would be interested in this solution. I am using DSE for 
cassandra. Would you provide me with info on connecting with DSE either through 
Tableau or zeppelin. The goal here is query cassandra through spark sql so that 
I could perform joins and groupby on my queries. Are you able to perform spark 
sql queries with tableau?

Thanks,
Pawan Venugopal
On Apr 3, 2015 5:03 AM, "Todd Nist" 
<tsind...@gmail.com<mailto:tsind...@gmail.com>> wrote:
What version of Cassandra are you using?  Are you using DSE or the stock Apache 
Cassandra version?  I have connected it with DSE, but have not attempted it 
with the standard Apache Cassandra version.

FWIW, 
http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise,
 provides an ODBC driver tor accessing C* from Tableau.  Granted it does not 
provide all the goodness of Spark.  Are you attempting to leverage the 
spark-cassandra-connector for this?

On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Hi –

Is anybody using Tableau to analyze data in Cassandra through the Spark SQL 
Thrift Server?

Thanks!

Mohammed

RE: Tableau + Spark SQL Thrift Server + Cassandra

Reply via email to