RE: Tableau + Spark SQL Thrift Server + Cassandra
Sure, will do. I may not be able to get to it until next week, but will let you know if I am able to the crack the code. Mohammed From: Todd Nist [mailto:tsind...@gmail.com] Sent: Friday, April 3, 2015 5:52 PM To: Mohammed Guller Cc: pawan kumar; user@spark.apache.org Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra Thanks Mohammed, I was aware of Calliope, but haven't used it since with since the spark-cassandra-connector project got released. I was not aware of the CalliopeServer2; cool thanks for sharing that one. I would appreciate it if you could lmk how you decide to proceed with this; I can see this coming up on my radar in the next few months; thanks. -Todd On Fri, Apr 3, 2015 at 5:53 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Thanks, Todd. It is an interesting idea; worth trying. I think the cash project is old. The tuplejump guy has created another project called CalliopeServer2, which works like a charm with BI tools that use JDBC, but unfortunately Tableau throws an error when it connects to it. Mohammed From: Todd Nist [mailto:tsind...@gmail.commailto:tsind...@gmail.com] Sent: Friday, April 3, 2015 11:39 AM To: pawan kumar Cc: Mohammed Guller; user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: Server import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the Server, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.commailto:pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed From: pawan kumar [mailto:pkv...@gmail.commailto:pkv...@gmail.com] Sent: Friday, April 3, 2015 7:41 AM To: Todd Nist Cc: user@spark.apache.orgmailto:user@spark.apache.org; Mohammed Guller Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.commailto:tsind...@gmail.com wrote: What version of Cassandra
Re: Tableau + Spark SQL Thrift Server + Cassandra
What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
RE: Tableau + Spark SQL Thrift Server + Cassandra
Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
Re: Tableau + Spark SQL Thrift Server + Cassandra
Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
RE: Tableau + Spark SQL Thrift Server + Cassandra
Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed From: pawan kumar [mailto:pkv...@gmail.com] Sent: Friday, April 3, 2015 7:41 AM To: Todd Nist Cc: user@spark.apache.org; Mohammed Guller Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.commailto:tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
Re: Tableau + Spark SQL Thrift Server + Cassandra
Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: *Server* import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the *Server*, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
Re: Tableau + Spark SQL Thrift Server + Cassandra
@Pawan Not sure if you have seen this or not, but here is a good example by Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding Tableau is as simple as Mohammed stated with DSE. https://github.com/jlacefie/sparksqltest. HTH, Todd On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist tsind...@gmail.com wrote: Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: *Server* import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the *Server*, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
Re: Tableau + Spark SQL Thrift Server + Cassandra
@Pawan, So it's been a couple of months since I have had a chance to do anything with Zeppelin, but here is a link to a post on what I did to get it working https://groups.google.com/forum/#!topic/zeppelin-developers/mCNdyOXNikI. This may or may not work with the newer releases from Zeppelin. -Todd On Fri, Apr 3, 2015 at 3:02 PM, pawan kumar pkv...@gmail.com wrote: Hi Todd, Thanks for the help. So i was able to get the DSE working with tableau as per the link provided by Mohammed. Now i trying to figure out if i could write sparksql queries from tableau and get data from DSE. My end goal is to get a web based tool where i could write sql queries which will pull data from cassandra. With Zeppelin I was able to build and run it in EC2 but not sure if configurations are right. I am pointing to a spark master which is a remote DSE node and all spark and sparksql dependencies are in the remote node. I am not sure if i need to install spark and its dependencies in the webui (zepplene) node. I am not sure talking about zepplelin in this thread is right. Thanks once again for all the help. Thanks, Pawan Venugopal On Fri, Apr 3, 2015 at 11:48 AM, Todd Nist tsind...@gmail.com wrote: @Pawan Not sure if you have seen this or not, but here is a good example by Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding Tableau is as simple as Mohammed stated with DSE. https://github.com/jlacefie/sparksqltest. HTH, Todd On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist tsind...@gmail.com wrote: Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: *Server* import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the *Server*, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql
Re: Tableau + Spark SQL Thrift Server + Cassandra
Hi Todd, Thanks for the help. So i was able to get the DSE working with tableau as per the link provided by Mohammed. Now i trying to figure out if i could write sparksql queries from tableau and get data from DSE. My end goal is to get a web based tool where i could write sql queries which will pull data from cassandra. With Zeppelin I was able to build and run it in EC2 but not sure if configurations are right. I am pointing to a spark master which is a remote DSE node and all spark and sparksql dependencies are in the remote node. I am not sure if i need to install spark and its dependencies in the webui (zepplene) node. I am not sure talking about zepplelin in this thread is right. Thanks once again for all the help. Thanks, Pawan Venugopal On Fri, Apr 3, 2015 at 11:48 AM, Todd Nist tsind...@gmail.com wrote: @Pawan Not sure if you have seen this or not, but here is a good example by Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding Tableau is as simple as Mohammed stated with DSE. https://github.com/jlacefie/sparksqltest. HTH, Todd On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist tsind...@gmail.com wrote: Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: *Server* import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the *Server*, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector
RE: Tableau + Spark SQL Thrift Server + Cassandra
Thanks, Todd. It is an interesting idea; worth trying. I think the cash project is old. The tuplejump guy has created another project called CalliopeServer2, which works like a charm with BI tools that use JDBC, but unfortunately Tableau throws an error when it connects to it. Mohammed From: Todd Nist [mailto:tsind...@gmail.com] Sent: Friday, April 3, 2015 11:39 AM To: pawan kumar Cc: Mohammed Guller; user@spark.apache.org Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: Server import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the Server, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.commailto:pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed From: pawan kumar [mailto:pkv...@gmail.commailto:pkv...@gmail.com] Sent: Friday, April 3, 2015 7:41 AM To: Todd Nist Cc: user@spark.apache.orgmailto:user@spark.apache.org; Mohammed Guller Subject: Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.commailto:tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
Re: Tableau + Spark SQL Thrift Server + Cassandra
. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting to leverage the spark-cassandra-connector for this? On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller moham...@glassbeam.com wrote: Hi – Is anybody using Tableau to analyze data in Cassandra through the Spark SQL Thrift Server? Thanks! Mohammed
Re: Tableau + Spark SQL Thrift Server + Cassandra
Thanks Mohammed, I was aware of Calliope, but haven't used it since with since the spark-cassandra-connector project got released. I was not aware of the CalliopeServer2; cool thanks for sharing that one. I would appreciate it if you could lmk how you decide to proceed with this; I can see this coming up on my radar in the next few months; thanks. -Todd On Fri, Apr 3, 2015 at 5:53 PM, Mohammed Guller moham...@glassbeam.com wrote: Thanks, Todd. It is an interesting idea; worth trying. I think the cash project is old. The tuplejump guy has created another project called CalliopeServer2, which works like a charm with BI tools that use JDBC, but unfortunately Tableau throws an error when it connects to it. Mohammed *From:* Todd Nist [mailto:tsind...@gmail.com] *Sent:* Friday, April 3, 2015 11:39 AM *To:* pawan kumar *Cc:* Mohammed Guller; user@spark.apache.org *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Mohammed, Not sure if you have tried this or not. You could try using the below api to start the thriftserver with an existing context. https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42 The one thing that Michael Ambrust @ databrick recommended was this: You can start a JDBC server with an existing context. See my answer here: http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html So something like this based on example from Cheng Lian: * Server* import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.catalyst.types._ val sparkContext = sc import sparkContext._ val sqlContext = new HiveContext(sparkContext) import sqlContext._ makeRDD((1,hello) :: (2,world) ::Nil).toSchemaRDD.cache().registerTempTable(t) // replace the above with the C* + spark-casandra-connectore to generate SchemaRDD and registerTempTable import org.apache.spark.sql.hive.thriftserver._ HiveThriftServer2.startWithContext(sqlContext) Then Startup ./bin/beeline -u jdbc:hive2://localhost:1/default 0: jdbc:hive2://localhost:1/default select * from t; I have not tried this yet from Tableau. My understanding is that the tempTable is only valid as long as the sqlContext is, so if one terminates the code representing the *Server*, and then restarts the standard thrift server, sbin/start-thriftserver ..., the table won't be available. Another possibility is to perhaps use the tuplejump cash project, https://github.com/tuplejump/cash. HTH. -Todd On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar pkv...@gmail.com wrote: Thanks mohammed. Will give it a try today. We would also need the sparksSQL piece as we are migrating our data store from oracle to C* and it would be easier to maintain all the reports rather recreating each one from scratch. Thanks, Pawan Venugopal. On Apr 3, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Todd, We are using Apache C* 2.1.3, not DSE. We got Tableau to work directly with C* using the ODBC driver, but now would like to add Spark SQL to the mix. I haven’t been able to find any documentation for how to make this combination work. We are using the Spark-Cassandra-Connector in our applications, but haven’t been able to figure out how to get the Spark SQL Thrift Server to use it and connect to C*. That is the missing piece. Once we solve that piece of the puzzle then Tableau should be able to see the tables in C*. Hi Pawan, Tableau + C* is pretty straight forward, especially if you are using DSE. Create a new DSN in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed Guller *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra Hi Todd, Thanks for the link. I would be interested in this solution. I am using DSE for cassandra. Would you provide me with info on connecting with DSE either through Tableau or zeppelin. The goal here is query cassandra through spark sql so that I could perform joins and groupby on my queries. Are you able to perform spark sql queries with tableau? Thanks, Pawan Venugopal On Apr 3, 2015 5:03 AM, Todd Nist tsind...@gmail.com wrote: What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW, http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise, provides an ODBC driver tor accessing C* from Tableau. Granted it does not provide all the goodness of Spark. Are you attempting