Chris, I have a question about your setup. Does it allow the same usage of Cassandra/HBase data sources? Can I create a table that links to and be used by Spark SQL? The reason for asking is that I see the Cassandra connector package included in your script.
Thanks, Ben > On Dec 25, 2015, at 6:41 AM, Chris Fregly <ch...@fregly.com> wrote: > > Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs > to be on the Java System Classpath per this > <http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting> > troubleshooting section in the Spark SQL programming guide. > > Here > <https://github.com/fluxcapacitor/pipeline/blob/master/bin/start-hive-thriftserver.sh> > is an example hive-thrift-server start script from my Spark-based reference > pipeline project. Here > <https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipeline-spark-sql.sh> > is an example script that decorates the out-of-the-box spark-sql command to > use the MySQL JDBC driver. > > These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined > here > <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L144> > and here > <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L87> > and includes the path to the local MySQL JDBC driver. This approach is > described here > <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management> > in the Spark docs that describe the advanced spark-submit options. > > Any jar specified with --jars will be passed to each worker node in the > cluster - specifically in the work directory for each SparkContext for > isolation purposes. > > Cleanup of these jars on the worker nodes is handled by YARN automatically, > and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param. > > The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, > but I couldn't get this to work for whatever reason, so i'm sticking to the > --jars approach used in my examples. > > On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Stephen, > > Let me confirm. I just need to propagate these settings I put in > spark-defaults.conf to all the worker nodes? Do I need to do the same with > the PostgreSQL driver jar file too? If so, is there a way to have it read > from HDFS rather than copying out to the cluster manually. > > Thanks for your help, > Ben > > > On Tuesday, December 22, 2015, Stephen Boesch <java...@gmail.com > <mailto:java...@gmail.com>> wrote: > HI Benjamin, yes by adding to the thrift server then the create table would > work. But querying is performed by the workers: so you need to add to the > classpath of all nodes for reads to work. > > 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com <>>: > Hi Stephen, > > I forgot to mention that I added these lines below to the spark-default.conf > on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I > restarted it. > > spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar > spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar > > I read in another thread that this would work. I was able to create the table > and could see it in my SHOW TABLES list. But, when I try to query the table, > I get the same error. It looks like I’m getting close. > > Are there any other things that I have to do that you can think of? > > Thanks, > Ben > > >> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <java...@gmail.com <>> wrote: >> >> The postgres jdbc driver needs to be added to the classpath of your spark >> workers. You can do a search for how to do that (multiple ways). >> >> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuil...@gmail.com <>>: >> I see in the Spark SQL documentation that a temporary table can be created >> directly onto a remote PostgreSQL table. >> >> CREATE TEMPORARY TABLE <table_name> >> USING org.apache.spark.sql.jdbc >> OPTIONS ( >> url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>", >> dbtable "impressions" >> ); >> When I run this against our PostgreSQL server, I get the following error. >> >> Error: java.sql.SQLException: No suitable driver found for >> jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> (state=,code=0) >> >> Can someone help me understand why this is? >> >> Thanks, Ben >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html >> >> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html> >> Sent from the Apache Spark User List mailing list archive at Nabble.com >> <http://nabble.com/>. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <> >> For additional commands, e-mail: user-h...@spark.apache.org <> >> >> > > > > > > -- > > Chris Fregly > Principal Data Solutions Engineer > IBM Spark Technology Center, San Francisco, CA > http://spark.tc <http://spark.tc/> | http://advancedspark.com > <http://advancedspark.com/>