Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

Benjamin Kim Sat, 26 Dec 2015 13:14:14 -0800

Chris,

I have a question about your setup. Does it allow the same usage of 
Cassandra/HBase data sources? Can I create a table that links to and be used by 
Spark SQL? The reason for asking is that I see the Cassandra connector package 
included in your script.


Thanks,
Ben

> On Dec 25, 2015, at 6:41 AM, Chris Fregly <ch...@fregly.com> wrote:
> 
> Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs 
> to be on the Java System Classpath per this 
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting>
>  troubleshooting section in the Spark SQL programming guide.
> 
> Here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/bin/start-hive-thriftserver.sh>
>  is an example hive-thrift-server start script from my Spark-based reference 
> pipeline project.  Here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipeline-spark-sql.sh>
>  is an example script that decorates the out-of-the-box spark-sql command to 
> use the MySQL JDBC driver.
> 
> These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined 
> here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L144>
>  and here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L87>
>  and includes the path to the local MySQL JDBC driver.  This approach is 
> described here 
> <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
>  in the Spark docs that describe the advanced spark-submit options.  
> 
> Any jar specified with --jars will be passed to each worker node in the 
> cluster - specifically in the work directory for each SparkContext for 
> isolation purposes.
> 
> Cleanup of these jars on the worker nodes is handled by YARN automatically, 
> and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.
> 
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, 
> but I couldn't get this to work for whatever reason, so i'm sticking to the 
> --jars approach used in my examples.
> 
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Stephen,
> 
> Let me confirm. I just need to propagate these settings I put in 
> spark-defaults.conf to all the worker nodes? Do I need to do the same with 
> the PostgreSQL driver jar file too? If so, is there a way to have it read 
> from HDFS rather than copying out to the cluster manually. 
> 
> Thanks for your help,
> Ben
> 
> 
> On Tuesday, December 22, 2015, Stephen Boesch <java...@gmail.com 
> <mailto:java...@gmail.com>> wrote:
> HI Benjamin,  yes by adding to the thrift server then the create table would 
> work.  But querying is performed by the workers: so you need to add to the 
> classpath of all nodes for reads to work.
> 
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com <>>:
> Hi Stephen,
> 
> I forgot to mention that I added these lines below to the spark-default.conf 
> on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
> restarted it.
> 
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> 
> I read in another thread that this would work. I was able to create the table 
> and could see it in my SHOW TABLES list. But, when I try to query the table, 
> I get the same error. It looks like I’m getting close.
> 
> Are there any other things that I have to do that you can think of?
> 
> Thanks,
> Ben
> 
> 
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <java...@gmail.com <>> wrote:
>> 
>> The postgres jdbc driver needs to be added to the  classpath of your spark 
>> workers.  You can do a search for how to do that (multiple ways).
>> 
>> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuil...@gmail.com <>>:
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>> 
>> CREATE TEMPORARY TABLE <table_name>
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>> 
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name> (state=,code=0)
>> 
>> Can someone help me understand why this is?
>> 
>> Thanks, Ben
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>  
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html>
>> Sent from the Apache Spark User List mailing list archive at Nabble.com 
>> <http://nabble.com/>.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <>
>> For additional commands, e-mail: user-h...@spark.apache.org <>
>> 
>> 
> 
> 
> 
> 
> 
> -- 
> 
> Chris Fregly
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Francisco, CA
> http://spark.tc <http://spark.tc/> | http://advancedspark.com 
> <http://advancedspark.com/>

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

Reply via email to