In-Ho Yi created SPARK-40108: -------------------------------- Summary: JDBC connection to Hive Metastore fails without first calling any .jdbc call Key: SPARK-40108 URL: https://issues.apache.org/jira/browse/SPARK-40108 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.3.0 Environment: PySpark==3.3.0 Java 11 Reporter: In-Ho Yi
Tested on pyspark==3.3.0. When talking to hive metastore with MySQL backend, I installed MySQL driver with spark.jars.packages, alongside with other necessary settings: ss = SparkSession.builder.master('local[*]')\ .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.3," + "org.apache.hadoop:hadoop-common:3.3.3,mysql:mysql-connector-java:8.0.30") \ .config("spark.executor.memory", "10g") \ .config("spark.driver.memory", "10g") \ .config("spark.memory.offHeap.enabled","true") \ .config("spark.memory.offHeap.size","32g") \ .config("spark.hadoop.javax.jdo.option.ConnectionURL", "jdbc:mysql://localhost:3306/hive") \ .config("spark.hadoop.javax.jdo.option.ConnectionUserName", "yyyy") \ .config("spark.hadoop.javax.jdo.option.ConnectionPassword", "xxxx") \ .config("spark.hadoop.javax.jdo.option.ConnectionDriverName", "com.mysql.cj.jdbc.Driver") \ .config("spark.sql.hive.metastore.sharedPrefixes", "com.mysql") \ .config("spark.sql.warehouse.dir", "s3://xxxx-yyyy/") \ .enableHiveSupport() \ .appName("hms_test").config(conf=conf).getOrCreate() Now, if I just do: ss.sql("SHOW DATABASES;").show() I get a lot of errors, saying: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost:3306/hive, username = yyyy. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------ java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/hive at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702) at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189) at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361) at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416) ... However, if I do any "jdbc" read, even if the call ends up in an error, then the call to Hive Metastore seem to succeed without any issue: try: _ = ss.read.format("jdbc") \ .option("url", "jdbc:mysql://localhost:3306/hive") \ .option("query", "SHOW TABLES;") \ .option("driver", "com.mysql.cj.jdbc.Driver").load() except: pass ss.sql("SHOW DATABASES;").show() # this now works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org