In-Ho Yi created SPARK-40108:
--------------------------------

             Summary: JDBC connection to Hive Metastore fails without first 
calling any .jdbc call
                 Key: SPARK-40108
                 URL: https://issues.apache.org/jira/browse/SPARK-40108
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.3.0
         Environment: PySpark==3.3.0
Java 11
            Reporter: In-Ho Yi


Tested on pyspark==3.3.0. When talking to hive metastore with MySQL backend, I 
installed MySQL driver with spark.jars.packages, alongside with other necessary 
settings:

ss = SparkSession.builder.master('local[*]')\
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.3," +
        
"org.apache.hadoop:hadoop-common:3.3.3,mysql:mysql-connector-java:8.0.30") \   
.config("spark.executor.memory", "10g") \
    .config("spark.driver.memory", "10g") \
    .config("spark.memory.offHeap.enabled","true") \
    .config("spark.memory.offHeap.size","32g")  \
    .config("spark.hadoop.javax.jdo.option.ConnectionURL", 
"jdbc:mysql://localhost:3306/hive") \
    .config("spark.hadoop.javax.jdo.option.ConnectionUserName", "yyyy") \
    .config("spark.hadoop.javax.jdo.option.ConnectionPassword", "xxxx") \
    .config("spark.hadoop.javax.jdo.option.ConnectionDriverName", 
"com.mysql.cj.jdbc.Driver") \
    .config("spark.sql.hive.metastore.sharedPrefixes", "com.mysql") \
    .config("spark.sql.warehouse.dir", "s3://xxxx-yyyy/") \
    .enableHiveSupport() \
    .appName("hms_test").config(conf=conf).getOrCreate()

Now, if I just do: ss.sql("SHOW DATABASES;").show() I get a lot of errors, 
saying:

Unable to open a test connection to the given database. JDBC url = 
jdbc:mysql://localhost:3306/hive, username = yyyy. Terminating connection pool 
(set lazyInit to true if you expect to start your database after your app). 
Original Exception: ------
java.sql.SQLException: No suitable driver found for 
jdbc:mysql://localhost:3306/hive
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
    at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
    at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
...

However, if I do any "jdbc" read, even if the call ends up in an error, then 
the call to Hive Metastore seem to succeed without any issue:

try:
    _ = ss.read.format("jdbc") \
        .option("url", "jdbc:mysql://localhost:3306/hive") \
        .option("query", "SHOW TABLES;") \
        .option("driver", "com.mysql.cj.jdbc.Driver").load()
except:
    pass

ss.sql("SHOW DATABASES;").show() # this now works fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to