Marvin Rösch created SPARK-38327:
------------------------------------

             Summary: JDBC Source with MariaDB connection returns column names 
as values
                 Key: SPARK-38327
                 URL: https://issues.apache.org/jira/browse/SPARK-38327
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.1
         Environment: MariaDB version 10.3.10

Running with spark-k8s-operator
            Reporter: Marvin Rösch


Using a JDBC source with the official MariaDB JDBC driver and a JDBC connection 
URL like the following does not work as expected:
{noformat}
jdbc:mariadb://db.example.com:3306/schema {noformat}
Assume we have a table "values" like the following in MariaDB:
||id (binary)||name (varchar)||
|0xAB|Name 1|
|0xBC|Name 2|

We intend to create and display a data frame from it like this:
{code:scala}
spark.read
  .format("jdbc")
  .option("url", "jdbc:mariadb://db.example.com:3306/schema")
  .option("dbtable", "values")
  .load()
  .show{code}
*Expected Behavior*

Using such a connection URL on an arbitrary MariaDB table or query results in a 
data frame that reflects the table structure and content from MariaDB 
correctly, with columns having the correct type and values.

The output of the above should be
{noformat}
+----+------+
|  id|  name|
+----+------+
|[AB]|Name 1|
|[BC]|Name 2|
+----+------+{noformat}
*Observed Behavior*

Result rows contain column names as values, making them effectively useless to 
work with.

The actual output is
{noformat}
+-------+----+
|     id|name|
+-------+----+
|[69 64]|name|
|[69 64]|name|
+-------+----+{noformat}
*Further information*

An easy workaround appears to be specifying "mysql" instead of "mariadb" in the 
connection URL while explicitly specifying the MariaDB driver. I'd expect the 
mariadb URL to work out of the box, however.

It looks like this has been an issue since at least 2016 according to a 
[StackOverflow 
post|https://stackoverflow.com/questions/38808463/incorrect-data-while-loading-jdbc-table-in-spark-sql].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to