[jira] [Commented] (SPARK-38327) JDBC Source with MariaDB connection returns column names as values

Hyukjin Kwon (Jira) Sun, 27 Feb 2022 17:31:04 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-38327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498677#comment-17498677
 ]


Hyukjin Kwon commented on SPARK-38327:
--------------------------------------

I think it needs a MariaDB dialect that implements 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

> JDBC Source with MariaDB connection returns column names as values
> ------------------------------------------------------------------
>
>                 Key: SPARK-38327
>                 URL: https://issues.apache.org/jira/browse/SPARK-38327
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>         Environment: MariaDB version 10.3.10
> Running with spark-k8s-operator
>            Reporter: Marvin Rösch
>            Priority: Minor
>
> Using a JDBC source with the official MariaDB JDBC driver and a JDBC 
> connection URL like the following does not work as expected:
> {noformat}
> jdbc:mariadb://db.example.com:3306/schema {noformat}
> Assume we have a table "values" like the following in MariaDB:
> ||id (binary)||name (varchar)||
> |0xAB|Name 1|
> |0xBC|Name 2|
> We intend to create and display a data frame from it like this:
> {code:scala}
> spark.read
>   .format("jdbc")
>   .option("url", "jdbc:mariadb://db.example.com:3306/schema")
>   .option("dbtable", "values")
>   .load()
>   .show{code}
> *Expected Behavior*
> Using such a connection URL on an arbitrary MariaDB table or query results in 
> a data frame that reflects the table structure and content from MariaDB 
> correctly, with columns having the correct type and values.
> The output of the above should be
> {noformat}
> +----+------+
> |  id|  name|
> +----+------+
> |[AB]|Name 1|
> |[BC]|Name 2|
> +----+------+{noformat}
> *Observed Behavior*
> Result rows contain column names as values, making them effectively useless 
> to work with.
> The actual output is
> {noformat}
> +-------+----+
> |     id|name|
> +-------+----+
> |[69 64]|name|
> |[69 64]|name|
> +-------+----+{noformat}
> *Further information*
> An easy workaround appears to be specifying "mysql" instead of "mariadb" in 
> the connection URL while explicitly specifying the MariaDB driver. I'd expect 
> the mariadb URL to work out of the box, however.
> It looks like this has been an issue since at least 2016 according to a 
> [StackOverflow 
> post|https://stackoverflow.com/questions/38808463/incorrect-data-while-loading-jdbc-table-in-spark-sql].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38327) JDBC Source with MariaDB connection returns column names as values

Reply via email to