[ https://issues.apache.org/jira/browse/SPARK-38327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498677#comment-17498677 ]
Hyukjin Kwon commented on SPARK-38327: -------------------------------------- I think it needs a MariaDB dialect that implements https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala > JDBC Source with MariaDB connection returns column names as values > ------------------------------------------------------------------ > > Key: SPARK-38327 > URL: https://issues.apache.org/jira/browse/SPARK-38327 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.1 > Environment: MariaDB version 10.3.10 > Running with spark-k8s-operator > Reporter: Marvin Rösch > Priority: Minor > > Using a JDBC source with the official MariaDB JDBC driver and a JDBC > connection URL like the following does not work as expected: > {noformat} > jdbc:mariadb://db.example.com:3306/schema {noformat} > Assume we have a table "values" like the following in MariaDB: > ||id (binary)||name (varchar)|| > |0xAB|Name 1| > |0xBC|Name 2| > We intend to create and display a data frame from it like this: > {code:scala} > spark.read > .format("jdbc") > .option("url", "jdbc:mariadb://db.example.com:3306/schema") > .option("dbtable", "values") > .load() > .show{code} > *Expected Behavior* > Using such a connection URL on an arbitrary MariaDB table or query results in > a data frame that reflects the table structure and content from MariaDB > correctly, with columns having the correct type and values. > The output of the above should be > {noformat} > +----+------+ > | id| name| > +----+------+ > |[AB]|Name 1| > |[BC]|Name 2| > +----+------+{noformat} > *Observed Behavior* > Result rows contain column names as values, making them effectively useless > to work with. > The actual output is > {noformat} > +-------+----+ > | id|name| > +-------+----+ > |[69 64]|name| > |[69 64]|name| > +-------+----+{noformat} > *Further information* > An easy workaround appears to be specifying "mysql" instead of "mariadb" in > the connection URL while explicitly specifying the MariaDB driver. I'd expect > the mariadb URL to work out of the box, however. > It looks like this has been an issue since at least 2016 according to a > [StackOverflow > post|https://stackoverflow.com/questions/38808463/incorrect-data-while-loading-jdbc-table-in-spark-sql]. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org