RE: I would like to add JDBCDialect to support Vertica database

Bryan Herger Wed, 11 Dec 2019 07:58:45 -0800

It kind of already is.  I was able to build the VerticaDialect as a sort of 
plugin as follows:

Check out apache/spark tree
Copy in VerticaDialect.scala
Build with “mvn -DskipTests compile”
package the compiled class plus companion object into a JAR
Copy JAR to jars folder in Spark binary installation (optional, probably can 
set path in an extra --jars argument instead)

Then run the following test in spark-shell after creating Vertica table and 
sample data:

org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect)
val jdbcDF = spark.read.format("jdbc").option("url", 
"jdbc:vertica://hpbox:5433/docker").option("dbtable", 
"test_alltypes").option("user", "dbadmin").option("password", 
"Vertica1!").load()
jdbcDF.show()
jdbcDF.write.mode("append").format("jdbc").option("url", 
"jdbc:vertica://hpbox:5433/docker").option("dbtable", 
"test_alltypes").option("user", "dbadmin").option("password", 
"Vertica1!").save()
JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect)

If it would be preferable to write documentation describing the above, I can do 
that instead.  The hard part is checking out the matching apache/spark tree 
then copying to the Spark cluster – I can install master branch and latest 
binary and apply patches since I have root on all my test boxes, but customers 
may not be able to.  Still, this provides another route to support new JDBC 
dialects.

BryanH

From: Wenchen Fan [mailto:[email protected]]
Sent: Wednesday, December 11, 2019 10:48 AM
To: Xiao Li <[email protected]>
Cc: Bryan Herger <[email protected]>; Sean Owen <[email protected]>; 
[email protected]
Subject: Re: I would like to add JDBCDialect to support Vertica database

Can we make the JDBCDialect a public API that users can plugin? It looks like 
an end-less job to make sure Spark JDBC source supports all databases.

On Wed, Dec 11, 2019 at 11:41 PM Xiao Li 
<[email protected]<mailto:[email protected]>> wrote:
You can follow how we test the other JDBC dialects. All JDBC dialects require 
the docker integration tests. 
https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc

On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger 
<[email protected]<mailto:[email protected]>> wrote:
Hi, to answer both questions raised:

Though Vertica is derived from Postgres, Vertica does not recognize type names 
TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently enough to 
cause issues.  The major changes are to use type names and date format 
supported by Vertica.

For testing, I have a SQL script plus Scala and PySpark scripts, but these 
require a Vertica database to connect, so automated testing on a build server 
wouldn’t work.  It’s possible to include my test scripts and directions to run 
manually, but not sure where in the repo that would go.  If automated testing 
is required, I can ask our engineers whether there exists something like a 
mockito that could be included.

Thanks, Bryan H

From: Xiao Li [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, December 11, 2019 10:13 AM
To: Sean Owen <[email protected]<mailto:[email protected]>>
Cc: Bryan Herger 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: I would like to add JDBCDialect to support Vertica database

How can the dev community test it?

Xiao

On Wed, Dec 11, 2019 at 6:52 AM Sean Owen 
<[email protected]<mailto:[email protected]>> wrote:
It's probably OK, IMHO. The overhead of another dialect is small. Are
there differences that require a new dialect? I assume so and might
just be useful to summarize them if you open a PR.

On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
<[email protected]<mailto:[email protected]>> wrote:
>
> Hi, I am a Vertica support engineer, and we have open support requests around 
> NULL values and SQL type conversion with DataFrame read/write over JDBC when 
> connecting to a Vertica database.  The stack traces point to issues with the 
> generic JDBCDialect in Spark-SQL.
>
> I saw that other vendors (Teradata, DB2...) have contributed a JDBCDialect 
> class to address JDBC compatibility, so I wrote up a dialect for Vertica.
>
> The changeset is on my fork of apache/spark at 
> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>
> I have tested this against Vertica 9.3 and found that this changeset 
> addresses both issues reported to us (issue with NULL values - setNull() - 
> for valid java.sql.Types, and String to VARCHAR conversion)
>
> Is the an acceptable change?  If so, how should I go about submitting a pull 
> request?
>
> Thanks, Bryan Herger
> Vertica Solution Engineer
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: 
> [email protected]<mailto:[email protected]>
>

---------------------------------------------------------------------
To unsubscribe e-mail: 
[email protected]<mailto:[email protected]>
--
[Databricks Summit - Watch the 
talks]<https://databricks.com/sparkaisummit/north-america>

--
[Databricks Summit - Watch the 
talks]<https://databricks.com/sparkaisummit/north-america>

RE: I would like to add JDBCDialect to support Vertica database

Reply via email to