Can we make the JDBCDialect a public API that users can plugin? It looks like an end-less job to make sure Spark JDBC source supports all databases.
On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <[email protected]> wrote: > You can follow how we test the other JDBC dialects. All JDBC dialects > require the docker integration tests. > https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc > > > On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <[email protected]> > wrote: > >> Hi, to answer both questions raised: >> >> >> >> Though Vertica is derived from Postgres, Vertica does not recognize type >> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently >> enough to cause issues. The major changes are to use type names and date >> format supported by Vertica. >> >> >> >> For testing, I have a SQL script plus Scala and PySpark scripts, but >> these require a Vertica database to connect, so automated testing on a >> build server wouldn’t work. It’s possible to include my test scripts and >> directions to run manually, but not sure where in the repo that would go. >> If automated testing is required, I can ask our engineers whether there >> exists something like a mockito that could be included. >> >> >> >> Thanks, Bryan H >> >> >> >> *From:* Xiao Li [mailto:[email protected]] >> *Sent:* Wednesday, December 11, 2019 10:13 AM >> *To:* Sean Owen <[email protected]> >> *Cc:* Bryan Herger <[email protected]>; [email protected] >> *Subject:* Re: I would like to add JDBCDialect to support Vertica >> database >> >> >> >> How can the dev community test it? >> >> >> >> Xiao >> >> >> >> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <[email protected]> wrote: >> >> It's probably OK, IMHO. The overhead of another dialect is small. Are >> there differences that require a new dialect? I assume so and might >> just be useful to summarize them if you open a PR. >> >> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger >> <[email protected]> wrote: >> > >> > Hi, I am a Vertica support engineer, and we have open support requests >> around NULL values and SQL type conversion with DataFrame read/write over >> JDBC when connecting to a Vertica database. The stack traces point to >> issues with the generic JDBCDialect in Spark-SQL. >> > >> > I saw that other vendors (Teradata, DB2...) have contributed a >> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect >> for Vertica. >> > >> > The changeset is on my fork of apache/spark at >> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d >> > >> > I have tested this against Vertica 9.3 and found that this changeset >> addresses both issues reported to us (issue with NULL values - setNull() - >> for valid java.sql.Types, and String to VARCHAR conversion) >> > >> > Is the an acceptable change? If so, how should I go about submitting a >> pull request? >> > >> > Thanks, Bryan Herger >> > Vertica Solution Engineer >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: [email protected] >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: [email protected] >> >> -- >> >> [image: Databricks Summit - Watch the talks] >> <https://databricks.com/sparkaisummit/north-america> >> > > > -- > [image: Databricks Summit - Watch the talks] > <https://databricks.com/sparkaisummit/north-america> >
