Not sure, too. Can't you use Spark Packages for your scenario? https://spark-packages.org/
On Thu, Dec 12, 2019 at 9:46 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > I am not so sure about it too. I think it is enough to expose JDBCDialect > as an API (which seems already is). > It brings some overhead to dev (e.g., to test and review PRs related to > another third party). > Such third party integration might better exist as a third party library > without a strong reason. > > 2019년 12월 12일 (목) 오전 12:58, Bryan Herger <bryan.her...@microfocus.com>님이 > 작성: > >> It kind of already is. I was able to build the VerticaDialect as a sort >> of plugin as follows: >> >> >> >> Check out apache/spark tree >> >> Copy in VerticaDialect.scala >> >> Build with “mvn -DskipTests compile” >> >> package the compiled class plus companion object into a JAR >> >> Copy JAR to jars folder in Spark binary installation (optional, probably >> can set path in an extra --jars argument instead) >> >> >> >> Then run the following test in spark-shell after creating Vertica table >> and sample data: >> >> >> >> >> org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect) >> >> val jdbcDF = spark.read.format("jdbc").option("url", >> "jdbc:vertica://hpbox:5433/docker").option("dbtable", >> "test_alltypes").option("user", "dbadmin").option("password", >> "Vertica1!").load() >> >> jdbcDF.show() >> >> jdbcDF.write.mode("append").format("jdbc").option("url", >> "jdbc:vertica://hpbox:5433/docker").option("dbtable", >> "test_alltypes").option("user", "dbadmin").option("password", >> "Vertica1!").save() >> >> JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect) >> >> >> >> If it would be preferable to write documentation describing the above, I >> can do that instead. The hard part is checking out the matching >> apache/spark tree then copying to the Spark cluster – I can install master >> branch and latest binary and apply patches since I have root on all my test >> boxes, but customers may not be able to. Still, this provides another >> route to support new JDBC dialects. >> >> >> >> BryanH >> >> >> >> *From:* Wenchen Fan [mailto:cloud0...@gmail.com] >> *Sent:* Wednesday, December 11, 2019 10:48 AM >> *To:* Xiao Li <lix...@databricks.com> >> *Cc:* Bryan Herger <bryan.her...@microfocus.com>; Sean Owen < >> sro...@gmail.com>; dev@spark.apache.org >> *Subject:* Re: I would like to add JDBCDialect to support Vertica >> database >> >> >> >> Can we make the JDBCDialect a public API that users can plugin? It looks >> like an end-less job to make sure Spark JDBC source supports all databases. >> >> >> >> On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <lix...@databricks.com> wrote: >> >> You can follow how we test the other JDBC dialects. All JDBC dialects >> require the docker integration tests. >> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc >> >> >> >> >> >> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <bryan.her...@microfocus.com> >> wrote: >> >> Hi, to answer both questions raised: >> >> >> >> Though Vertica is derived from Postgres, Vertica does not recognize type >> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently >> enough to cause issues. The major changes are to use type names and date >> format supported by Vertica. >> >> >> >> For testing, I have a SQL script plus Scala and PySpark scripts, but >> these require a Vertica database to connect, so automated testing on a >> build server wouldn’t work. It’s possible to include my test scripts and >> directions to run manually, but not sure where in the repo that would go. >> If automated testing is required, I can ask our engineers whether there >> exists something like a mockito that could be included. >> >> >> >> Thanks, Bryan H >> >> >> >> *From:* Xiao Li [mailto:lix...@databricks.com] >> *Sent:* Wednesday, December 11, 2019 10:13 AM >> *To:* Sean Owen <sro...@gmail.com> >> *Cc:* Bryan Herger <bryan.her...@microfocus.com>; dev@spark.apache.org >> *Subject:* Re: I would like to add JDBCDialect to support Vertica >> database >> >> >> >> How can the dev community test it? >> >> >> >> Xiao >> >> >> >> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <sro...@gmail.com> wrote: >> >> It's probably OK, IMHO. The overhead of another dialect is small. Are >> there differences that require a new dialect? I assume so and might >> just be useful to summarize them if you open a PR. >> >> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger >> <bryan.her...@microfocus.com> wrote: >> > >> > Hi, I am a Vertica support engineer, and we have open support requests >> around NULL values and SQL type conversion with DataFrame read/write over >> JDBC when connecting to a Vertica database. The stack traces point to >> issues with the generic JDBCDialect in Spark-SQL. >> > >> > I saw that other vendors (Teradata, DB2...) have contributed a >> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect >> for Vertica. >> > >> > The changeset is on my fork of apache/spark at >> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d >> > >> > I have tested this against Vertica 9.3 and found that this changeset >> addresses both issues reported to us (issue with NULL values - setNull() - >> for valid java.sql.Types, and String to VARCHAR conversion) >> > >> > Is the an acceptable change? If so, how should I go about submitting a >> pull request? >> > >> > Thanks, Bryan Herger >> > Vertica Solution Engineer >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- >> >> [image: Databricks Summit - Watch the talks] >> <https://databricks.com/sparkaisummit/north-america> >> >> >> >> >> -- >> >> [image: Databricks Summit - Watch the talks] >> <https://databricks.com/sparkaisummit/north-america> >> >> -- --- Takeshi Yamamuro