Hi Team,

I have been using spark 3.2.0 and working on DatasourceV2 adoption with
custom JDBC datasource connector.
The approach I took, was to implement Table Provider Interface in
mycustomeDatasource class and register my datasource.
Also I implemented relevant handler with SupportsRead , Scan , ScanBuilder
, SupportsPushDownAggregates , SupportsPushDownRequiredColumns
,SupportsPushDownFilters interfaces..

Approach and Observations

Using createOrReplaceTempView: I tried testing the new datasource with
spark shell and passing my datasource in spark.read.format method
val jdbcDF = spark.read..format("mycustomeDatasource")
jdbcDF.registerTempTable("test_table")
spark.sql("select * from test_table ").show(false)

This approach works fine and I am able to test the new datasource.

USING HIVE CREATE TABLE : The setup has an external hive metastore , to
store HIVE table metastore information. I created a new hive table using my
datasource by passing all options.

CREATE TABLE test_table USING mycustomeDatasource (url, driver, dbtable,
………)

It creates the table fine but when I try to query , it thrown back below
error.

org.apache.spark.sql.AnalysisException: mycustomeDatasource is not a valid
Spark SQL Data Source.

Caused by: org.apache.spark.sql.AnalysisException: mycustomeDatasource is
not a valid Spark SQL Data Source.
at
org.apache.spark.sql.errors.QueryCompilationErrors$.invalidDataSourceError(QueryCompilationErrors.scala:1016)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:424)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:261)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:248)
at
org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)

On checking github for debugging, this, I found out that
DataSource.resolveRelation(), does not handle Table Provider interface yet
But same has been added in lookupDataSourceV2 method. Below links for
reference

https://github.com/apache/spark/blob/v3.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L717

https://github.com/apache/spark/blob/v3.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L344

Can you help me understand , on below points

Is DatasourceV2 for JDBC datasource is still under development ? and this
can't be adopted yet to use pushdown enhancement ? if yes, any JIRA to
track this ?
The approach I took to adopt the pushdown enhancement might not be ideal,
Do you have some working examples for the same I can refer to ?

Thanks
Arsh

Reply via email to