Hi Team, I have been using spark 3.2.0 and working on DatasourceV2 adoption with custom JDBC datasource connector. The approach I took, was to implement Table Provider Interface in mycustomeDatasource class and register my datasource. Also I implemented relevant handler with SupportsRead , Scan , ScanBuilder , SupportsPushDownAggregates , SupportsPushDownRequiredColumns ,SupportsPushDownFilters interfaces..
Approach and Observations Using createOrReplaceTempView: I tried testing the new datasource with spark shell and passing my datasource in spark.read.format method val jdbcDF = spark.read..format("mycustomeDatasource") jdbcDF.registerTempTable("test_table") spark.sql("select * from test_table ").show(false) This approach works fine and I am able to test the new datasource. USING HIVE CREATE TABLE : The setup has an external hive metastore , to store HIVE table metastore information. I created a new hive table using my datasource by passing all options. CREATE TABLE test_table USING mycustomeDatasource (url, driver, dbtable, ………) It creates the table fine but when I try to query , it thrown back below error. org.apache.spark.sql.AnalysisException: mycustomeDatasource is not a valid Spark SQL Data Source. Caused by: org.apache.spark.sql.AnalysisException: mycustomeDatasource is not a valid Spark SQL Data Source. at org.apache.spark.sql.errors.QueryCompilationErrors$.invalidDataSourceError(QueryCompilationErrors.scala:1016) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:424) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:261) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:248) at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) On checking github for debugging, this, I found out that DataSource.resolveRelation(), does not handle Table Provider interface yet But same has been added in lookupDataSourceV2 method. Below links for reference https://github.com/apache/spark/blob/v3.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L717 https://github.com/apache/spark/blob/v3.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L344 Can you help me understand , on below points Is DatasourceV2 for JDBC datasource is still under development ? and this can't be adopted yet to use pushdown enhancement ? if yes, any JIRA to track this ? The approach I took to adopt the pushdown enhancement might not be ideal, Do you have some working examples for the same I can refer to ? Thanks Arsh