Hi Jacky, We’ve internally released support for Hive tables (and Spark FileFormat tables) using DataSourceV2 so that we can switch between catalogs; sounds like that’s what you are planning to build as well. It would be great to work with the broader community on a Hive connector.
I will get a branch of our connectors published so that you can take a look. I think it should be fairly close to what you’re talking about building, with a few exceptions: - Our implementation always uses our S3 committers, but it should be easy to change this - It supports per-partition formats, like Hive Do you have an idea about where the connector should be developed? I don’t think it makes sense for it to be part of Spark. That would keep complexity in the main project and require updating Hive versions slowly. Using a separate project would mean less code in Spark specific to one source, and could more easily support multiple Hive versions. Maybe we should create a project for catalog plug-ins? rb On Mon, Mar 23, 2020 at 4:20 AM JackyLee <qcsd2...@163.com> wrote: > Hi devs, > I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re > now working on a project using DataSourceV2 to provide multiple source > support and it works with the data lake solution very well, yet it does not > yet support HiveTable. > > There are 3 reasons why we need to support Hive on DataSourceV2. > 1. Hive itself is one of Spark data sources. > 2. HiveTable is essentially a FileTable with its own input and output > formats, it works fine with FileTable. > 3. HiveTable should be stateless, and users can freely read or write Hive > using batch or microbatch. > > We implemented stateless Hive on DataSourceV1, it supports user to write > into Hive on streaming or batch and it has widely used in our company. > Recently, we are trying to support Hive on DataSourceV2, Multiple Hive > Catalog and DDL Commands have already been supported. > > Looking forward to more discussions on this. > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Ryan Blue Software Engineer Netflix