Hi Spark developers,
My team has an internal storage format. It already has an implementaion of data
source v2.
Now we want to adapt catalog support for it. I expect each partition can be
stored in this format and spark catalog can manage partition columns which is
just like using ORC and Parquet.
After checking the logic of DataSource.resolveRelation, I wonder if introducing
another FileFormat for my storage spec is the only way to support catalog
managed partition. Could any expert help to confirm?
Another question is the following comments "now catalog for data source V2 is
under development". Anyone knows the progress or design of feature?
lazy val providingClass: Class[_] = {
val cls = DataSource.lookupDataSource(className,
sparkSession.sessionState.conf)
// `providingClass` is used for resolving data source relation for catalog
tables.
// As now catalog for data source V2 is under development, here we fall back
all the
// [[FileDataSourceV2]] to [[FileFormat]] to guarantee the current catalog
works.
// [[FileDataSourceV2]] will still be used if we call the load()/save()
method in
// [[DataFrameReader]]/[[DataFrameWriter]], since they use method
`lookupDataSource`
// instead of `providingClass`.
cls.newInstance() match {
case f: FileDataSourceV2 => f.fallbackFileFormat
case _ => cls
}
}
Thanks,
Kun