yjshen commented on pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#issuecomment-896706081
> Overall I would prefer (but this is just my opinion) a higher level abstraction in which we can also plug catalogs such as Delta or Iceberg Hi @rdettai, we do have `CatalogProvider` already and a `CatalogList` in the ExecutionContext, and we get table from `CatalogProvider` -> `SchemaProvider` -> `TableProvider`. I suppose the `Catalog` you want is orthogonal to `ObjectStore` here? > But here you cannot use async because the file list and statistics are materialized at the ParquetTable creation level which is too early. This early materialization will also be problematic with buckets that have thousands of files: The file listing happens when we are registering a new table. Since we currently enforce all the files have the same schema, I thought this can only be achieved to read them all first? I think this could be relaxed when we can provide schema in advance and can handle parquet files with different schema inside one table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
