`SchemaProvider`

GitBox Thu, 10 Nov 2022 07:27:08 -0800


crepererum commented on issue #3777:
URL: 
https://github.com/apache/arrow-datafusion/issues/3777#issuecomment-1310460602


   The issue with `refresh()` though is that it only enables a small subset of 
features, namely async full discovery. It doesn't allow for partial schema 
discovery. For example, if you have a hive dataset and you know the tables by 
scanning one directory, but there might be thousands of them (I'm exaggerating 
here) and you wanna spare yourself reading all the `_common_metadata` files, 
because the user may only query a single table. We in IOx have a similar 
situation where it would be nice if we could get away with only constructing 
the schema for tables we actually need (it's not a major blocker, but I nice to 
have).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] crepererum commented on issue #3777: An asynchronous version of `CatalogList`/`CatalogProvider`/`SchemaProvider`

Reply via email to