crepererum commented on issue #3777: URL: https://github.com/apache/arrow-datafusion/issues/3777#issuecomment-1310460602
The issue with `refresh()` though is that it only enables a small subset of features, namely async full discovery. It doesn't allow for partial schema discovery. For example, if you have a hive dataset and you know the tables by scanning one directory, but there might be thousands of them (I'm exaggerating here) and you wanna spare yourself reading all the `_common_metadata` files, because the user may only query a single table. We in IOx have a similar situation where it would be nice if we could get away with only constructing the schema for tables we actually need (it's not a major blocker, but I nice to have). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
