Dandandan commented on issue #19970: URL: https://github.com/apache/datafusion/issues/19970#issuecomment-4068033654
> Hi [@Dandandan](https://github.com/Dandandan) [@alamb](https://github.com/alamb), is this still being actively pursued? We have a use case that would benefit from this — inferring schemas from complex nested JSON files can be quite slow today, especially when there are many files or deeply nested structures. Faster schema inference would meaningfully improve our workflow. Happy to contribute or help test if this is still moving forward! Go ahead, I think it's a nice issue. Did you also see (perhaps you can help reviewing?): https://github.com/apache/arrow-rs/pull/9494 One other thing I saw that also might be worth looking at is that we always create > 32 threads when doing metadata reading based on the configuration. Not a huge problem per-se but it adds to the memory usage and probably reduces locality a bit while running queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
