BlakeOrth commented on issue #19055: URL: https://github.com/apache/datafusion/issues/19055#issuecomment-3676013903
> I wonder if I should first add test files in `parquet-testing` repo, or if you have other suggestions. Thank you! @jizezhang I don't think adding files to `parquet-testing` is necessary here. The caching functionality works equally as well with temporary in-memory object stores, so there's no technical reasons to prefer files on disk or some other remote storage for testing. There are quite a few ways to accomplish getting some valid test files. I would probably recommend you take inspiration from the `object_store_access.rs` tests which set up various in-memory tables programmatically: https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion/core/tests/datasource/object_store_access.rs#L505 If you're looking to do some integration level testing similar to the tests here: https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion-cli/tests/cli_integration.rs#L402 You can use DataFusion to create parquet files by using the `COPY` command. You can see various usage of `COPY` with parquet files in the `sqllogic` tests: https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion/sqllogictest/test_files/parquet.slt#L18 For testing I think `COPY` is preferable to `INSERT` because it will allow you to have well defined names for your files. The subtle caveat here is that when the cache is enabled tables that are already defined won't pick up changes from the `COPY` commands, so you'd have to make sure you write all the files first prior to creating a table in DataFusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
