paleolimbot commented on PR #646:
URL: https://github.com/apache/sedona-db/pull/646#issuecomment-3938030949
If the benchmark is running a fresh process each time, then `sd.sql("SET
datafusion.runtime.metadata_cache_limit = '900M'").execute()` won't help (the
cache is in-memory only). I'm not sure exactly how DuckDB does it but having a
persistent cache would be great.
We can do that if we want...it roughly involves reimplementing the default
cache:
https://github.com/apache/datafusion/blob/1736fd2a40b64c6e39fb12090a2dbe8be07ac5ac/datafusion/execution/src/cache/file_metadata_cache.rs#L143-L205
...backing it with a SQLite database or files in a temporary directory. It
can be overridden when we set up the runtime environment here:
https://github.com/apache/datafusion/blob/1736fd2a40b64c6e39fb12090a2dbe8be07ac5ac/datafusion/execution/src/runtime_env.rs#L379-L383
https://github.com/apache/sedona-db/blob/bed91516313b3099815de57d94b184f3173a7f45/rust/sedona/src/context_builder.rs#L194-L223
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]