alamb commented on issue #7556: URL: https://github.com/apache/arrow-datafusion/issues/7556#issuecomment-1721070992
@liukun4515 > Does influx io has the file statis cache or the list files cache when? IOx caches (effectively) the list of files and a (very small) subset of the statistics (in our case just the min/max timestamp values). Our metadata catalog (see below) did not have space to store the entire parquet file metadata (with per-row group statistics) Also, at the moment IOx has an in memory cache of the actual parquet data, which means it effectively always reads the entire objects from storage (though we may change this at some point) > How does influx io resolve the issue that node need to visit the remote storage when generating the execution plan? IOx has its own, separate, metadata catalog that stores information about the schema and what files store data for each table as well as what partition (IOx keeps the data segregated in daily partitions typically). Thus we never use `LIST` operations on object store THe hgih level architecture is described here: https://www.influxdata.com/blog/influxdb-3-0-system-architecture/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
