[GitHub] [arrow-datafusion] alamb commented on issue #7556: Make StatisticsCache share in session level

via GitHub Fri, 15 Sep 2023 03:51:57 -0700


alamb commented on issue #7556:
URL: 
https://github.com/apache/arrow-datafusion/issues/7556#issuecomment-1721070992


   @liukun4515 
   
   > Does influx io has the file statis cache or the list files cache when?
   
   IOx caches (effectively) the list of files and a (very small) subset of the 
statistics (in our case just the min/max timestamp values). Our metadata 
catalog (see below) did not have space to store the entire parquet file 
metadata (with per-row group statistics)
   
   Also, at the moment IOx has an in memory cache of the actual parquet data, 
which means it effectively always reads the entire objects from storage (though 
we may change this at some point)
   
   > How does influx io resolve the issue that node need to visit the remote 
storage when generating the execution plan?
   
   IOx has its own, separate, metadata catalog that stores information about 
the schema and what files store data for each table as well as what partition 
(IOx keeps the data segregated in daily partitions typically). 
   
   Thus we never use `LIST` operations on object store
   
   THe hgih level architecture is described here: 
https://www.influxdata.com/blog/influxdb-3-0-system-architecture/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #7556: Make StatisticsCache share in session level

Reply via email to