[GitHub] [arrow-datafusion] liukun4515 commented on issue #7556: Make StatisticsCache share in session level

via GitHub Sun, 17 Sep 2023 07:20:58 -0700


liukun4515 commented on issue #7556:
URL: 
https://github.com/apache/arrow-datafusion/issues/7556#issuecomment-1722488428


   > @liukun4515
   > 
   > > Does influx io has the file statis cache or the list files cache when?
   > 
   > IOx caches (effectively) the list of files and a (very small) subset of 
the statistics (in our case just the min/max timestamp values). Our metadata 
catalog (see below) did not have space to store the entire parquet file 
metadata (with per-row group statistics)
   > 
   > Also, at the moment IOx has an in memory cache of the actual parquet data, 
which means it effectively always reads the entire objects from storage (though 
we may change this at some point)
   > 
   > > How does influx io resolve the issue that node need to visit the remote 
storage when generating the execution plan?
   > 
   > IOx has its own, separate, metadata catalog that stores information about 
the schema and what files store data for each table as well as what partition 
(IOx keeps the data segregated in daily partitions typically).
   > 
   > Thus we never use `LIST` operations on object store
   > 
   > THe hgih level architecture is described here: 
https://www.influxdata.com/blog/influxdb-3-0-system-architecture/
   
   Thanks for your detailed comments, I will take a look this blogs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] liukun4515 commented on issue #7556: Make StatisticsCache share in session level

Reply via email to