Hi,
I'd like to share with you some experiment results about how chunk size impact the query performance. Hardware: MacBook Pro (Retina, 15-inch, Mid 2015) CPU: 2.2 GHz Intel Core i7 Memory: 16 GB 1600 MHz DDR3 I use a mobile HDD (SEAGATE, 1TB, Model SRD00F1) as the storage. Workload: 1 storage group, 1 device, 100 measurements in long type. 1 million data points generated randomly for each time series. A background knowledge is the origin flushed chunk size = memtable_size_threshold / series number / byte per data point (16 for long data points) I adjust the memtable_size_threhold to control the chunk size. Configurations of IoTDB: enable_parameter_adapter=false avg_series_point_number_threshold=10000000 (to make the memtable_size_threshold valid) page_size_in_byte=1000000000 (each chunk has one page) tsfile_size_threshold = memtable_size_threshold = 160000/1600000/16000000/160000000/1600000000 I use SessionExample.insertTablet to insert data under different configurations. Then I got Chunk sizes from 100 to 1000000. Then I use SessionExample.queryByIterator to iterate the result set of "select s1 from root.sg1.d1" without constructing other data structures. The results are: | chunk size | query time cost in ms | | 100 | 47620 | | 1000 | 13984 | | 10000 | 2416 | | 100000 | 1322 | As we could see the chunk size has a dominate impact to the raw data query performance. In the current query engine, Chunk is the basic data unit to read from the disk. For reading each Chunk, we need one seek + one IO operation. A larger chunk size means less Chunks to read. Therefore, it's better to enlarge the memtable_size_threshold for accelerate queries. However, enlarging memtable_size_threshold means more memory is needed. This is not always satisfied in some scenes. Therefore, we need compaction, either hot compaction triggered in flushing or the timed compaction strategy, to compact small chunks to a large one. Thanks, -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院