Hi Jialin, Great experiment! Thanks for your sharing.
Looking forward to the function of hot compaction. Best, ----------------------------------- Zesong Sun School of Software, Tsinghua University 孙泽嵩 清华大学 软件学院 > 2020年7月8日 16:39,Jialin Qiao <qj...@mails.tsinghua.edu.cn> 写道: > > Hi, > > > I'd like to share with you some experiment results about how chunk size > impact the query performance. > > > Hardware: > MacBook Pro (Retina, 15-inch, Mid 2015) > CPU: 2.2 GHz Intel Core i7 > Memory: 16 GB 1600 MHz DDR3 > I use a mobile HDD (SEAGATE, 1TB, Model SRD00F1) as the storage. > > > Workload: 1 storage group, 1 device, 100 measurements in long type. 1 million > data points generated randomly for each time series. > > > A background knowledge is the origin flushed chunk size = > memtable_size_threshold / series number / byte per data point (16 for long > data points) > > > I adjust the memtable_size_threhold to control the chunk size. > > > Configurations of IoTDB: > > > enable_parameter_adapter=false > avg_series_point_number_threshold=10000000 (to make the > memtable_size_threshold valid) > page_size_in_byte=1000000000 (each chunk has one page) > tsfile_size_threshold = memtable_size_threshold = > 160000/1600000/16000000/160000000/1600000000 > > > I use SessionExample.insertTablet to insert data under different > configurations. Then I got Chunk sizes from 100 to 1000000. > > > Then I use SessionExample.queryByIterator to iterate the result set of > "select s1 from root.sg1.d1" without constructing other data structures. > > > The results are: > > > | chunk size | query time cost in ms | > | 100 | 47620 | > | 1000 | 13984 | > | 10000 | 2416 | > | 100000 | 1322 | > > > As we could see the chunk size has a dominate impact to the raw data query > performance. In the current query engine, Chunk is the basic data unit to > read from the disk. For reading each Chunk, we need one seek + one IO > operation. A larger chunk size means less Chunks to read. > > > Therefore, it's better to enlarge the memtable_size_threshold for accelerate > queries. However, enlarging memtable_size_threshold means more memory is > needed. This is not always satisfied in some scenes. Therefore, we need > compaction, either hot compaction triggered in flushing or the timed > compaction strategy, to compact small chunks to a large one. > > > Thanks, > -- > Jialin Qiao > School of Software, Tsinghua University > > 乔嘉林 > 清华大学 软件学院