> " flush can reduce memory and speed up the restart process" , this assumes that all copies have been flushed synchronously, so we can ensure that the data files are logically consistent at this point.
Sorry that maybe I lag behind current cluster design.. Do we need "all copies have been flushed synchronously, so we can ensure that the data files are logically consistent at this point" ? why? because of the raft protocol? ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 李思佳 <lisi...@360.cn> 于2022年5月23日周一 11:47写道: > " flush can reduce memory and speed up the restart process" , this assumes > that all copies have been flushed synchronously, so we can ensure that the > data files are logically consistent at this point. > > The operation of datanode flushing should be the process of resource > release before the node is shutdown(but this does not guarantee that all > copies are logically consistent at this point). For example, shutdownHook > requires the default disk flushing and resource release. We need to provide > a flush command scenario, perhaps because our node shutdown operation is > not incomplete? > > BR, > ----------------------------------- > Sijia Li > > > -----邮件原件----- > 发件人: Xiangdong Huang <saint...@gmail.com> > 发送时间: 2022年5月23日 11:37 > 收件人: dev <dev@iotdb.apache.org> > 主题: Re: Flush function in cluster > > I think distinguishing flushing on one node or on the cluster has its > meaning. > > As you said, flush can reduce memory and speed up the restart process. So, > how about if the DBA just wants to restart one node.. > > However, the default behavior can be discussed: flush on one node by > default or on the whole cluster by default. > > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > 李思佳 <lisi...@360.cn> 于2022年5月23日周一 11:28写道: > > > Sorry, I don't understand what the purpose and use of flushing current > > datanode is. > > > > IMO, flush all should mean that all storage group could be flushed, in > > another word, flush sg is a subset of flush all. > > > > For users, distributed is a black box, while SG is an exposed structure. > > Therefore, for cli commands, there is no need to be aware of the > > relationship between the datanode and the self-created SG. > > > > In addition, the Flush operation may speed up our restart recovery > > process. For example, when we flush an SG successfully, we can label > > the associated data files to indicate that all copies are consistent > > at that point in time(here are flush and write priorities). During the > > next restart, we can use this flag to quickly skip the verification step. > > > > In summary, here are my questions and thoughts: > > 1. Is it necessary to flush a dataNode? What are the benefits of this? > > 2. Can the Flush operation affect the consensus group or WAL for a > > quick restart? > > > > BR, > > ----------------------------------- > > Sijia Li > > > > > > -----邮件原件----- > > 发件人: Jialin Qiao <qiaojia...@apache.org> > > 发送时间: 2022年5月23日 11:07 > > 收件人: dev@iotdb.apache.org > > 主题: Flush function in cluster > > > > Hi, > > > > Flush is a frequently used command in IoTDB, which flushes memtable > > into disk and closes all tsfiles. > > > > In the new cluster, we need to redefine this function [1]. > > > > * flush: flushing current datanode > > > > * flush all/cluster: flushing all datanodes > > > > * flush sg: flush all DataRegions of a storage group > > > > > > What do you think? > > > > [1] https://issues.apache.org/jira/browse/IOTDB-3099 > > > > ————————————————— > > Jialin Qiao > > Apache IoTDB PMC > > >