> " flush can reduce memory and speed up the restart process" , this
assumes that all copies have been flushed synchronously, so we can ensure
that the data files are logically consistent at this point.

Sorry that maybe I lag behind current cluster design..
Do we need "all copies have been flushed synchronously, so we can ensure
that the data files are logically consistent at this point" ? why? because
of the raft protocol?


-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


李思佳 <lisi...@360.cn> 于2022年5月23日周一 11:47写道:

> " flush can reduce memory and speed up the restart process" , this assumes
> that all copies have been flushed synchronously, so we can ensure that the
> data files are logically consistent at this point.
>
> The operation of datanode flushing should be the process of resource
> release before the node is shutdown(but this does not guarantee that all
> copies are logically consistent at this point). For example, shutdownHook
> requires the default disk flushing and resource release. We need to provide
> a flush command scenario, perhaps because our node shutdown operation is
> not incomplete?
>
> BR,
> -----------------------------------
> Sijia Li
>
>
> -----邮件原件-----
> 发件人: Xiangdong Huang <saint...@gmail.com>
> 发送时间: 2022年5月23日 11:37
> 收件人: dev <dev@iotdb.apache.org>
> 主题: Re: Flush function in cluster
>
> I think distinguishing flushing on one node or on the cluster has its
> meaning.
>
> As you said, flush can reduce memory and speed up the restart process. So,
> how about if the DBA just wants to restart one node..
>
> However, the default behavior can be discussed: flush on one node by
> default or on the whole cluster by default.
>
> -----------------------------------
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳 <lisi...@360.cn> 于2022年5月23日周一 11:28写道:
>
> > Sorry, I don't understand what the purpose and use of flushing current
> > datanode is.
> >
> > IMO, flush all should mean that all storage group could be flushed, in
> > another word, flush sg is a subset of flush all.
> >
> > For users, distributed is a black box, while SG is an exposed structure.
> > Therefore, for cli commands, there is no need to be aware of the
> > relationship between the datanode and the self-created SG.
> >
> > In addition, the Flush operation may speed up our restart recovery
> > process. For example, when we flush an SG successfully, we can label
> > the associated data files to indicate that all copies are consistent
> > at that point in time(here are flush and write priorities). During the
> > next restart, we can use this flag to quickly skip the verification step.
> >
> > In summary, here are my questions and thoughts:
> > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > 2. Can the Flush operation affect the consensus group or WAL for a
> > quick restart?
> >
> > BR,
> > -----------------------------------
> > Sijia Li
> >
> >
> > -----邮件原件-----
> > 发件人: Jialin Qiao <qiaojia...@apache.org>
> > 发送时间: 2022年5月23日 11:07
> > 收件人: dev@iotdb.apache.org
> > 主题: Flush function in cluster
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable
> > into disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> >
> > —————————————————
> > Jialin Qiao
> > Apache IoTDB PMC
> >
>

Reply via email to