Hi, everyone

Currently, the filename format of each tsfile is
In one time partition, the order of tsfiles is guaranteed by the
version_id, for example, 1651825804093-2-0-0.tsfile is after

The problem is that filename conflict may occur in the cross space
compaction and load scenes. In the cross space compaction, assuming there
exists 3-2-0-0.tsfile, 4-3-0-0.tsfile and 5-5-0-0.tsfile in the sequence
folder, if file 4-3-0-0.tsfile is selected, compaction cannot generate 3 or
more target files because only 2 version_id are left between 2 and 5, so
some big target files may be generated. In the load, assuming there exists
3-2-0-0.tsfile, 3-3-0-0.tsfile and 3-3-0-0.tsfile in the sequence folder,
no more sequence files cannot be loaded between 3-2-0-0.tsfile and
3-3-0-0.tsfile, they can only be loaded into the unsequence folder.

In response to these problems, the format won't be changed, but the meaning
of file_created_time and version_id will be different. Instead of
version_id, we use file_created_time to guarantee the order of tsfiles, and
if two tsfiles have the same file_created_time, then we use version_id to
guarantee the order. This semantics change may afftect query, compaction
and load module.

Hope for some suggestions.

Haiming Zhu
School of Software, Tsinghua University

清华大学 软件学院

Reply via email to