On Tue, Apr 23, 2024 at 11:37 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> reason why I called out binary level verification out of initial scope is
> because of these two reasons: 1) Calculating digest for each file may
> increase CPU utilisation and 2) Disk would also be under pressure as
> complete disk content will also be read to calculate digest. As called out
> in the discussion, I think we can't
>

We should have a digest / checksum for each of the file components computed
and stored on disk so this doesn't need to be recomputed each time. Most
files / components are immutable and therefore their checksum won't change.
There are some components which may be mutated and therefore their checksum
may need to be recomputed. However, data integrity is not something we can
compromise on. On the receiving node, CPU utilization is not a big issue as
that node isn't servicing traffic.

I was too lazy to dig into the code and someone who is more familiar with
the SSTable components / file format can help shed light on checksums.

Reply via email to