Hi  Andres

> It's implied, but to make it more explicit: One big efficiency advantage
of
> writes by checkpointer is that they are sorted and can often be combined
into
> larger writes. That's often a lot more efficient: For network attached
storage
> it saves you iops, for local SSDs it's much friendlier to wear leveling.

thank you for explanation, I think bgwrite also can merge io ,It  writes
asynchronously to the file system cache, scheduling by os, .



> Another aspect is that checkpointer's writes are much easier to pace over
time
> than e.g. bgwriters, because bgwriter is triggered by a fairly short term
> signal.  Eventually we'll want to combine writes by bgwriter too, but
that's
> always going to be more expensive than doing it in a large batched fashion
> like checkpointer does.

> I think we could improve checkpointer's pacing further, fwiw, by taking
into
> account that the WAL volume at the start of a spread-out checkpoint
typically
> is bigger than at the end.

I'm also very keen to improve checkpoints , Whenever I do stress test,
bgwrite does not write dirty pages when the data set is smaller than
shard_buffer size,Before the checkpoint, the pressure measurement tps was
stable and the highest during the entire pressure measurement phase,Other
databases refresh dirty pages at a certain frequency, at intervals, and at
dirty page water levels,They have a much smaller impact on performance when
checkpoints occur


Thanks


Andres Freund <and...@anarazel.de> 于2024年10月4日周五 03:40写道:

> Hi,
>
> On 2024-10-02 18:36:44 +0200, Tomas Vondra wrote:
> > On 10/2/24 17:02, Tony Wayne wrote:
> > >
> > >
> > > On Wed, Oct 2, 2024 at 8:14 PM Laurenz Albe <laurenz.a...@cybertec.at
> > > <mailto:laurenz.a...@cybertec.at>> wrote:
> > >
> > >     On Wed, 2024-10-02 at 16:48 +0800, wenhui qiu wrote:
> > >     > Whenever I check the checkpoint information in a log, most dirty
> > >     pages are written by the checkpoint process
> > >
> > >     That's exactly how it should be!
> > >
> > > is it because if bgwriter frequently flushes, the disk io will be
> more?🤔
> >
> > Yes, pretty much. But it's also about where the writes happen.
> >
> > Checkpoint flushes dirty buffers only once per checkpoint interval,
> > which is the lowest amount of write I/O that needs to happen.
> >
> > Every other way of flushing buffers is less efficient, and is mostly a
> > sign of memory pressure (shared buffers not large enough for active part
> > of the data).
>
> It's implied, but to make it more explicit: One big efficiency advantage of
> writes by checkpointer is that they are sorted and can often be combined
> into
> larger writes. That's often a lot more efficient: For network attached
> storage
> it saves you iops, for local SSDs it's much friendlier to wear leveling.
>
>
> > But it's also happens about where the writes happen. Checkpoint does
> > that in the background, not as part of regular query execution. What we
> > don't want is for the user backends to flush buffers, because it's
> > expensive and can cause result in much higher latency.
> >
> > The bgwriter is somewhere in between - it's happens in the background,
> > but may not be as efficient as doing it in the checkpointer. Still much
> > better than having to do this in regular backends.
>
> Another aspect is that checkpointer's writes are much easier to pace over
> time
> than e.g. bgwriters, because bgwriter is triggered by a fairly short term
> signal.  Eventually we'll want to combine writes by bgwriter too, but
> that's
> always going to be more expensive than doing it in a large batched fashion
> like checkpointer does.
>
> I think we could improve checkpointer's pacing further, fwiw, by taking
> into
> account that the WAL volume at the start of a spread-out checkpoint
> typically
> is bigger than at the end.
>
> Greetings,
>
> Andres Freund
>
>
>

Reply via email to