Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Mon, 10 Aug 2015 11:25:28 -0700


Hello Andres,

You can't allocate 4GB with palloc(), it has a builtin limit against
allocating more than 1GB.


Argh, too bad, I assumed very naively that palloc was malloc in disguise.

[...]

Well, then everytime the checkpointer is restarted.


Hm...

The point is that it's done at postmaster startup, and we're pretty much
guaranteed that the memory will availabl.e.


Ok ok, I stop resisting... I'll have a look.

Would it also fix the 1 GB palloc limit on the same go? I guess so...

That reasoning makes it impossible to move the fsyncing of files into the
loop (whenever we move to a new file). That's not nice.


I do not see why.


Because it means that the sorting isn't necessarily correct. I.e. we
can't rely on it to determine whether a file has already been fsynced.


Ok, I understand your point.

Then the file would be fsynced twice: if the fsync is done properly (datahave already been flushed to disk) then it would not cost much, and doingit sometimes twice on some file would not be a big issue. The code couldalso detect such event and log a warning, which would give a hint abouthow often it occurs in practice.

Hm. Is that actually the case for our qsort implementation?
I think that it is hard to write a qsort which would fail that. That would
mean that it would compare the same items twice, which would be inefficient.
What? The same two elements aren't frequently compared pairwise witheach other, but of course an individual element is frequently comparedwith other elements.


Sure.

Consider what happens when the chosen pivot element changes its identityafter already dividing half. The two partitions will not be divided inany meaning full way anymore. I don't see how this will results in ameaningful sort.

It would be partly meaningful, which is enough for performance, and doesnot matter for correctness: currently buffers are not sorted at all and itworks, even if it does not work well.

If the pivot element changes its identity won't the result be pretty much
random?


That would be a very unlikely event, given the short time spent in
qsort.


Meh, we don't want to rely on "likeliness" on such things.

My main argument is that even if it occurs, and the qsort result is partlywrong, it does not change correctness, it just mean that the actual writeswill be less in order than wished. If it occurs, one pivot separationwould be quite strange, but then others would be right, so the bufferswould be "partly sorted".

Another issue I see is that even if buffers are locked within cmp, thestatus may change between two cmp... I do not think that locking allbuffers for sorting them is an option. So on the whole, I think thatlocking buffers for sorting is probably not possible with the simple (andefficient) lightweight approach used in the patch.

The good news, as I argued before, is that the order is only advisory tohelp with performance, but the correctness is really that all checkpointbuffers are written and fsync is called in the end, and does not depend onthe buffer order. That is how it currently works anyway.

If you block on this then I'll put a heavy weight approach, but that wouldbe a waste of memory in my opinion, hence my argumentation for thelightweight approach.


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to