Hello,

[ I have not yet had the time to look at code again in response to
some of the points raised raised by several people; but I wanted to
follow-up somewhat still on other bits. ]

> > You would have to test for whether it's time to sleep much more often.
> > Possibly before every ExecProcNode call would be enough.
> 
> That would have overhead comparable to EXPLAIN ANALYZE, which is a lot.
> 
> I'm fairly dubious about this whole proposal: it's not clear to me that
> the vacuum delay stuff works very well at all, and to the extent that it
> does work it's because vacuum has such stylized, predictable behavior.

Well, it definitely works well enough to make a large difference in my
use cases. In particular with respect to the amount of write activity
generated which easily causes latency problems. That said, it remains
to be seen how much of an issue heavy write activity will be once
upgraded to 8.3 and after tweaking the Linux buffer cache.

Right now, I do not expect the database to be even useful in one of my
use cases, if it were not for delay points during vacuuming.

So although I make no argument as to whether it works better due to
the limited and understood nature of vacuuming, it is definitely an
appreciate feature for my use cases.

> The same can't be said of general SQL queries.  For one thing, it's
> not apparent that rate-limiting I/O would be sufficient, because
> although vacuum is nearly always I/O bound, general queries often are
> CPU bound; or their system impact might be due to other factors like
> contention for shared-memory data structures.

In my case I mostly care about I/O. I believe that this is going to be
a fairly common fact with anyone whose primary concern is latency.

The good part about CPU contention is that it is handled quite well by
modern operating systems/hardware. Even on a single-core machine, a
single CPU bound query should still only have a percentage-wise
throughput impact on other traffic (normally; of course you might have
some particularly bad contention on some resource, etc). If your
database is very sensitive to latency, you are likely running it at
far below full throughput, meaning that there should be quite a bit of
margin in terms of CPU.

This would be especially true on multi-core machines where the impact
of a single backend is even less.

The problem I have with I/O is that saturating I/O, in particular with
writes, has all sorts of indirect effects that are difficult to
predict, and are not at all guaranteed to translate into a simple
percentage-wise slow-down. For example, I've seen stalls lasting
several *minutes* due to a bulk DELETE of a million rows or so. With
mixed random-access writes, streaming writes, and the PG buffer cache,
the operating system buffer cache, and the RAID controller's cache, it
is not at all unlikely that you will have significant latency problems
when saturating the system with writes.

So recognizing that I am not likely to ever have very good behavior
while saturating the storage system with writes, I instead want to
limit the write activity generated to a sensible amount (preferably
such that individual bursts are small enough to e.g. fit in a RAID
controller cache). This reduces the problem of ensuring good behavior
with respect to short burst of writes and their interaction with
checkpoints, which is a much easier problem than somehow ensuring
"fairness" under write-saturated load.

So that is where my motivation comes from; in more or less all my use
cases, limiting disk I/O is massively more important than limiting CPU
usage.

On this topic, I have started thinking again about direct I/O. I asked
about this on -performance a while back in a different context and it
seemed there was definitely no clear concensus that one "should" have
direct I/O. That seemed to be mostly due to a perceived lack of
increase in throughput. However my gut feel is that by bypassing the
OS buffer cache, you could significantly improve real-time/latency
sensitive aspects in PostgreSQL in cases where throughput is not your
primary concern.

Perhaps something like that would be a more effective approach.

> Priority inversion is
> a pretty serious concern as well (ie, a sleeping "low priority" query
> might be blocking other queries).

I presume this is in reference to bulk modifications (rather than
selects) blocking other transactions with conflicting updates?

If so, yes I see that. The feature would be difficult to use reliably
for writes except in very controlled situations (in my particular
use-case that I am tunnel vision:ing on, it is guaranteed that there
is no conflict due to the nature of the application).

But is my understanding correct that there is no reason to believe
there are such issues for read-only queries, or queries that do not
actually conflict (at the SQL level) with concurrent transactions?

(Ignoring the impact it might have on old transactions hanging around
for a longer time.)

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

Attachment: pgpS145eUe9aH.pgp
Description: PGP signature

Reply via email to