Re: [HACKERS] Cost limited statements RFC

Greg Smith Fri, 07 Jun 2013 08:36:53 -0700

On 6/7/13 10:14 AM, Robert Haas wrote:

If the page hit limit goes away, the user with a single core server who used
to having autovacuum only pillage shared_buffers at 78MB/s might complain
that if it became unbounded.


Except that it shouldn't become unbounded, because of the ring-buffer
stuff.  Vacuum can pillage the OS cache, but the degree to which a
scan of a single relation can pillage shared_buffers should be sharply
limited.

I wasn't talking about disruption of the data that's in the buffercache. The only time the scenario I was describing plays out is whenthe data is already in shared_buffers. The concern is damage done tothe CPU's data cache by this activity. Right now you can't even reach100MB/s of damage to your CPU caches in an autovacuum process. Rippingout the page hit cost will eliminate that cap. Autovacuum couldintroduce gigabytes per second of memory -> L1 cache transfers. That'swhat all my details about memory bandwidth were trying to put intocontext. I don't think it really matter much because the new bottleneckwill be the processing speed of a single core, and that's still a decentcap to most people now.

I think you're missing my point here, which is is that we shouldn't
have any such things as a "cost limit".  We should limit reads and
writes *completely separately*.  IMHO, there should be a limit on
reading, and a limit on dirtying data, and those two limits should not
be tied to any common underlying "cost limit".  If they are, they will
not actually enforce precisely the set limit, but some other composite
limit which will just be weird.

I see the distinction you're making now, don't need a mock up to followyou. The main challenge of moving this way is that read and write ratesnever end up being completely disconnected from one another. A readwill only cost some fraction of what a write does, but they shouldn't becompletely independent.

Just because I'm comfortable doing 10MB/s of reads and 5MB/s of writes,I may not be happy with the server doing 9MB/s read + 5MB/s write=14MB/sof I/O in an implementation where they float independently. It'scertainly possible to disconnect the two like that, and people will beable to work something out anyway. I personally would prefer not tolose some ability to specify how expensive read and write operationsshould be considered in relation to one another.

Related aside: shared_buffers is becoming a decreasing fraction oftotal RAM each release, because it's stuck with this rough 8GB limitright now. As the OS cache becomes a larger multiple of theshared_buffers size, the expense of the average read is dropping. Readsare getting more likely to be in the OS cache but not shared_buffers,which makes the average cost of any one read shrink. But writes are asexpensive as ever.

Real-world tunings I'm doing now reflecting that, typically in serverswith >128GB of RAM, have gone this far in that direction:


vacuum_cost_page_hit = 0
vacuum_cost_page_hit = 2
vacuum_cost_page_hit = 20

That's 4MB/s of writes, 40MB/s of reads, or some blended mix thatconsiders writes 10X as expensive as reads. The blend is a feature.

The logic here is starting to remind me of how the random_page_costdefault has been justified. Read-world random reads are actually closeto 50X as expensive as sequential ones. But the average read from theexecutor's perspective is effectively discounted by OS cache hits, so4.0 is still working OK. In large memory servers, random reads keepgetting cheaper via better OS cache hit odds, and it's increasinglybecoming something important to tune for.

Some of this mess would go away if we could crack the shared_buffersscaling issues for 9.4. There's finally enough dedicated hardwarearound to see the issue and work on it, but I haven't gotten a clearpicture of any reproducible test workload that gets slower with largebuffer cache sizes. If anyone has a public test case that gets slowerwhen shared_buffers goes from 8GB to 16GB, please let me know; I've gottwo systems setup I could chase that down on now.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cost limited statements RFC

Reply via email to