Re: [HACKERS] Postgresql Caching

Harvell F Sun, 15 Oct 2006 19:59:28 -0700


On 15 Oct 2006, at 19:55, [EMAIL PROTECTED] wrote:

On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote:
[EMAIL PROTECTED] wrote:
As a thought experiment, I'm not seeing the benefit. I think if you
could prove a benefit, then any proof you provided could be used to
improve the already existing caching layers, and would apply equally
to read-only or read-write pages. For example, why not be able to
hint to PostgreSQL that a disk-based table should be considered a
priority to keep in RAM. That way, PostgreSQL would avoid pushing
pages from this table out.
If memcached (or pgmemcached implemented in triggers) can show aspeed
improvement using ram based caching (even with network overhead) of
specific data then it stands to reason that this ram based cachecan beintegrated into postgres with better integration that willovercome the
issues that pgmemcached has.
...
I think the memcache people are thinking that the cost ofPostgreSQL is
about the disk. Although the disk plays a part, I'm pretty sure it's
only a fraction. Not providing transaction guarantees, notproviding an
SQL level abstraction, and not having multiple processes or threads
plays a much bigger part.

Forgive my intrusion and perhaps simplistic viewpoint, however,improved caching would be of great benefit for me as a web developer.

I wholeheartedly agree that the disk IO is often a small part ofthe expense of obtaining data from the database, especially for thenominal web based application. Query parsing, joining, sorting, etc.are all likely to be real culprits. The existing caching mechanism(as I understand them) and especially the kernel disk caches donothing to eliminate these overhead costs.

I would venture that the 80/20 rule applies here as in many, manyother instances. A full 80+% of the queries performed against thedatabase are performed over and over and over again with the samecriteria for a period of time and then the criteria changes for thenext period of time. This would be particularly true for seldomchanged tables that, for example, contain a list of the day'sadvertisements. The data is changed slowly, once a day or once aweek, but, a query is made for every page hit. Usually the exactsame query.

I know, for you purists out there, that this is an obvious callfor an application level cache. Perhaps so, however, it complicatesthe end-programmer environment _and_ it has consistencydisadvantages. Many of the programming languages being used providedirect interfaces to PostgreSQL (not surprising given that theprogrammers are using PostgreSQL) and some may even provide a cachingmechanism. Best case, integrating the two remains a task for the end-programmer, worse case, the end-programmer has to implement the cacheas well. Rolling a cache into the database removes that complexityby incorporating it into the existing PostgreSQL API. (BTW, I'maware that the consistency disadvantages of the application levelcache could probably be overcome by implementing notify in the cachebut, again, at added end-programmer expense.)

Getting back to the original posting, as I remember it, thequestion was about seldom changed information. In that case, andassuming a repetitive query as above, a simple query results cachethat is keyed on the passed SQL statement string and that simplyreturns the previously cooked result set would be a really bigperformance win.

Registering each cache entry by the tables included in the queryand invalidating the cache during on a committed update or inserttransaction to any of the tables would, transparently, solve theconsistency problem.

Does this improve the "more interesting" case of heavily updatedtables? Not likely, however, for many web applications, it willlikely improve 80% of the queries leaving more cycles (and bandwidth)available for the non-cacheable queries.

There would be other issues as well, for example, non-invalidatedcache entries will accumulate rapidly if the criteria changes often,large result sets will cause cache contention, cursors will (likely)be unable to use the cache, syntax/commands for manipulatingcacheability, etc. THIS DOES NOT ELIMINATE THE BASIC VALUE of aresults cache for the conditions specified above. Conditions that Iwould venture to say make up a large part of the queries that are (orcould be) made by a web application.


Thanks,
  F

--
F Harvell
407 467-1919




---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Postgresql Caching

Reply via email to