On 15 Oct 2006, at 19:55, [EMAIL PROTECTED] wrote:

On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote:
[EMAIL PROTECTED] wrote:
As a thought experiment, I'm not seeing the benefit. I think if you
could prove a benefit, then any proof you provided could be used to
improve the already existing caching layers, and would apply equally
to read-only or read-write pages. For example, why not be able to
hint to PostgreSQL that a disk-based table should be considered a
priority to keep in RAM. That way, PostgreSQL would avoid pushing
pages from this table out.
If memcached (or pgmemcached implemented in triggers) can show a speed
improvement using ram based caching (even with network overhead) of
specific data then it stands to reason that this ram based cache can be integrated into postgres with better integration that will overcome the
issues that pgmemcached has.

...
I think the memcache people are thinking that the cost of PostgreSQL is
about the disk. Although the disk plays a part, I'm pretty sure it's
only a fraction. Not providing transaction guarantees, not providing an
SQL level abstraction, and not having multiple processes or threads
plays a much bigger part.


Forgive my intrusion and perhaps simplistic viewpoint, however, improved caching would be of great benefit for me as a web developer.

I wholeheartedly agree that the disk IO is often a small part of the expense of obtaining data from the database, especially for the nominal web based application. Query parsing, joining, sorting, etc. are all likely to be real culprits. The existing caching mechanism (as I understand them) and especially the kernel disk caches do nothing to eliminate these overhead costs.

I would venture that the 80/20 rule applies here as in many, many other instances. A full 80+% of the queries performed against the database are performed over and over and over again with the same criteria for a period of time and then the criteria changes for the next period of time. This would be particularly true for seldom changed tables that, for example, contain a list of the day's advertisements. The data is changed slowly, once a day or once a week, but, a query is made for every page hit. Usually the exact same query.

I know, for you purists out there, that this is an obvious call for an application level cache. Perhaps so, however, it complicates the end-programmer environment _and_ it has consistency disadvantages. Many of the programming languages being used provide direct interfaces to PostgreSQL (not surprising given that the programmers are using PostgreSQL) and some may even provide a caching mechanism. Best case, integrating the two remains a task for the end- programmer, worse case, the end-programmer has to implement the cache as well. Rolling a cache into the database removes that complexity by incorporating it into the existing PostgreSQL API. (BTW, I'm aware that the consistency disadvantages of the application level cache could probably be overcome by implementing notify in the cache but, again, at added end-programmer expense.)

Getting back to the original posting, as I remember it, the question was about seldom changed information. In that case, and assuming a repetitive query as above, a simple query results cache that is keyed on the passed SQL statement string and that simply returns the previously cooked result set would be a really big performance win.

Registering each cache entry by the tables included in the query and invalidating the cache during on a committed update or insert transaction to any of the tables would, transparently, solve the consistency problem.

Does this improve the "more interesting" case of heavily updated tables? Not likely, however, for many web applications, it will likely improve 80% of the queries leaving more cycles (and bandwidth) available for the non-cacheable queries.

There would be other issues as well, for example, non-invalidated cache entries will accumulate rapidly if the criteria changes often, large result sets will cause cache contention, cursors will (likely) be unable to use the cache, syntax/commands for manipulating cacheability, etc. THIS DOES NOT ELIMINATE THE BASIC VALUE of a results cache for the conditions specified above. Conditions that I would venture to say make up a large part of the queries that are (or could be) made by a web application.

Thanks,
  F

--
F Harvell
407 467-1919




---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to