Re: [HACKERS] Postgresql Caching

Harvell F Mon, 16 Oct 2006 07:56:04 -0700


On 16 Oct 2006, at 4:29, Shane Ambler wrote:

Harvell F wrote:
Getting back to the original posting, as I remember it, thequestion was about seldom changed information. In that case, andassuming a repetitive query as above, a simple query results cachethat is keyed on the passed SQL statement string and that simplyreturns the previously cooked result set would be a really bigperformance win.
I believe the main point that Mark made was the extra overhead isin the sql parsing and query planning - this is the part thatpostgres won't get around. Even if you setup simple tables forcaching it still goes through the parser and planner and looses thebenefits that memcached has. Or you fork those requests before theplanner and loose the benefits of postgres.The main benefit of using memcached is to bypass the parsing andquery planning.

That was the basis of my suggestion to just use the passed querystring as the key. No parsing or processing of the query, just asimple string match.

You will find there is more to sql parsing than you first think, itneeds to find the components that make up the sql statement (tablescolumn names functions) and check that they exist and can be usedin the context of the given sql and the given data matches thecontext that is given to be used in, it needs to check that thecurrent user has enough privileges to perform the requested task,then it locates the data whether it be in the memory cache, on diskor an integrated version of memcached, this would also includechecks to make sure another user hasn't locked the data to changeit and whether there exists more than one version of the data,committed and uncommitted and then sends the results back to theclient requesting it.

The user permissions checking is a potential issue but again, forthe special case of repeated queries by the same user (the webserverprocess) for the same data, a simple match of the original querystring _and_ the original query user, would still be very simple.The big savings by having the simple results cache would be theelimination of the parsing, planning, locating, combining, andsorting of the results set.

I don't believe normal locking plays a part in the cache (thereare basic cache integrity locking issues though) nor does theversioning or commit states, beyond the invalidation of the cacheupon a commit to a referenced table. It may be that the invalidationneeds to happen whenever a table is locked as well. (The hooks forthe invalidation would be done during the original caching of theresults set.)

I know that the suggestion is a very simple minded suggestion andis limited to a very small subset of the potential query types andinteractions, however, at least for web applications, it would be avery big win. Many website want to display today's data on theirwebpage and have it change as dates change (or as users change). Thedata in the source table doesn't change very often (especiallycompared to a popular website) and the number of times that the exactsame query could be issued between changes can measure into thehundreds of thousands or more. Putting even this simple resultscache into the database would really simplify the programmer's lifeand improve reliability (and the use of PostgreSQL).




---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [HACKERS] Postgresql Caching

Reply via email to