Re: [HACKERS] Database Caching

2002-08-26 Thread Karel Zak

On Sun, Aug 25, 2002 at 09:35:24PM -0400, J. R. Nield wrote:
> I'm not sure about query result caching or 'relation caching', since the
> first would seem to run into problems with concurrent updates, and the
> second is sort-of what the buffer cache does.
> 
> Query plan caching sounds like a really good idea though. Neil Conway's
> PREPARE patch already does this for an individual backend. Do you think
> it would be hard to make it use shared memory, and check if a query has
> already been prepared by another backend? Maybe it could use something
> like a whitespace insensitive checksum for a shared hash key.

 The original version of query plan cache allows exactly this. But
 after some discussion the shared memory usage in qcache was remove.

 I think better and more robus solution is store cached planns in
 backend memory and allows to run backend as persistent (means not
 startup/stop for each client connection).

Karel

-- 
 Karel Zak  <[EMAIL PROTECTED]>
 http://home.zf.jcu.cz/~zakkr/
 
 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Database Caching

2002-08-25 Thread Curt Sampson

On Sun, 25 Aug 2002, Bruce Momjian wrote:

> Do we want to add "query caching" to the TODO list, perhaps with a
> question mark?

I'd love to have query plans cached, preferably across backends.

cjs
-- 
Curt Sampson  <[EMAIL PROTECTED]>   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Database Caching

2002-08-25 Thread J. R. Nield

I'm not sure about query result caching or 'relation caching', since the
first would seem to run into problems with concurrent updates, and the
second is sort-of what the buffer cache does.

Query plan caching sounds like a really good idea though. Neil Conway's
PREPARE patch already does this for an individual backend. Do you think
it would be hard to make it use shared memory, and check if a query has
already been prepared by another backend? Maybe it could use something
like a whitespace insensitive checksum for a shared hash key.

Regards,

John Nield

On Sun, 2002-08-25 at 20:15, Bruce Momjian wrote:
> 
> Do we want to add "query caching" to the TODO list, perhaps with a
> question mark?
> 
> ---
> 
> Greg Sabino Mullane wrote:
[snip]
> 
-- 
J. R. Nield
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Database Caching

2002-08-25 Thread Bruce Momjian


Do we want to add "query caching" to the TODO list, perhaps with a
question mark?

---

Greg Sabino Mullane wrote:
[ There is text before PGP section. ]
> 
[ PGP not available, raw data follows ]
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> I had to make a relatively long drive yesterday, so I had lots of free time 
> to do some thinking...and my thoughts were turning to caching and databases. 
> The following is what I came up with: forgive me if it seems to be just an 
> obvious ramble...
> 
> Why does a database need caching?
> 
> Normally, when one thinks of a database (or to be precise, a RDBMS) the 
> ACID acronym comes up. This is concerned with having a stable database that 
> can reliably be used by many users at the same time. Caching a query is 
> unintuitive because it involves sharing information from transactions that 
> may be separated by a great amount of time and/or by different users. 
> However, from the standpoint of the database server, caching increases 
> efficiency enormously. If 800 users all make the same query, then caching 
> can help the database server backend (hereafter simply "database") to 
> save part or all of the work it performs so it doesn't have to repeat the 
> entire sequence of steps 800 times.
> 
> What is caching?
> 
> Caching basically means that we want to save frequently-used information 
> into an easy to get to area. Usually, this means storing it into memory. 
> Caching has three main goals: reducing disk access, reducing computation 
> (i.e. CPU utilization), and speeding up the time as measured by how long a 
> it takes a user to seea  result.  It does all this at the expense of RAM, 
> and the tradeoff is almost always worth it.
> 
> In a database, there are three basic types of caching: query results, 
> query plans, and relations.
> 
> The first, query result caching, simply means that we store into memory 
> the exact output of a SELECT query for the next time that somebody performs 
> that exact same SELECT query. Thus, if 800 people do a "SELECT * FROM foo", 
> the database runs it for the first person, saves the results, and simply 
> reads the cache for the next 799 requests. This saves the database from doing 
> any disk access, practically removes CPU usage, and speeds up the query.
> 
> The second, query plan caching, involves saving the results of the optimizer, 
> which is responsible for figuring out exactly "how" the databse is going to 
> fetch the requested data. This type of caching usually involves a "prepared" 
> query, which has almost all of the information needed to run the query with 
> the exception of one or more "placeholders" (spots that are populated with 
> variables at a later time). The query could also involve non-prepared 
> statments as well. Thus, if someone prepares the query "SELECT flavor FROM 
> foo WHERE size=?", and then executes it by sending in 300 different values 
> for "size", the prepared statement is run through the optimizer, the r
> esulting path is stored into the query plan cache, and the stored path is 
> used for the 300 execute requests. Because the path is already known, the 
> optimizer does not need to be called, which saves the database CPU and time.
> 
> The third, relation caching, simply involves putting the entire relation 
> (usually a table or index) into memory so that it can be read quickly. 
> This saves disk access, which basically means that it saves time. (This type 
> of caching also can occur at the OS level, which caches files, but that will 
> not be discussed here).
> 
> Those are the three basic types of caching, ways of implementing each are 
> discussed below. Each one should complement the other, and a query may be 
> able to use one, two, or all three of the caches.
> 
> I. Query result caching:
> 
> A query result cache is only used for SELECT queries that involve a 
> relation (i.e. not for "SELECT version") Each cache entry has the following 
> fields: the query itself, the actual results, a status, an access time, an 
> access number, and a list of all included columns. (The column list actually 
> tells as much information as needed to uniquely identify it, i.e. schema, 
> database, table, and column). The status is merely an indicator of whether or 
> not this cached query is valid. It may not be, because it may be invalidated 
> for a user within a transaction but still be of use to others. 
> 
> When a select query is processed, it is first parsed apart into a basic common 
> form, stripping whitespace, standardizing case, etc., in order to facilitate 
> an accurate match. Note that no other pre-processing is really required, 
> since we are only interested in exact matches that produce the exact same 
> output. An advanced version of this would ideally be able to use the cached 
> output of "SELECT bar,baz FROM foo" when it receives the query "SELECT 
> baz,bar FROM foo", but tha