Do we want to add "query caching" to the TODO list, perhaps with a
question mark?
---
Greg Sabino Mullane wrote:
[ There is text before PGP section. ]
>
[ PGP not available, raw data follows ]
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
>
> I had to make a relatively long drive yesterday, so I had lots of free time
> to do some thinking...and my thoughts were turning to caching and databases.
> The following is what I came up with: forgive me if it seems to be just an
> obvious ramble...
>
> Why does a database need caching?
>
> Normally, when one thinks of a database (or to be precise, a RDBMS) the
> ACID acronym comes up. This is concerned with having a stable database that
> can reliably be used by many users at the same time. Caching a query is
> unintuitive because it involves sharing information from transactions that
> may be separated by a great amount of time and/or by different users.
> However, from the standpoint of the database server, caching increases
> efficiency enormously. If 800 users all make the same query, then caching
> can help the database server backend (hereafter simply "database") to
> save part or all of the work it performs so it doesn't have to repeat the
> entire sequence of steps 800 times.
>
> What is caching?
>
> Caching basically means that we want to save frequently-used information
> into an easy to get to area. Usually, this means storing it into memory.
> Caching has three main goals: reducing disk access, reducing computation
> (i.e. CPU utilization), and speeding up the time as measured by how long a
> it takes a user to seea result. It does all this at the expense of RAM,
> and the tradeoff is almost always worth it.
>
> In a database, there are three basic types of caching: query results,
> query plans, and relations.
>
> The first, query result caching, simply means that we store into memory
> the exact output of a SELECT query for the next time that somebody performs
> that exact same SELECT query. Thus, if 800 people do a "SELECT * FROM foo",
> the database runs it for the first person, saves the results, and simply
> reads the cache for the next 799 requests. This saves the database from doing
> any disk access, practically removes CPU usage, and speeds up the query.
>
> The second, query plan caching, involves saving the results of the optimizer,
> which is responsible for figuring out exactly "how" the databse is going to
> fetch the requested data. This type of caching usually involves a "prepared"
> query, which has almost all of the information needed to run the query with
> the exception of one or more "placeholders" (spots that are populated with
> variables at a later time). The query could also involve non-prepared
> statments as well. Thus, if someone prepares the query "SELECT flavor FROM
> foo WHERE size=?", and then executes it by sending in 300 different values
> for "size", the prepared statement is run through the optimizer, the r
> esulting path is stored into the query plan cache, and the stored path is
> used for the 300 execute requests. Because the path is already known, the
> optimizer does not need to be called, which saves the database CPU and time.
>
> The third, relation caching, simply involves putting the entire relation
> (usually a table or index) into memory so that it can be read quickly.
> This saves disk access, which basically means that it saves time. (This type
> of caching also can occur at the OS level, which caches files, but that will
> not be discussed here).
>
> Those are the three basic types of caching, ways of implementing each are
> discussed below. Each one should complement the other, and a query may be
> able to use one, two, or all three of the caches.
>
> I. Query result caching:
>
> A query result cache is only used for SELECT queries that involve a
> relation (i.e. not for "SELECT version") Each cache entry has the following
> fields: the query itself, the actual results, a status, an access time, an
> access number, and a list of all included columns. (The column list actually
> tells as much information as needed to uniquely identify it, i.e. schema,
> database, table, and column). The status is merely an indicator of whether or
> not this cached query is valid. It may not be, because it may be invalidated
> for a user within a transaction but still be of use to others.
>
> When a select query is processed, it is first parsed apart into a basic common
> form, stripping whitespace, standardizing case, etc., in order to facilitate
> an accurate match. Note that no other pre-processing is really required,
> since we are only interested in exact matches that produce the exact same
> output. An advanced version of this would ideally be able to use the cached
> output of "SELECT bar,baz FROM foo" when it receives the query "SELECT
> baz,bar FROM foo", but tha