On Sun, Apr 08, 2001 at 12:46:43AM +0300, Michael Widenius wrote:
> 
> Hi!
> 
> >>>>> "Tim" == Tim Bunce <[EMAIL PROTECTED]> writes:
> 
> Tim> On Sat, Apr 07, 2001 at 02:14:42AM +0300, Michael Widenius wrote:
> >> 
> >> The reason singly queries are slower are of course that the
> >> initialization phase (reading the data from a socket, parsing,
> >> locking, checking which keys to use) is about half of the query time.
> >> 
> Peter> Heikki made tests which  also shows some strange things - for
> Peter> example why INSERT is FASTER then SELECT.
> >> 
> >> I haven't seen the test but I can imagine this is true in some
> >> context.  The reason for this is that a SELECT has to go through many
> >> optimization stages to find out what indexes to use and what queries
> >> to do.  This is one basic fault with SQL;  The optimizer has to do a
> >> lot of work...
> 
> Tim> Most high-end relational databases address this by...
> 
> Tim> a) storing the query execution plan etc in a cache keyed by the sql
> Tim>    statement text. That way, if another statement with the same text is
> Tim>    executed a ready-made execution plan is available.
> 
> Which databases are you referring to?  

Oracle is the one I know best.

> According to the benchmarks I have run no one of the big database
> vendors does this very good by default. The problem with SQL is that
> the above approach doesn't work very well when you have more than one
> key in use by a query or the keys are unevenly distributed.

You're right in as much as Oracle will sometimes execute a query faster
when constants are used instead of placeholders because it can generate
a better execution plan.

But I've personally seen many cases where overall performance has
jumped _dramatically_ by changing the majority of queries to use
placeholders because the global gains from reuse outweigh the
occasional minor loss from a more generic/conservative execution plan.

> Even with a cached query plan, you will still loose the time for
> sending the query, the parsing and initialization of the query.

But you wouldn't need to do a full parse on the query - just the simple
'abstraction'. I'd also expect at least some of the 'initialization' to
be cacheable along with the query execution plan.

> As the optimizer in MySQL is only a minor part of the total time for
> most queries, this will not give us any major speed increase.

But you wouldn't need to do a full parse on the query either. The parsed
query data structure plus info about which keys to use etc etc would all
be 'instantly' available after one byte-level copy of the statement plus
an in-memory hash lookup. A dramatic reduction in code path.

> Prepared statements would help a bit, at least if the developer could
> give hints to the optimizer that it only need to use a given set of indexes
> (This would of course fail for some queries, but generally this would
> be good).

Umm, that's not making much sense to me. I take "Prepared statement" to
mean that the set of indices has been worked out once and cached so that
future executions of the same statement don't have to work it out again.

>From what you're saying below you're taking it to mean, effectively, a
'template statement' into which the placeholder values are logically
placed before then acting as if the whole statement had arrived over
the socket. I.e., little caching of associated info.

> Tim> b) to make that effective they support placeholders that abstract out
> Tim>    literal values from the statement text, so the cached plan can be
> Tim>    reused regardless of the literal values boind to the placeholders
> Tim>    for a particular execution.
> 
> For any placeholder that is a key, we would still need to do a lot of
> the work that we do today for any query.

Optionally.

> Tim> I appreciate that doing (b) would require major changes to the protocol
> Tim> etc, but it's just occured to me that there's a very simple way to
> Tim> avoid that but still get the benefits of (a)...
> 
> We will add prepared statements in MySQL 4; At this point we don't intend
> to cache the query plan but only avoid all parsing overhead (all
> parameters will be sent in binary format, which will be MUCH faster)

So you're adding placeholders in v4? Great!

> Tim> Imagine if, when a statement arrived, mysqld made a char-by-char copy,
> Tim> but in that copy skipped out the literal values and kept a seperate
> Tim> list of those. That would be a very fast and simple piece of code.
> 
> Tim> That 'abstracted' statement could then be used as the key to the
> Tim> statement cache. If it matched an entry in the cache then mysql
> Tim> could skip the generation of the query execution plan!
> 
> Tim> (To simplify access rights issues you could also add the username to
> Tim> the abstracted statement.)
> 
> Tim> What do you think Monty?
> 
> My estimate (which could be wrong) is that we would get a 10 % speedup
> from the current setup. The major slow point here (I think) is sending the
> queries between the client and server.

Umm, don't most simple queries fit in one TCP/IP ethernet MTU packet?

(And I trust that your new placeholder/parameters API will not talk to
the server when setting each but batch them up and combind it with the
excute request.)

> What we also need to do is to benchmark the different sections in the
> MySQL code to be able to know how much a gain we can do by adding some
> kind of cached plans.  I just hope that having to store and retrieve
> query plans will not cause almost as much overhead than the gain we
> would get from this.

You'd only need a simple main-memory hash with a limited size. I'd
avoid being tempted to use a more heavyweight cache such as your hash
table handlers.

> Another option would be to allow the user to specify the query plan
> MySQL should use. (In other words, more options like STRAIGHT_SELECT
> and use_index(...)).  Theoretically this could give almost as good
> performance than cached query plans.

But only when humans get it right. Using cached query plans lets the
server generate a good, if slightly conservative, plan each time the
plan is cached (after restarts, table alters etc)

Oracle lets you use extensive hints, but it's easy to 'over hint' and
embed too much 'physical' info into an otherwise purely logical select
statement (creating problems with ongoing db maintenance etc).

The best thing to do is support both, like Oracle does, and suggest the
use of hints primarily as a way of fine tuning the slightly
conservative execution plan that results from using placeholders
(eg by specifying that "key2=?" will actually match only 1 row).

Tim.

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to