Re: why does "select * from table oder by indexed_field" not use key?

2012-07-11 Thread Stephen Tu
On Wed, Jul 11, 2012 at 5:46 AM, Reindl Harald wrote:

> > mysql> show profiles;
> >
> +--++--+
> > | Query_ID | Duration   | Query
> > |
> >
> +--++--+
> > |1 | 0.32261700 | SELECT SQL_NO_CACHE * FROM
> > cms1_quickbar_groups force key (qbq_key) ORDER BY qg_sort ASC |
> > |2 | 0.24592100 | SELECT SQL_NO_CACHE * FROM
> > cms1_quickbar_groups ORDER BY qg_sort ASC |
> >
> +--++--+
> > 2 rows in set (0.00 sec)
>
> this may be true for small data where it does not matter at all
> but if this would be a large table it would cause a lot of I/O
>
>
While I agree with you that Ewen's microbenchmark is not the most
convincing, I do agree with his reasoning, and I encourage you to try
benchmarking both options on your dataset. Remember to flush both the OS
disk cache and the mysql buffer pool between runs, so that your benchmarks
are actually reflecting cold runs instead of partially warmed up runs.

So why do I believe no index is faster for your particular query? Well, a
secondary index (qbq_key in your case) is usually key/value pairs of the
form [index key, page ID pointing to tuple]. So if we answered your query
with qbq_key, we don't need to do a sort, *but* we'll need to do roughly
one disk seek for each key in the index (I'm assuming here that qg_sort
values don't have any strong correlation with qg_id, and that the pages for
the table aren't already in the buffer pool/OS disk cache).

Compare that with a case where we don't use an index, so we must do a
filesort. Since mysql has an optimization (
https://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html - read
the "modified filesort algorithm" section) which stores the entire tuple
(if under a threshold size controlled by max_length_for_sort_data, which is
1024 bytes by default which your schema seems to fall under) when sorting
instead of just the [sort key, page ID], the filesort can actually avoid
much of random disk seeks (since mergesort itself is a very sequential IO
heavy algorithm). So for this plan, you end up basically doing several
sequential IOs over the entire table (whereas the previous plan just had to
read the entire table once, albeit randomly). Most likely the mysql
optimizer has calculated that several sequential scans over the table are
much faster than a bunch of random disk seeks which reads the table at
once. In fact, I believe the general rule of thumb for DB optimizers is
that if you need to read more than 10% of a table from an index, you are
better off using a sequential scan.

Like I said, I encourage you to measure the performance to convince
yourself that mysql is actually doing the right thing.


Re: Commit commands with SELECT

2012-04-13 Thread Stephen Tu
Wrapping even just selects around a transaction absolutely matters,
depending if you care about isolation.

Consider the following two clients running on the same mysql instance, w/
--transaction_isolation=serializable. Suppose we have the following innodb
table:
CREATE TABLE FOO (i INTEGER, j INTEGER);

Client 1:
SELECT * FROM foo WHERE i = 0;
SELECT * FROM foo WHERE i = 0;

Client 2:
UPDATE foo SET j = 1 WHERE i = 0;

Suppose the table starts out with a single tuple (0, 0). Now, if client 1
and client 2 are running at the same time, wrapping client 1's select
statements with a BEGIN/COMMIT removes the possibility of the following
interleaving:

C1: SELECT * FROM foo WHERE i = 0;
C2: UPDATE foo SET j = 1 WHERE i = 0;
C1: SELECT * FROM foo WHERE i = 0;

Without the BEGIN/COMMIT, the interleaving above is completely valid. Now
to answer your questions.

> On 09.04.2012, at 11:38, Rozeboom, Kay [DAS] wrote:
> >
> >> We have an application with blocks of code that begin with setting
> autocommit off, and end with a commit.  The code in between does only
> selects, no updating.
> >>
> >> 1)  Am I correct in thinking that the autocommit and commit
> statements don't really accomplish anything useful?
>

No, you need to reason about whether or not the select statements need to
run in isolation for correctness (like the above example).


> >> 2)  If the autocommit and commit statements are unneeded, do they
> add enough additional overhead that I should be concerned about them?
>

I don't think you gain any overhead by using explicit transactions. For
instance, a single select statement (I believe) is really equivalent to

BEGIN; SELECT ...; COMMIT;

However, you do incur overhead in the sense that the longer your
transaction block is, the more time you spend holding (read) locks, and
thus excluding writers.

Hope that helps,
--
Stephen Tu