Re: Explanation of multiple-column indexes

2003-08-27 Thread Jesse Sheidlower
On Tue, Aug 26, 2003 at 09:26:55PM -0500, Matt W wrote:
> Hi Jesse,
> 
> - Original Message -
> From: "Jesse Sheidlower"
>
> > What I'm trying to understand is how you would set up these
> > indexes when you'd always be doing joins with another table.
> > Suppose you have The Canonical CD Database, and you have a
> > table "songs" with fields "song_id", "album_id", "song_title",
> > and "song_length". Suppose you're often doing searches of
> > song_title or (for some reason) song_length, and that any time
> > you'd do such a search, you'd _always_ be joining it to the
> > "album" table.
> >
> > It would seem that you'd want at least two multiple-indexes in
> > the "song" table, one of them including "song_title" and
> > "album_id", the other including "song_length" and "album_id".
> > Is this correct? Do you need "song_id" (which would be a
> > primary key on that table) in there too? What order should
> > the indexes be in?
> 
> You wouldn't necessarily want indexes on (song_title, album_id) -- in
> that order -- and/or (song_length, album_id). This reason for this is
> because if any other columns from the song table are involved in the
> query (in the select list or in the WHERE), MySQL will have to hit the
> data file for those columns anyway, and album_id as the second column in
> the index won't be used -- just the first column -- song_title or
> song_length -- if you're searching on them. However, if only the 2
> columns in the index are involved in the query (searching on title or
> length and join with album_id), then having album_id in the index would
> be benficial because no seek to the data file is needed. This can be
> verified by seeing if EXPLAIN says "Using index" for the song table.

[...]

> > If every search for song_title or song_length must be joined
> > against the album table, it's not clear which should be the
> > first named column in this index. The experiments I've done
> > so far have been inconclusive, and I don't think I'm understanding
> > the process in the first place.
> 
> The indexes would be:
> 
> (song_title, album_id)
> (song_length, album_id)
> 
> If the order was reversed (e.g. album_id was first in the index), the
> index couldn't be used for searching.
> 
> And like I said above, if other columns besides the 2 in the index are
> involved in the query, album_id isn't used anyway. In that case, just
> index title and length separately for searching:
> 
> (song_title)
> (song_length)
> 
> Of course, if you included ALL columns that will be used in queries in
> each index (with title or length as the first column in each), then it
> wouldn't have to go to the data file and EXPLAIN would say "Using
> index." But this doesn't usually give much speed improvement and is a
> waste of space. Just letting you know. :-)

Well, if speed rather than space is my main concern, _and_ I can't
predict what the searches will be--i.e. it's quite possible that 
some searches will be only song_title, some will be only song_length,
and some will be both, and some will involve other combinations of
fields not shown in this made-up example--then what? Do I have
several multiple-column indexes, each with (song_title, album_id)
or whatever for each field, along with single-column indexes for
everything (even though every query would be joined on album_id)?
Or do I need to have three- or more-column indexes for all the
potential groups that might be searched? (song_title, song_length,
album_id), (song_title, song_something_else, album_id), etc.?

Thanks very much.

Best,

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Explanation of multiple-column indexes

2003-08-27 Thread Matt W
Hi Jesse,

- Original Message -
From: "Jesse Sheidlower"
Sent: Monday, August 25, 2003 8:26 AM
Subject: Explanation of multiple-column indexes

> After some discussion in a separate thread, I've been trying
> to get a better understanding of the workings of multiple-column
> indexes, and think I'm still missing the point. I understand
> indexing (last_name, first_name) in that order if you'd always
> be searching last names and only be searching first names in
> combination.

Correct.


> What I'm trying to understand is how you would set up these
> indexes when you'd always be doing joins with another table.
> Suppose you have The Canonical CD Database, and you have a
> table "songs" with fields "song_id", "album_id", "song_title",
> and "song_length". Suppose you're often doing searches of
> song_title or (for some reason) song_length, and that any time
> you'd do such a search, you'd _always_ be joining it to the
> "album" table.
>
> It would seem that you'd want at least two multiple-indexes in
> the "song" table, one of them including "song_title" and
> "album_id", the other including "song_length" and "album_id".
> Is this correct? Do you need "song_id" (which would be a
> primary key on that table) in there too? What order should
> the indexes be in?

You wouldn't necessarily want indexes on (song_title, album_id) -- in
that order -- and/or (song_length, album_id). This reason for this is
because if any other columns from the song table are involved in the
query (in the select list or in the WHERE), MySQL will have to hit the
data file for those columns anyway, and album_id as the second column in
the index won't be used -- just the first column -- song_title or
song_length -- if you're searching on them. However, if only the 2
columns in the index are involved in the query (searching on title or
length and join with album_id), then having album_id in the index would
be benficial because no seek to the data file is needed. This can be
verified by seeing if EXPLAIN says "Using index" for the song table.

Note: One of the exceptions where it wouldn't say "Using index" and
would have to consult the data file is if song_title, for example, has
only a prefix index on the first n characters (e.g. created with KEY
(song_title(10), album_id), instead of KEY (song_title, album_id), which
indexes the FULL column).

And no, including song_id in one of these indexes wouldn't be of any
use.


> If every search for song_title or song_length must be joined
> against the album table, it's not clear which should be the
> first named column in this index. The experiments I've done
> so far have been inconclusive, and I don't think I'm understanding
> the process in the first place.

The indexes would be:

(song_title, album_id)
(song_length, album_id)

If the order was reversed (e.g. album_id was first in the index), the
index couldn't be used for searching.

And like I said above, if other columns besides the 2 in the index are
involved in the query, album_id isn't used anyway. In that case, just
index title and length separately for searching:

(song_title)
(song_length)

Of course, if you included ALL columns that will be used in queries in
each index (with title or length as the first column in each), then it
wouldn't have to go to the data file and EXPLAIN would say "Using
index." But this doesn't usually give much speed improvement and is a
waste of space. Just letting you know. :-)



> Thanks very much.
>
> Jesse Sheidlower
>

You're welcome. Hope that helped. If you have any more questions, ask
away. :-)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Explanation of multiple-column indexes

2003-08-25 Thread Jesse Sheidlower

After some discussion in a separate thread, I've been trying
to get a better understanding of the workings of multiple-column
indexes, and think I'm still missing the point. I understand
indexing (last_name, first_name) in that order if you'd always
be searching last names and only be searching first names in
combination.

What I'm trying to understand is how you would set up these
indexes when you'd always be doing joins with another table.
Suppose you have The Canonical CD Database, and you have a
table "songs" with fields "song_id", "album_id", "song_title",
and "song_length". Suppose you're often doing searches of
song_title or (for some reason) song_length, and that any time
you'd do such a search, you'd _always_ be joining it to the 
"album" table.

It would seem that you'd want at least two multiple-indexes in
the "song" table, one of them including "song_title" and
"album_id", the other including "song_length" and "album_id".
Is this correct? Do you need "song_id" (which would be a
primary key on that table) in there too? What order should
the indexes be in?

If every search for song_title or song_length must be joined
against the album table, it's not clear which should be the
first named column in this index. The experiments I've done
so far have been inconclusive, and I don't think I'm understanding
the process in the first place.

Thanks very much.

Jesse Sheidlower


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]