Re: Slow results with simple, well-indexed query

2003-08-22 Thread Keith C. Ivey
On 21 Aug 2003 at 21:38, Jesse Sheidlower wrote:

 Huh, I was told the exact opposite, that if most of the
 entries are smaller than the maximum length of the field, you
 should use an index about the size you expect most entries to
 be. Why would you ever use a shorter index than the full column
 length if it led to such performance degradation?

It depends on the query (as well as your data).  In your case, the 
only column you're selecting is cw, so if all of cw is in the index 
MySQL can use the index alone and never has to look at the data file. 
That speeds things up quite a bit.  If you were selecting multiple 
columns the difference might not be so great.

Another point is that you're sorting by cw, and a prefix-based index 
won't allow you to sort completely.  If your queries were mainly 
selecting by cw rather than sorting by it, a prefix-based index 
should be fine.

-- 
Keith C. Ivey [EMAIL PROTECTED]
Tobacco Documents Online
http://tobaccodocuments.org


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-22 Thread Steven Roussey
 Here's the CREATEs, somewhat edited to remove parts not relevant
 to this discussion, to save space:

I never actually looked at your JOIN statement more than a quick
glimpse, but I will (though not just right now). Before I do, can you
try this (I still don't have data or I'd play with it myself:

 mysql EXPLAIN SELECT cg.cw FROM cg USE INDEX(q_id_2),q,cit,sref
 - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
sref.id
 - AND cg.cw LIKE 't%'
 - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 - ORDER BY cg.cw
 - LIMIT 1000,10;

I wanted it to use that new index, and it chose not to, so we can show
it the way. (You can also try FORCE INDEX instead of USE INDEX). 

In the meantime, I'll actually read over your table descriptions and the
JOIN.

--steve-


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-22 Thread Jesse Sheidlower
On Fri, Aug 22, 2003 at 10:23:37AM -0400, Keith C. Ivey wrote:
 On 21 Aug 2003 at 21:38, Jesse Sheidlower wrote:
 
  Huh, I was told the exact opposite, that if most of the
  entries are smaller than the maximum length of the field, you
  should use an index about the size you expect most entries to
  be. Why would you ever use a shorter index than the full column
  length if it led to such performance degradation?
 
 It depends on the query (as well as your data).  In your case, the 
 only column you're selecting is cw, so if all of cw is in the index 
 MySQL can use the index alone and never has to look at the data file. 
 That speeds things up quite a bit.  If you were selecting multiple 
 columns the difference might not be so great.
 
 Another point is that you're sorting by cw, and a prefix-based index 
 won't allow you to sort completely.  If your queries were mainly 
 selecting by cw rather than sorting by it, a prefix-based index 
 should be fine.

Hmm. This is a single case; as mentioned in other queries one
might be selecting by other values, and I'd usually be
selecting multiple column. The sort, however, is always done
either on cg.cw, or on cit.d (which I may not have shown), a
date field.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-22 Thread Jesse Sheidlower
On Fri, Aug 22, 2003 at 07:33:56AM -0700, Steven Roussey wrote:
  Here's the CREATEs, somewhat edited to remove parts not relevant
  to this discussion, to save space:
 
 I never actually looked at your JOIN statement more than a quick
 glimpse, but I will (though not just right now). Before I do, can you
 try this (I still don't have data or I'd play with it myself:
 
  mysql EXPLAIN SELECT cg.cw FROM cg USE INDEX(q_id_2),q,cit,sref
  - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
 sref.id
  - AND cg.cw LIKE 't%'
  - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
  - ORDER BY cg.cw
  - LIMIT 1000,10;

The actual SELECT wasn't really any faster; the first run was
4.05 sec (OK, a lot faster than the 1 m 15 sec it had been
taking) but the second run was 3.66 sec, compared to the 3.5 sec
or so it had been taking.

Here's the EXPLAIN:


mysql EXPLAIN SELECT cg.cw FROM cg USE INDEX(q_id_2),q,cit,sref
-  WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw LIKE 't%'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- ORDER BY cg.cw
- LIMIT 1000,10\G
*** 1. row ***
table: sref
 type: range
possible_keys: PRIMARY,cd
  key: cd
  key_len: 4
  ref: NULL
 rows: 3102
Extra: Using where; Using temporary; Using filesort
*** 2. row ***
table: cit
 type: ref
possible_keys: PRIMARY,sref_id
  key: sref_id
  key_len: 4
  ref: sref.id
 rows: 3
Extra:
** 3. row ***
table: q
 type: ref
possible_keys: PRIMARY,cit_id
  key: cit_id
  key_len: 4
  ref: cit.id
 rows: 31
Extra:
*** 4. row ***
table: cg
 type: ref
possible_keys: q_id_2
  key: q_id_2
  key_len: 4
  ref: q.id
 rows: 1
Extra: Using where; Using index
4 rows in set (0.00 sec)


Best,

Jesse

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-22 Thread Steven Roussey
 All the indexes were single indexes, partly because I haven't
 yet made the effort to understand composite index. I guess it's
 time ;-).

Oh.

There are better places to start than this list. ;) The manual can be a
great starting place, and several people on this list have written books
about MySQL which are great for getting started. You can look at
Amazon.com, etc.

 mysql EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
 - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
sref.id
 - AND cg.cw LIKE 't%'
 - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 - ORDER BY cg.cw
 - LIMIT 1000,10;

You do a range on the fist and last table in the chain. :( 

It is best to normalize your table structure. I should have noticed that
upfront. The 'simple, well-indexed query' through me off. Normalizing is
a great thing to learn, and probably the first thing to understand after
how to do a SELECT and composite indexes.

 What does this mean for regular searching? In most cases, there will
be
 some criteria entered that need to be searched on, and the id fields
 will also be needed for the joins. For example, in the database, one
 might want to search based on cg.exp (fulltext), sref.rdr, sref.cd
 (the date field), sref.kbd, cit.w, and various other ones I've edited
 out of this display to save space, and often a combination of several
 of these at once. How should I set up indexes for the potential
 searches that might be executed?
 (I should mention that this is a read-only database; it's built from
 a parsed SGML file and is never added to directly, if that's an
 issue.)

Well, I would rewrite the table design. :) It looks like it is taking
its structure from the SGML format. Personally, I'd use that as an
intermediate format in order to populate another set of tables that
would be in normal form. Again, a book will help here.

Sorry I can't be of more help. 

At least we went from 1 m 15 sec to 4.05 sec.

--steve-



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-22 Thread Steven Roussey
Hmmm, just in case you can't change the table layout...

Run this through MySQL. First I get rid of the other index I made, then
add chained indexes so there is no need for data file lookup. Also, one
direction of the query table join chain was not always using the indexes
for the where. One direction is preferable (not knowing what the data
is) since then we can use the index for the sort, but the other
direction may have benefits that outweigh that, so that is why we used
the composite index I last suggested. 

Anyhow, just to be clear, lets force all composite indexes for this
query in both direction and force the optimizer to use both directions
and see what we get (and sorry if I misspell or something, since I don't
have your DB to check against). Run this and send back the results:


# Get rid of the index I added before
ALTER TABLE cg DROP INDEX q_id_2;


# Get a baseline for direction 1
SELECT STRAIGHT_JOIN cg.cw FROM cg,q,cit,sref
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;

EXPLAIN SELECT STRAIGHT_JOIN cg.cw FROM cg,q,cit,sref
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;


# Get a baseline for direction 2
SELECT STRAIGHT_JOIN cg.cw FROM sref,cit,q,cg
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;

EXPLAIN SELECT STRAIGHT_JOIN cg.cw FROM sref,cit,q,cg
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;


# add indexes
ALTER TABLE cg 
ADD INDEX dir1(cw,q_id), 
ADD INDEX dir2(q_id,cw);

ALTER TABLE q 
ADD INDEX dir1(id,cit_id), 
ADD INDEX dir2(cit_id,id);

ALTER TABLE cit 
ADD INDEX dir1(id,sref_id), 
ADD INDEX dir2(sref_id,id);

ALTER TABLE sref 
ADD INDEX dir1(id,cd), 
ADD INDEX dir2(cd,id);


# Get a new result for direction 1
SELECT STRAIGHT_JOIN cg.cw FROM 
cg USE INDEX(dir1),
q USE INDEX(dir1),
cit USE INDEX(dir1),
sref USE INDEX(dir1)
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;

EXPLAIN SELECT STRAIGHT_JOIN cg.cw FROM 
cg USE INDEX(dir1),
q USE INDEX(dir1),
cit USE INDEX(dir1),
sref USE INDEX(dir1)
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;


# Get a new result for direction 2
SELECT STRAIGHT_JOIN cg.cw FROM 
sref USE INDEX(dir2),
cit USE INDEX(dir2),
q USE INDEX(dir2),
cg USE INDEX(dir2)
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;

EXPLAIN SELECT STRAIGHT_JOIN cg.cw FROM 
sref USE INDEX(dir2),
cit USE INDEX(dir2),
q USE INDEX(dir2),
cg USE INDEX(dir2)
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 't%'
AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw
LIMIT 1000,10;

# Send results back via email!!



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-22 Thread Jesse Sheidlower
On Fri, Aug 22, 2003 at 09:03:55AM -0700, Steven Roussey wrote:
  All the indexes were single indexes, partly because I haven't
  yet made the effort to understand composite index. I guess it's
  time ;-).
 
 Oh.
 
 There are better places to start than this list. ;) The manual can be a
 great starting place, and several people on this list have written books
 about MySQL which are great for getting started. You can look at
 Amazon.com, etc.

No, I have them all, I just...I guess haven't gotten around to
them yet. I mistakenly figured that individually indexing
everything in sight would do the trick.

  mysql EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
  - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
 sref.id
  - AND cg.cw LIKE 't%'
  - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
  - ORDER BY cg.cw
  - LIMIT 1000,10;
 
 You do a range on the fist and last table in the chain. :( 
 
 It is best to normalize your table structure. I should have noticed that
 upfront. The 'simple, well-indexed query' through me off. Normalizing is
 a great thing to learn, and probably the first thing to understand after
 how to do a SELECT and composite indexes.

I wonder if anyone else is still reading :-/.

While I don't claim to be an expert by any means, and while it may not
look like it, this database _is_ highly normalized, or at least I think
it is; I put a lot of effort into making it so. Let me just explain what
it all does.

The database holds lexical data for use in linguistic analysis. For the
tables we're looking at now, it works like this:

srefcontains the bibliographic details for a single physical book
or other text, along with some metadata about when the text
was keyed, who read it, etc.

cit contains the bibliographic details for a quotation taken from
a text; there might be only one cit per sref if these details
don't change (e.g. a novel), or there might be many cits per
sref if they do (e.g. a collection of essays, each one written
by a different author at a different date).

q   contains a single quotation.

cg  contains information about specific words.

I believe this is normalized because each element appears only once and
is linked to others as necessary, so that, for example, a single quotation
reading This is a foobarred, conformiferous structiform is stored once
in the database, and foobarred, conformiferous, and structiform
are each stored independently, joined to the q table on the relevant
id fields. So if you had the word structiform and wanted to see the
quotation, you could retrive it by the q_id that's stored in cg. It's
the same for the other elements; there's only one sref section per 
text, so if you're at any other point and want to get the date the
text was read, you have to join your way up to sref to retrieve 
sref.cd.

Though I haven't shown them yet, there are also some other tables,
holding authors at the sref level, subjects at the sref level, and
authors at the cit level; for any of these, there can be any number
of elements (i.e. no subjects, one subject, ten subjects etc.), so
I normalized them by putting them in to separate tables linked
to sref.id or cit.id etc. as appropriate.

So for the queries we've been discussing, I'm looking for all the
words beginning with t that have been keyed in the last six months;
the reason I'm getting from the first and last table in the chain 
should now be obvious. In reality, I wouldn't just be retrieving
the word (the cg.cw), I'd be retrieving both the quote and bibliographic
info in cit, and the (unshown) author related to cit. Other queries
can involve any combination of these factors--searching based on the
fulltext content of a q, on an author, on a work title (perhaps not
shown), on subjects, and so on and so on. I have created (individual)
indexes on all the id fields, and on any value used in a search. In
almost, if not every case, a search will involve a particular value
used for the search as well as the id fields of relevant tables; what
that means for multiple queries, I'm now not sure.

I'll stop there and try to answer the suggestions in your other
message. Thanks again for all the time you've been spending.

Best,

Jesse

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-22 Thread Steven Roussey
After looking over your results, I would keep the dir1 index at least on
the first and last table.

But since this data is read only, why not reformulate the data for the
queries you are going to make? This is the opposite of normalizing, and
will require more disk space, and is not flexible, but it will be fast.
Of course, it depends on what you are doing and how many types of
queries you have. This 'normalize by queries' or what I refer to as
'selective denormalization' likely won't appear in any books. 

It is a technique that I was taught from DBA's that have been doing this
sort of stuff for Fortune100 companies for decades. I use it only
rarely, and only where the data is needed in realtime where the
structure of the data and its quantity would not normally return results
so quickly. It would likely work in your case also.

All the best,

--steve-

PS: Get a big fast SCSI RAID 10 array of 15K drives (stripe the
mirrors). Actually, since this data is readonly and copies are stored
elsewhere, you could get by with RAID 1. I'm guessing you are being held
back by the disk, or your memory buffers.

Oh, and just for fun:

# order the files
ALTER TABLE cg ORDER BY dir1;

ALTER TABLE q ORDER BY dir1;

ALTER TABLE cit ORDER BY dir1;

ALTER TABLE sref ORDER BY dir1;

# Get a new result for direction 1
SELECT STRAIGHT_JOIN cg.cw FROM 
cg USE INDEX(dir1),
q USE INDEX(dir1),
cit USE INDEX(dir1),
sref USE INDEX(dir1)
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND
cg.cw LIKE 't%' AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw LIMIT 1000,10;

# order the files
ALTER TABLE cg ORDER BY dir2;

ALTER TABLE q ORDER BY dir2;

ALTER TABLE cit ORDER BY dir2;

ALTER TABLE sref ORDER BY dir2;


# Get a new result for direction 2
SELECT STRAIGHT_JOIN cg.cw FROM 
sref USE INDEX(dir2),
cit USE INDEX(dir2),
q USE INDEX(dir2),
cg USE INDEX(dir2)
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND
cg.cw LIKE 't%' AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
ORDER BY cg.cw LIMIT 1000,10;



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-22 Thread Jesse Sheidlower
On Fri, Aug 22, 2003 at 12:42:27PM -0700, Steven Roussey wrote:
 
 But since this data is read only, why not reformulate the data for the
 queries you are going to make? This is the opposite of normalizing, and
 will require more disk space, and is not flexible, but it will be fast.
 Of course, it depends on what you are doing and how many types of
 queries you have. This 'normalize by queries' or what I refer to as
 'selective denormalization' likely won't appear in any books. 

Hmm, I guess I expected it to be faster normalized in any case,
and that I shouldn't waste all that effort I put in to 
normalizing it properly. I thought I was learning something :-|.
I'll see if I can experiment and get a sense of the difference
it will make.

 PS: Get a big fast SCSI RAID 10 array of 15K drives (stripe the
 mirrors). Actually, since this data is readonly and copies are stored
 elsewhere, you could get by with RAID 1. I'm guessing you are being held
 back by the disk, or your memory buffers.

In fact, it's already running on a RAID 1 array of 15K SCSI drives.
I wanted to set it up to work well with this data set.

If you want fun, I could show you the numbers I get running the
queries on my development laptop.

 Oh, and just for fun:
 
 # order the files
 ALTER TABLE cg ORDER BY dir1;

[etc.]

I had to substitute the values of dir1 in these queries, but then:

 # Get a new result for direction 1
 SELECT STRAIGHT_JOIN cg.cw FROM 
 cg USE INDEX(dir1),
 q USE INDEX(dir1),
 cit USE INDEX(dir1),
 sref USE INDEX(dir1)
 WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND
 cg.cw LIKE 't%' AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 ORDER BY cg.cw LIMIT 1000,10;

This took 8.18 sec first, 2.29 sec immediately thereafter.

 # order the files
 ALTER TABLE cg ORDER BY dir2;

[etc.]

 # Get a new result for direction 2
 SELECT STRAIGHT_JOIN cg.cw FROM 
 sref USE INDEX(dir2),
 cit USE INDEX(dir2),
 q USE INDEX(dir2),
 cg USE INDEX(dir2)
 WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND
 cg.cw LIKE 't%' AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 ORDER BY cg.cw LIMIT 1000,10;

This took 3.05 sec first, and 2.96 sec immediately thereafter.

Best,

Jesse

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Jesse Sheidlower
On Thu, Aug 21, 2003 at 11:34:00AM -0400, Jesse Sheidlower wrote:
 
 I'm struggling with speed issues on some queries that
 I would have expected to be relatively fast. Perhaps
 even more frustratingly, when I've tried to break 
 these down into their components, they still execute
 very slowly. I've looked over all the relevant suggestions
 for optimization and so forth, and there's nothing I can
 tell that I'm missing.
 
 An example of a query is to get all the words (the cg.cw
 field) in a particular alphabetical range that have been
 added in some timespan (the sref.cd field). The cg table
 has about 3M rows, and the sref table about 70,000; the
 intervening tables are all indexed on the relevant id
 fields:

Sorry, I should have added that these queries are being
run on MySQL 4.0.10 on a lightly loaded PIII 1.4Ghz with
1GB RAM and fast SCSI drives.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-21 Thread Mechain Marc
What is the value of sort_buffer_size, may be you could increase the value for 
having faster ORDER BY (all in memory intead of using temporary file on disk).

Marc.

-Message d'origine-
De : Jesse Sheidlower [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 21 août 2003 17:34
À : [EMAIL PROTECTED]
Objet : Slow results with simple, well-indexed query



I'm struggling with speed issues on some queries that
I would have expected to be relatively fast. Perhaps
even more frustratingly, when I've tried to break 
these down into their components, they still execute
very slowly. I've looked over all the relevant suggestions
for optimization and so forth, and there's nothing I can
tell that I'm missing.

An example of a query is to get all the words (the cg.cw
field) in a particular alphabetical range that have been
added in some timespan (the sref.cd field). The cg table
has about 3M rows, and the sref table about 70,000; the
intervening tables are all indexed on the relevant id
fields:

-
mysql SELECT cg.cw FROM cg,q,cit,sref
- WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw BETWEEN 't' AND 'tzzz'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- ORDER BY cg.cw
- LIMIT 1000,10;
+---+
| cw|
+---+
| teeny-pop |
| teeter|
| teetery   |
| teeth-grating |
| Teflon|
| teflon|
| teflon|
| teflon|
| teflubenzuron |
| Tejano|
+---+
10 rows in set (7.30 sec)
-

That's just too slow; yet an EXPLAIN doesn't make things easy for 
me to see what's wrong:

-
mysql EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
- WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw BETWEEN 't' AND 'tzzz'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- ORDER BY cg.cw
- LIMIT 1000,10\G
*** 1. row ***
table: cg
 type: range
possible_keys: q_id,cw
  key: cw
  key_len: 26
  ref: NULL
 rows: 170982
Extra: Using where; Using filesort
*** 2. row ***
table: q
 type: eq_ref
possible_keys: PRIMARY,cit_id
  key: PRIMARY
  key_len: 4
  ref: cg.q_id
 rows: 1
Extra: 
*** 3. row ***
table: cit
 type: eq_ref
possible_keys: PRIMARY,sref_id
  key: PRIMARY
  key_len: 4
  ref: q.cit_id
 rows: 1
Extra: 
*** 4. row ***
table: sref
 type: eq_ref
possible_keys: PRIMARY,cd
  key: PRIMARY
  key_len: 4
  ref: cit.sref_id
 rows: 1
Extra: Using where
4 rows in set (0.00 sec)
-

Executing just the search on the word table, with no joins to the
table with the dates, is still slow:

-
mysql SELECT cw
- FROM cg
- WHERE cw BETWEEN 's' AND 'szzz'
- ORDER BY cw
- LIMIT 3000,5; 
+-+
| cw  |
+-+
| sacrifice hit   |
| sacrifice play  |
| sacrifice the earth |
| sacrifice throw |
| sacrifice to|
+-+
5 rows in set (5.80 sec)
-

and has a similar EXPLAIN:

-
mysql EXPLAIN SELECT cw FROM cg WHERE cw BETWEEN 's' AND 'szzz' ORDER BY cw LIMIT 
3000,5\G
*** 1. row ***
table: cg
 type: range
possible_keys: cw
  key: cw
  key_len: 26
  ref: NULL
 rows: 318244
Extra: Using where; Using filesort
1 row in set (0.00 sec)
-

Of course cw is indexed. Is there anything I can to do improve queries of this
nature? There are more complicated queries from this database, but the big
slowdown always seems to be when one of the possibilities (e.g. all words 
in 'S') is large; the other limitations don't improve things.

Thanks.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Cybot
Jesse Sheidlower wrote:

I'm struggling with speed issues on some queries that
I would have expected to be relatively fast. Perhaps
even more frustratingly, when I've tried to break 
these down into their components, they still execute
very slowly. I've looked over all the relevant suggestions
for optimization and so forth, and there's nothing I can
tell that I'm missing.

An example of a query is to get all the words (the cg.cw
field) in a particular alphabetical range that have been
added in some timespan (the sref.cd field). The cg table
has about 3M rows, and the sref table about 70,000; the
intervening tables are all indexed on the relevant id
fields:
-
mysql SELECT cg.cw FROM cg,q,cit,sref
- WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw BETWEEN 't' AND 'tzzz'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
move your DATE before cw

AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
AND cg.cw BETWEEN 't' AND 'tzzz'
cause sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH) should be faster 
than cg.cw BETWEEN 't' AND 'tzzz' and so the total rows are already 
limited when cg.cw BETWEEN 't' AND 'tzzz' will be executed

also you can try an index with a length of 2 or 3 over cg.cw, this will 
result in smaller index and possible speed up things

or you add another field (cg.cw_short) with first 2 (or 3 or even 1) 
letter of cg.cw and add an index on this and add to your query:

AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
AND cg.cw_short BETWEEN 't' AND 'tz'
AND cg.cw BETWEEN 't' AND 'tzzz'
pls post here if this hepls out or not

- ORDER BY cg.cw
- LIMIT 1000,10;
+---+
| cw|
+---+
| teeny-pop |
| teeter|
| teetery   |
| teeth-grating |
| Teflon|
| teflon|
| teflon|
| teflon|
| teflubenzuron |
| Tejano|
+---+
10 rows in set (7.30 sec)
-
That's just too slow; yet an EXPLAIN doesn't make things easy for 
me to see what's wrong:

-
mysql EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
- WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw BETWEEN 't' AND 'tzzz'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- ORDER BY cg.cw
- LIMIT 1000,10\G
*** 1. row ***
table: cg
 type: range
possible_keys: q_id,cw
  key: cw
  key_len: 26
  ref: NULL
 rows: 170982
Extra: Using where; Using filesort
*** 2. row ***
table: q
 type: eq_ref
possible_keys: PRIMARY,cit_id
  key: PRIMARY
  key_len: 4
  ref: cg.q_id
 rows: 1
Extra: 
*** 3. row ***
table: cit
 type: eq_ref
possible_keys: PRIMARY,sref_id
  key: PRIMARY
  key_len: 4
  ref: q.cit_id
 rows: 1
Extra: 
*** 4. row ***
table: sref
 type: eq_ref
possible_keys: PRIMARY,cd
  key: PRIMARY
  key_len: 4
  ref: cit.sref_id
 rows: 1
Extra: Using where
4 rows in set (0.00 sec)
-

Executing just the search on the word table, with no joins to the
table with the dates, is still slow:
-
mysql SELECT cw
- FROM cg
- WHERE cw BETWEEN 's' AND 'szzz'
- ORDER BY cw
- LIMIT 3000,5; 
+-+
| cw  |
+-+
| sacrifice hit   |
| sacrifice play  |
| sacrifice the earth |
| sacrifice throw |
| sacrifice to|
+-+
5 rows in set (5.80 sec)
-

and has a similar EXPLAIN:

-
mysql EXPLAIN SELECT cw FROM cg WHERE cw BETWEEN 's' AND 'szzz' ORDER BY cw LIMIT 
3000,5\G
*** 1. row ***
table: cg
 type: range
possible_keys: cw
  key: cw
  key_len: 26
  ref: NULL
 rows: 318244
Extra: Using where; Using filesort
1 row in set (0.00 sec)
-
Of course cw is indexed. Is there anything I can to do improve queries of this
nature? There are more complicated queries from this database, but the big
slowdown always seems to be when one of the possibilities (e.g. all words 
in 'S') is large; the other limitations don't improve things.

Thanks.

Jesse Sheidlower



--
Sebastian Mendel
www.sebastianmendel.de
www.tekkno4u.de
www.nofetish.com
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Slow results with simple, well-indexed query

2003-08-21 Thread Jesse Sheidlower
On Thu, Aug 21, 2003 at 05:59:54PM +0200, Mechain Marc wrote:

 What is the value of sort_buffer_size, may be you could
 increase the value for having faster ORDER BY (all in memory
 intead of using temporary file on disk).

I had previously tried that--I sometimes have big GROUP BY
queries as well--so the sort_buffer_size is now 8M; though
I don't usually have many simultaneous users, I'm still 
nervous about making it much larger than that.

Best,

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Jesse Sheidlower
On Thu, Aug 21, 2003 at 06:01:31PM +0200, Cybot wrote:
 Jesse Sheidlower wrote:
 
 I'm struggling with speed issues on some queries that
 I would have expected to be relatively fast. Perhaps
 even more frustratingly, when I've tried to break 
 these down into their components, they still execute
 very slowly. I've looked over all the relevant suggestions
 for optimization and so forth, and there's nothing I can
 tell that I'm missing.
 
 An example of a query is to get all the words (the cg.cw
 field) in a particular alphabetical range that have been
 added in some timespan (the sref.cd field). The cg table
 has about 3M rows, and the sref table about 70,000; the
 intervening tables are all indexed on the relevant id
 fields:
 
 -
 mysql SELECT cg.cw FROM cg,q,cit,sref
 - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
 - AND cg.cw BETWEEN 't' AND 'tzzz'
 - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 
 move your DATE before cw
 
 AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 AND cg.cw BETWEEN 't' AND 'tzzz'
 
 cause sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH) should be faster 
 than cg.cw BETWEEN 't' AND 'tzzz' and so the total rows are already 
 limited when cg.cw BETWEEN 't' AND 'tzzz' will be executed

I assume that the optimizer would take care of this, but in any
case I gave it a try and it made no difference.

 also you can try an index with a length of 2 or 3 over cg.cw, this will 
 result in smaller index and possible speed up things

I also tried this (the current index is 25 characters on a 100-character
field), and if anything it made things slower.

Anyone have any other ideas or analysis?

Thanks very much.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Hans van Harten
Cybot wrote:
 Jesse Sheidlower wrote:
 An example of a query is to get all the words (the cg.cw
 field) in a particular alphabetical range that have been
 added in some timespan (the sref.cd field). The cg table
 has about 3M rows, and the sref table about 70,000; the
 intervening tables are all indexed on the relevant id
 fields:
 mysql SELECT cg.cw FROM cg,q,cit,sref
 - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
 sref.id
 - AND cg.cw BETWEEN 't' AND 'tzzz'
 - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 move your DATE before cw
 AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 AND cg.cw BETWEEN 't' AND 'tzzz'
I'ld try a changed table list after FROM:  sref, cit, q, cg.
Your query started by selecting a few out of 3M, while some out of 70k still
might be less. Just my rule of thumb: Far faster response keep intermediate
results small by joining in the largest table last.

I prefer to write out all joins ...
SELECT cg.cw FROM sref JOIN cit ON sref.id = cit.sref_id
 JOIN q ON cit.id = q.cit_id
 JOIN cg ON q.id = cg.q_id
WHERE sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
AND cg.cw BETWEEN 't' AND 'tzzz'
ORDER BY cg.cw
LIMIT 1000,10;
... don't think there is any gain in it on MySQL.

HansH


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-21 Thread Allen Weeks
The only thing I can add is check you hardware and OS platform.

Cheers

-Original Message-
From: Jesse Sheidlower [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 21, 2003 11:44 AM
To: Cybot
Cc: [EMAIL PROTECTED]
Subject: Re: Slow results with simple, well-indexed query



On Thu, Aug 21, 2003 at 06:01:31PM +0200, Cybot wrote:
 Jesse Sheidlower wrote:

 I'm struggling with speed issues on some queries that
 I would have expected to be relatively fast. Perhaps
 even more frustratingly, when I've tried to break
 these down into their components, they still execute
 very slowly. I've looked over all the relevant suggestions
 for optimization and so forth, and there's nothing I can
 tell that I'm missing.
 
 An example of a query is to get all the words (the cg.cw
 field) in a particular alphabetical range that have been
 added in some timespan (the sref.cd field). The cg table
 has about 3M rows, and the sref table about 70,000; the
 intervening tables are all indexed on the relevant id
 fields:
 
 -
 mysql SELECT cg.cw FROM cg,q,cit,sref
 - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
sref.id
 - AND cg.cw BETWEEN 't' AND 'tzzz'
 - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)

 move your DATE before cw

 AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 AND cg.cw BETWEEN 't' AND 'tzzz'

 cause sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH) should be faster
 than cg.cw BETWEEN 't' AND 'tzzz' and so the total rows are already
 limited when cg.cw BETWEEN 't' AND 'tzzz' will be executed

I assume that the optimizer would take care of this, but in any
case I gave it a try and it made no difference.

 also you can try an index with a length of 2 or 3 over cg.cw, this will
 result in smaller index and possible speed up things

I also tried this (the current index is 25 characters on a 100-character
field), and if anything it made things slower.

Anyone have any other ideas or analysis?

Thanks very much.

Jesse Sheidlower

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-21 Thread John Griffin
Can you post your DDL to go along with your DML?

-Original Message-
From: Allen Weeks [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 21, 2003 4:51 PM
To: Jesse Sheidlower; Cybot
Cc: [EMAIL PROTECTED]
Subject: RE: Slow results with simple, well-indexed query


The only thing I can add is check you hardware and OS platform.

Cheers

-Original Message-
From: Jesse Sheidlower [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 21, 2003 11:44 AM
To: Cybot
Cc: [EMAIL PROTECTED]
Subject: Re: Slow results with simple, well-indexed query



On Thu, Aug 21, 2003 at 06:01:31PM +0200, Cybot wrote:
 Jesse Sheidlower wrote:

 I'm struggling with speed issues on some queries that
 I would have expected to be relatively fast. Perhaps
 even more frustratingly, when I've tried to break
 these down into their components, they still execute
 very slowly. I've looked over all the relevant suggestions
 for optimization and so forth, and there's nothing I can
 tell that I'm missing.
 
 An example of a query is to get all the words (the cg.cw
 field) in a particular alphabetical range that have been
 added in some timespan (the sref.cd field). The cg table
 has about 3M rows, and the sref table about 70,000; the
 intervening tables are all indexed on the relevant id
 fields:
 
 -
 mysql SELECT cg.cw FROM cg,q,cit,sref
 - WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id =
sref.id
 - AND cg.cw BETWEEN 't' AND 'tzzz'
 - AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)

 move your DATE before cw

 AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
 AND cg.cw BETWEEN 't' AND 'tzzz'

 cause sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH) should be faster
 than cg.cw BETWEEN 't' AND 'tzzz' and so the total rows are already
 limited when cg.cw BETWEEN 't' AND 'tzzz' will be executed

I assume that the optimizer would take care of this, but in any
case I gave it a try and it made no difference.

 also you can try an index with a length of 2 or 3 over cg.cw, this will
 result in smaller index and possible speed up things

I also tried this (the current index is 25 characters on a 100-character
field), and if anything it made things slower.

Anyone have any other ideas or analysis?

Thanks very much.

Jesse Sheidlower

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Jesse Sheidlower
On Thu, Aug 21, 2003 at 03:36:54PM -0700, Steven Roussey wrote:
  Executing just the search on the word table, with no joins to the
  table with the dates, is still slow:
 
 Then it is not worth while to focus on anything else until you fix that.
 Are the contents of this field always in lower case?
 
 Is so, then change the column to a binary type. The explain says:
 
   rows: 318244
  Extra: Using where; Using filesort
 
 That means that is sorting all 318,244 (est) records first, then going
 down to the 3000th and giving you five records. Just a guess. See if
 that helps then we can move on to the join.

No, the contents can be of mixed case. Where does that leave things?

In a working environment I'd never be querying on this table alone,
it would always be joined in to other tables that would limit things
in some way, but these don't seem to be affecting things. The suggestions
other people have made to try to get it to do the smaller queries first
don't seem to be having much effect, unfortunately.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-21 Thread Steven Roussey
 No, the contents can be of mixed case. Where does that leave things?

**Index the length of the entire column.** It then should not need to
have to do the filesort. Actually the binary option would not have
really helped. The explain should say 'Using Index'. Get back to me on
this and tell me the results.

 In a working environment I'd never be querying on this table alone,
 it would always be joined in to other tables that would limit things
 in some way, but these don't seem to be affecting things. The
suggestions
 other people have made to try to get it to do the smaller queries
first
 don't seem to be having much effect, unfortunately.

Optimize the join once you know how to optimize its parts. One thing at
a time.

--steve-


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Jesse Sheidlower
On Thu, Aug 21, 2003 at 04:58:47PM -0700, Steven Roussey wrote:
  No, the contents can be of mixed case. Where does that leave things?
 
 **Index the length of the entire column.** It then should not need to
 have to do the filesort. Actually the binary option would not have
 really helped. The explain should say 'Using Index'. Get back to me on
 this and tell me the results.

Huh, I was told the exact opposite, that if most of the
entries are smaller than the maximum length of the field, you
should use an index about the size you expect most entries to
be. Why would you ever use a shorter index than the full column
length if it led to such performance degradation?

In any case, I reindexed cg.cw to the length of the entire column,
and the result is...

GOD! OK, sorry, I wasn't quite expecting this:

mysql SELECT cw FROM cg WHERE cw BETWEEN 's' AND 'szzz' ORDER BY cw LIMIT 3000,5;
+-+
| cw  |
+-+
| sacrifice hit   |
| sacrifice play  |
| sacrifice the earth |
| sacrifice throw |
| sacrifice to|
+-+
5 rows in set (0.02 sec)

Wow!

But what's the explanation for this huge improvement? Again, I
was always told the opposite, and the Manual itself says: If
it's very likely that a column has a unique prefix on the
first number of characters, it's better to only index this
prefix. MySQL supports an index on a part of a character
column. Shorter indexes are faster not only because they take
less disk space but also because they will give you more hits
in the index cache and thus fewer disk seeks. (At sec. 5.4.2.)

  In a working environment I'd never be querying on this table alone,
  it would always be joined in to other tables that would limit things
  in some way, but these don't seem to be affecting things. The
 suggestions
  other people have made to try to get it to do the smaller queries
 first
  don't seem to be having much effect, unfortunately.
 
 Optimize the join once you know how to optimize its parts. One thing at
 a time.

Hmm. When I returned to the multiple-table query that started
this thread, but with the full-column index, it took a 
staggering 1m 15s; rerunning it speeded it up to 3.51 sec
(the original was 7.30 sec), but still nothing like the
improvement that the single table change made just above. The
explain looks like this:

mysql EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
- WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw BETWEEN 't' AND 'tzzz'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- ORDER BY cg.cw
- LIMIT 1000,10;
+---++-+-+-+-++-+
| table | type   | possible_keys   | key | key_len | ref | rows   | Extra  
 |
+---++-+-+-+-++-+
| cg| range  | q_id,cw | cw  | 101 | NULL| 190550 | Using 
where |
| q | eq_ref | PRIMARY,cit_id  | PRIMARY |   4 | cg.q_id |  1 |
 |
| cit   | eq_ref | PRIMARY,sref_id | PRIMARY |   4 | q.cit_id|  1 |
 |
| sref  | eq_ref | PRIMARY,cd  | PRIMARY |   4 | cit.sref_id |  1 | Using 
where |
+---++-+-+-+-++-+
4 rows in set (0.00 sec)

Where do I go from here?

And thanks for all the thought people have been putting into this.

Jesse Sheidlower


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow results with simple, well-indexed query

2003-08-21 Thread Steven Roussey
 GOD! OK, sorry, I wasn't quite expecting this:
 Wow!

:)

 
 But what's the explanation for this huge improvement? Again, I
 was always told the opposite, and the Manual itself says: ...

Yes, and it is true (usually). But your EXPLAIN showed a filesort and
that is bad. What happens is that if the resultset is sorted only on the
first few characters (based on the index) of that column. Since you
requested an ORDER BY, it had to go back and fully sort the resultset.
If you have the index do the whole column, then this step is not needed.

Even better is that due to the limit, it can safely go right to the part
of the table it needs to, and once it gets the 5 rows, it is done
(rather then getting all of them for the sort step). Even better in this
particular case is that all the information needed is in the index (the
MYI file) so it did not even need to do a read on the data file (MYD).
Less disk access is a good thing...

 Hmm. When I returned to the multiple-table query that started
 this thread, 

And it was slow. Yeah, one thing at a time. It makes it easier for
people reading this list now or in the future (if it comes up in a
search result) if we go over things one item at a time.

Since I never saw the whole table definitions (the indexes in
particular), I'll have to try and guess through it. So try this:

ALTER TABLE cg add index(q_id,cw);

Tell me how that works and send the EXPLAIN.

The point here is that now you are doing a join and you are using both
columns to qualify the resultset. So we should use a composite index
rather than have individual ones (of which MySQL will choose only one).

Also, you can change line 
AND cg.cw BETWEEN 't' AND 'tzzz'
To 
AND cg.cw like 't%'
For better readability (how many zzz's are enough, eh?). Personal
preference.

--steve-


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow results with simple, well-indexed query

2003-08-21 Thread Jesse Sheidlower
On Thu, Aug 21, 2003 at 06:58:29PM -0700, Steven Roussey wrote:
Jesse Sheidlower wrote:
 
  Hmm. When I returned to the multiple-table query that started
  this thread, 
 
 And it was slow. Yeah, one thing at a time. It makes it easier for
 people reading this list now or in the future (if it comes up in a
 search result) if we go over things one item at a time.
 
 Since I never saw the whole table definitions (the indexes in
 particular), I'll have to try and guess through it. So try this:

All the indexes were single indexes, partly because I haven't
yet made the effort to understand composite index. I guess it's
time ;-).

Here's the CREATEs, somewhat edited to remove parts not relevant
to this discussion, to save space:

CREATE TABLE `cg` (
  `q_id` int(10) unsigned NOT NULL default '0',
  `cw` varchar(100) default NULL,
  `exp` text,
  KEY `q_id` (`q_id`),
  KEY `cw` (`cw`),
  KEY `q_id_2` (`q_id`,`cw`),
  FULLTEXT KEY `exp` (`exp`)
) TYPE=MyISAM

CREATE TABLE `q` (
  `id` int(10) unsigned NOT NULL default '0',
  `cit_id` int(10) unsigned NOT NULL default '0',
  `q_tag` enum('q','qau','qca','qna','qsa') default NULL,
  `qt` text,
  PRIMARY KEY  (`id`),
  KEY `cit_id` (`cit_id`),
  FULLTEXT KEY `qt` (`qt`)
) TYPE=MyISAM 

CREATE TABLE `cit` (
  `id` int(10) unsigned NOT NULL default '0',
  `sref_id` int(10) unsigned NOT NULL default '0',
  `w` varchar(200) default NULL,
  PRIMARY KEY  (`id`),
  KEY `sref_id` (`sref_id`),
  FULLTEXT KEY `w` (`w`),
  ) TYPE=MyISAM 

CREATE TABLE `sref` (
  `id` int(10) unsigned NOT NULL default '0',
  `rdr` varchar(30) default NULL,
  `kbd` varchar(20) default NULL,
  `cd` date default NULL,
  PRIMARY KEY  (`id`),
  KEY `rdr` (`rdr`),
  KEY `kbd` (`kbd`),
  KEY `cd` (`cd`)
) TYPE=MyISAM

 ALTER TABLE cg add index(q_id,cw);

I did this, as is reflected in the CREATE above.

 Tell me how that works and send the EXPLAIN.

Unfortunately, it made no difference--the first execution was
about 1 m 15 sec, and one immediately thereafter was about 3.5 sec,
as before. The EXPLAIN shows:

mysql EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
- WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
- AND cg.cw LIKE 't%'
- AND sref.cd  DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- ORDER BY cg.cw
- LIMIT 1000,10;
+---++-+-+-+-++-+
| table | type   | possible_keys   | key | key_len | ref | rows   | Extra  
 |
+---++-+-+-+-++-+
| cg| range  | q_id,cw,q_id_2  | cw  | 101 | NULL| 190550 | Using 
where |
| q | eq_ref | PRIMARY,cit_id  | PRIMARY |   4 | cg.q_id |  1 |
 |
| cit   | eq_ref | PRIMARY,sref_id | PRIMARY |   4 | q.cit_id|  1 |
 |
| sref  | eq_ref | PRIMARY,cd  | PRIMARY |   4 | cit.sref_id |  1 | Using 
where |
+---++-+-+-+-++-+

 The point here is that now you are doing a join and you are using both
 columns to qualify the resultset. So we should use a composite index
 rather than have individual ones (of which MySQL will choose only one).

What does this mean for regular searching? In most cases, there will be
some criteria entered that need to be searched on, and the id fields 
will also be needed for the joins. For example, in the database, one
might want to search based on cg.exp (fulltext), sref.rdr, sref.cd
(the date field), sref.kbd, cit.w, and various other ones I've edited
out of this display to save space, and often a combination of several
of these at once. How should I set up indexes for the potential 
searches that might be executed? 

(I should mention that this is a read-only database; it's built from
a parsed SGML file and is never added to directly, if that's an
issue.)

 Also, you can change line 
   AND cg.cw BETWEEN 't' AND 'tzzz'
 To 
   AND cg.cw like 't%'
 For better readability (how many zzz's are enough, eh?). Personal
 preference.

No, I agree, and it was originally LIKE 't%' and is still like
that in the actual code being generated by the query form. I changed
it to the BETWEEN because in some playing around it seemed to be 
faster this way, and in fact I was worried about having to
generate the 

BETWEEN \'$val\' AND \' . $val . \'

thing in my program. If this was a glitch of my badly-indexed
original, I'm glad to fix it.

Thanks again for taking the time to look this over.

Best,

Jesse

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]