Re: Stop query on first match

Michael Stassen Sat, 12 Jun 2004 10:29:55 -0700

Emmanuel van der Meulen wrote:

On Monday, June 07, 2004 19:09, Michael Stassen wrote;

This doesn't quite make sense. You seem to say that several rows will match but then you say only one will. It must be one or the other. Perhaps I've misunderstood you.


Firstly, Michael thank you for your assistance; I'll elaborate.

Michael I'll attempt first without supplying the table definitions and
actual sql and etc..

Table 'a' contains a property which does not match any property on table 'b'
directly, but matches within a range.

Example;

table 'a' property 'num'

num
---
1000
2000
3000

table 'b' properties 'fromNum' and 'toNum'
       fromNum toNum
row 1.       1  1500
row 2.    1501  2000
row 3.    2001  4000

select... where 'num' between 'fromNum' and 'toNum'.

As seen in this example, using between only one row actually matches,
however the query engine would at first see more than one matching row i.e.,


What do you mean by "at first"?

with 'fromNum' (between is same as num >= fromNum), value 2000 from table a
would match row 1 & 2 on table b; and with 'toNum' (between is same as num
<= toNum), value 2000 from table a would match row 2 & 3 on table b; only
once the range is taken together the result matches one row, viz., row 2 on
table b.

This analysis is not quite right. You seem to be saying that rows which match half of the condition will be retrieved, then filtered according to the other half of the condition, but that's not how it works. If it did, BETWEEN would always be slow, but BETWEEN can be very fast, if the range is a constant and the column is indexed. Assuming indexes on num, fromNum, and toNum, you should be able to verify that

  SELECT * FROM table_a WHERE num BETWEEN 1501 AND 2000;
  SELECT * FROM table_b WHERE fromNum BETWEEN 1 AND 3000;
  SELECT * FROM table_b WHERE toNum BETWEEN 1500 AND 4000;

are all very quick. In your case, "num BETWEEN fromNum AND toNum" is the same as "num >= fromNum AND num <= toNum". There is only one matching row. The problem is finding it.

To find which rows in table b match, we have to look at each row and compare the values of num, fromNum, and toNum. Indexes won't help much here, because each column's index is unlikely to narrow the list of possible matches much. If the optimizer could put 2 indexes together, it might see that fromNum matches from row 23 up, while toNum matches from rows 23 down, so row 23 is it, but the optimizer doesn't combine indexes -- it chooses the best (most restrictive) one. Even then, it only uses the index if it restricts us to no more than 30% of the rows. Furthermore, the value of num changes with each row of table a. The result is an unavoidable full table scan on table b.

In other words, the problem is not, strictly speaking, the BETWEEN.

Now please remember table b has 1.4 million rows as in this example, with
fromNum and toNum running consecutively, so the query takes 4 seconds to
find a row in table b.  With say 200 rows in table a, that means the query
runs for a long time.

That's 4 seconds to scan all of table b, rather than 4 seconds to find one row. A subtle but important distinction.

Is that right? You have 1.4 million ranges? Wow! Up till now, I've been assuming you had a lot of data in table a to be assigned one of a relatively few ranges from table b. Apparently, I've been picturing it backwards.

I'm not sure there's a way to improve this specific query, but there may be another way to accomplish the same thing which works better. It's hard to say what that might be without a better picture of what you're doing than I currently have. Perhaps if you described your data someone could make a suggestion. I'm still assuming table b must be static, right? Otherwise you couldn't guarantee uniqueness of ranges, I think. On the other hand, different rows in table a could be in the same range (match the same row of table b), right?

What I did was to use limit 1, and ran query with 1 row, this took .01
second.

I assume by "ran query with 1 row" you mean one row from table a. Now num is effectively a constant, and LIMIT 1 short circuits the full table scan as soon as one match is found.

You also seem to imply that with BETWEEN you get a full
table scan even though there is only one match for each row.  That sounds
like an indexing problem, but it is hard to say without more information.


Indexed individually on;
fromNum
toNum

Also experimented by adding combining index on;
fromNum/toNum

Either way no difference, query runs 4 seconds.

It would help us help you if you at least posted the query and the results
of EXPLAIN.  It would probably also help if you told us more about the
tables, perhaps with SHOW CREATE TABLE.

If above does not help you, I'll bring the whole lot to the post.

Well, the more info, the better. At least describe the data a little more. Depending on exactly what you want, there may be a way to change the process to make this work at an acceptable speed.

Michael, again thank you for you assistance.

Kind regards
Emmanuel


Michael


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Stop query on first match

Reply via email to