Emmanuel van der Meulen wrote:
On Monday, June 07, 2004 19:09, Michael Stassen wrote;
This doesn't quite make sense. You seem to say that several rows
will match but then you say only one will. It must be one or the other. Perhaps I've misunderstood you.
Firstly, Michael thank you for your assistance; I'll elaborate.
Michael I'll attempt first without supplying the table definitions and actual sql and etc..
Table 'a' contains a property which does not match any property on table 'b' directly, but matches within a range.
Example;
table 'a' property 'num'
num --- 1000 2000 3000
table 'b' properties 'fromNum' and 'toNum' fromNum toNum row 1. 1 1500 row 2. 1501 2000 row 3. 2001 4000
select... where 'num' between 'fromNum' and 'toNum'.
As seen in this example, using between only one row actually matches, however the query engine would at first see more than one matching row i.e.,
What do you mean by "at first"?
with 'fromNum' (between is same as num >= fromNum), value 2000 from table a would match row 1 & 2 on table b; and with 'toNum' (between is same as num <= toNum), value 2000 from table a would match row 2 & 3 on table b; only once the range is taken together the result matches one row, viz., row 2 on table b.
This analysis is not quite right. You seem to be saying that rows which match half of the condition will be retrieved, then filtered according to the other half of the condition, but that's not how it works. If it did, BETWEEN would always be slow, but BETWEEN can be very fast, if the range is a constant and the column is indexed. Assuming indexes on num, fromNum, and toNum, you should be able to verify that
SELECT * FROM table_a WHERE num BETWEEN 1501 AND 2000; SELECT * FROM table_b WHERE fromNum BETWEEN 1 AND 3000; SELECT * FROM table_b WHERE toNum BETWEEN 1500 AND 4000;
are all very quick. In your case, "num BETWEEN fromNum AND toNum" is the same as "num >= fromNum AND num <= toNum". There is only one matching row. The problem is finding it.
To find which rows in table b match, we have to look at each row and compare the values of num, fromNum, and toNum. Indexes won't help much here, because each column's index is unlikely to narrow the list of possible matches much. If the optimizer could put 2 indexes together, it might see that fromNum matches from row 23 up, while toNum matches from rows 23 down, so row 23 is it, but the optimizer doesn't combine indexes -- it chooses the best (most restrictive) one. Even then, it only uses the index if it restricts us to no more than 30% of the rows. Furthermore, the value of num changes with each row of table a. The result is an unavoidable full table scan on table b.
In other words, the problem is not, strictly speaking, the BETWEEN.
Now please remember table b has 1.4 million rows as in this example, with fromNum and toNum running consecutively, so the query takes 4 seconds to find a row in table b. With say 200 rows in table a, that means the query runs for a long time.
That's 4 seconds to scan all of table b, rather than 4 seconds to find one row. A subtle but important distinction.
Is that right? You have 1.4 million ranges? Wow! Up till now, I've been assuming you had a lot of data in table a to be assigned one of a relatively few ranges from table b. Apparently, I've been picturing it backwards.
I'm not sure there's a way to improve this specific query, but there may be another way to accomplish the same thing which works better. It's hard to say what that might be without a better picture of what you're doing than I currently have. Perhaps if you described your data someone could make a suggestion. I'm still assuming table b must be static, right? Otherwise you couldn't guarantee uniqueness of ranges, I think. On the other hand, different rows in table a could be in the same range (match the same row of table b), right?
What I did was to use limit 1, and ran query with 1 row, this took .01 second.
I assume by "ran query with 1 row" you mean one row from table a. Now num is effectively a constant, and LIMIT 1 short circuits the full table scan as soon as one match is found.
You also seem to imply that with BETWEEN you get a full table scan even though there is only one match for each row. That sounds like an indexing problem, but it is hard to say without more information.
Indexed individually on; fromNum toNum
Also experimented by adding combining index on; fromNum/toNum
Either way no difference, query runs 4 seconds.
It would help us help you if you at least posted the query and the results of EXPLAIN. It would probably also help if you told us more about the tables, perhaps with SHOW CREATE TABLE.
If above does not help you, I'll bring the whole lot to the post.
Well, the more info, the better. At least describe the data a little more. Depending on exactly what you want, there may be a way to change the process to make this work at an acceptable speed.
Michael, again thank you for you assistance.
Kind regards Emmanuel
Michael
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]