Re: Stop query on first match
Emmanuel van der Meulen wrote: On Monday, June 07, 2004 19:09, Michael Stassen wrote; This doesn't quite make sense. You seem to say that several rows will match but then you say only one will. It must be one or the other. Perhaps I've misunderstood you. Firstly, Michael thank you for your assistance; I'll elaborate. Michael I'll attempt first without supplying the table definitions and actual sql and etc.. Table 'a' contains a property which does not match any property on table 'b' directly, but matches within a range. Example; table 'a' property 'num' num --- 1000 2000 3000 table 'b' properties 'fromNum' and 'toNum' fromNum toNum row 1. 1 1500 row 2.1501 2000 row 3.2001 4000 select... where 'num' between 'fromNum' and 'toNum'. As seen in this example, using between only one row actually matches, however the query engine would at first see more than one matching row i.e., What do you mean by "at first"? with 'fromNum' (between is same as num >= fromNum), value 2000 from table a would match row 1 & 2 on table b; and with 'toNum' (between is same as num <= toNum), value 2000 from table a would match row 2 & 3 on table b; only once the range is taken together the result matches one row, viz., row 2 on table b. This analysis is not quite right. You seem to be saying that rows which match half of the condition will be retrieved, then filtered according to the other half of the condition, but that's not how it works. If it did, BETWEEN would always be slow, but BETWEEN can be very fast, if the range is a constant and the column is indexed. Assuming indexes on num, fromNum, and toNum, you should be able to verify that SELECT * FROM table_a WHERE num BETWEEN 1501 AND 2000; SELECT * FROM table_b WHERE fromNum BETWEEN 1 AND 3000; SELECT * FROM table_b WHERE toNum BETWEEN 1500 AND 4000; are all very quick. In your case, "num BETWEEN fromNum AND toNum" is the same as "num >= fromNum AND num <= toNum". There is only one matching row. The problem is finding it. To find which rows in table b match, we have to look at each row and compare the values of num, fromNum, and toNum. Indexes won't help much here, because each column's index is unlikely to narrow the list of possible matches much. If the optimizer could put 2 indexes together, it might see that fromNum matches from row 23 up, while toNum matches from rows 23 down, so row 23 is it, but the optimizer doesn't combine indexes -- it chooses the best (most restrictive) one. Even then, it only uses the index if it restricts us to no more than 30% of the rows. Furthermore, the value of num changes with each row of table a. The result is an unavoidable full table scan on table b. In other words, the problem is not, strictly speaking, the BETWEEN. Now please remember table b has 1.4 million rows as in this example, with fromNum and toNum running consecutively, so the query takes 4 seconds to find a row in table b. With say 200 rows in table a, that means the query runs for a long time. That's 4 seconds to scan all of table b, rather than 4 seconds to find one row. A subtle but important distinction. Is that right? You have 1.4 million ranges? Wow! Up till now, I've been assuming you had a lot of data in table a to be assigned one of a relatively few ranges from table b. Apparently, I've been picturing it backwards. I'm not sure there's a way to improve this specific query, but there may be another way to accomplish the same thing which works better. It's hard to say what that might be without a better picture of what you're doing than I currently have. Perhaps if you described your data someone could make a suggestion. I'm still assuming table b must be static, right? Otherwise you couldn't guarantee uniqueness of ranges, I think. On the other hand, different rows in table a could be in the same range (match the same row of table b), right? What I did was to use limit 1, and ran query with 1 row, this took .01 second. I assume by "ran query with 1 row" you mean one row from table a. Now num is effectively a constant, and LIMIT 1 short circuits the full table scan as soon as one match is found. You also seem to imply that with BETWEEN you get a full table scan even though there is only one match for each row. That sounds like an indexing problem, but it is hard to say without more information. Indexed individually on; fromNum toNum Also experimented by adding combining index on; fromNum/toNum Either way no difference, query runs 4 seconds. It would help us help you if you at least posted the query and the results of EXPLAIN. It would probably also help if you told us more about the tables, perhaps with SHOW CREATE TABLE. If above does not help you, I'll bring the whole lot to the post. Well, the more info, the better. At least describe the data a little more. Depending on exactly what you want, there may be a way to change the process to make this work at an acceptable speed.
RE: Stop query on first match
On Monday, June 07, 2004 20:23, Justin Swanhart wrote >> It can only ever return one row from table b, for >> each row on table a, due >> to the contents that is stored in table b. The >> table contains in excess of >> a million records. What happens as a result of the >I assume that you have multiple rows in table b that >match the primary key on table a, but only one of >those rows in b will be satisfied by your between >condition of the query. Hello Justin, fisrtly, thank you for your assistance. Then, as I posted previously, keys do not match, however, a property on table a matches only in respect of being between two properties on table b. >First off, make sure that you have an index on table b >that matches the primary key in table a. Make sure >the data types and lengths are the same for both keys. Please see previous post. > Secondly, make sure that your key cache is large > enough so that the keys to the tables are buffered. I'm not sure it is this. > If these assumptions are not correct, then you will > need to post the query, the explain of the query, and > the "show create table" for each table in the query so > that we have a better idea what you are dealing with. Please see previous post. >> I've looked in several books and searched Google but >> cannot get a way of >> doing this. It seems Oracle has a 'FIRST' in their >> select which they use >> for such a use case. But I do not see anything for >> MySql anywhere. > Oracle's FIRST_ROWS simply tells the optimizer to > prefer index scans over a FTS where lots of rows may > be returned by the query. It won't modify the > behavior of a query in the way you want it to. Thank you for info. > The only way I can think to implement the behavior > that you are looking for is to: > 1) get all the records from "a" that you need > 2) execute a second select for each row in "a" on "b" > with your between condition and a LIMIT clause so that > only one row is returned. Until 4.1 when subselects become avaialable, I suspect this is a workaround. > Still, if multiple rows > exist in "b" that match the PK on "a", those rows will > probably be scanned in filesort order, which means > that multiple rows will be looked at unless your > between matches the first row inserted, except if you > have an appropriate index. Please see previous post. Justin, again, thank you for your asistance. Kind regrds Emmanuel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Stop query on first match
Hello all, Thank you to everyone who assisted me. I'm wondering about my theory around the between as posted previously. Can anyone provide further insights regarding that theory? Kind regards Emmanuel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Stop query on first match
On Monday, June 07, 2004 19:09, Michael Stassen wrote; > This doesn't quite make sense. You seem to say that several rows > will match > but then you say only one will. It must be one or the other. > Perhaps I've > misunderstood you. Firstly, Micheal thank you for your assistance; I'll elaborate. Michael I'll attempt first without supplying the table definitions and actual sql and etc.. Table 'a' contains a property which does not match any property on table 'b' directly, but matches within a range. Example; table 'a' property 'num' num --- 1000 2000 3000 table 'b' properties 'fromNum' and 'toNum' fromNum toNum row 1. 1 1500 row 2.1501 2000 row 3.2001 4000 select... where 'num' between 'fromNum' and 'toNum'. As seen in this example, using between only one row actually matches, however the query engine would at first see more than one matching row i.e., with 'fromNum' (between is same as num >= fromNum), value 2000 from table a would match row 1 & 2 on table b; and with 'toNum' (between is same as num <= toNum), value 2000 from table a would match row 2 & 3 on table b; only once the range is taken together the result matches one row, viz., row 2 on table b. Now please remember table b has 1.4 million rows as in this example, with fromNum and toNum running consecutively, so the query takes 4 seconds to find a row in table b. With say 200 rows in table a, that means the query runs for a long time. What I did was to use limit 1, and ran query with 1 row, this took .01 second. > You also seem to imply that with BETWEEN you > get a full > table scan even though there is only one match for each row. That sounds > like an indexing problem, but it is hard to say without more information. Indexed individually on; fromNum toNum Also experimented by adding combining index on; fromNum/toNum Either way no difference, query runs 4 seconds. > It would help us help you if you at least posted the query and > the results > of EXPLAIN. It would probably also help if you told us more about the > tables, perhaps with SHOW CREATE TABLE. If above does not help you, I'll bring the whole lot to the post. Michael, again thank you for you assistance. Kind regards Emmanuel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Stop query on first match
On Monday, June 07, 2004 07:54, Jeff Smelser wrote; > On Monday 07 June 2004 12:49 am, Emmanuel van der Meulen wrote: > > I've looked in several books and searched Google but cannot get a way of > > doing this. It seems Oracle has a 'FIRST' in their select > which they use > > for such a use case. But I do not see anything for MySql anywhere. > > If your using a later mysql, you could use sub query with limit. > > Other then that, splitting the query is the only think off hand. Thank you for you note and valuable feedback. Unfortunately I'm planning to go live before MySql 4.1 is production release so I need to find a work around. Do I understand your suggestion about splitting the query; I would first select all required rows from table a, then take each retrieved row from table and (in a loop in my java program) select from table b, using limit? Kind regards Emmanuel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Stop query on first match
This doesn't quite make sense. You seem to say that several rows will match but then you say only one will. It must be one or the other. Perhaps I've misunderstood you. You also seem to imply that with BETWEEN you get a full table scan even though there is only one match for each row. That sounds like an indexing problem, but it is hard to say without more information. It would help us help you if you at least posted the query and the results of EXPLAIN. It would probably also help if you told us more about the tables, perhaps with SHOW CREATE TABLE. Michael Emmanuel van der Meulen wrote: Hello all, I see a similar question was asked before, but it was not answered. I hope someone can assist me. My query uses two tables. The query selects one row on table b for each row on table a, but uses between in the select. It can only ever return one row from table b, for each row on table a, due to the contents that is stored in table b. The table contains in excess of a million records. What happens as a result of the between is that for the query, several rows seem to be candidates on table b, but once the query evaluates and sifts through the candidate rows on table b, only one row will ever match. So if I could inform MySql to stop the query for the particular row, once one row on table b matches the row on table a, the query would return hundreds of times faster. As an experiment I took one example and used limit and the query reduced from 4 secs to .01 sec. However, when doing the 'live' query, I cannot use limit because, I do not want overall only 1 row returned, I want one row returned for each of the rows from table a which has 1 match each on table b. I've looked in several books and searched Google but cannot get a way of doing this. It seems Oracle has a 'FIRST' in their select which they use for such a use case. But I do not see anything for MySql anywhere. Could someone please assist me. Kind regards Emmanuel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Stop query on first match
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Monday 07 June 2004 12:49 am, Emmanuel van der Meulen wrote: > I've looked in several books and searched Google but cannot get a way of > doing this. It seems Oracle has a 'FIRST' in their select which they use > for such a use case. But I do not see anything for MySql anywhere. If your using a later mysql, you could use sub query with limit. Other then that, splitting the query is the only think off hand. - -- I am under the influence of sugar, caffeine, and lack of sleep, and should not be held responsible for my behavior. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAxAMEld4MRA3gEwYRAgc5AJ0behkfDwijlpSANX1oXsRfNtwmKgCeOF4b j9366DtZYNuo2j0aTvQsudY= =ecrd -END PGP SIGNATURE- -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Stop query on first match
Hello all, I see a similar question was asked before, but it was not answered. I hope someone can assist me. My query uses two tables. The query selects one row on table b for each row on table a, but uses between in the select. It can only ever return one row from table b, for each row on table a, due to the contents that is stored in table b. The table contains in excess of a million records. What happens as a result of the between is that for the query, several rows seem to be candidates on table b, but once the query evaluates and sifts through the candidate rows on table b, only one row will ever match. So if I could inform MySql to stop the query for the particular row, once one row on table b matches the row on table a, the query would return hundreds of times faster. As an experiment I took one example and used limit and the query reduced from 4 secs to .01 sec. However, when doing the 'live' query, I cannot use limit because, I do not want overall only 1 row returned, I want one row returned for each of the rows from table a which has 1 match each on table b. I've looked in several books and searched Google but cannot get a way of doing this. It seems Oracle has a 'FIRST' in their select which they use for such a use case. But I do not see anything for MySql anywhere. Could someone please assist me. Kind regards Emmanuel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]