Re: query optimization question (my struggle against 'using temporary; using filesort')
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :) The queries were generated by ActiveRecord (an ORM library for Ruby), although even if I had written them myself they would probably not be much better. Regards, Ciaran Lee 2009/9/24 Gavin Towey > Hi Ciaran, > > So I think there's a couple things going on: > > 1. The explain plan for your "slow" query looks wrong, such as mysql is > confused. It's possible your index statistics are incorrect. Try ANALYZE > TABLE on listings and addresses. > > I think a sure way to fix it is to add STRAIGHT_JOIN to force the join > order. That should get rid of the temp table and filesort operations and > give faster results. > > SELECT > STRAIGHT_JOIN > listings.*, addresses.* > FROM >`listings` > JOIN > `addresses` ON `addresses`.addressable_id = `listings`.id > AND > `addresses`.addressable_type = 'Listing' > WHERE (addresses.parent_city_id = 3) > ORDER BY listings.id DESC LIMIT 1 > > > > 2. I need to make some comments about your table design: > > This column is AWFUL: > `addressable_type` varchar(255) DEFAULT NULL, > > Why have field that hold up to 255 characters and put a little string in it > like "Listing?" Why does it matter? Well it makes your indexes > disasterously bloated: > > KEY `index_addresses_on_parent_city_id_and_addressable_type` > (`parent_city_id`,`addressable_type`), > > > If you noticed in the explain plan, that index is 733 bytes *per row*. > Especially using utf8 means each character takes 3 bytes in the index. > That's terrible. That type field should be a foreign key tinyint or at the > very least be a much much shorter varchar field (such as 8 or 10) > > You have lots of those varchar(255) fields, which looks like lazy design -- > by not gathering correct requirements and designing accordingly you will > hurt your database performance, waste disk space and cause yourself all > kinds of future problems. > > 3. Why are you using OUTER JOIN? > > It looks to me like you're using it because you don't know the difference, > since you're not looking for NULL rows or anything. In fact, it looks like > mysql is smart enough to know that you've negated the OUTER JOIN by putting > conditions on the joined tables in the WHERE clause, and convert then to > INNER JOINS. Don't rely on that! Use the correct join type. > > Those queries > Regards, > Gavin Towey > > -Original Message- > From: Ciaran Lee [mailto:ciaran@gmail.com] > Sent: Tuesday, September 22, 2009 1:32 PM > To: mysql@lists.mysql.com > Subject: query optimization question (my struggle against 'using temporary; > using filesort') > > Hi, > > I hope this is the right place to ask a question about query optimization. > > Background: > I have a database which has events, which occur in places (listings). > Places > have addresses, and addresses belong to a city. I can select the latest > event within a particular city very efficiently (less than 2ms), but > selecting the latest listing within a city is REALLY slow (10-20 seconds) > despite being almost a subset of the event query. > > I have been working on this for about a day, and have tried all sorts of > tweaks to the indexes but to no avail. I always seem to end up with 'using > temporary; using filesort' as the 'extra' content in the explain result. If > anyone has a suggestion for what I might do to fix this, I'd really > appreciate it. If not, I could further de-normalize the database for > performance reasons, but I would feel dirty for doing so. > > Here is the fast query (select the latest event within a particular city), > and it's explain. > SELECT > events.*, listings.*, addresses.* > FROM >`events` >LEFT OUTER JOIN > `listings` ON `listings`.id = `events`.listing_id >LEFT OUTER JOIN > `addresses` ON `addresses`.addressable_id = `listings`.id > AND > `addresses`.addressable_type = 'Listing' > WHERE (addresses.parent_city_id = 3) > ORDER BY events.id DESC LIMIT 1 > > > > ++-+---++++-+--+--+-+ > | id | select_type | table | type | possible_keys > > | key > | key_len | ref
RE: query optimization question (my struggle against 'using temporary; using filesort')
Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your "slow" query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like "Listing?" Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressabl
query optimization question (my struggle against 'using temporary; using filesort')
Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | index_addresses_on_addressable_type_and_addressable_id | 773 | const, ratemyarea.listings.id |1 | Using where | ++-+---++++-+--+--+-+ Here is the slow query (select the latest listing within a particular city), and it's explain SELECT listings.*, addresses.* FROM `listings` LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 ++-+---+++--+-+-+---+--+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+++--+-+-+---+--+ | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | addressable_id_type_city | 773 | const,const | 25680 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.addresses.addressable_id | 1 | | ++-+---+++--+-+-+---+--+ Here is the
Re: Query Optimization Question
[EMAIL PROTECTED] wrote: Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield <> 16 This query will use the index if the number of rows with indexfield != 16 is small enough (< about 30%). This query should be noticeably faster due to using ranged queries (partial index ranged matches): ( SELECT ... FROM ... WHERE indexfield < 16 ) UNION ( SELECT ... FROM ... WHERE indexfield > 16 ) At the very least, you'll want to make that UNION ALL, rather than just UNION, so mysql doesn't waste time looking for duplicates to remove. But even then, this may be slower. First, these two unioned queries may still not use the index. If the number of rows with indexfield < 16 is too large (more than about 30%) the first query will be a table scan. If the number of rows with indexfield > 16 is too large (more than about 30%) the second query will be a table scan. In fact, if the number of rows with indexfield = 16 is less than about 40% of the table, then at least one of the two unioned queries is guaranteed to be a table scan. Worse yet, this query stands a good chance of being twice as long as the single, != query, because it may require 2 table scans! Second, even if both unioned queries use the index, the result still may not be faster. If the combined number of matching rows is too high, the full table scan should be faster than the indexed lookups. For example: SELECT cat, COUNT(*) FROM inits GROUP BY cat; +--+--+ | cat | COUNT(*) | +--+--+ |0 | 5743 | |1 | 3792 | |2 |30727 | |3 | 1926 | |4 | 7812 | +--+--+ 19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side. First, the != case: EXPLAIN SELECT * FROM inits WHERE cat != 2 \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: ALL possible_keys: cat_idx key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where As expected, a full table scan. SELECT * FROM inits WHERE cat != 2; ... 19273 rows in set (0.37 sec) Now, the unioned range queries: EXPLAIN SELECT * FROM inits WHERE cat < 2 UNION ALL SELECT * FROM inits WHERE cat > 2 \G *** 1. row *** id: 1 select_type: PRIMARY table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 5680 Extra: Using where *** 2. row *** id: 2 select_type: UNION table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 6543 Extra: Using where *** 3. row *** id: NULL select_type: UNION RESULT table: type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: As hoped, mysql plans to use the index for each query. SELECT * FROM inits WHERE cat < 2 UNION ALL SELECT * FROM inits WHERE cat > 2; ... 19273 rows in set (0.78 sec) Despite (because of) using the index, this takes more than twice as long! Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Even when using an index is possible, it is not always desirable. I'd suggest not trying to outfox the optimizer until you've first determined it is making bad choices, and then test to make sure the solution is actually an improvement. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
Robert DiFalco wrote: > In a previous database engine I was using an IN was more optimal than a > <>. So, for example: > > SELECT * FROM table WHERE table.type IN (1,2,3); > > Where the possible values of type are 0-3, was appreciably faster than: > > SELECT * FROM table WHERE table.type <> 0; > > I've been playing with the Query Browser and checking out the > optimization documents and haven't been able to make a clear call on > whether or not this is also the case with MySQL/InnoDB. Mladen Adamovic wrote: > IN should be faster implemented with both hash tables and BTREE's so > nowadays it should also be faster than <> as all MySQL implementation > AFAIK use those well known data structures for indexes. [EMAIL PROTECTED] wrote: > YES, YES, YES! This is definitely an optimization. > > When you say IN or =, you are asking for "matching values". Matches can > come from indexes. When you say <> or NOT IN, you are asking for > everything BUT matches. In order to evaluate a negative, the database > engine (and this is usually true regardless of database server) almost > always performs a full table scan to test every row to make sure it is > either <> or NOT IN. At the very best, they have to perform a full index > scan which is still less efficient than ranged or values-based lookups. > > It's when you get into the situation where you are matching against dozens > of IN-clause items that you may run into slowdowns again. Until you reach > 2 or 3 dozen terms (depending on your hardware) you should be faster with > an IN comparison than a <> or a NOT IN comparison. An optimization to > search for BUT a term or two is to create a temporary table of all of your > terms and delete the exact ones you want to exclude. Put an index on your > temp table then JOIN that back into your query again (replacing the huge > IN clause). The database will match index to index and things will get > fast again. This technique can scale up to some really big queries. > > Always try to code for the affirmative tests. Your users will thank you. Implicit in Mladen and Shawn's answers, but never actually mentioned in the original post, is the presence of an index on the type column. This is probably obvious to all concerned, but I mention it for completeness: without an index on type, there is no difference between "type IN (1,2,3)" and "type != 0". That is, the question is not whether IN is better than !=, but rather which will allow the optimizer to make good use of the index on type. I find mysql's optimizer is pretty good with well-written queries, as long as subqueries aren't involved, so my initial reaction was to expect no difference. After all, as the optimizer considers the WHERE conditions and the available indexes, it is certainly possible, at least theoretically, for it to notice that "type IN (1,2,3)" and "type != 0" are identical conditions. That is, a clever optimizer could treat them identically. Shawn's and Mladen's answers gave me pause, however, and aroused my curiosity, so I decided to test: SELECT VERSION(); +---+ | VERSION() | +---+ | 4.1.15| +---+ SELECT cat, COUNT(*) FROM inits GROUP BY cat; +--+--+ | type | COUNT(*) | +--+--+ |0 |44224 | |1 | 1919 | |2 | 1931 | |3 | 1926 | +--+--+ mysql> EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 8117 Extra: Using where mysql> EXPLAIN SELECT * FROM inits WHERE cat != 0 \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: range possible_keys: cat_idx key: cat_idx key_len: 5 ref: NULL rows: 8120 Extra: Using where As you can see, the optimizer plans to use the index in both cases, examining 8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rows out of 50,000 (about 11.55%). On the other hand, it makes a difference how many rows will match. What is the distribution of values of type? If the number of matching rows is more than about 30% of the table, the optimizer won't use an available index in any case. For example, mysql> EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G *** 1. row *** id: 1 select_type: SIMPLE table: inits type: ALL possible_keys: cat_idx key: NULL key_len: NULL ref: NULL rows: 5 Extra: Using where mysql> EXPLAIN SELECT * FROM inits WHERE cat !=3 \G *** 1. row **
RE: Query Optimization Question
Interesting, that seems like an optimization the query optimizer could do itself when it sees a <> operator on a indexed numeric. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 8:01 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: RE: Query Optimization Question Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield <> 16 This query should be noticeably faster due to using ranged queries (partial index ranged matches): (SELECT ... FROM ... WHERE indexfield < 16 ) UNION ( SELECT ... FROM ... WHERE indexfield > 16 ) Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Robert DiFalco" <[EMAIL PROTECTED]> wrote on 03/13/2006 10:48:29 AM: > Shawn, > > Any performance gains for specifying "type > 0" than "type <> 0" ? > > R. > > > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, March 13, 2006 6:37 AM > To: Robert DiFalco > Cc: mysql@lists.mysql.com > Subject: Re: Query Optimization Question > > > > > "Robert DiFalco" <[EMAIL PROTECTED]> wrote on 03/11/2006 12:43:43 > PM: > > > In a previous database engine I was using an IN was more optimal than > a > > <>. So, for example: > > > > SELECT * FROM table WHERE table.type IN (1,2,3); > > > > Where the possible values of type are 0-3, was appreciably faster > than: > > > > SELECT * FROM table WHERE table.type <> 0; > > > > I've been playing with the Query Browser and checking out the > > optimization documents and haven't been able to make a clear call on > > whether or not this is also the case with MySQL/InnoDB. > > > > TIA, > > > > R. > > > > > > YES, YES, YES! This is definitely an optimization. > > When you say IN or =, you are asking for "matching values". Matches can > come from indexes. When you say <> or NOT IN, you are asking for > everything BUT matches. In order to evaluate a negative, the database > engine (and this is usually true regardless of database server) almost > always performs a full table scan to test every row to make sure it is > either <> or NOT IN. At the very best, they have to perform a full index > scan which is still less efficient than ranged or values-based lookups. > > > It's when you get into the situation where you are matching against > dozens of IN-clause items that you may run into slowdowns again. Until > you reach 2 or 3 dozen terms (depending on your hardware) you should be > faster with an IN comparison than a <> or a NOT IN comparison. An > optimization to search for BUT a term or two is to create a temporary > table of all of your terms and delete the exact ones you want to > exclude. Put an index on yoru temp table then JOIN that back into your > query again (replacing the huge IN clause). The database will match > index to index and things will get fast again. This technique can scale > up to some really big queries. > > Always try to code for the affirmative tests. Your users will thank you. > > > Shawn Green > Database Administrator > Unimin Corporation - Spruce Pine >
RE: Query Optimization Question
Yes, a ranged query should respond faster than a negation. In some cases you can seriously improve query performance for a negation query if you split it into two range queries unioned together. Here is a pseudo example: This query should be slow due to the table scan it takes to test the condition: SELECT ... FROM ... WHERE indexfield <> 16 This query should be noticeably faster due to using ranged queries (partial index ranged matches): ( SELECT ... FROM ... WHERE indexfield < 16 ) UNION ( SELECT ... FROM ... WHERE indexfield > 16 ) Of course, if you have to do a table scan ANYWAY (because a value you have in a constraint is not in an index) then this won't help. This optimization is all about getting the engine to use an index whenever it can instead of performing a table scan. Of course, that is not always possible. Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Robert DiFalco" <[EMAIL PROTECTED]> wrote on 03/13/2006 10:48:29 AM: > Shawn, > > Any performance gains for specifying "type > 0" than "type <> 0" ? > > R. > > > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, March 13, 2006 6:37 AM > To: Robert DiFalco > Cc: mysql@lists.mysql.com > Subject: Re: Query Optimization Question > > > > > "Robert DiFalco" <[EMAIL PROTECTED]> wrote on 03/11/2006 12:43:43 > PM: > > > In a previous database engine I was using an IN was more optimal than > a > > <>. So, for example: > > > > SELECT * FROM table WHERE table.type IN (1,2,3); > > > > Where the possible values of type are 0-3, was appreciably faster > than: > > > > SELECT * FROM table WHERE table.type <> 0; > > > > I've been playing with the Query Browser and checking out the > > optimization documents and haven't been able to make a clear call on > > whether or not this is also the case with MySQL/InnoDB. > > > > TIA, > > > > R. > > > > > > YES, YES, YES! This is definitely an optimization. > > When you say IN or =, you are asking for "matching values". Matches can > come from indexes. When you say <> or NOT IN, you are asking for > everything BUT matches. In order to evaluate a negative, the database > engine (and this is usually true regardless of database server) almost > always performs a full table scan to test every row to make sure it is > either <> or NOT IN. At the very best, they have to perform a full index > scan which is still less efficient than ranged or values-based lookups. > > > It's when you get into the situation where you are matching against > dozens of IN-clause items that you may run into slowdowns again. Until > you reach 2 or 3 dozen terms (depending on your hardware) you should be > faster with an IN comparison than a <> or a NOT IN comparison. An > optimization to search for BUT a term or two is to create a temporary > table of all of your terms and delete the exact ones you want to > exclude. Put an index on yoru temp table then JOIN that back into your > query again (replacing the huge IN clause). The database will match > index to index and things will get fast again. This technique can scale > up to some really big queries. > > Always try to code for the affirmative tests. Your users will thank you. > > > Shawn Green > Database Administrator > Unimin Corporation - Spruce Pine >
RE: Query Optimization Question
Shawn, Any performance gains for specifying "type > 0" than "type <> 0" ? R. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 6:37 AM To: Robert DiFalco Cc: mysql@lists.mysql.com Subject: Re: Query Optimization Question "Robert DiFalco" <[EMAIL PROTECTED]> wrote on 03/11/2006 12:43:43 PM: > In a previous database engine I was using an IN was more optimal than a > <>. So, for example: > > SELECT * FROM table WHERE table.type IN (1,2,3); > > Where the possible values of type are 0-3, was appreciably faster than: > > SELECT * FROM table WHERE table.type <> 0; > > I've been playing with the Query Browser and checking out the > optimization documents and haven't been able to make a clear call on > whether or not this is also the case with MySQL/InnoDB. > > TIA, > > R. > > YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for "matching values". Matches can come from indexes. When you say <> or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either <> or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a <> or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query Optimization Question
"Robert DiFalco" <[EMAIL PROTECTED]> wrote on 03/11/2006 12:43:43 PM: > In a previous database engine I was using an IN was more optimal than a > <>. So, for example: > > SELECT * FROM table WHERE table.type IN (1,2,3); > > Where the possible values of type are 0-3, was appreciably faster than: > > SELECT * FROM table WHERE table.type <> 0; > > I've been playing with the Query Browser and checking out the > optimization documents and haven't been able to make a clear call on > whether or not this is also the case with MySQL/InnoDB. > > TIA, > > R. > > YES, YES, YES! This is definitely an optimization. When you say IN or =, you are asking for "matching values". Matches can come from indexes. When you say <> or NOT IN, you are asking for everything BUT matches. In order to evaluate a negative, the database engine (and this is usually true regardless of database server) almost always performs a full table scan to test every row to make sure it is either <> or NOT IN. At the very best, they have to perform a full index scan which is still less efficient than ranged or values-based lookups. It's when you get into the situation where you are matching against dozens of IN-clause items that you may run into slowdowns again. Until you reach 2 or 3 dozen terms (depending on your hardware) you should be faster with an IN comparison than a <> or a NOT IN comparison. An optimization to search for BUT a term or two is to create a temporary table of all of your terms and delete the exact ones you want to exclude. Put an index on yoru temp table then JOIN that back into your query again (replacing the huge IN clause). The database will match index to index and things will get fast again. This technique can scale up to some really big queries. Always try to code for the affirmative tests. Your users will thank you. Shawn Green Database Administrator Unimin Corporation - Spruce Pine
Re: Query Optimization Question
Robert DiFalco wrote: In a previous database engine I was using an IN was more optimal than a <>. So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type <> 0; IN should be faster implemented with both hash tables and BTREE's so nowadays it should also be faster than <> as all MySQL implementation AFAIK use those well known data structures for indexes. I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. -- Mladen Adamovic http://home.blic.net/adamm http://www.shortopedia.com http://www.froola.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query Optimization Question
In a previous database engine I was using an IN was more optimal than a <>. So, for example: SELECT * FROM table WHERE table.type IN (1,2,3); Where the possible values of type are 0-3, was appreciably faster than: SELECT * FROM table WHERE table.type <> 0; I've been playing with the Query Browser and checking out the optimization documents and haven't been able to make a clear call on whether or not this is also the case with MySQL/InnoDB. TIA, R. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query optimization question
When you don't have subselects, you have two options: temporary tables or JOINed queries.In your case, I think the temporary table is the better way to go. I would also eliminate the ABS() check so that I can compare values directly against the index. I know the math is correct your way but this way you are comparing values directly against the column which means that indexes can come into play. SET @targetLat = 44.6, @targetLon = -123.8, @Delta = 3 CREATE TEMPORARY TABLE tmpDeltaData SELECT city, state, country, latitude, longitude FROM Londata WHERE latitude BETWEEN (@[EMAIL PROTECTED]) AND (@targetLat + @Delta) AND longitude BETWEEN (@targetLon - @Delta) AND (@targetLon + @Delta) SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9\\.]+$' AND longitude REGEXP'[0-9\\.]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(@targetLat))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(@targetLat))) * (COS(RADIANS(latitude))) * (COS(RADIANS(@targetLon -longitude) * 111),) as distance FROM tmpDeltaData ORDER BY distance DROP TEMPORARY TABLE tmpDeltaData I would also test the WHERE clause from tmpDeltaData with your original query to compare speeds of the two methods. Shawn Green Database Administrator Unimin Corporation - Spruce Pine Gerald Taylor <[EMAIL PROTECTED]> wrote on 10/04/2004 10:33:22 AM: > Query optimization question > > I am selecting from a single table but it has a lot of rows and it has > a very involved calculation. What I really want to do is > is FIRST restrict the number of rows so that the big calculation is only > performed on the ones that are within 3 degrees. > > Using 4.0.20 > > A sample query is given here: > The application interpolates variable values such as 44.6 into > the query string, so from mysql's > point of view they are constants, right? And the explain doc > says it optimizes constants, but it is looking at all the rows > and I see why. > > SELECT city, state, country, latitude, longitude, > IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', > ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * > (SIN(RADIANS(latitude))) + > (COS(RADIANS(44.6))) * > (COS(RADIANS(latitude))) * > (COS(RADIANS(-123.28 -longitude) > * 111),) as distance FROM londata > WHERE ABS(44.6-latitude) <= 3.0 AND ABS(-123.28-longitude) <= 3.0 ORDER > BY distance; > > > I guess I can't do a subselect with my version... > If I could what would it look like? > Something like below? (I might be able to talk > the powers that be into an upgrade.) And if I can't > is it more horrible to manually create a temporary table > and perform the calculations on it than it is to > just do what I am doing? > > SELECT city, state, country, latitude, longitude, > IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', > ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * > (SIN(RADIANS(latitude))) + > (COS(RADIANS(44.6))) * > (COS(RADIANS(latitude))) * > (COS(RADIANS(-123.28 -longitude) > * 111),) as distance FROM (SELECT * FROM londata > WHERE ABS(44.6-latitude) <= 3.0 AND ABS(-123.28-longitude) <= 3.0) > as > sublon ORDER BY distance; > > Thanks. > > GT > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] >
Re: Query optimization question
Gerald Taylor wrote: Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) <= 3.0 AND ABS(-123.28-longitude) <= 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) <= 3.0 AND ABS(-123.28-longitude) <= 3.0) as sublon ORDER BY distance; Thanks. GT A Subselect won't help. You still have to scan the whole table to the calculation for the where clause. Pre calculate your min and max lat. and lon. WHERE latitude <= @maxlat and latitude >= @minlat and longitude <= @maxlon and longitude >= @minlon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Query optimization question
Query optimization question I am selecting from a single table but it has a lot of rows and it has a very involved calculation. What I really want to do is is FIRST restrict the number of rows so that the big calculation is only performed on the ones that are within 3 degrees. Using 4.0.20 A sample query is given here: The application interpolates variable values such as 44.6 into the query string, so from mysql's point of view they are constants, right? And the explain doc says it optimizes constants, but it is looking at all the rows and I see why. SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM londata WHERE ABS(44.6-latitude) <= 3.0 AND ABS(-123.28-longitude) <= 3.0 ORDER BY distance; I guess I can't do a subselect with my version... If I could what would it look like? Something like below? (I might be able to talk the powers that be into an upgrade.) And if I can't is it more horrible to manually create a temporary table and perform the calculations on it than it is to just do what I am doing? SELECT city, state, country, latitude, longitude, IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$', ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) * (SIN(RADIANS(latitude))) + (COS(RADIANS(44.6))) * (COS(RADIANS(latitude))) * (COS(RADIANS(-123.28 -longitude) * 111),) as distance FROM (SELECT * FROM londata WHERE ABS(44.6-latitude) <= 3.0 AND ABS(-123.28-longitude) <= 3.0) as sublon ORDER BY distance; Thanks. GT -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]