Re: Slow Query Question - Need help of Gurus.
Ben Dinnerville wrote: Note that sorting by the count can't use an index, so it will be slower than if you had ordered by `Call Svc Tag ID`. This is something that will need testing. Ordering on a varchar(255) column (call svc tag ID) is going to be a lot more inefficient than ordering on a numeric column, especially when the ordering is happening on the reduced result set of the group by and predicate clauses, however the indexing available on the "real" column may negate any deficit. The ordering happens on the reduced result set either way, so that isn't relevant here. The key point is the index. Without any index, it is certainly true that a varchar(255) should take longer to sort than an int, but in this case the varchar column is already sorted in the index, while the int is the result of a calculation. No additional work is needed to sort the varchar, while the int must be sorted. On the other hand, sorting ints is usually fast, so I don't think this should be a big factor. MySQL handles BETWEEN just fine. If you think about it, you explicitly set the range with BETWEEN, but the optimizer has to put the two inequalities joined with AND together to get the same range. In other words, BETWEEN is easier. I am sure that MySQL does handle the "between" just fine, but this is not to say that it will handle the between better that a gt / lt clause in this particular scenario. I know the between sounds like it might be easier, but internally the between is likely to be translated into a gt / lt clause by the optimiser anyhow as the only way to check if something is between 2 values is the check if it is greater than the min and less than the max. Just give the 2 different queries a go and see if there is any time difference. I won't pretend to be an expert on the internals of the optimizer, but I think you are missing the value of the index. Another way to tell if a value is between two others is to check its relative position in the index. Both the between and the 2 inequalities define a range. With the index, the optimizer need merely find the start and end of the range and then grab everything from the one to the other. I am certainly in favor of testing, but I'd be surprised if you saw a measurable difference between the two. Also noticed that the table seems to be a fairly flat structure (hard to tell definitely from looking at one sample). By this I mean that there are a lot of varchar(255)'s in the table, and looking at the naming of the columns, these seem to be "id" type fields. You might be able to gain some performance by normalising the database a little by moving some of the large varchar fields out to a related table and referencing by a numeric type id, this could greatly reduce the amount of data in your base table, especially considering the number of rows you are talking about storing and could give you an IO based performance increase. Once again, might need to test and play around a little with different models here. Excellent point. Normalizing would help a lot. Unless there are 500,000 Call Reps, there's a LOT of wasted space in this table. To make matters worse, the rows are variable length, so there's a penalty for wasted space. At the least, the Call Rep info should be in its own table, and the 6 Call Rep columns should be replaced with an int column containing the Call Rep key. Similarly, the Case Status and Substatus should be moved to their own table(s?). If at all possible, all tables should be fixed length (CHARs) of a reasonable size (will you ever really have a 255 char Call Rep Employee Name?) rather than variable length (VARCHARs). Usually, I'd say that disk is cheap, so go ahead and waste some space to save on time, but in this case, I expect we'll save so much space from normalizing that even changing VARCHAR to CHAR we'll still end up ahead spacewise. Dropping any unused indexes could also help, in general. For example, unless we need to select based on time of day without regard to date, the index on `Journal Create Time` is unlikely to be used. In that case drop both date and time indexes and replace them with one multicolumn index. Also, are you looking for null values, or the word "null" in the column? I noticed that too and forgot to mention it. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Slow Query Question - Need help of Gurus.
Ben, - Original Message - From: "Ben Dinnerville" Sent: Monday, April 19, 2004 1:49 AM Subject: RE: Slow Query Question - Need help of Gurus. >>Then try again: >> >> SELECT `Call Svc Tag ID`, >> Count(*) as counter, >> `Journal Create Date` >> FROM 31909_859552 >> WHERE `Journal Create Date` between '2004-03-13' AND '2004-03-16' >> AND `Call Svc Tag ID`<>'null' >> GROUP BY `Call Svc Tag ID` >> HAVING counter > 3 >> ORDER BY counter; > The "count(*)" will be causing some havoc here, as all columns in the > underlying table(s) will have to be read in, negating any direct hits to the > index's and causing a lot more IO than is needed. Change it to a count on > one of the columns in the result set or simply a "count(1) as counter" - > will give you the same result without any IO. COUNT(*) is not a problem. It won't cause the data file to be read if just the index can be used. EXPLAIN will show the same plan for COUNT(*) and COUNT(1). :-) Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Slow Query Question - Need help of Gurus.
>>The indexes were listed at the bottom of the original post. Woops, didn’t see that far down, should have scrolled a little further :) >>What is needed, I expect, is a >>multi-column index on those 2 columns: >> >> ALTER TABLE 31909_859552 >> ADD INDEX Tag_Created (`Call Svc Tag ID`, `Journal Create Date`); Definatly get an index across all queried fields, especially in this case where there are only 2 columns in the result set, you may be able to avoid hitting the data leaf's of the table all together and retrieve all info from the index alone, saving you the added IO on the data leaf's >>Then try again: >> >> SELECT `Call Svc Tag ID`, >> Count(*) as counter, >> `Journal Create Date` >> FROM 31909_859552 >> WHERE `Journal Create Date` between '2004-03-13' AND '2004-03-16' >> AND `Call Svc Tag ID`<>'null' >> GROUP BY `Call Svc Tag ID` >> HAVING counter > 3 >> ORDER BY counter; The "count(*)" will be causing some havoc here, as all columns in the underlying table(s) will have to be read in, negating any direct hits to the index's and causing a lot more IO than is needed. Change it to a count on one of the columns in the result set or simply a "count(1) as counter" - will give you the same result without any IO. >>Note that sorting by the count can't use an index, so it will be slower >>than >>if you had ordered by `Call Svc Tag ID`. This is something that will need testing. Ordering on a varchar(255) column (call svc tag ID) is going to be a lot more inefficient that ordering on a numeric column, especially when the ordering is happening on the reduced result set of the group by and predicate clauses, however the indexing available on the "real" column may negate any deficit. >>MySQL handles BETWEEN just fine. If you think about it, you explicitly >>set >>the range with BETWEEN, but the optimizer has to put the two inequalities >>joined with AND together to get the same range. In other words, BETWEEN >>is >>easier. I am sure that MySQL does handle the "between" just fine, but this is not to say that it will handle the between better that a gt / lt clause in this particular scenario. I know the between sounds like it might be easier, but internally the between is likely to be translated into a gt / lt clause by the optimiser anyhow as the only way to check if something is between 2 values is the check if it is greater than the min and less than the max. Just give the 2 different queries a go and see if there is any time difference. Also noticed that the table seems to be a fairly flat structure (hard to tell defiantly from looking at one sample). By this I mean that there are a lot of varchar(255)'s in the table, and looking at the naming of the columns, these seem to be "id" type fields. You might be able to gain some performance by normalising the database a little by moving some of the large varchar fields out to a related table and referencing by a numeric type id, this could greatly reduce the amount of data in your base table, especially considering the number of rows you are talking about storing and could give you an IO based performance increase. Once again, might need to test and play around a little with different models here. What kind of proportion of null values are you expecting for the Call Svc Tag ID column? Depending on this, you may or may not be better off restructuring the query to filter out the null values post grouping - eg as an extra for the having clause "having counter >3 and call svc tag id <> null) - My mind is not super clear on this one at the moment, but am not sure if / how null values get indexed in MySQL (any feedback on this one anyone?) another one that maybe only testing will show if it helps or not. Also, are you looking for null values, or the word "null" in the column? --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Slow Query Question - Need help of Gurus.
Ben Dinnerville wrote: You have a redundant clause in your query - the distinct is not needed when you are doing a group by on the same field ('Call Svr Tag ID') - not sure how the optimiser in MySQL will handle this. In some RDBMS's the duplicate clause will be optimised down to 1 clause, so you wont cop a performance hit, but just because you think an optimiser will do something is no reason not to write the query right the first time - always makes it easier on the optimiser and write the query with as few possible predicates and clauses as possible. DISTINCT is not a function you feed a column into. It is a SELECT keyword which prevents duplicate rows. For example, CREATE TABLE t (v1 int, v2 int); INSERT INTO t VALUES (1, 1), (1, 2); SELECT DISTINCT(v1), v2 FROM t; +--+--+ | v1 | v2 | +--+--+ |1 |1 | |1 |2 | +--+--+ 2 rows in set (0.00 sec) Once you add GROUP BY, you are guaranteed unique rows, one for each group, so DISTINCT adds nothing. What indexes do you have on the table? You need an index on the 'Journal Create Date' (PS identifiers with spaces in the name are bad practice!). An index on the Call_Svc_Tag_ID column may also assist with the group by function - you need to run an explain to see what indexes are needed. The indexes were listed at the bottom of the original post. He already has indexes on `Journal Create Date` and `Call Svc Tag ID`, but MySQL will only use one index per table, so separate indexes won't help much. He should definitely run an EXPLAIN. I expect EXPLAIN will list both as possible keys, and which, if any, it picked. What is needed, I expect, is a multi-column index on those 2 columns: ALTER TABLE 31909_859552 ADD INDEX Tag_Created (`Call Svc Tag ID`, `Journal Create Date`); Then try again: SELECT `Call Svc Tag ID`, Count(*) as counter, `Journal Create Date` FROM 31909_859552 WHERE `Journal Create Date` between '2004-03-13' AND '2004-03-16' AND `Call Svc Tag ID`<>'null' GROUP BY `Call Svc Tag ID` HAVING counter > 3 ORDER BY counter; Note that sorting by the count can't use an index, so it will be slower than if you had ordered by `Call Svc Tag ID`. Also, not sure how the optimiser and indexing in MySQL handles the between predicate, but you might want to try doing a greater than equal to test and a less than test in the place of the between test - eg WHERE Journal_Create_Date >= '2004-03-13' AND Journal_Create_Date < '2004-03-16' MySQL handles BETWEEN just fine. If you think about it, you explicitly set the range with BETWEEN, but the optimizer has to put the two inequalities joined with AND together to get the same range. In other words, BETWEEN is easier. In answer to one of the other follow ups, a count() should not result in a full table scan when there are other predicates in the query - they will be resolved first and then the count() will be done over the temporary result set. > Cheers, Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Hi All, I have a huge Database which contains about 500,000 records, (will be adding about 20-25k records every week) I need to run group queries and output the same to a web interface. - At present a select takes anywhere from 20 seconds to 10 minutes on such queries. MySQL/MyISAM is designed to give fast SELECTS, so 10 minutes seems excessive. What can I do? Till now we were using MS Access, and no web based interface. Running a normal MySQL 4.0 install on windows 2000, 256 MB RAM, 2.6 GHz P4 NON-HT About 10 users at a time would be connecting initially. I'm no Win2000 expert, but that doesn't sound like enough RAM, especially if the web server is on the same machine. Ideally, you'd like your data cached in RAM, but this one table appears to be 100 Mb or so, not even counting the indexes. With a query taking 10 minutes, I wonder if you're running out of memory and thrashing the disk. I know theres a lot of tweaking possible with MySQL. Please advise, because I would need a lot of such queries, fetching through JSP, and displaying them in a presentable form. Table Structure is as Follows: Journal Created By varchar(255) No Case Substatus ID varchar(255) Yes NULL Case Status ID varchar(255) Yes NULL Call Rep Employee Name varchar(255) Yes NULL Call Rep Employee ID varchar(10) Yes NULL Call Rep DPS Login ID varchar(10) Yes NULL Call Rep Profile ID varchar(15) Yes NULL Call Rep Section Code varchar(15) Yes NULL Call Rep Country Code char(3) Yes NULL Journal Create Date date No -00-00 Journal Create Time time Yes NULL Call Svc Tag ID varchar(255) Yes NULL Order Num of Svc Tag varchar(255) Yes NULL > Indexes
RE: Slow Query Question - Need help of Gurus.
You have a redundant clause in your query - the distinct is not needed when you are doing a group by on the same field ('Call Svr Tag ID') - not sure how the optimiser in MySQL will handle this. In some RDBMS's the duplicate clause will be optimised down to 1 clause, so you wont cop a performance hit, but just because you think an optimiser will do something is no reason not to write the query right the first time - always makes it easier on the optimiser and write the query with as few possible predicates and clauses as possible. What indexes do you have on the table? You need an index on the 'Journal Create Date' (PS identifiers with spaces in the name are bad practice!). An index on the Call_Cvc_Tag_ID column may also assist with the group by function - you need to run an explain to see what indexes are needed. Also, not sure how the optimiser and indexing in MySQL handles the between predicate, but you might want to try doing a greater than equal to test and a less than test in the place of the between test - eg WHERE Journal_Create_Date >= '2004-03-13' AND Journal_Create_Date < '2004-03-16' In answer to one of the other follow ups, a count() should not result in a full table scan when there are other predicates in the query - they will be resolved first and then the count() will be done over the temporary result set. Cheers, Ben -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Sunday, 18 April 2004 06:50 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Slow Query Question - Need help of Gurus. Importance: High Hi All, I have a huge Database which contains about 500,000 records, (will be adding about 20-25k records every week) I need to run group queries and output the same to a web interface. An example is: SELECT DISTINCT(`Call Svc Tag ID`),Count(`Call Svc Tag ID`) as counter, `Journal Create Date` FROM 31909_859552 WHERE `Journal Create Date` between '2004-03-13' AND '2004-03-16' and `Call Svc Tag ID`<>'null' GROUP BY `Call Svc Tag ID` having count(`Call Svc Tag ID`)>3 order by counter; --basically finding out repeat journal entries over a period of 3 days for the same tag.. (the no. of days may vary.. 2/3/4/5... , and same goes with the having count clause..) The output results to about 150-1 rows or so.. Based on selections -I've designed the table to be a MyISAM (I don't care how long inserts take, that will be a once-a-week-one-time process on the server, I want the selects to be fast) - At present a select takes anywhere from 20 seconds to 10 minutes on such queries. What can I do? Till now we were using MS Access, and no web based interface. Running a normal MySQL 4.0 install on windows 2000, 256 MB RAM, 2.6 GHz P4 NON-HT About 10 users at a time would be connecting initially. I know theres a lot of tweaking possible with MySQL. Please advise, because I would need a lot of such queries, fetching through JSP, and displaying them in a presentable form. Table Structure is as Follows: Journal Created By varchar(255) No Case Substatus ID varchar(255) Yes NULL Case Status ID varchar(255) Yes NULL Call Rep Employee Name varchar(255) Yes NULL Call Rep Employee ID varchar(10) Yes NULL Call Rep DPS Login ID varchar(10) Yes NULL Call Rep Profile ID varchar(15) Yes NULL Call Rep Section Code varchar(15) Yes NULL Call Rep Country Code char(3) Yes NULL Journal Create Date date No -00-00 Journal Create Time time Yes NULL Call Svc Tag ID varchar(255) Yes NULL Order Num of Svc Tag varchar(255) Yes NULL Indexes: Keyname Type Cardinality Action Field Journal Created By INDEX None Journal Created By Call Rep Employee ID INDEX None Call Rep Employee ID Call Rep DPS Login ID INDEX None Call Rep DPS Login ID Call Rep Profile ID INDEX None Call Rep Profile ID Call Rep Country Code INDEX None Call Rep Country Code Journal Create Time INDEX None Journal Create Time Journal Create Date INDEX None Journal Create Date Call Svc Tag ID INDEX None Call Svc Tag ID Regards, Amit Wadhwa, Data Analyst -- MySQL Java Mailing List For list archives: http://lists.mysql.com/java To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004 -- MySQL General Mailing L
Slow Query Question - Need help of Gurus.
Hi All, I have a huge Database which contains about 500,000 records, (will be adding about 20-25k records every week) I need to run group queries and output the same to a web interface. An example is: SELECT DISTINCT(`Call Svc Tag ID`),Count(`Call Svc Tag ID`) as counter, `Journal Create Date` FROM 31909_859552 WHERE `Journal Create Date` between '2004-03-13' AND '2004-03-16' and `Call Svc Tag ID`<>'null' GROUP BY `Call Svc Tag ID` having count(`Call Svc Tag ID`)>3 order by counter; --basically finding out repeat journal entries over a period of 3 days for the same tag.. (the no. of days may vary.. 2/3/4/5... , and same goes with the having count clause..) The output results to about 150-1 rows or so.. Based on selections -I've designed the table to be a MyISAM (I don't care how long inserts take, that will be a once-a-week-one-time process on the server, I want the selects to be fast) - At present a select takes anywhere from 20 seconds to 10 minutes on such queries. What can I do? Till now we were using MS Access, and no web based interface. Running a normal MySQL 4.0 install on windows 2000, 256 MB RAM, 2.6 GHz P4 NON-HT About 10 users at a time would be connecting initially. I know theres a lot of tweaking possible with MySQL. Please advise, because I would need a lot of such queries, fetching through JSP, and displaying them in a presentable form. Table Structure is as Follows: Journal Created By varchar(255) No Case Substatus ID varchar(255) Yes NULL Case Status ID varchar(255) Yes NULL Call Rep Employee Name varchar(255) Yes NULL Call Rep Employee ID varchar(10) Yes NULL Call Rep DPS Login ID varchar(10) Yes NULL Call Rep Profile ID varchar(15) Yes NULL Call Rep Section Code varchar(15) Yes NULL Call Rep Country Code char(3) Yes NULL Journal Create Date date No -00-00 Journal Create Time time Yes NULL Call Svc Tag ID varchar(255) Yes NULL Order Num of Svc Tag varchar(255) Yes NULL Indexes: Keyname Type Cardinality Action Field Journal Created By INDEX None Journal Created By Call Rep Employee ID INDEX None Call Rep Employee ID Call Rep DPS Login ID INDEX None Call Rep DPS Login ID Call Rep Profile ID INDEX None Call Rep Profile ID Call Rep Country Code INDEX None Call Rep Country Code Journal Create Time INDEX None Journal Create Time Journal Create Date INDEX None Journal Create Date Call Svc Tag ID INDEX None Call Svc Tag ID Regards, Amit Wadhwa, Data Analyst -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]