Re: Slow Query Question - Need help of Gurus.

2004-04-19 Thread Michael Stassen


Ben Dinnerville wrote:


Note that sorting by the count can't use an index, so it will be slower
than if you had ordered by `Call Svc Tag ID`.
This is something that will need testing. Ordering on a varchar(255) column
(call svc tag ID) is going to be a lot more inefficient than ordering on a
numeric column, especially when the ordering is happening on the reduced
result set of the group by and predicate clauses, however the indexing
available on the "real" column may negate any deficit.
The ordering happens on the reduced result set either way, so that isn't 
relevant here.  The key point is the index.  Without any index, it is 
certainly true that a varchar(255) should take longer to sort than an int, 
but in this case the varchar column is already sorted in the index, while 
the int is the result of a calculation.  No additional work is needed to 
sort the varchar, while the int must be sorted.  On the other hand, sorting 
ints is usually fast, so I don't think this should be a big factor.

MySQL handles BETWEEN just fine. If you think about it, you
explicitly set the range with BETWEEN, but the optimizer has to put
the two inequalities joined with AND together to get the same range.
In other words, BETWEEN is  easier.
I am sure that MySQL does handle the "between" just fine, but this is not to
say that it will handle the between better that a gt / lt clause in this
particular scenario. I know the between sounds like it might be easier, but
internally the between is likely to be translated into a gt / lt clause by
the optimiser anyhow as the only way to check if something is between 2
values is the check if it is greater than the min and less than the max.
Just give the 2 different queries a go and see if there is any time
difference.
I won't pretend to be an expert on the internals of the optimizer, but I 
think you are missing the value of the index.  Another way to tell if a 
value is between two others is to check its relative position in the index. 
 Both the between and the 2 inequalities define a range.  With the index, 
the optimizer need merely find the start and end of the range and then grab 
everything from the one to the other.  I am certainly in favor of testing, 
but I'd be surprised if you saw a measurable difference between the two.

Also noticed that the table seems to be a fairly flat structure (hard to
tell definitely from looking at one sample). By this I mean that there are a
lot of varchar(255)'s in the table, and looking at the naming of the
columns, these seem to be "id" type fields. You might be able to gain some
performance by normalising the database a little by moving some of the large
varchar fields out to a related table and referencing by a numeric type id,
this could greatly reduce the amount of data in your base table, especially
considering the number of rows you are talking about storing and could give
you an IO based performance increase. Once again, might need to test and
play around a little with different models here.
Excellent point.  Normalizing would help a lot.  Unless there are 500,000 
Call Reps, there's a LOT of wasted space in this table.  To make matters 
worse, the rows are variable length, so there's a penalty for wasted space. 
 At the least, the Call Rep info should be in its own table, and the 6 Call 
Rep columns should be replaced with an int column containing the Call Rep 
key.  Similarly, the Case Status and Substatus should be moved to their own 
table(s?).  If at all possible, all tables should be fixed length (CHARs) of 
a reasonable size (will you ever really have a 255 char Call Rep Employee 
Name?) rather than variable length (VARCHARs).  Usually, I'd say that disk 
is cheap, so go ahead and waste some space to save on time, but in this 
case, I expect we'll save so much space from normalizing that even changing 
VARCHAR to CHAR we'll still end up ahead spacewise.

Dropping any unused indexes could also help, in general.  For example, 
unless we need to select based on time of day without regard to date, the 
index on `Journal Create Time` is unlikely to be used.  In that case drop 
both date and time indexes and replace them with one multicolumn index.


Also, are you looking for null values, or the word "null" in the column?  
I noticed that too and forgot to mention it.

Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Slow Query Question - Need help of Gurus.

2004-04-19 Thread Matt W
Ben,


- Original Message -
From: "Ben Dinnerville"
Sent: Monday, April 19, 2004 1:49 AM
Subject: RE: Slow Query Question - Need help of Gurus.





>>Then try again:
>>
>>   SELECT `Call Svc Tag ID`,
>>   Count(*) as counter,
>>  `Journal Create Date`
>>   FROM 31909_859552
>>   WHERE `Journal Create Date` between '2004-03-13' AND  '2004-03-16'
>>   AND `Call Svc Tag ID`<>'null'
>>   GROUP BY `Call Svc Tag ID`
>>   HAVING counter > 3
>>   ORDER BY counter;

> The "count(*)" will be causing some havoc here, as all columns in the
> underlying table(s) will have to be read in, negating any direct hits
to the
> index's and causing a lot more IO than is needed. Change it to a count
on
> one of the columns in the result set or simply a "count(1) as
counter" -
> will give you the same result without any IO.

COUNT(*) is not a problem.  It won't cause the data file to be read if
just the index can be used.  EXPLAIN will show the same plan for
COUNT(*) and COUNT(1).  :-)


Matt


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Slow Query Question - Need help of Gurus.

2004-04-18 Thread Ben Dinnerville

>>The indexes were listed at the bottom of the original post.  
Woops, didn’t see that far down, should have scrolled a little further :)

>>What is needed, I expect, is a 
>>multi-column index on those 2 columns:
>>
>>   ALTER TABLE 31909_859552
>>   ADD INDEX Tag_Created (`Call Svc Tag ID`, `Journal Create Date`);

Definatly get an index across all queried fields, especially in this case
where there are only 2 columns in the result set, you may be able to avoid
hitting the data leaf's of the table all together and retrieve all info from
the index alone, saving you the added IO on the data leaf's


>>Then try again:
>>
>>   SELECT `Call Svc Tag ID`,
>>   Count(*) as counter,
>>  `Journal Create Date`
>>   FROM 31909_859552
>>   WHERE `Journal Create Date` between '2004-03-13' AND  '2004-03-16'
>>   AND `Call Svc Tag ID`<>'null'
>>   GROUP BY `Call Svc Tag ID`
>>   HAVING counter > 3
>>   ORDER BY counter;

The "count(*)" will be causing some havoc here, as all columns in the
underlying table(s) will have to be read in, negating any direct hits to the
index's and causing a lot more IO than is needed. Change it to a count on
one of the columns in the result set or simply a "count(1) as counter" -
will give you the same result without any IO.

>>Note that sorting by the count can't use an index, so it will be slower
>>than 
>>if you had ordered by `Call Svc Tag ID`.
This is something that will need testing. Ordering on a varchar(255) column
(call svc tag ID) is going to be a lot more inefficient that ordering on a
numeric column, especially when the ordering is happening on the reduced
result set of the group by and predicate clauses, however the indexing
available on the "real" column may negate any deficit.

>>MySQL handles BETWEEN just fine.  If you think about it, you explicitly
>>set 
>>the range with BETWEEN, but the optimizer has to put the two inequalities 
>>joined with AND together to get the same range.  In other words, BETWEEN
>>is 
>>easier.
I am sure that MySQL does handle the "between" just fine, but this is not to
say that it will handle the between better that a gt / lt clause in this
particular scenario. I know the between sounds like it might be easier, but
internally the between is likely to be translated into a gt / lt clause by
the optimiser anyhow as the only way to check if something is between 2
values is the check if it is greater than the min and less than the max.
Just give the 2 different queries a go and see if there is any time
difference.

Also noticed that the table seems to be a fairly flat structure (hard to
tell defiantly from looking at one sample). By this I mean that there are a
lot of varchar(255)'s in the table, and looking at the naming of the
columns, these seem to be "id" type fields. You might be able to gain some
performance by normalising the database a little by moving some of the large
varchar fields out to a related table and referencing by a numeric type id,
this could greatly reduce the amount of data in your base table, especially
considering the number of rows you are talking about storing and could give
you an IO based performance increase. Once again, might need to test and
play around a little with different models here.

What kind of proportion of null values are you expecting for the Call Svc
Tag ID column? Depending on this, you may or may not be better off
restructuring the query to filter out the null values post grouping - eg as
an extra for the having clause "having counter >3 and call svc tag id <>
null) - My mind is not super clear on this one at the moment, but am not
sure if / how null values get indexed in MySQL (any feedback on this one
anyone?) another one that maybe only testing will show if it helps or not.
Also, are you looking for null values, or the word "null" in the column? 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Slow Query Question - Need help of Gurus.

2004-04-18 Thread Michael Stassen
Ben Dinnerville wrote:

You have a redundant clause in your query - the distinct is not needed when
you are doing a group by on the same field ('Call Svr Tag ID') - not sure
how the optimiser in MySQL will handle this. In some RDBMS's the duplicate
clause will be optimised down to 1 clause, so you wont cop a performance
hit, but just because you think an optimiser will do something is no reason
not to write the query right the first time - always makes it easier on the
optimiser and write the query with as few possible predicates and clauses as
possible.
DISTINCT is not a function you feed a column into.  It is a SELECT keyword 
which prevents duplicate rows.  For example,

  CREATE TABLE t (v1 int, v2 int);
  INSERT INTO t VALUES (1, 1), (1, 2);
  SELECT DISTINCT(v1), v2 FROM t;
  +--+--+
  | v1   | v2   |
  +--+--+
  |1 |1 |
  |1 |2 |
  +--+--+
  2 rows in set (0.00 sec)
Once you add GROUP BY, you are guaranteed unique rows, one for each group, 
so DISTINCT adds nothing.

What indexes do you have on the table?
You need an index on the 'Journal Create Date' (PS identifiers with spaces
in the name are bad practice!).
An index on the Call_Svc_Tag_ID column may also assist with the group by
function - you need to run an explain to see what indexes are needed.
The indexes were listed at the bottom of the original post.  He already has 
indexes on `Journal Create Date` and `Call Svc Tag ID`, but MySQL will only 
use one index per table, so separate indexes won't help much.  He should 
definitely run an EXPLAIN.  I expect EXPLAIN will list both as possible 
keys, and which, if any, it picked.  What is needed, I expect, is a 
multi-column index on those 2 columns:

  ALTER TABLE 31909_859552
  ADD INDEX Tag_Created (`Call Svc Tag ID`, `Journal Create Date`);
Then try again:

  SELECT `Call Svc Tag ID`,
  Count(*) as counter,
 `Journal Create Date`
  FROM 31909_859552
  WHERE `Journal Create Date` between '2004-03-13' AND  '2004-03-16'
  AND `Call Svc Tag ID`<>'null'
  GROUP BY `Call Svc Tag ID`
  HAVING counter > 3
  ORDER BY counter;
Note that sorting by the count can't use an index, so it will be slower than 
if you had ordered by `Call Svc Tag ID`.

Also, not sure how the optimiser and indexing in MySQL handles the between
predicate, but you might want to try doing a greater than equal to test and
a less than test in the place of the between test - eg
WHERE Journal_Create_Date >= '2004-03-13' AND Journal_Create_Date <
'2004-03-16'
MySQL handles BETWEEN just fine.  If you think about it, you explicitly set 
the range with BETWEEN, but the optimizer has to put the two inequalities 
joined with AND together to get the same range.  In other words, BETWEEN is 
easier.

In answer to one of the other follow ups, a count() should not result in a
full table scan when there are other predicates in the query - they will be
resolved first and then the count() will be done over the temporary result
set.
>
Cheers,

Ben

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 

Hi All,
I have a huge Database which contains about 500,000 records, (will be
adding about 20-25k records every week)
I need to run group queries and output the same to a web interface. 

- At present a select takes anywhere from 20 seconds to 10 minutes on
such queries.
MySQL/MyISAM is designed to give fast SELECTS, so 10 minutes seems excessive.

What can I do?

Till now we were using MS Access, and no web based interface.
Running a normal MySQL 4.0 install on windows 2000, 256 MB RAM, 2.6 GHz
P4 NON-HT
About 10 users at a time would be connecting initially.
I'm no Win2000 expert, but that doesn't sound like enough RAM, especially if 
the web server is on the same machine.  Ideally, you'd like your data cached 
in RAM, but this one table appears to be 100 Mb or so, not even counting the 
indexes.  With a query taking 10 minutes, I wonder if you're running out of 
memory and thrashing the disk.

I know theres a lot of tweaking possible with MySQL.
Please advise, because I would need a lot of such queries, fetching
through JSP, and displaying them in a presentable form.
Table Structure is as Follows:

Journal Created By  varchar(255)   No  
Case Substatus ID  varchar(255)   Yes  NULL
Case Status ID  varchar(255)   Yes  NULL
Call Rep Employee Name  varchar(255)   Yes  NULL
Call Rep Employee ID  varchar(10)   Yes  NULL
Call Rep DPS Login ID  varchar(10)   Yes  NULL
Call Rep Profile ID  varchar(15)   Yes  NULL
Call Rep Section Code  varchar(15)   Yes  NULL
Call Rep Country Code  char(3)   Yes  NULL
Journal Create Date  date   No  -00-00
Journal Create Time  time   Yes  NULL
Call Svc Tag ID  varchar(255)   Yes  NULL
Order Num of Svc Tag  varchar(255)   Yes  NULL   
>
Indexes

RE: Slow Query Question - Need help of Gurus.

2004-04-18 Thread Ben Dinnerville
You have a redundant clause in your query - the distinct is not needed when
you are doing a group by on the same field ('Call Svr Tag ID') - not sure
how the optimiser in MySQL will handle this. In some RDBMS's the duplicate
clause will be optimised down to 1 clause, so you wont cop a performance
hit, but just because you think an optimiser will do something is no reason
not to write the query right the first time - always makes it easier on the
optimiser and write the query with as few possible predicates and clauses as
possible.

What indexes do you have on the table?
You need an index on the 'Journal Create Date' (PS identifiers with spaces
in the name are bad practice!).
An index on the Call_Cvc_Tag_ID column may also assist with the group by
function - you need to run an explain to see what indexes are needed.

Also, not sure how the optimiser and indexing in MySQL handles the between
predicate, but you might want to try doing a greater than equal to test and
a less than test in the place of the between test - eg
WHERE Journal_Create_Date >= '2004-03-13' AND Journal_Create_Date <
'2004-03-16'

In answer to one of the other follow ups, a count() should not result in a
full table scan when there are other predicates in the query - they will be
resolved first and then the count() will be done over the temporary result
set.

Cheers,

Ben

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 18 April 2004 06:50
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Slow Query Question - Need help of Gurus.
Importance: High

Hi All,
I have a huge Database which contains about 500,000 records, (will be
adding about 20-25k records every week)
I need to run group queries and output the same to a web interface. 

An example is: 

SELECT DISTINCT(`Call Svc Tag ID`),Count(`Call Svc Tag ID`) as counter,
`Journal Create Date`  FROM 31909_859552 WHERE `Journal Create Date`
between '2004-03-13' AND  '2004-03-16'  and `Call Svc Tag ID`<>'null'
GROUP BY `Call Svc Tag ID` having count(`Call Svc Tag ID`)>3 order by
counter;

--basically finding out repeat journal entries over a period of 3 days
for the same tag..
(the no. of days may vary.. 2/3/4/5... , and same goes with the having
count clause..)
The output results to about 150-1 rows or so.. Based on selections

-I've designed the table to be a MyISAM (I don't care how long inserts
take, that will be a once-a-week-one-time process on the server, I want
the selects to be fast)
- At present a select takes anywhere from 20 seconds to 10 minutes on
such queries.

What can I do?

Till now we were using MS Access, and no web based interface.
Running a normal MySQL 4.0 install on windows 2000, 256 MB RAM, 2.6 GHz
P4 NON-HT
About 10 users at a time would be connecting initially.

I know theres a lot of tweaking possible with MySQL.
Please advise, because I would need a lot of such queries, fetching
through JSP, and displaying them in a presentable form.

Table Structure is as Follows:

Journal Created By  varchar(255)   No  
Case Substatus ID  varchar(255)   Yes  NULL
Case Status ID  varchar(255)   Yes  NULL
Call Rep Employee Name  varchar(255)   Yes  NULL
Call Rep Employee ID  varchar(10)   Yes  NULL
Call Rep DPS Login ID  varchar(10)   Yes  NULL
Call Rep Profile ID  varchar(15)   Yes  NULL
Call Rep Section Code  varchar(15)   Yes  NULL
Call Rep Country Code  char(3)   Yes  NULL
Journal Create Date  date   No  -00-00
Journal Create Time  time   Yes  NULL
Call Svc Tag ID  varchar(255)   Yes  NULL
Order Num of Svc Tag  varchar(255)   Yes  NULL   

Indexes: 

Keyname Type Cardinality Action Field 
Journal Created By  INDEX  None   Journal Created By  
Call Rep Employee ID  INDEX  None   Call Rep Employee ID  
Call Rep DPS Login ID  INDEX  None   Call Rep DPS Login ID  
Call Rep Profile ID  INDEX  None   Call Rep Profile ID  
Call Rep Country Code  INDEX  None   Call Rep Country Code  
Journal Create Time  INDEX  None   Journal Create Time  
Journal Create Date  INDEX  None   Journal Create Date  
Call Svc Tag ID  INDEX  None   Call Svc Tag ID  

Regards,
Amit Wadhwa,
Data Analyst



-- 
MySQL Java Mailing List
For list archives: http://lists.mysql.com/java
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
 


--
MySQL General Mailing L

Slow Query Question - Need help of Gurus.

2004-04-17 Thread Amit_Wadhwa
Hi All,
I have a huge Database which contains about 500,000 records, (will be
adding about 20-25k records every week)
I need to run group queries and output the same to a web interface. 

An example is: 

SELECT DISTINCT(`Call Svc Tag ID`),Count(`Call Svc Tag ID`) as counter,
`Journal Create Date`  FROM 31909_859552 WHERE `Journal Create Date`
between '2004-03-13' AND  '2004-03-16'  and `Call Svc Tag ID`<>'null'
GROUP BY `Call Svc Tag ID` having count(`Call Svc Tag ID`)>3 order by
counter;

--basically finding out repeat journal entries over a period of 3 days
for the same tag..
(the no. of days may vary.. 2/3/4/5... , and same goes with the having
count clause..)
The output results to about 150-1 rows or so.. Based on selections

-I've designed the table to be a MyISAM (I don't care how long inserts
take, that will be a once-a-week-one-time process on the server, I want
the selects to be fast)
- At present a select takes anywhere from 20 seconds to 10 minutes on
such queries.

What can I do?

Till now we were using MS Access, and no web based interface.
Running a normal MySQL 4.0 install on windows 2000, 256 MB RAM, 2.6 GHz
P4 NON-HT
About 10 users at a time would be connecting initially.

I know theres a lot of tweaking possible with MySQL.
Please advise, because I would need a lot of such queries, fetching
through JSP, and displaying them in a presentable form.

Table Structure is as Follows:

Journal Created By  varchar(255)   No  
Case Substatus ID  varchar(255)   Yes  NULL
Case Status ID  varchar(255)   Yes  NULL
Call Rep Employee Name  varchar(255)   Yes  NULL
Call Rep Employee ID  varchar(10)   Yes  NULL
Call Rep DPS Login ID  varchar(10)   Yes  NULL
Call Rep Profile ID  varchar(15)   Yes  NULL
Call Rep Section Code  varchar(15)   Yes  NULL
Call Rep Country Code  char(3)   Yes  NULL
Journal Create Date  date   No  -00-00
Journal Create Time  time   Yes  NULL
Call Svc Tag ID  varchar(255)   Yes  NULL
Order Num of Svc Tag  varchar(255)   Yes  NULL   

Indexes: 

Keyname Type Cardinality Action Field 
Journal Created By  INDEX  None   Journal Created By  
Call Rep Employee ID  INDEX  None   Call Rep Employee ID  
Call Rep DPS Login ID  INDEX  None   Call Rep DPS Login ID  
Call Rep Profile ID  INDEX  None   Call Rep Profile ID  
Call Rep Country Code  INDEX  None   Call Rep Country Code  
Journal Create Time  INDEX  None   Journal Create Time  
Journal Create Date  INDEX  None   Journal Create Date  
Call Svc Tag ID  INDEX  None   Call Svc Tag ID  

Regards,
Amit Wadhwa,
Data Analyst



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]