Re: Slow Query Question - Need help of Gurus.

Michael Stassen Mon, 19 Apr 2004 10:10:12 -0700

Ben Dinnerville wrote:

<snip>

Note that sorting by the count can't use an index, so it will be slower
than if you had ordered by `Call Svc Tag ID`.


This is something that will need testing. Ordering on a varchar(255) column
(call svc tag ID) is going to be a lot more inefficient than ordering on a
numeric column, especially when the ordering is happening on the reduced
result set of the group by and predicate clauses, however the indexing
available on the "real" column may negate any deficit.

The ordering happens on the reduced result set either way, so that isn't relevant here. The key point is the index. Without any index, it is certainly true that a varchar(255) should take longer to sort than an int, but in this case the varchar column is already sorted in the index, while the int is the result of a calculation. No additional work is needed to sort the varchar, while the int must be sorted. On the other hand, sorting ints is usually fast, so I don't think this should be a big factor.

MySQL handles BETWEEN just fine. If you think about it, you
explicitly set the range with BETWEEN, but the optimizer has to put
the two inequalities joined with AND together to get the same range.
In other words, BETWEEN is  easier.


I am sure that MySQL does handle the "between" just fine, but this is not to
say that it will handle the between better that a gt / lt clause in this
particular scenario. I know the between sounds like it might be easier, but
internally the between is likely to be translated into a gt / lt clause by
the optimiser anyhow as the only way to check if something is between 2
values is the check if it is greater than the min and less than the max.
Just give the 2 different queries a go and see if there is any time
difference.

I won't pretend to be an expert on the internals of the optimizer, but I think you are missing the value of the index. Another way to tell if a value is between two others is to check its relative position in the index. Both the between and the 2 inequalities define a range. With the index, the optimizer need merely find the start and end of the range and then grab everything from the one to the other. I am certainly in favor of testing, but I'd be surprised if you saw a measurable difference between the two.

Also noticed that the table seems to be a fairly flat structure (hard to
tell definitely from looking at one sample). By this I mean that there are a
lot of varchar(255)'s in the table, and looking at the naming of the
columns, these seem to be "id" type fields. You might be able to gain some
performance by normalising the database a little by moving some of the large
varchar fields out to a related table and referencing by a numeric type id,
this could greatly reduce the amount of data in your base table, especially
considering the number of rows you are talking about storing and could give
you an IO based performance increase. Once again, might need to test and
play around a little with different models here.

Excellent point. Normalizing would help a lot. Unless there are 500,000 Call Reps, there's a LOT of wasted space in this table. To make matters worse, the rows are variable length, so there's a penalty for wasted space. At the least, the Call Rep info should be in its own table, and the 6 Call Rep columns should be replaced with an int column containing the Call Rep key. Similarly, the Case Status and Substatus should be moved to their own table(s?). If at all possible, all tables should be fixed length (CHARs) of a reasonable size (will you ever really have a 255 char Call Rep Employee Name?) rather than variable length (VARCHARs). Usually, I'd say that disk is cheap, so go ahead and waste some space to save on time, but in this case, I expect we'll save so much space from normalizing that even changing VARCHAR to CHAR we'll still end up ahead spacewise.

Dropping any unused indexes could also help, in general. For example, unless we need to select based on time of day without regard to date, the index on `Journal Create Time` is unlikely to be used. In that case drop both date and time indexes and replace them with one multicolumn index.

<snip>

Also, are you looking for null values, or the word "null" in the column?

I noticed that too and forgot to mention it.

Michael


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Slow Query Question - Need help of Gurus.

Reply via email to