On Fri, Aug 22, 2003 at 12:42:27PM -0700, Steven Roussey wrote:
> 
> But since this data is read only, why not reformulate the data for the
> queries you are going to make? This is the opposite of normalizing, and
> will require more disk space, and is not flexible, but it will be fast.
> Of course, it depends on what you are doing and how many types of
> queries you have. This 'normalize by queries' or what I refer to as
> 'selective denormalization' likely won't appear in any books. 

Hmm, I guess I expected it to be faster normalized in any case,
and that I shouldn't waste all that effort I put in to 
normalizing it properly. I thought I was learning something :-|.
I'll see if I can experiment and get a sense of the difference
it will make.

> PS: Get a big fast SCSI RAID 10 array of 15K drives (stripe the
> mirrors). Actually, since this data is readonly and copies are stored
> elsewhere, you could get by with RAID 1. I'm guessing you are being held
> back by the disk, or your memory buffers.

In fact, it's already running on a RAID 1 array of 15K SCSI drives.
I wanted to set it up to work well with this data set.

If you want fun, I could show you the numbers I get running the
queries on my development laptop.

> Oh, and just for fun:
> 
> # order the files
> ALTER TABLE cg ORDER BY dir1;

[etc.]

I had to substitute the values of dir1 in these queries, but then:

> # Get a new result for direction 1
> SELECT STRAIGHT_JOIN cg.cw FROM 
> cg USE INDEX(dir1),
> q USE INDEX(dir1),
> cit USE INDEX(dir1),
> sref USE INDEX(dir1)
> WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND
> cg.cw LIKE 't%' AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
> ORDER BY cg.cw LIMIT 1000,10;

This took 8.18 sec first, 2.29 sec immediately thereafter.

> # order the files
> ALTER TABLE cg ORDER BY dir2;

[etc.]

> # Get a new result for direction 2
> SELECT STRAIGHT_JOIN cg.cw FROM 
> sref USE INDEX(dir2),
> cit USE INDEX(dir2),
> q USE INDEX(dir2),
> cg USE INDEX(dir2)
> WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND
> cg.cw LIKE 't%' AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
> ORDER BY cg.cw LIMIT 1000,10;

This took 3.05 sec first, and 2.96 sec immediately thereafter.

Best,

Jesse

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to