On Fri, Aug 22, 2003 at 12:42:27PM -0700, Steven Roussey wrote: > > But since this data is read only, why not reformulate the data for the > queries you are going to make? This is the opposite of normalizing, and > will require more disk space, and is not flexible, but it will be fast. > Of course, it depends on what you are doing and how many types of > queries you have. This 'normalize by queries' or what I refer to as > 'selective denormalization' likely won't appear in any books.
Hmm, I guess I expected it to be faster normalized in any case, and that I shouldn't waste all that effort I put in to normalizing it properly. I thought I was learning something :-|. I'll see if I can experiment and get a sense of the difference it will make. > PS: Get a big fast SCSI RAID 10 array of 15K drives (stripe the > mirrors). Actually, since this data is readonly and copies are stored > elsewhere, you could get by with RAID 1. I'm guessing you are being held > back by the disk, or your memory buffers. In fact, it's already running on a RAID 1 array of 15K SCSI drives. I wanted to set it up to work well with this data set. If you want fun, I could show you the numbers I get running the queries on my development laptop. > Oh, and just for fun: > > # order the files > ALTER TABLE cg ORDER BY dir1; [etc.] I had to substitute the values of dir1 in these queries, but then: > # Get a new result for direction 1 > SELECT STRAIGHT_JOIN cg.cw FROM > cg USE INDEX(dir1), > q USE INDEX(dir1), > cit USE INDEX(dir1), > sref USE INDEX(dir1) > WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND > cg.cw LIKE 't%' AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH) > ORDER BY cg.cw LIMIT 1000,10; This took 8.18 sec first, 2.29 sec immediately thereafter. > # order the files > ALTER TABLE cg ORDER BY dir2; [etc.] > # Get a new result for direction 2 > SELECT STRAIGHT_JOIN cg.cw FROM > sref USE INDEX(dir2), > cit USE INDEX(dir2), > q USE INDEX(dir2), > cg USE INDEX(dir2) > WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id AND > cg.cw LIKE 't%' AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH) > ORDER BY cg.cw LIMIT 1000,10; This took 3.05 sec first, and 2.96 sec immediately thereafter. Best, Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]