From: "Margaret MacDonald" <[EMAIL PROTECTED]> > Is there a generally-accepted rule of thumb for estimating the > performance cost of joins? I can't find one even in Date, but > intuitively it seems as though there must be one by now.
It's hard to estimate the cost of a join as such. The performance is influenced by the combination of database lay-out and the query. If you know the queries you will perform you can optimize the database and the query itself. The MySQL documentation contains a lot of articles with directions for optimizations. > I'm working on a large information system --probably tens of millions > of records-- and am guessing that reads should outnumber writes by a > factor of 10 at least. High performance from the human user's point > of view is an absolute requirement for success. > > Rather than store category and other 'flavoring' strings repetitively > in the information records themselves, I've stored indexes into other > tables. But that means every lookup joins 3-7 tables. How much am I > paying for that? If the number of 'flavouring' strings are relatively low you can also decide to do the translation from categoryID to category name in the program you write around your queries. > Adding the flavoring strings themselves to each record would increase > the overall storage requirement by some large fraction less than 50%. > Since disk space is already relatively cheap, that shouldn't be a > problem. But some of the flavoring is guaranteed to change at an > unpredictable rate, which would mean repeatedly traversing every > information record to update it. Disk space is not your only concern here. Indices are smaller for integer data (ID numbers) and for integer data MySQL often does not have to access the database itself if it uses the index for that column (which saves you disk reads). Also, smaller data files are easier to cache by the OS. The best thing will be to see to it that you keep your data as small as possible. Analyse the queries using EXPLAIN. A smaller record set in each step will produce faster queries (see estimated number of records). See if you can use indexes for each column in your query. You can make indexes on two or more columns! If you don't have any "using filesort" or "using temporary" in the explain chances are the query is blazingly fast. Try to avoid columns with very little distinct values (e.g. sex: male/female) if you use these in your query and MySQL guesses that it will result in more than apporx. 30% of the records it will perform a full table scan instead of using the index. If you use such a low cardinality column in a combined index MySQL can use the index again. If necessary use USE INDEX or FORCE INDEX to make sure the right index is used. Optimize where clauses yourself by using "column operator constant" (e.g. `birthdate` < NOW() - INTERVAL 16 YEAR). The constant part may be an expression as long as the result is a constant value. This is a lot faster than calculating the expression for each record in the table (e.g. `birthdate` + INTERVAL 16 YEAR < NOW()). Use precalculated values in a seperate column if necessary to prevent MySQL from having to perform calculations on every record for a WHERE condition. The speed of your query also depends on the size of the resulting record set and the operations you will perform on that. Sorting 10 relatively small records is done in memory and will be very fast. Sorting 100.000 records without an index will most likely require a temporary table and wil take 'forever'. Read all you can find in the MySQL documentation and perhaps take a look at Jeremy Zawodny's "High Performance MySQL". Regards, Jigal. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]