On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote: > Which tables can have duplicate records in them should be something > decided BEFORE you begin to populate the tables. 99.999% of the time, each > row of any one table should be different from every other row on the same > table. That difference may exist as a combination of values or as a single > value but it is NEARLY ALWAYS in your best interest to have no two rows > identical. It's not too late to make this decision. However, by putting > it off for so long, implementing your uniqueness rules may be much more > difficult.
That's a problem with being self-taught -- there are many things like that one can miss. I was not aware any indexing sped things up. As for the design -- I always knew which tables required unique values, but it was a matter of what I had time to do and when I could do it. The priority was to get the system working, and make sure all the smaller programs did their job and played nicely together. Now that everything works, I can spare time to write code that will go through and easily distinguish which tables, when they are created, will need indexing. > >... So now that I can start using keys, I have a few > > questions: > > Keys make finding records much faster. You should probably have created a > few long before now. > > > 1) I've been using "SELECT FirstName, LastName, Birthdate FROM > > division.People > > WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = > > '$bday'" > > > to check for preexisting records. This means before I inserted a > > record, I > > > would select on specific fields and see if they matched the fields of > > the > > > current record. If they did, I threw out the record I was going to > > enter, if > > > there was not a match, I'd INSERT the new record. If I use "INSERT > > IGNORE...", MySQL will still have to verify that the new record does not > > > > match any old records. How much faster is it to do it that way than > > the way > > > I was? I'd think the same routines to find matching data would be used. > > No, the same routine will not be used. > > A hash of the values of the columns that participate in each PRIMARY KEY > or UNIQUE KEY will be computed for each row (and stored as part of the > key's index structure) and also for each new row as it is being INSERTed. > If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an > error will be thrown by the server. An INSERT with the IGNORE modifier > will ignore that particular error (basically skipping that row) and > continue inserting records. Okay -- great. That also answers my last question -- I needed to be sure IGNORE did what I thought it did. > > 2) Right now I'm creating an archival database to store older records > > in. In > > > one of these archives, there are over 250,000 records. At this point, > > by > > > doing things the "old way" (checking for a match, then inserting), it is > > now > > > able to insert about 750 records in 10 minutes. Earlier, when it was > > only > > > about 180,000 records, it was inserting at about 1,000 records in > > 10minutes. > > > So, first, how much of a speed up can I count on if I use "INSERT > > IGNORE" > > > instead,... > > none. The speed-up will come from the combination of INSERT IGNORE and the > UNIQUE or PRIMARY key(s) defined on the target table. It will not come > from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys > on that table will cut your processing time considerably. You should be > able to insert several hundred records every second (1000s per minute). > Your total lack of all indexes has absolutely killed your performance. So, unless I misunderstand, adding both keys and IGNORE will speed things up by a factor of 100 to 1000 or more. That is a huge relief. (Again, the first step was developing the system and making sure it works, so now I'm speeding it up and adding other useful features, like keys. > >... and second, if I use "INSERT IGNORE DELAYED", will the program > > finish faster, leaving MySQL to catch up with the queued INSERT > > statements as > > > it can? > > The DELAYED modifier asks the server to buffer your INSERTs so that they > can be interleaved with any active read requests allowing your client to > believe it has finished inserting records much sooner than if it had > waited on all of those inserts to actually happen. It should not be > necessary to use DELAYED under most circumstances. You will need to > benchmark both techniques to determine which one works best for your > situation. Which leads to another question: If my program things the data is inserted, and it is delayed, is the queue of DELAYed transactions kept anywhere -- so if MySQL or the system crashes, none of the DELAYed transactions are lost? > > 3) While this has been stated on the mysql.com, it is not what I was > > originally taught by a friend and some books, so I want to be clear I > > understand this correctly before I depend on it. As I understand it, > > "INSERT > > > IGNORE..." compares the data being inserted with the keys of all records > > in > > > the table and will not insert it if it duplicates an existing > > multi-column > > > key. ... > > Then you understand incorrectly. Only two types of keys have any affect on > INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on > uniqueness. > > If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with > INDEX) defined on a table, then all rows within that table must be > different from all other rows in the same table for those columns or > combination of columns you used to define the key. UNIQUE and PRIMARY KEYs > are just like regular indexes except they have a little extra muscle: they > define for the database what to look for in order to reject duplicate > records (that is known as a "uniqueness constraint"). Normal (regular) > indexes (keys) speed up queries. UNIQUE and PRIMARY KEYs do that, too, but > also protect your data from duplication. > > >... The IGNORE basically tells MySQL to not generate an error message if > > the > > > data I'm inserting is a duplicate, so I can use "INSERT IGNORE..." in a > > Perl > > > program to be sure I'm not duplicating records and not getting > > errormessages > > > on it if the data is a duplicate. Is this correct? > > No, the INSERT IGNORE statement does not enforce the uniqueness of a new > record, your UNIQUE and PRIMARY KEYs do that. Using the IGNORE modifier > tells the server to simply ignore the error raised by violating a > uniqueness constraint. I worded it wrong, so I was not clear about what I was thinking (coding 10-18 (and then debugging) hours a day since last February is catching up with me. You did, though, answer the question and give me what I needed -- in spite of my poorly worded question So it sounds like adding the keys and using IGNORE on my INSERTs will make a huge difference -- as well as simplifying my code. Thank you! This helps a LOT! Hal -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]