Questions on INSERT IGNORE
I want to be sure I understand INSERT IGNORE... correctly before I start depending on it. Up until now, I have not been using any kind of key or unique index, since many of my tables are created automatically and, until now, it has been difficult for me to create a way to distinguish between the tables that would have multiple matching records and the tables that need to have only unique values. So now that I can start using keys, I have a few questions: 1) I've been using SELECT FirstName, LastName, Birthdate FROM division.People WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = '$bday' to check for preexisting records. This means before I inserted a record, I would select on specific fields and see if they matched the fields of the current record. If they did, I threw out the record I was going to enter, if there was not a match, I'd INSERT the new record. If I use INSERT IGNORE..., MySQL will still have to verify that the new record does not match any old records. How much faster is it to do it that way than the way I was? I'd think the same routines to find matching data would be used. 2) Right now I'm creating an archival database to store older records in. In one of these archives, there are over 250,000 records. At this point, by doing things the old way (checking for a match, then inserting), it is now able to insert about 750 records in 10 minutes. Earlier, when it was only about 180,000 records, it was inserting at about 1,000 records in 10 minutes. So, first, how much of a speed up can I count on if I use INSERT IGNORE instead, and second, if I use INSERT IGNORE DELAYED, will the program finish faster, leaving MySQL to catch up with the queued INSERT statements as it can? 3) While this has been stated on the mysql.com, it is not what I was originally taught by a friend and some books, so I want to be clear I understand this correctly before I depend on it. As I understand it, INSERT IGNORE... compares the data being inserted with the keys of all records in the table and will not insert it if it duplicates an existing multi-column key. The IGNORE basically tells MyQL to not generate an error message if the data I'm inserting is a duplicate, so I can use INSERT IGNORE... in a Perl program to be sure I'm not duplicating records and not getting error messages on it if the data is a duplicate. Is this correct? Thanks! Hal -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Questions on INSERT IGNORE
Comments embedded. See below Hal Vaughan [EMAIL PROTECTED] wrote on 11/01/2005 02:50:13 PM: I want to be sure I understand INSERT IGNORE... correctly before I start depending on it. Up until now, I have not been using any kind of key or unique index, since many of my tables are created automatically and, until now, it has been difficult for me to create a way to distinguish between the tables that would have multiple matching records and the tables that need to have only unique values. ... Which tables can have duplicate records in them should be something decided BEFORE you begin to populate the tables. 99.999% of the time, each row of any one table should be different from every other row on the same table. That difference may exist as a combination of values or as a single value but it is NEARLY ALWAYS in your best interest to have no two rows identical. It's not too late to make this decision. However, by putting it off for so long, implementing your uniqueness rules may be much more difficult. ... So now that I can start using keys, I have a few questions: Keys make finding records much faster. You should probably have created a few long before now. 1) I've been using SELECT FirstName, LastName, Birthdate FROM division.People WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = '$bday' to check for preexisting records. This means before I inserted a record, I would select on specific fields and see if they matched the fields of the current record. If they did, I threw out the record I was going to enter, if there was not a match, I'd INSERT the new record. If I use INSERT IGNORE..., MySQL will still have to verify that the new record does not match any old records. How much faster is it to do it that way than the way I was? I'd think the same routines to find matching data would be used. No, the same routine will not be used. A hash of the values of the columns that participate in each PRIMARY KEY or UNIQUE KEY will be computed for each row (and stored as part of the key's index structure) and also for each new row as it is being INSERTed. If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an error will be thrown by the server. An INSERT with the IGNORE modifier will ignore that particular error (basically skipping that row) and continue inserting records. 2) Right now I'm creating an archival database to store older records in. In one of these archives, there are over 250,000 records. At this point, by doing things the old way (checking for a match, then inserting), it is now able to insert about 750 records in 10 minutes. Earlier, when it was only about 180,000 records, it was inserting at about 1,000 records in 10minutes. So, first, how much of a speed up can I count on if I use INSERT IGNORE instead,... none. The speed-up will come from the combination of INSERT IGNORE and the UNIQUE or PRIMARY key(s) defined on the target table. It will not come from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys on that table will cut your processing time considerably. You should be able to insert several hundred records every second (1000s per minute). Your total lack of all indexes has absolutely killed your performance. ... and second, if I use INSERT IGNORE DELAYED, will the program finish faster, leaving MySQL to catch up with the queued INSERT statements as it can? The DELAYED modifier asks the server to buffer your INSERTs so that they can be interleaved with any active read requests allowing your client to believe it has finished inserting records much sooner than if it had waited on all of those inserts to actually happen. It should not be necessary to use DELAYED under most circumstances. You will need to benchmark both techniques to determine which one works best for your situation. 3) While this has been stated on the mysql.com, it is not what I was originally taught by a friend and some books, so I want to be clear I understand this correctly before I depend on it. As I understand it, INSERT IGNORE... compares the data being inserted with the keys of all records in the table and will not insert it if it duplicates an existing multi-column key. ... Then you understand incorrectly. Only two types of keys have any affect on INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on uniqueness. If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with INDEX) defined on a table, then all rows within that table must be different from all other rows in the same table for those columns or combination of columns you used to define the key. UNIQUE and PRIMARY KEYs are just like regular indexes except they have a little extra muscle: they define for the database what to look for in order to reject duplicate records (that is known as a uniqueness constraint). Normal (regular)
Re: Questions on INSERT IGNORE
On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote: Which tables can have duplicate records in them should be something decided BEFORE you begin to populate the tables. 99.999% of the time, each row of any one table should be different from every other row on the same table. That difference may exist as a combination of values or as a single value but it is NEARLY ALWAYS in your best interest to have no two rows identical. It's not too late to make this decision. However, by putting it off for so long, implementing your uniqueness rules may be much more difficult. That's a problem with being self-taught -- there are many things like that one can miss. I was not aware any indexing sped things up. As for the design -- I always knew which tables required unique values, but it was a matter of what I had time to do and when I could do it. The priority was to get the system working, and make sure all the smaller programs did their job and played nicely together. Now that everything works, I can spare time to write code that will go through and easily distinguish which tables, when they are created, will need indexing. ... So now that I can start using keys, I have a few questions: Keys make finding records much faster. You should probably have created a few long before now. 1) I've been using SELECT FirstName, LastName, Birthdate FROM division.People WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = '$bday' to check for preexisting records. This means before I inserted a record, I would select on specific fields and see if they matched the fields of the current record. If they did, I threw out the record I was going to enter, if there was not a match, I'd INSERT the new record. If I use INSERT IGNORE..., MySQL will still have to verify that the new record does not match any old records. How much faster is it to do it that way than the way I was? I'd think the same routines to find matching data would be used. No, the same routine will not be used. A hash of the values of the columns that participate in each PRIMARY KEY or UNIQUE KEY will be computed for each row (and stored as part of the key's index structure) and also for each new row as it is being INSERTed. If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an error will be thrown by the server. An INSERT with the IGNORE modifier will ignore that particular error (basically skipping that row) and continue inserting records. Okay -- great. That also answers my last question -- I needed to be sure IGNORE did what I thought it did. 2) Right now I'm creating an archival database to store older records in. In one of these archives, there are over 250,000 records. At this point, by doing things the old way (checking for a match, then inserting), it is now able to insert about 750 records in 10 minutes. Earlier, when it was only about 180,000 records, it was inserting at about 1,000 records in 10minutes. So, first, how much of a speed up can I count on if I use INSERT IGNORE instead,... none. The speed-up will come from the combination of INSERT IGNORE and the UNIQUE or PRIMARY key(s) defined on the target table. It will not come from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys on that table will cut your processing time considerably. You should be able to insert several hundred records every second (1000s per minute). Your total lack of all indexes has absolutely killed your performance. So, unless I misunderstand, adding both keys and IGNORE will speed things up by a factor of 100 to 1000 or more. That is a huge relief. (Again, the first step was developing the system and making sure it works, so now I'm speeding it up and adding other useful features, like keys. ... and second, if I use INSERT IGNORE DELAYED, will the program finish faster, leaving MySQL to catch up with the queued INSERT statements as it can? The DELAYED modifier asks the server to buffer your INSERTs so that they can be interleaved with any active read requests allowing your client to believe it has finished inserting records much sooner than if it had waited on all of those inserts to actually happen. It should not be necessary to use DELAYED under most circumstances. You will need to benchmark both techniques to determine which one works best for your situation. Which leads to another question: If my program things the data is inserted, and it is delayed, is the queue of DELAYed transactions kept anywhere -- so if MySQL or the system crashes, none of the DELAYed transactions are lost? 3) While this has been stated on the mysql.com, it is not what I was originally taught by a friend and some books, so I want to be clear I understand this correctly before I depend on it. As I understand it, INSERT IGNORE... compares the data being inserted with the keys of
Re: Questions on INSERT IGNORE
Follow up at bottom: On Tuesday 01 November 2005 04:15 pm, Hal Vaughan wrote: On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote: Which tables can have duplicate records in them should be something decided BEFORE you begin to populate the tables. 99.999% of the time, each row of any one table should be different from every other row on the same table. That difference may exist as a combination of values or as a single value but it is NEARLY ALWAYS in your best interest to have no two rows identical. It's not too late to make this decision. However, by putting it off for so long, implementing your uniqueness rules may be much more difficult. That's a problem with being self-taught -- there are many things like that one can miss. I was not aware any indexing sped things up. As for the design -- I always knew which tables required unique values, but it was a matter of what I had time to do and when I could do it. The priority was to get the system working, and make sure all the smaller programs did their job and played nicely together. Now that everything works, I can spare time to write code that will go through and easily distinguish which tables, when they are created, will need indexing. ... So now that I can start using keys, I have a few questions: Keys make finding records much faster. You should probably have created a few long before now. 1) I've been using SELECT FirstName, LastName, Birthdate FROM division.People WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = '$bday' to check for preexisting records. This means before I inserted a record, I would select on specific fields and see if they matched the fields of the current record. If they did, I threw out the record I was going to enter, if there was not a match, I'd INSERT the new record. If I use INSERT IGNORE..., MySQL will still have to verify that the new record does not match any old records. How much faster is it to do it that way than the way I was? I'd think the same routines to find matching data would be used. No, the same routine will not be used. A hash of the values of the columns that participate in each PRIMARY KEY or UNIQUE KEY will be computed for each row (and stored as part of the key's index structure) and also for each new row as it is being INSERTed. If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an error will be thrown by the server. An INSERT with the IGNORE modifier will ignore that particular error (basically skipping that row) and continue inserting records. Okay -- great. That also answers my last question -- I needed to be sure IGNORE did what I thought it did. 2) Right now I'm creating an archival database to store older records in. In one of these archives, there are over 250,000 records. At this point, by doing things the old way (checking for a match, then inserting), it is now able to insert about 750 records in 10 minutes. Earlier, when it was only about 180,000 records, it was inserting at about 1,000 records in 10minutes. So, first, how much of a speed up can I count on if I use INSERT IGNORE instead,... none. The speed-up will come from the combination of INSERT IGNORE and the UNIQUE or PRIMARY key(s) defined on the target table. It will not come from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys on that table will cut your processing time considerably. You should be able to insert several hundred records every second (1000s per minute). Your total lack of all indexes has absolutely killed your performance. So, unless I misunderstand, adding both keys and IGNORE will speed things up by a factor of 100 to 1000 or more. That is a huge relief. (Again, the first step was developing the system and making sure it works, so now I'm speeding it up and adding other useful features, like keys. ... and second, if I use INSERT IGNORE DELAYED, will the program finish faster, leaving MySQL to catch up with the queued INSERT statements as it can? The DELAYED modifier asks the server to buffer your INSERTs so that they can be interleaved with any active read requests allowing your client to believe it has finished inserting records much sooner than if it had waited on all of those inserts to actually happen. It should not be necessary to use DELAYED under most circumstances. You will need to benchmark both techniques to determine which one works best for your situation. Which leads to another question: If my program things the data is inserted, and it is delayed, is the queue of DELAYed transactions kept anywhere -- so if MySQL or the system crashes, none of the DELAYed transactions are lost? 3) While this has been stated on the mysql.com, it is not what I was originally taught by a