Re: Questions on INSERT IGNORE

Hal Vaughan Tue, 01 Nov 2005 13:15:44 -0800

On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote:

> Which tables can have duplicate records in them should be something
> decided BEFORE you begin to populate the tables. 99.999% of the time, each
> row of any one table should be different from every other row on the same
> table. That difference may exist as a combination of values or as a single
> value but it is NEARLY ALWAYS in your best interest to have no two rows
> identical.  It's not too late to make this decision. However, by putting
> it off for so long, implementing your uniqueness rules may be much more
> difficult.


That's a problem with being self-taught -- there are many things like that one 
can miss.  I was not aware any indexing sped things up.  As for the design -- 
I always knew which tables required unique values, but it was a matter of 
what I had time to do and when I could do it.  The priority was to get the 
system working, and make sure all the smaller programs did their job and 
played nicely together.  Now that everything works, I can spare time to write 
code that will go through and easily distinguish which tables, when they are 
created, will need indexing.

> >... So now that I can start using keys, I have a few
> > questions:
>
> Keys make finding records much faster. You should probably have created a
> few long before now.
>
> > 1) I've been using "SELECT FirstName, LastName, Birthdate FROM
> > division.People
> > WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate =
>
> '$bday'"
>
> > to check for preexisting records.  This means before I inserted a
>
> record, I
>
> > would select on specific fields and see if they matched the fields of
>
> the
>
> > current record.  If they did, I threw out the record I was going to
>
> enter, if
>
> > there was not a match, I'd INSERT the new record.  If I use "INSERT
> > IGNORE...", MySQL will still have to verify that the new record does not
> >
> > match any old records.  How much faster is it  to do it that way than
>
> the way
>
> > I was?  I'd think the same routines to find matching data would be used.
>
> No, the same routine will not be used.
>
> A hash of the values of the columns that participate in each PRIMARY KEY
> or UNIQUE KEY will be computed for each row (and stored as part of the
> key's index structure) and also for each new row as it is being INSERTed.
> If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an
> error will be thrown by the server. An INSERT with the IGNORE modifier
> will ignore that particular error (basically skipping that row) and
> continue inserting records.

Okay -- great.  That also answers my last question -- I needed to be sure 
IGNORE did what I thought it did.

> > 2) Right now I'm creating an archival database to store older records
>
> in.  In
>
> > one of these archives, there are over 250,000 records.   At this point,
>
> by
>
> > doing things the "old way" (checking for a match, then inserting), it is
>
> now
>
> > able to insert about 750 records in 10 minutes.  Earlier, when it was
>
> only
>
> > about 180,000 records, it was inserting at about 1,000 records in
>
> 10minutes.
>
> > So, first, how much of a speed up can I count on if I use "INSERT
>
> IGNORE"
>
> > instead,...
>
> none. The speed-up will come from the combination of INSERT IGNORE and the
> UNIQUE or PRIMARY key(s) defined on the target table. It will not come
> from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys
> on that table will cut your processing time considerably. You should be
> able to insert several hundred records every second (1000s per minute).
> Your total lack of all indexes has absolutely killed your performance.

So, unless I misunderstand, adding both keys and IGNORE will speed things up 
by a factor of 100 to 1000 or more.  That is a huge relief.  (Again, the 
first step was developing the system and making sure it works, so now I'm 
speeding it up and adding other useful features, like keys.

> >... and second, if I use "INSERT IGNORE DELAYED", will the program
> > finish faster, leaving MySQL to catch up with the queued INSERT
>
> statements as
>
> > it can?
>
> The DELAYED modifier asks the server to buffer your INSERTs so that they
> can be interleaved with any active read requests allowing your client to
> believe it has finished inserting records much sooner than if it had
> waited on all of those inserts to actually happen. It should not be
> necessary to use DELAYED under most circumstances. You will need to
> benchmark both techniques to determine which one works best for your
> situation.

Which leads to another question: If my program things the data is inserted, 
and it is delayed, is the queue of DELAYed transactions kept anywhere -- so 
if MySQL or the system crashes, none of the DELAYed transactions are lost?

> > 3) While this has been stated on the mysql.com, it is not what I was
> > originally taught by a friend and some books, so I want to be clear I
> > understand this correctly before I depend on it.  As I understand it,
>
> "INSERT
>
> > IGNORE..." compares the data being inserted with the keys of all records
>
> in
>
> > the table and will not insert it if it duplicates an existing
>
> multi-column
>
> > key. ...
>
> Then you understand incorrectly. Only two types of keys have any affect on
> INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on
> uniqueness.
>
> If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with
> INDEX) defined on a table, then all rows within that table must be
> different from all other rows in the same table for those columns or
> combination of columns you used to define the key. UNIQUE and PRIMARY KEYs
> are just like regular indexes except they have a little extra muscle: they
> define for the database what to look for in order to reject duplicate
> records (that is known as a "uniqueness constraint"). Normal (regular)
> indexes (keys) speed up queries. UNIQUE and PRIMARY KEYs do that, too, but
> also protect your data from duplication.
>
> >...  The IGNORE basically tells MySQL to not generate an error message if
>
> the
>
> > data I'm inserting is a duplicate, so I can use "INSERT IGNORE..." in a
>
> Perl
>
> > program to be sure I'm not duplicating records and not getting
>
> errormessages
>
> > on it if the data is a duplicate.  Is this correct?
>
> No, the INSERT IGNORE statement does not enforce the uniqueness of a new
> record, your UNIQUE and PRIMARY KEYs do that. Using the IGNORE modifier
> tells the server to simply ignore the error raised by violating a
> uniqueness constraint.

I worded it wrong, so I was not clear about what I was thinking (coding 10-18 
(and then debugging) hours a day since last February is catching up with me.  
You did, though, answer the question and give me what I needed -- in spite of 
my poorly worded question

So it sounds like adding the keys and using IGNORE on my INSERTs will make a 
huge difference -- as well as simplifying my code.

Thank you!  This helps a LOT!

Hal

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Questions on INSERT IGNORE

Reply via email to