Questions on INSERT IGNORE

2005-11-01 Thread Hal Vaughan
I want to be sure I understand INSERT IGNORE... correctly before I start 
depending on it.  Up until now, I have not been using any kind of key or 
unique index, since many of my tables are created automatically and, until 
now, it has been difficult for me to create a way to distinguish between the 
tables that would have multiple matching records and the tables that need to 
have only unique values.  So now that I can start using keys, I have a few 
questions:

1) I've been using SELECT FirstName, LastName, Birthdate FROM division.People 
WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = '$bday' 
to check for preexisting records.  This means before I inserted a record, I 
would select on specific fields and see if they matched the fields of the 
current record.  If they did, I threw out the record I was going to enter, if 
there was not a match, I'd INSERT the new record.  If I use INSERT 
IGNORE..., MySQL will still have to verify that the new record does not 
match any old records.  How much faster is it  to do it that way than the way 
I was?  I'd think the same routines to find matching data would be used.

2) Right now I'm creating an archival database to store older records in.  In 
one of these archives, there are over 250,000 records.   At this point, by 
doing things the old way (checking for a match, then inserting), it is now 
able to insert about 750 records in 10 minutes.  Earlier, when it was only 
about 180,000 records, it was inserting at about 1,000 records in 10 minutes.  
So, first, how much of a speed up can I count on if I use INSERT IGNORE 
instead, and second, if I use INSERT IGNORE DELAYED, will the program 
finish faster, leaving MySQL to catch up with the queued INSERT statements as 
it can?

3) While this has been stated on the mysql.com, it is not what I was 
originally taught by a friend and some books, so I want to be clear I 
understand this correctly before I depend on it.  As I understand it, INSERT 
IGNORE... compares the data being inserted with the keys of all records in 
the table and will not insert it if it duplicates an existing multi-column 
key.  The IGNORE basically tells MyQL to not generate an error message if the 
data I'm inserting is a duplicate, so I can use INSERT IGNORE... in a Perl 
program to be sure I'm not duplicating records and not getting error messages 
on it if the data is a duplicate.  Is this correct?

Thanks!

Hal

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Questions on INSERT IGNORE

2005-11-01 Thread SGreen
Comments embedded. See below

Hal Vaughan [EMAIL PROTECTED] wrote on 11/01/2005 02:50:13 PM:

 I want to be sure I understand INSERT IGNORE... correctly before I 
start 
 depending on it.  Up until now, I have not been using any kind of key or 

 unique index, since many of my tables are created automatically and, 
until 
 now, it has been difficult for me to create a way to distinguish between 
the 
 tables that would have multiple matching records and the tables that 
need to 
 have only unique values. ... 

Which tables can have duplicate records in them should be something 
decided BEFORE you begin to populate the tables. 99.999% of the time, each 
row of any one table should be different from every other row on the same 
table. That difference may exist as a combination of values or as a single 
value but it is NEARLY ALWAYS in your best interest to have no two rows 
identical.  It's not too late to make this decision. However, by putting 
it off for so long, implementing your uniqueness rules may be much more 
difficult.


... So now that I can start using keys, I have a few 
 questions:
 

Keys make finding records much faster. You should probably have created a 
few long before now.


 1) I've been using SELECT FirstName, LastName, Birthdate FROM 
 division.People 
 WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate = 
'$bday' 
 to check for preexisting records.  This means before I inserted a 
record, I 
 would select on specific fields and see if they matched the fields of 
the 
 current record.  If they did, I threw out the record I was going to 
enter, if 
 there was not a match, I'd INSERT the new record.  If I use INSERT 
 IGNORE..., MySQL will still have to verify that the new record does not 

 match any old records.  How much faster is it  to do it that way than 
the way 
 I was?  I'd think the same routines to find matching data would be used.

No, the same routine will not be used. 

A hash of the values of the columns that participate in each PRIMARY KEY 
or UNIQUE KEY will be computed for each row (and stored as part of the 
key's index structure) and also for each new row as it is being INSERTed. 
If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an 
error will be thrown by the server. An INSERT with the IGNORE modifier 
will ignore that particular error (basically skipping that row) and 
continue inserting records.

 
 2) Right now I'm creating an archival database to store older records 
in.  In 
 one of these archives, there are over 250,000 records.   At this point, 
by 
 doing things the old way (checking for a match, then inserting), it is 
now 
 able to insert about 750 records in 10 minutes.  Earlier, when it was 
only 
 about 180,000 records, it was inserting at about 1,000 records in 
10minutes. 
 So, first, how much of a speed up can I count on if I use INSERT 
IGNORE 
 instead,... 

none. The speed-up will come from the combination of INSERT IGNORE and the 
UNIQUE or PRIMARY key(s) defined on the target table. It will not come 
from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys 
on that table will cut your processing time considerably. You should be 
able to insert several hundred records every second (1000s per minute). 
Your total lack of all indexes has absolutely killed your performance.

... and second, if I use INSERT IGNORE DELAYED, will the program 
 finish faster, leaving MySQL to catch up with the queued INSERT 
statements as 
 it can?

The DELAYED modifier asks the server to buffer your INSERTs so that they 
can be interleaved with any active read requests allowing your client to 
believe it has finished inserting records much sooner than if it had 
waited on all of those inserts to actually happen. It should not be 
necessary to use DELAYED under most circumstances. You will need to 
benchmark both techniques to determine which one works best for your 
situation.

 
 3) While this has been stated on the mysql.com, it is not what I was 
 originally taught by a friend and some books, so I want to be clear I 
 understand this correctly before I depend on it.  As I understand it, 
INSERT 
 IGNORE... compares the data being inserted with the keys of all records 
in 
 the table and will not insert it if it duplicates an existing 
multi-column 
 key. ... 

Then you understand incorrectly. Only two types of keys have any affect on 
INSERT IGNORE: UNIQUE and PRIMARY. Regular indexes have no effect on 
uniqueness.

If you have a PRIMARY KEY or a UNIQUE INDEX (KEY is interchangeable with 
INDEX) defined on a table, then all rows within that table must be 
different from all other rows in the same table for those columns or 
combination of columns you used to define the key. UNIQUE and PRIMARY KEYs 
are just like regular indexes except they have a little extra muscle: they 
define for the database what to look for in order to reject duplicate 
records (that is known as a uniqueness constraint). Normal (regular) 

Re: Questions on INSERT IGNORE

2005-11-01 Thread Hal Vaughan
On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote:

 Which tables can have duplicate records in them should be something
 decided BEFORE you begin to populate the tables. 99.999% of the time, each
 row of any one table should be different from every other row on the same
 table. That difference may exist as a combination of values or as a single
 value but it is NEARLY ALWAYS in your best interest to have no two rows
 identical.  It's not too late to make this decision. However, by putting
 it off for so long, implementing your uniqueness rules may be much more
 difficult.

That's a problem with being self-taught -- there are many things like that one 
can miss.  I was not aware any indexing sped things up.  As for the design -- 
I always knew which tables required unique values, but it was a matter of 
what I had time to do and when I could do it.  The priority was to get the 
system working, and make sure all the smaller programs did their job and 
played nicely together.  Now that everything works, I can spare time to write 
code that will go through and easily distinguish which tables, when they are 
created, will need indexing.

 ... So now that I can start using keys, I have a few
  questions:

 Keys make finding records much faster. You should probably have created a
 few long before now.

  1) I've been using SELECT FirstName, LastName, Birthdate FROM
  division.People
  WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate =

 '$bday'

  to check for preexisting records.  This means before I inserted a

 record, I

  would select on specific fields and see if they matched the fields of

 the

  current record.  If they did, I threw out the record I was going to

 enter, if

  there was not a match, I'd INSERT the new record.  If I use INSERT
  IGNORE..., MySQL will still have to verify that the new record does not
 
  match any old records.  How much faster is it  to do it that way than

 the way

  I was?  I'd think the same routines to find matching data would be used.

 No, the same routine will not be used.

 A hash of the values of the columns that participate in each PRIMARY KEY
 or UNIQUE KEY will be computed for each row (and stored as part of the
 key's index structure) and also for each new row as it is being INSERTed.
 If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an
 error will be thrown by the server. An INSERT with the IGNORE modifier
 will ignore that particular error (basically skipping that row) and
 continue inserting records.

Okay -- great.  That also answers my last question -- I needed to be sure 
IGNORE did what I thought it did.

  2) Right now I'm creating an archival database to store older records

 in.  In

  one of these archives, there are over 250,000 records.   At this point,

 by

  doing things the old way (checking for a match, then inserting), it is

 now

  able to insert about 750 records in 10 minutes.  Earlier, when it was

 only

  about 180,000 records, it was inserting at about 1,000 records in

 10minutes.

  So, first, how much of a speed up can I count on if I use INSERT

 IGNORE

  instead,...

 none. The speed-up will come from the combination of INSERT IGNORE and the
 UNIQUE or PRIMARY key(s) defined on the target table. It will not come
 from simply changing INSERT to INSERT IGNORE. In fact, defining ANY keys
 on that table will cut your processing time considerably. You should be
 able to insert several hundred records every second (1000s per minute).
 Your total lack of all indexes has absolutely killed your performance.

So, unless I misunderstand, adding both keys and IGNORE will speed things up 
by a factor of 100 to 1000 or more.  That is a huge relief.  (Again, the 
first step was developing the system and making sure it works, so now I'm 
speeding it up and adding other useful features, like keys.

 ... and second, if I use INSERT IGNORE DELAYED, will the program
  finish faster, leaving MySQL to catch up with the queued INSERT

 statements as

  it can?

 The DELAYED modifier asks the server to buffer your INSERTs so that they
 can be interleaved with any active read requests allowing your client to
 believe it has finished inserting records much sooner than if it had
 waited on all of those inserts to actually happen. It should not be
 necessary to use DELAYED under most circumstances. You will need to
 benchmark both techniques to determine which one works best for your
 situation.

Which leads to another question: If my program things the data is inserted, 
and it is delayed, is the queue of DELAYed transactions kept anywhere -- so 
if MySQL or the system crashes, none of the DELAYed transactions are lost?

  3) While this has been stated on the mysql.com, it is not what I was
  originally taught by a friend and some books, so I want to be clear I
  understand this correctly before I depend on it.  As I understand it,

 INSERT

  IGNORE... compares the data being inserted with the keys of 

Re: Questions on INSERT IGNORE

2005-11-01 Thread Hal Vaughan
Follow up at bottom:

On Tuesday 01 November 2005 04:15 pm, Hal Vaughan wrote:
 On Tuesday 01 November 2005 03:35 pm, [EMAIL PROTECTED] wrote:
  Which tables can have duplicate records in them should be something
  decided BEFORE you begin to populate the tables. 99.999% of the time,
  each row of any one table should be different from every other row on the
  same table. That difference may exist as a combination of values or as a
  single value but it is NEARLY ALWAYS in your best interest to have no two
  rows identical.  It's not too late to make this decision. However, by
  putting it off for so long, implementing your uniqueness rules may be
  much more difficult.

 That's a problem with being self-taught -- there are many things like that
 one can miss.  I was not aware any indexing sped things up.  As for the
 design -- I always knew which tables required unique values, but it was a
 matter of what I had time to do and when I could do it.  The priority was
 to get the system working, and make sure all the smaller programs did their
 job and played nicely together.  Now that everything works, I can spare
 time to write code that will go through and easily distinguish which
 tables, when they are created, will need indexing.

  ... So now that I can start using keys, I have a few
   questions:
 
  Keys make finding records much faster. You should probably have created a
  few long before now.
 
   1) I've been using SELECT FirstName, LastName, Birthdate FROM
   division.People
   WHERE FirstName = '$fname' AND LastName = '$lname' AND Birthdate =
 
  '$bday'
 
   to check for preexisting records.  This means before I inserted a
 
  record, I
 
   would select on specific fields and see if they matched the fields of
 
  the
 
   current record.  If they did, I threw out the record I was going to
 
  enter, if
 
   there was not a match, I'd INSERT the new record.  If I use INSERT
   IGNORE..., MySQL will still have to verify that the new record does
   not
  
   match any old records.  How much faster is it  to do it that way than
 
  the way
 
   I was?  I'd think the same routines to find matching data would be
   used.
 
  No, the same routine will not be used.
 
  A hash of the values of the columns that participate in each PRIMARY KEY
  or UNIQUE KEY will be computed for each row (and stored as part of the
  key's index structure) and also for each new row as it is being INSERTed.
  If the new row's hash matches the hash of any UNIQUE or PRIMARY KEY an
  error will be thrown by the server. An INSERT with the IGNORE modifier
  will ignore that particular error (basically skipping that row) and
  continue inserting records.

 Okay -- great.  That also answers my last question -- I needed to be sure
 IGNORE did what I thought it did.

   2) Right now I'm creating an archival database to store older records
 
  in.  In
 
   one of these archives, there are over 250,000 records.   At this point,
 
  by
 
   doing things the old way (checking for a match, then inserting), it
   is
 
  now
 
   able to insert about 750 records in 10 minutes.  Earlier, when it was
 
  only
 
   about 180,000 records, it was inserting at about 1,000 records in
 
  10minutes.
 
   So, first, how much of a speed up can I count on if I use INSERT
 
  IGNORE
 
   instead,...
 
  none. The speed-up will come from the combination of INSERT IGNORE and
  the UNIQUE or PRIMARY key(s) defined on the target table. It will not
  come from simply changing INSERT to INSERT IGNORE. In fact, defining ANY
  keys on that table will cut your processing time considerably. You should
  be able to insert several hundred records every second (1000s per
  minute). Your total lack of all indexes has absolutely killed your
  performance.

 So, unless I misunderstand, adding both keys and IGNORE will speed things
 up by a factor of 100 to 1000 or more.  That is a huge relief.  (Again, the
 first step was developing the system and making sure it works, so now I'm
 speeding it up and adding other useful features, like keys.

  ... and second, if I use INSERT IGNORE DELAYED, will the program
   finish faster, leaving MySQL to catch up with the queued INSERT
 
  statements as
 
   it can?
 
  The DELAYED modifier asks the server to buffer your INSERTs so that they
  can be interleaved with any active read requests allowing your client to
  believe it has finished inserting records much sooner than if it had
  waited on all of those inserts to actually happen. It should not be
  necessary to use DELAYED under most circumstances. You will need to
  benchmark both techniques to determine which one works best for your
  situation.

 Which leads to another question: If my program things the data is inserted,
 and it is delayed, is the queue of DELAYed transactions kept anywhere -- so
 if MySQL or the system crashes, none of the DELAYed transactions are lost?

   3) While this has been stated on the mysql.com, it is not what I was
   originally taught by a