Re: Question about INSERT vs UPDATE

2003-06-15 Thread Becoming Digital
Consider going over Section 5.2.9 of the manual.
http://www.mysql.com/doc/en/Insert_speed.html

Edward Dudlik
Becoming Digital
www.becomingdigital.com


- Original Message -
From: Shane Bryldt [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, 15 June, 2003 01:12
Subject: Question about INSERT vs UPDATE


Hello,

I am new to the list, so pardon me if I am on the wrong one posting my
question.  If so, please direct me to the right place.

My question is in regards to the efficiency of INSERT statements. I have
gone over some of the optimizations, and on a machine running the client and
MySQL 4 server, DELETEing a table and repopulating it with ~48k records via
INSERT, I have come to the conclusion that, regardless of hardware, the delay
will be too significant (on a test machine it ran about 25 seconds).
What I am curious to know, is whether there is a significant increase if I
switch my method of saving from a complete memory dump, to a partial memory
dump.  The overhaul involved would require a lot of code restructuring.  The
situation is this, approximately 10k of those 48k records are actually modified
regularily.  If I overhaul the code to indicate when a record needs to be
updated, or inserted, and only call the appropriate action, is the performance
going to be significantly better?
Keeping in mind it would have to search the 48k records to UPDATE the 10k
modified records, as well as potentially INSERT new records (very few if any).
With 48k records, is updating 10k records faster than simply deleting and
reinserting every record?

Alternatively, is there any way speeding up the INSERT time can be achieved?
First, I am using the default format, I believe is MyISAM.  Second, I have used
table locking to optimize writing before the table is deleted and repopulated,
and unlocked after all records are inserted.  Third, I am using multiple INSERT
lists (of 1000 records inserted at a time).  Whether the lists is with 100, or
1000, I get about the same results of 25 seconds.  I realize the hardware I am
using is not significantly powerful, but I think there is some optimization I
could make.  Can you preallocate a definite number of rows for a large insert
operation?

Any assistance would be appreciated, this project is flexible towards new ideas
to make the dumping more efficient.  The alternative has been considered to use
a method of UPDATEing records immediately when changes are made in memory.
Transaction overhead could become an issue however, with 10k+ records actively
changing at any given time.  With an UPDATE method, dumping frequently may
actually reduce the time required for the process, since it would update fewer
records.

Anyone with some insight on this would be much appreciated if they could offer
some ways to speed up the process.

Thanks,
-Shane


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Question about INSERT vs UPDATE

2003-06-15 Thread Shane Bryldt
As my post suggested, I have already addressed the tweaks this this
section of the manual addresses, and was hoping there might be some insight
on my original question, the process of INSERT vs UPDATE.  That chapter was
helpful initially, but I have already addressed most of what that chapter
has to offer, however it still does not address my question.  I have read
that chapter thoroughly.
I was hoping someone with some experience might offer something more
than pointing a finger to some documentation.
The documentation does not address UPDATE efficiency nor which operation
takes longer on a larger scale.

-Shane

- Original Message - 
From: Becoming Digital [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, June 15, 2003 3:21 AM
Subject: Re: Question about INSERT vs UPDATE


 Consider going over Section 5.2.9 of the manual.
 http://www.mysql.com/doc/en/Insert_speed.html

 Edward Dudlik
 Becoming Digital
 www.becomingdigital.com


 - Original Message -
 From: Shane Bryldt [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, 15 June, 2003 01:12
 Subject: Question about INSERT vs UPDATE


 Hello,

 I am new to the list, so pardon me if I am on the wrong one posting my
 question.  If so, please direct me to the right place.

 My question is in regards to the efficiency of INSERT statements. I
have
 gone over some of the optimizations, and on a machine running the client
and
 MySQL 4 server, DELETEing a table and repopulating it with ~48k records
via
 INSERT, I have come to the conclusion that, regardless of hardware, the
delay
 will be too significant (on a test machine it ran about 25 seconds).
 What I am curious to know, is whether there is a significant increase
if I
 switch my method of saving from a complete memory dump, to a partial
memory
 dump.  The overhaul involved would require a lot of code restructuring.
The
 situation is this, approximately 10k of those 48k records are actually
modified
 regularily.  If I overhaul the code to indicate when a record needs to be
 updated, or inserted, and only call the appropriate action, is the
performance
 going to be significantly better?
 Keeping in mind it would have to search the 48k records to UPDATE the
10k
 modified records, as well as potentially INSERT new records (very few if
any).
 With 48k records, is updating 10k records faster than simply deleting
and
 reinserting every record?

 Alternatively, is there any way speeding up the INSERT time can be
achieved?
 First, I am using the default format, I believe is MyISAM.  Second, I have
used
 table locking to optimize writing before the table is deleted and
repopulated,
 and unlocked after all records are inserted.  Third, I am using multiple
INSERT
 lists (of 1000 records inserted at a time).  Whether the lists is with
100, or
 1000, I get about the same results of 25 seconds.  I realize the hardware
I am
 using is not significantly powerful, but I think there is some
optimization I
 could make.  Can you preallocate a definite number of rows for a large
insert
 operation?

 Any assistance would be appreciated, this project is flexible towards new
ideas
 to make the dumping more efficient.  The alternative has been considered
to use
 a method of UPDATEing records immediately when changes are made in memory.
 Transaction overhead could become an issue however, with 10k+ records
actively
 changing at any given time.  With an UPDATE method, dumping frequently may
 actually reduce the time required for the process, since it would update
fewer
 records.

 Anyone with some insight on this would be much appreciated if they could
offer
 some ways to speed up the process.

 Thanks,
 -Shane


 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Question about INSERT vs UPDATE

2003-06-15 Thread Becoming Digital
 As my post suggested, I have already addressed the tweaks this this
 section of the manual addresses, and was hoping there might be some insight
 on my original question, the process of INSERT vs UPDATE.

Gotcha.  I wasn't sure if you'd checked the manual or just run a huge number of
EXPLAINs on your queries.

 I was hoping someone with some experience might offer something more
 than pointing a finger to some documentation.

Well, I can't offer an exact answer, but perhaps a tip.  Have you considered
adding a pair of DATE fields to your tables to determine when the records were
added and subsequently modified?  You could use those fields in conjunction with
INSERT, UPDATE and IGNORE to get the desired effect.

While this doesn't directly address your question, it may be a better long-term
solution to your problem.  If you continue going about things with your current
technique, you will likely run into the same problem once your table grows, even
with the best choice of UPDATE or INSERT.

Edward Dudlik
Becoming Digital
www.becomingdigital.com


- Original Message -
From: Shane Bryldt [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, 15 June, 2003 06:03
Subject: Re: Question about INSERT vs UPDATE


As my post suggested, I have already addressed the tweaks this this
section of the manual addresses, and was hoping there might be some insight
on my original question, the process of INSERT vs UPDATE.  That chapter was
helpful initially, but I have already addressed most of what that chapter
has to offer, however it still does not address my question.  I have read
that chapter thoroughly.
I was hoping someone with some experience might offer something more
than pointing a finger to some documentation.
The documentation does not address UPDATE efficiency nor which operation
takes longer on a larger scale.

-Shane

- Original Message -
From: Becoming Digital [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, June 15, 2003 3:21 AM
Subject: Re: Question about INSERT vs UPDATE


 Consider going over Section 5.2.9 of the manual.
 http://www.mysql.com/doc/en/Insert_speed.html

 Edward Dudlik
 Becoming Digital
 www.becomingdigital.com


 - Original Message -
 From: Shane Bryldt [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, 15 June, 2003 01:12
 Subject: Question about INSERT vs UPDATE


 Hello,

 I am new to the list, so pardon me if I am on the wrong one posting my
 question.  If so, please direct me to the right place.

 My question is in regards to the efficiency of INSERT statements. I
have
 gone over some of the optimizations, and on a machine running the client
and
 MySQL 4 server, DELETEing a table and repopulating it with ~48k records
via
 INSERT, I have come to the conclusion that, regardless of hardware, the
delay
 will be too significant (on a test machine it ran about 25 seconds).
 What I am curious to know, is whether there is a significant increase
if I
 switch my method of saving from a complete memory dump, to a partial
memory
 dump.  The overhaul involved would require a lot of code restructuring.
The
 situation is this, approximately 10k of those 48k records are actually
modified
 regularily.  If I overhaul the code to indicate when a record needs to be
 updated, or inserted, and only call the appropriate action, is the
performance
 going to be significantly better?
 Keeping in mind it would have to search the 48k records to UPDATE the
10k
 modified records, as well as potentially INSERT new records (very few if
any).
 With 48k records, is updating 10k records faster than simply deleting
and
 reinserting every record?

 Alternatively, is there any way speeding up the INSERT time can be
achieved?
 First, I am using the default format, I believe is MyISAM.  Second, I have
used
 table locking to optimize writing before the table is deleted and
repopulated,
 and unlocked after all records are inserted.  Third, I am using multiple
INSERT
 lists (of 1000 records inserted at a time).  Whether the lists is with
100, or
 1000, I get about the same results of 25 seconds.  I realize the hardware
I am
 using is not significantly powerful, but I think there is some
optimization I
 could make.  Can you preallocate a definite number of rows for a large
insert
 operation?

 Any assistance would be appreciated, this project is flexible towards new
ideas
 to make the dumping more efficient.  The alternative has been considered
to use
 a method of UPDATEing records immediately when changes are made in memory.
 Transaction overhead could become an issue however, with 10k+ records
actively
 changing at any given time.  With an UPDATE method, dumping frequently may
 actually reduce the time required for the process, since it would update
fewer
 records.

 Anyone with some insight on this would be much appreciated if they could
offer
 some ways to speed up the process.

 Thanks,
 -Shane


 --
 MySQL General Mailing List
 For list archives

Re: Question about INSERT vs UPDATE

2003-06-15 Thread Shane Bryldt
I'm not sure I entirely understand your solution.  A datestamp would only
work if you stamped the record when it's updated, and then when the next
update is called, you'd have to have, throughout the program, snippets of
code to edit the last modified timestamp, so it knows to update those
records... This poses yet still an efficiency issue, because first, the code
has to be significantly modified in every location where the record is
modified in memory, to update the should dump condition.
A solution similar to this that I considered, was 3 flags. Whether inserted,
modified or deleted and acting upon each recording according to the current
flag... However, this still poses the problem of having to edit every
location in the code where the record is modified in memory to indicate it
should be dumped.  Retaining a last dumped timestamp would not really
achieve anything.  It would simply indicate when the last dump occured, but
not whether it should be dumped again.  Unless I have misunderstood you
idea.

As another possible out-of-the-box idea I had, I was considering the
possibility of a delayed import.  That is, to dump the entire set of SQL
statements normally called in the client, to a dump.sql file, and then in
turn, with another thread, use a loaded data infile to quickly import.  This
would take the load off the client for the duration of MySQL doing it's end
of things, allowing it to continue processing without disturbing data that
is being dumped (since it's already in the dump.sql) ... Fewer checks would
be needed to test whether saving again is safe, while the dump.sql still
exists.  The thread could remove the file, and in turn indicate it's safe to
save again.  This keeps a thread-safe environment, and offloads all the
processing short of a quick dump file to create.
Would this method achieve the results I am theorizing?  Hypothetically
speaking, it should only take a few seconds to write 48 insert statements
with a 1000 value list to a text file.  This does indicate however, a
delayed import.  It is not safe to assume as soon as the services is done
it's part, that the database is up to date.  Thus, locking the tables and
doing it all in one transaction in the separate thread is about as safe and
fast as I can think of.
Any thoughts?

Also, another optimization I made, that seemed to slow things down (could
just be coincidence of other processes running), was in changing my varchar
fields to static sized char fields.  I went from 21 seconds back to 28
seconds. I thought that strange, since it should actually be faster, or at
least not so significantly slower.  Again, could have been coincidence, so
with a second opinion, if static char field is faster than using any text,
varchar or blob fields, I'll keep the code as it is with char fields.

Thanks,
-Shane

- Original Message - 
From: Becoming Digital [EMAIL PROTECTED]
To: MySQL List [EMAIL PROTECTED]
Sent: Sunday, June 15, 2003 3:24 PM
Subject: Re: Question about INSERT vs UPDATE


  As my post suggested, I have already addressed the tweaks this this
  section of the manual addresses, and was hoping there might be some
insight
  on my original question, the process of INSERT vs UPDATE.

 Gotcha.  I wasn't sure if you'd checked the manual or just run a huge
number of
 EXPLAINs on your queries.

  I was hoping someone with some experience might offer something more
  than pointing a finger to some documentation.

 Well, I can't offer an exact answer, but perhaps a tip.  Have you
considered
 adding a pair of DATE fields to your tables to determine when the records
were
 added and subsequently modified?  You could use those fields in
conjunction with
 INSERT, UPDATE and IGNORE to get the desired effect.

 While this doesn't directly address your question, it may be a better
long-term
 solution to your problem.  If you continue going about things with your
current
 technique, you will likely run into the same problem once your table
grows, even
 with the best choice of UPDATE or INSERT.

 Edward Dudlik
 Becoming Digital
 www.becomingdigital.com


 - Original Message -
 From: Shane Bryldt [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, 15 June, 2003 06:03
 Subject: Re: Question about INSERT vs UPDATE


 As my post suggested, I have already addressed the tweaks this this
 section of the manual addresses, and was hoping there might be some
insight
 on my original question, the process of INSERT vs UPDATE.  That chapter
was
 helpful initially, but I have already addressed most of what that chapter
 has to offer, however it still does not address my question.  I have read
 that chapter thoroughly.
 I was hoping someone with some experience might offer something more
 than pointing a finger to some documentation.
 The documentation does not address UPDATE efficiency nor which
operation
 takes longer on a larger scale.

 -Shane

 - Original Message -
 From: Becoming Digital [EMAIL PROTECTED

Question about INSERT vs UPDATE

2003-06-14 Thread Shane Bryldt
Hello,

I am new to the list, so pardon me if I am on the wrong one posting my question.  
If so, please direct me to the right place.

My question is in regards to the efficiency of INSERT statements. I have gone over 
some of the optimizations, and on a machine running the client and MySQL 4 server, 
DELETEing a table and repopulating it with ~48k records via INSERT, I have come to the 
conclusion that, regardless of hardware, the delay will be too significant (on a test 
machine it ran about 25 seconds).
What I am curious to know, is whether there is a significant increase if I switch 
my method of saving from a complete memory dump, to a partial memory dump.  The 
overhaul involved would require a lot of code restructuring.  The situation is this, 
approximately 10k of those 48k records are actually modified regularily.  If I 
overhaul the code to indicate when a record needs to be updated, or inserted, and only 
call the appropriate action, is the performance going to be significantly better?
Keeping in mind it would have to search the 48k records to UPDATE the 10k modified 
records, as well as potentially INSERT new records (very few if any).
With 48k records, is updating 10k records faster than simply deleting and 
reinserting every record?

Alternatively, is there any way speeding up the INSERT time can be achieved? 
First, I am using the default format, I believe is MyISAM.  Second, I have used table 
locking to optimize writing before the table is deleted and repopulated, and unlocked 
after all records are inserted.  Third, I am using multiple INSERT lists (of 1000 
records inserted at a time).  Whether the lists is with 100, or 1000, I get about the 
same results of 25 seconds.  I realize the hardware I am using is not significantly 
powerful, but I think there is some optimization I could make.  Can you preallocate a 
definite number of rows for a large insert operation?

Any assistance would be appreciated, this project is flexible towards new ideas to 
make the dumping more efficient.  The alternative has been considered to use a method 
of UPDATEing records immediately when changes are made in memory.  Transaction 
overhead could become an issue however, with 10k+ records actively changing at any 
given time.  With an UPDATE method, dumping frequently may actually reduce the time 
required for the process, since it would update fewer records.

Anyone with some insight on this would be much appreciated if they could offer some 
ways to speed up the process.

Thanks,
-Shane