Re: Question about INSERT vs UPDATE
Consider going over Section 5.2.9 of the manual. http://www.mysql.com/doc/en/Insert_speed.html Edward Dudlik Becoming Digital www.becomingdigital.com - Original Message - From: Shane Bryldt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, 15 June, 2003 01:12 Subject: Question about INSERT vs UPDATE Hello, I am new to the list, so pardon me if I am on the wrong one posting my question. If so, please direct me to the right place. My question is in regards to the efficiency of INSERT statements. I have gone over some of the optimizations, and on a machine running the client and MySQL 4 server, DELETEing a table and repopulating it with ~48k records via INSERT, I have come to the conclusion that, regardless of hardware, the delay will be too significant (on a test machine it ran about 25 seconds). What I am curious to know, is whether there is a significant increase if I switch my method of saving from a complete memory dump, to a partial memory dump. The overhaul involved would require a lot of code restructuring. The situation is this, approximately 10k of those 48k records are actually modified regularily. If I overhaul the code to indicate when a record needs to be updated, or inserted, and only call the appropriate action, is the performance going to be significantly better? Keeping in mind it would have to search the 48k records to UPDATE the 10k modified records, as well as potentially INSERT new records (very few if any). With 48k records, is updating 10k records faster than simply deleting and reinserting every record? Alternatively, is there any way speeding up the INSERT time can be achieved? First, I am using the default format, I believe is MyISAM. Second, I have used table locking to optimize writing before the table is deleted and repopulated, and unlocked after all records are inserted. Third, I am using multiple INSERT lists (of 1000 records inserted at a time). Whether the lists is with 100, or 1000, I get about the same results of 25 seconds. I realize the hardware I am using is not significantly powerful, but I think there is some optimization I could make. Can you preallocate a definite number of rows for a large insert operation? Any assistance would be appreciated, this project is flexible towards new ideas to make the dumping more efficient. The alternative has been considered to use a method of UPDATEing records immediately when changes are made in memory. Transaction overhead could become an issue however, with 10k+ records actively changing at any given time. With an UPDATE method, dumping frequently may actually reduce the time required for the process, since it would update fewer records. Anyone with some insight on this would be much appreciated if they could offer some ways to speed up the process. Thanks, -Shane -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Question about INSERT vs UPDATE
As my post suggested, I have already addressed the tweaks this this section of the manual addresses, and was hoping there might be some insight on my original question, the process of INSERT vs UPDATE. That chapter was helpful initially, but I have already addressed most of what that chapter has to offer, however it still does not address my question. I have read that chapter thoroughly. I was hoping someone with some experience might offer something more than pointing a finger to some documentation. The documentation does not address UPDATE efficiency nor which operation takes longer on a larger scale. -Shane - Original Message - From: Becoming Digital [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, June 15, 2003 3:21 AM Subject: Re: Question about INSERT vs UPDATE Consider going over Section 5.2.9 of the manual. http://www.mysql.com/doc/en/Insert_speed.html Edward Dudlik Becoming Digital www.becomingdigital.com - Original Message - From: Shane Bryldt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, 15 June, 2003 01:12 Subject: Question about INSERT vs UPDATE Hello, I am new to the list, so pardon me if I am on the wrong one posting my question. If so, please direct me to the right place. My question is in regards to the efficiency of INSERT statements. I have gone over some of the optimizations, and on a machine running the client and MySQL 4 server, DELETEing a table and repopulating it with ~48k records via INSERT, I have come to the conclusion that, regardless of hardware, the delay will be too significant (on a test machine it ran about 25 seconds). What I am curious to know, is whether there is a significant increase if I switch my method of saving from a complete memory dump, to a partial memory dump. The overhaul involved would require a lot of code restructuring. The situation is this, approximately 10k of those 48k records are actually modified regularily. If I overhaul the code to indicate when a record needs to be updated, or inserted, and only call the appropriate action, is the performance going to be significantly better? Keeping in mind it would have to search the 48k records to UPDATE the 10k modified records, as well as potentially INSERT new records (very few if any). With 48k records, is updating 10k records faster than simply deleting and reinserting every record? Alternatively, is there any way speeding up the INSERT time can be achieved? First, I am using the default format, I believe is MyISAM. Second, I have used table locking to optimize writing before the table is deleted and repopulated, and unlocked after all records are inserted. Third, I am using multiple INSERT lists (of 1000 records inserted at a time). Whether the lists is with 100, or 1000, I get about the same results of 25 seconds. I realize the hardware I am using is not significantly powerful, but I think there is some optimization I could make. Can you preallocate a definite number of rows for a large insert operation? Any assistance would be appreciated, this project is flexible towards new ideas to make the dumping more efficient. The alternative has been considered to use a method of UPDATEing records immediately when changes are made in memory. Transaction overhead could become an issue however, with 10k+ records actively changing at any given time. With an UPDATE method, dumping frequently may actually reduce the time required for the process, since it would update fewer records. Anyone with some insight on this would be much appreciated if they could offer some ways to speed up the process. Thanks, -Shane -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Question about INSERT vs UPDATE
As my post suggested, I have already addressed the tweaks this this section of the manual addresses, and was hoping there might be some insight on my original question, the process of INSERT vs UPDATE. Gotcha. I wasn't sure if you'd checked the manual or just run a huge number of EXPLAINs on your queries. I was hoping someone with some experience might offer something more than pointing a finger to some documentation. Well, I can't offer an exact answer, but perhaps a tip. Have you considered adding a pair of DATE fields to your tables to determine when the records were added and subsequently modified? You could use those fields in conjunction with INSERT, UPDATE and IGNORE to get the desired effect. While this doesn't directly address your question, it may be a better long-term solution to your problem. If you continue going about things with your current technique, you will likely run into the same problem once your table grows, even with the best choice of UPDATE or INSERT. Edward Dudlik Becoming Digital www.becomingdigital.com - Original Message - From: Shane Bryldt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, 15 June, 2003 06:03 Subject: Re: Question about INSERT vs UPDATE As my post suggested, I have already addressed the tweaks this this section of the manual addresses, and was hoping there might be some insight on my original question, the process of INSERT vs UPDATE. That chapter was helpful initially, but I have already addressed most of what that chapter has to offer, however it still does not address my question. I have read that chapter thoroughly. I was hoping someone with some experience might offer something more than pointing a finger to some documentation. The documentation does not address UPDATE efficiency nor which operation takes longer on a larger scale. -Shane - Original Message - From: Becoming Digital [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, June 15, 2003 3:21 AM Subject: Re: Question about INSERT vs UPDATE Consider going over Section 5.2.9 of the manual. http://www.mysql.com/doc/en/Insert_speed.html Edward Dudlik Becoming Digital www.becomingdigital.com - Original Message - From: Shane Bryldt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, 15 June, 2003 01:12 Subject: Question about INSERT vs UPDATE Hello, I am new to the list, so pardon me if I am on the wrong one posting my question. If so, please direct me to the right place. My question is in regards to the efficiency of INSERT statements. I have gone over some of the optimizations, and on a machine running the client and MySQL 4 server, DELETEing a table and repopulating it with ~48k records via INSERT, I have come to the conclusion that, regardless of hardware, the delay will be too significant (on a test machine it ran about 25 seconds). What I am curious to know, is whether there is a significant increase if I switch my method of saving from a complete memory dump, to a partial memory dump. The overhaul involved would require a lot of code restructuring. The situation is this, approximately 10k of those 48k records are actually modified regularily. If I overhaul the code to indicate when a record needs to be updated, or inserted, and only call the appropriate action, is the performance going to be significantly better? Keeping in mind it would have to search the 48k records to UPDATE the 10k modified records, as well as potentially INSERT new records (very few if any). With 48k records, is updating 10k records faster than simply deleting and reinserting every record? Alternatively, is there any way speeding up the INSERT time can be achieved? First, I am using the default format, I believe is MyISAM. Second, I have used table locking to optimize writing before the table is deleted and repopulated, and unlocked after all records are inserted. Third, I am using multiple INSERT lists (of 1000 records inserted at a time). Whether the lists is with 100, or 1000, I get about the same results of 25 seconds. I realize the hardware I am using is not significantly powerful, but I think there is some optimization I could make. Can you preallocate a definite number of rows for a large insert operation? Any assistance would be appreciated, this project is flexible towards new ideas to make the dumping more efficient. The alternative has been considered to use a method of UPDATEing records immediately when changes are made in memory. Transaction overhead could become an issue however, with 10k+ records actively changing at any given time. With an UPDATE method, dumping frequently may actually reduce the time required for the process, since it would update fewer records. Anyone with some insight on this would be much appreciated if they could offer some ways to speed up the process. Thanks, -Shane -- MySQL General Mailing List For list archives
Re: Question about INSERT vs UPDATE
I'm not sure I entirely understand your solution. A datestamp would only work if you stamped the record when it's updated, and then when the next update is called, you'd have to have, throughout the program, snippets of code to edit the last modified timestamp, so it knows to update those records... This poses yet still an efficiency issue, because first, the code has to be significantly modified in every location where the record is modified in memory, to update the should dump condition. A solution similar to this that I considered, was 3 flags. Whether inserted, modified or deleted and acting upon each recording according to the current flag... However, this still poses the problem of having to edit every location in the code where the record is modified in memory to indicate it should be dumped. Retaining a last dumped timestamp would not really achieve anything. It would simply indicate when the last dump occured, but not whether it should be dumped again. Unless I have misunderstood you idea. As another possible out-of-the-box idea I had, I was considering the possibility of a delayed import. That is, to dump the entire set of SQL statements normally called in the client, to a dump.sql file, and then in turn, with another thread, use a loaded data infile to quickly import. This would take the load off the client for the duration of MySQL doing it's end of things, allowing it to continue processing without disturbing data that is being dumped (since it's already in the dump.sql) ... Fewer checks would be needed to test whether saving again is safe, while the dump.sql still exists. The thread could remove the file, and in turn indicate it's safe to save again. This keeps a thread-safe environment, and offloads all the processing short of a quick dump file to create. Would this method achieve the results I am theorizing? Hypothetically speaking, it should only take a few seconds to write 48 insert statements with a 1000 value list to a text file. This does indicate however, a delayed import. It is not safe to assume as soon as the services is done it's part, that the database is up to date. Thus, locking the tables and doing it all in one transaction in the separate thread is about as safe and fast as I can think of. Any thoughts? Also, another optimization I made, that seemed to slow things down (could just be coincidence of other processes running), was in changing my varchar fields to static sized char fields. I went from 21 seconds back to 28 seconds. I thought that strange, since it should actually be faster, or at least not so significantly slower. Again, could have been coincidence, so with a second opinion, if static char field is faster than using any text, varchar or blob fields, I'll keep the code as it is with char fields. Thanks, -Shane - Original Message - From: Becoming Digital [EMAIL PROTECTED] To: MySQL List [EMAIL PROTECTED] Sent: Sunday, June 15, 2003 3:24 PM Subject: Re: Question about INSERT vs UPDATE As my post suggested, I have already addressed the tweaks this this section of the manual addresses, and was hoping there might be some insight on my original question, the process of INSERT vs UPDATE. Gotcha. I wasn't sure if you'd checked the manual or just run a huge number of EXPLAINs on your queries. I was hoping someone with some experience might offer something more than pointing a finger to some documentation. Well, I can't offer an exact answer, but perhaps a tip. Have you considered adding a pair of DATE fields to your tables to determine when the records were added and subsequently modified? You could use those fields in conjunction with INSERT, UPDATE and IGNORE to get the desired effect. While this doesn't directly address your question, it may be a better long-term solution to your problem. If you continue going about things with your current technique, you will likely run into the same problem once your table grows, even with the best choice of UPDATE or INSERT. Edward Dudlik Becoming Digital www.becomingdigital.com - Original Message - From: Shane Bryldt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, 15 June, 2003 06:03 Subject: Re: Question about INSERT vs UPDATE As my post suggested, I have already addressed the tweaks this this section of the manual addresses, and was hoping there might be some insight on my original question, the process of INSERT vs UPDATE. That chapter was helpful initially, but I have already addressed most of what that chapter has to offer, however it still does not address my question. I have read that chapter thoroughly. I was hoping someone with some experience might offer something more than pointing a finger to some documentation. The documentation does not address UPDATE efficiency nor which operation takes longer on a larger scale. -Shane - Original Message - From: Becoming Digital [EMAIL PROTECTED
Question about INSERT vs UPDATE
Hello, I am new to the list, so pardon me if I am on the wrong one posting my question. If so, please direct me to the right place. My question is in regards to the efficiency of INSERT statements. I have gone over some of the optimizations, and on a machine running the client and MySQL 4 server, DELETEing a table and repopulating it with ~48k records via INSERT, I have come to the conclusion that, regardless of hardware, the delay will be too significant (on a test machine it ran about 25 seconds). What I am curious to know, is whether there is a significant increase if I switch my method of saving from a complete memory dump, to a partial memory dump. The overhaul involved would require a lot of code restructuring. The situation is this, approximately 10k of those 48k records are actually modified regularily. If I overhaul the code to indicate when a record needs to be updated, or inserted, and only call the appropriate action, is the performance going to be significantly better? Keeping in mind it would have to search the 48k records to UPDATE the 10k modified records, as well as potentially INSERT new records (very few if any). With 48k records, is updating 10k records faster than simply deleting and reinserting every record? Alternatively, is there any way speeding up the INSERT time can be achieved? First, I am using the default format, I believe is MyISAM. Second, I have used table locking to optimize writing before the table is deleted and repopulated, and unlocked after all records are inserted. Third, I am using multiple INSERT lists (of 1000 records inserted at a time). Whether the lists is with 100, or 1000, I get about the same results of 25 seconds. I realize the hardware I am using is not significantly powerful, but I think there is some optimization I could make. Can you preallocate a definite number of rows for a large insert operation? Any assistance would be appreciated, this project is flexible towards new ideas to make the dumping more efficient. The alternative has been considered to use a method of UPDATEing records immediately when changes are made in memory. Transaction overhead could become an issue however, with 10k+ records actively changing at any given time. With an UPDATE method, dumping frequently may actually reduce the time required for the process, since it would update fewer records. Anyone with some insight on this would be much appreciated if they could offer some ways to speed up the process. Thanks, -Shane