On Thu, 14 Oct 2010 16:36:34 +1300, Lawrence D'Oliveiro wrote: > In message <4cb5e659$0$1650$742ec...@news.sonic.net>, John Nagle wrote: > >> Also note that there are some issues with doing a huge volume of >> updates in one MySQL InnoDB transaction. The system has to keep the >> data needed to undo the updates, and there's a limit on the amount of >> pending transaction history that can be stored. > > How does “load data” avoid this? Is that not a transaction too? > Not usually. Its faster because there's no journalling overhead. The loader takes out an exclusive table lock, dumps the data into the table, rebuilds indexes and releases the lock. I can't comment about MySQL (don't use it) but this has been the case on the RDBMS databases I have used. > Seems to me this isn’t going to help, since both old and new tables are > on the same disk, after all. And it’s the disk access that’s the > bottleneck. > There's a lot of overhead in journalling - much more than in applying changes to a table. The before and after images *must* be flushed to disk on commit. In UNIX terms fsync() must be called on the journal file(s) and this is an expensive operation on all OSes because committing a series of small transactions can cause the same disk block to be written several times. However, the table pages can safely be left in the DBMS cache and flushed as part of normal cache operation since, after a crash, the table changes can always be recovered from a journal roll-forward. A good DBMS will do that automatically when its restarted.
-- martin@ | Martin Gregorie gregorie. | Essex, UK org | -- http://mail.python.org/mailman/listinfo/python-list