Re: SQLite - Time to build a database
On Thu, 3 Jun 2010 11:55:47 +0300, Henri Asseily wrote: If you're ok with one big transaction where if it fails you have to redo it all, then just do the above. Well in the root post he did say: 1500 of these rows have faulty data. so I don't think it's OK to fail completely on an error... Unless the data gets prescreened. In Perl. -- Bart Lateur bart.lat...@telenet.be
Re: SQLite - Time to build a database
On Thu, 03 Jun 2010 20:12:51 +0200 Bart Lateur bart.lat...@telenet.be wrote: On Thu, 3 Jun 2010 11:55:47 +0300, Henri Asseily wrote: If you're ok with one big transaction where if it fails you have to redo it all, then just do the above. Well in the root post he did say: 1500 of these rows have faulty data. so I don't think it's OK to fail completely on an error... Unless the data gets prescreened. In Perl. Thanks for all the responses. It was a one off exercise that roused my curiosity which has now been satisfied. SQLite is a great facility for my purposes, I don't know how it would get on with gigabyte sized files, but certainly for smaller stuff and general hack work, it and its perl interface are ideal Owen
Re: SQLite - Time to build a database
On Jun 3, 2010, at 1:12 AM, Bart Lateur wrote: On Wed, 26 May 2010 10:46:43 +1000, Owen wrote: So I read the main database row by row, tested the validity of one column, and inserted the good rows into a new database. This process took 27 minutes on a fairly recent desktop computer. ( about 1 rows a minute or 170 a second ) I thought the whole process would take a few seconds, so I wonder if 27 minutes is a reasonable time for this database creation. It's because SQLite is committing every row insert as its own transaction. Do a begin transaction at the start, then at the end a commit transaction and it'll fly with 50k rows/second. If you're ok with one big transaction where if it fails you have to redo it all, then just do the above. With 2.7 million rows that should take you barely a minute, so I'd say a single transaction for the whole procedure is the way to go. --- Henri Asseily henri.tel
Re: SQLite - Time to build a database
On Wed, 26 May 2010 10:46:43 +1000, Owen wrote: So I read the main database row by row, tested the validity of one column, and inserted the good rows into a new database. This process took 27 minutes on a fairly recent desktop computer. ( about 1 rows a minute or 170 a second ) I thought the whole process would take a few seconds, so I wonder if 27 minutes is a reasonable time for this database creation. Well I timed actions using SQLite some time ago and I am lead to believe that every commit produces an fsync() in the database, which takes roughly 100ms on my old 1MHz PC running Linux, and which you may expect to take 50ms on a more modern computer. 2 possible solutions are: 1) don't commit on every single row, keep an eye on the clock and commit every N seconds (I'm thinking of 1/2 minute) plus once at the very end, of course 2) edit the C source of SQLite and replace the fsync() with fflush(), which ought to be around 100 times faster, but which isn't garanteed to withstand computer crashes, if it goes down before the data is finally stored on disk - not ideal in the eye of ACID perfectionists. Anyway, with current cache sizes inside the disk, nothing is garanteed after an fsync(), anyway. (Data might be in the disk's own cache but not yet committed to the physical disk) As the database was being created, I noticed a journal file was in use though I haven't worked out what that did, yet. It SQLite's scratchpad, where it stores its intentions for each transaction. If the computer goes down before a transaction is finalized, this could likely be used to redo the action and this time, complete it. Or, roll it back properly. -- Bart Lateur bart.lat...@telenet.be
Re: SQLite - Time to build a database
Bart Lateur wrote: 2) edit the C source of SQLite and replace the fsync() with fflush(), which ought to be around 100 times faster, but which isn't garanteed to withstand computer crashes, if it goes down before the data is finally stored on disk - not ideal in the eye of ACID perfectionists. Anyway, with current cache sizes inside the disk, nothing is garanteed after an fsync(), anyway. (Data might be in the disk's own cache but not yet committed to the physical disk) if you're using proper enterprise storage and non-broken OS's, 'write barriers' take care of this nicely. of course, enterprise storage usually has a battery backup on the controller write cache, anyways, and enterprise drives (scsi, sas, fc) understand to flush their drive cache on a write barrier operation.
Re: SQLite - Time to build a database
Stuart Johnston wrote: Writes are slow if you do one insert per transaction. http://search.cpan.org/perldoc?DBD::SQLite#Performance Just think of the number of round trips you are doing in this case so 27min is not too bad. We have 30+ reads then which each insert there could be I think at least 3 round trips 1) prepare 2) bind 3) execute that is allot of round trips;). cheers John Scoles Owen wrote: I have a sqlite (version 3.6.13) database with 30+ rows 1500 of these rows have faulty data. So I read the main database row by row, tested the validity of one column, and inserted the good rows into a new database. This process took 27 minutes on a fairly recent desktop computer. ( about 1 rows a minute or 170 a second ) I thought the whole process would take a few seconds, so I wonder if 27 minutes is a reasonable time for this database creation. As the database was being created, I noticed a journal file was in use though I haven't worked out what that did, yet. TIA Owen
Re: SQLite - Time to build a database
Writes are slow if you do one insert per transaction. http://search.cpan.org/perldoc?DBD::SQLite#Performance Owen wrote: I have a sqlite (version 3.6.13) database with 30+ rows 1500 of these rows have faulty data. So I read the main database row by row, tested the validity of one column, and inserted the good rows into a new database. This process took 27 minutes on a fairly recent desktop computer. ( about 1 rows a minute or 170 a second ) I thought the whole process would take a few seconds, so I wonder if 27 minutes is a reasonable time for this database creation. As the database was being created, I noticed a journal file was in use though I haven't worked out what that did, yet. TIA Owen