Re: SQLite - Time to build a database

2010-06-04 Thread Bart Lateur
On Thu, 3 Jun 2010 11:55:47 +0300, Henri Asseily wrote:

If you're ok with one big transaction where if it fails you have to redo it 
all, then just do the above.

Well in the root post he did say:

1500 of these rows have faulty data.

so I don't think it's OK to fail completely on an error... Unless the
data gets prescreened. In Perl.

-- 
Bart Lateur
bart.lat...@telenet.be


Re: SQLite - Time to build a database

2010-06-04 Thread Owen
On Thu, 03 Jun 2010 20:12:51 +0200
Bart Lateur bart.lat...@telenet.be wrote:

 On Thu, 3 Jun 2010 11:55:47 +0300, Henri Asseily wrote:
 
 If you're ok with one big transaction where if it fails you have to
 redo it all, then just do the above.
 
 Well in the root post he did say:
 
 1500 of these rows have faulty data.
 
 so I don't think it's OK to fail completely on an error... Unless the
 data gets prescreened. In Perl.
 


Thanks for all the responses. It was a one off exercise that roused my
curiosity which has now been satisfied.

SQLite is a great facility for my purposes, I don't know how it would
get on with gigabyte sized files, but certainly for smaller stuff and
general hack work, it and its perl interface are ideal


Owen



Re: SQLite - Time to build a database

2010-06-03 Thread Henri Asseily

On Jun 3, 2010, at 1:12 AM, Bart Lateur wrote:

 On Wed, 26 May 2010 10:46:43 +1000, Owen wrote:
 
 So I read the main database row by row, tested the validity of one
 column, and inserted the good rows into a new database.
 
 This process took 27 minutes on a fairly recent desktop computer.
 ( about 1 rows a minute or 170 a second ) 
 
 I thought the whole process would take a few seconds, so I wonder if 27
 minutes is a reasonable time for this database creation.
 

It's because SQLite is committing every row insert as its own transaction.
Do a begin transaction at the start, then at the end a commit transaction 
and it'll fly with 50k rows/second.

If you're ok with one big transaction where if it fails you have to redo it 
all, then just do the above.
With 2.7 million rows that should take you barely a minute, so I'd say a single 
transaction for the whole procedure is the way to go.

---
Henri Asseily
henri.tel






Re: SQLite - Time to build a database

2010-06-02 Thread Bart Lateur
On Wed, 26 May 2010 10:46:43 +1000, Owen wrote:

So I read the main database row by row, tested the validity of one
column, and inserted the good rows into a new database.

This process took 27 minutes on a fairly recent desktop computer.
( about 1 rows a minute or 170 a second ) 

I thought the whole process would take a few seconds, so I wonder if 27
minutes is a reasonable time for this database creation.

Well I timed actions using SQLite some time ago and I am lead to believe
that every commit produces an fsync() in the database, which takes
roughly 100ms on my old 1MHz PC running Linux, and which you may expect
to take 50ms on a more modern computer.

2 possible solutions are: 

 1) don't commit on every single row, keep an eye on the clock and
commit every N seconds (I'm thinking of 1/2 minute) plus once at the
very end, of course

 2) edit the C source of SQLite and replace the fsync() with fflush(),
which ought to be around 100 times faster, but which isn't garanteed to
withstand computer crashes, if it goes down before the data is finally
stored on disk - not ideal in the eye of ACID perfectionists.

Anyway, with current cache sizes inside the disk, nothing is garanteed
after an fsync(), anyway. (Data might be in the disk's own cache but not
yet committed to the physical disk)

As the database was being created, I noticed a journal file was in use
though I haven't worked out what that did, yet.

It SQLite's scratchpad, where it stores its intentions for each
transaction. If the computer goes down before a transaction is
finalized, this could likely be used to redo the action and this time,
complete it. Or, roll it back properly.

-- 
Bart Lateur
bart.lat...@telenet.be


Re: SQLite - Time to build a database

2010-06-02 Thread John R Pierce

Bart Lateur wrote:

 2) edit the C source of SQLite and replace the fsync() with fflush(),
which ought to be around 100 times faster, but which isn't garanteed to
withstand computer crashes, if it goes down before the data is finally
stored on disk - not ideal in the eye of ACID perfectionists.

Anyway, with current cache sizes inside the disk, nothing is garanteed
after an fsync(), anyway. (Data might be in the disk's own cache but not
yet committed to the physical disk)
  



if you're using proper enterprise storage and non-broken OS's, 'write 
barriers' take care of this nicely.   of course, enterprise storage 
usually has a battery backup on the controller write cache, anyways, and 
enterprise drives (scsi, sas, fc) understand to flush their drive cache 
on a write barrier operation.











Re: SQLite - Time to build a database

2010-05-26 Thread John Scoles

Stuart Johnston wrote:

Writes are slow if you do one insert per transaction.

http://search.cpan.org/perldoc?DBD::SQLite#Performance


Just think of the number of round trips you are doing in this case so 
27min is not too bad.
We have 30+ reads then which each insert there could be I think at 
least

3 round trips

1) prepare
2) bind
3) execute

that is allot of round trips;).

cheers
John Scoles


Owen wrote:

I have a sqlite (version 3.6.13) database with 30+ rows

1500 of these rows have faulty data.

So I read the main database row by row, tested the validity of one
column, and inserted the good rows into a new database.

This process took 27 minutes on a fairly recent desktop computer.
( about 1 rows a minute or 170 a second )
I thought the whole process would take a few seconds, so I wonder if 27
minutes is a reasonable time for this database creation.

As the database was being created, I noticed a journal file was in use
though I haven't worked out what that did, yet.


TIA


Owen






Re: SQLite - Time to build a database

2010-05-25 Thread Stuart Johnston

Writes are slow if you do one insert per transaction.

http://search.cpan.org/perldoc?DBD::SQLite#Performance

Owen wrote:

I have a sqlite (version 3.6.13) database with 30+ rows

1500 of these rows have faulty data.

So I read the main database row by row, tested the validity of one
column, and inserted the good rows into a new database.

This process took 27 minutes on a fairly recent desktop computer.
( about 1 rows a minute or 170 a second ) 


I thought the whole process would take a few seconds, so I wonder if 27
minutes is a reasonable time for this database creation.

As the database was being created, I noticed a journal file was in use
though I haven't worked out what that did, yet.


TIA


Owen