Re: [sqlite] SQLite performance on select/update with 11GB database file

2009-04-09 Thread Kees Nuyt
On Wed, 08 Apr 2009 23:17:02 +0200, Florian Nigsch
 wrote:

>Hi all,
>
>I have been trying to implement a couple of things in SQLite because I  
>only need it for myself (so no concurrency issues here).
>
>I am working on Arch Linux (uname: 2.6.28-ARCH #1 SMP PREEMPT Sun Mar  
>8 10:18:28 UTC 2009 i686 Intel(R) Xeon(TM) CPU 3.40GHz). I have a flat  
>text file that I want to put into SQLite to be able to query it and  
>also update fields it required. This raw text file has 7.2GB, roughly  
>11 million entries and 60 fields. (Perhaps not the best design, but  
>that shall not be the issue here)

We can't ignore that. If you have 11 million entries and 60
fields, I bet there is a lot of redundant data in the
database, which makes it big.
Normalization can make a lot of difference here.

>Getting it into SQLite was not as easy as I anticipated. At first, I  
>used the execute() method for every insert, and committed every 1  
>rows. This run didn't finish overnight.
>
>The a commit every 100,000 rows: at around 2.5 million entries it  
>slowed down so much that it was unacceptable. I used PRAGMA  
>journal_mode = OFF which improved a bit, but not enough.
>
>The final rescue was to use executemany() for every 100,000 records  
>combined with a commit(). That put the whole file in the DB in approx  
>17 min. Creation of indices where required took another 40 minutes.  
>The final database file has roughly 11GB.
>
>Now, I have 3 spare fields in the table that I want to populate with  
>data from another file. I loop over the file in question, from there I  
>get the ID of the record to update in the table and the info to  
>populate the empty field with. Again, I used executemany() every  
>100,000 records, assuming that it would work. But no.
>
>Even when I use executemany() every 10,000 rows, these 10,000 updates  
>take around 5 minutes or so (I haven't properly timed it). That is  
>also with PRAGMA journal_mode = OFF.

I don't recognize executemany() or commit(). They are not
part of the sqlite3_* API. Can I assume they are your own
wrapper functions? 

You did tell you commit() your transactions, but do you also
BEGIN them? (I apologize for asking the obvious, but someone
has to ask it anyway).

>When I just get the IDs of the entries that I need to update, look  
>them up in the table (select a,b,c from table where id in (...)), and  
>retrieve them with fetchall() and do nothing with them (no printing,  
>etc.) then this works fine for the first 10,000 records. After that,  
>again, it get's so slow that it's basically unusable. The ID field  
>that I use for the lookups has an index on it. The fields that I am  
>updating do not have indices on them. It's just text fields that are  
>being populated with strings up to approx 150 characters.
>
>Why are the retrievals that slow, and why are they getting slower the  
>further I go? And why are the updates so slow?
>
>Is it because the database file is too big?
> 
>Any help or suggestions would be highly appreciated!
>
>Regards,
>
>Flo

-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite performance on select/update with 11GB database file

2009-04-08 Thread Florian Nigsch
Hi all,

I have been trying to implement a couple of things in SQLite because I  
only need it for myself (so no concurrency issues here).

I am working on Arch Linux (uname: 2.6.28-ARCH #1 SMP PREEMPT Sun Mar  
8 10:18:28 UTC 2009 i686 Intel(R) Xeon(TM) CPU 3.40GHz). I have a flat  
text file that I want to put into SQLite to be able to query it and  
also update fields it required. This raw text file has 7.2GB, roughly  
11 million entries and 60 fields. (Perhaps not the best design, but  
that shall not be the issue here)

Getting it into SQLite was not as easy as I anticipated. At first, I  
used the execute() method for every insert, and committed every 1  
rows. This run didn't finish overnight.

The a commit every 100,000 rows: at around 2.5 million entries it  
slowed down so much that it was unacceptable. I used PRAGMA  
journal_mode = OFF which improved a bit, but not enough.

The final rescue was to use executemany() for every 100,000 records  
combined with a commit(). That put the whole file in the DB in approx  
17 min. Creation of indices where required took another 40 minutes.  
The final database file has roughly 11GB.

Now, I have 3 spare fields in the table that I want to populate with  
data from another file. I loop over the file in question, from there I  
get the ID of the record to update in the table and the info to  
populate the empty field with. Again, I used executemany() every  
100,000 records, assuming that it would work. But no.

Even when I use executemany() every 10,000 rows, these 10,000 updates  
take around 5 minutes or so (I haven't properly timed it). That is  
also with PRAGMA journal_mode = OFF.

When I just get the IDs of the entries that I need to update, look  
them up in the table (select a,b,c from table where id in (...)), and  
retrieve them with fetchall() and do nothing with them (no printing,  
etc.) then this works fine for the first 10,000 records. After that,  
again, it get's so slow that it's basically unusable. The ID field  
that I use for the lookups has an index on it. The fields that I am  
updating do not have indices on them. It's just text fields that are  
being populated with strings up to approx 150 characters.

Why are the retrievals that slow, and why are they getting slower the  
further I go? And why are the updates so slow?

Is it because the database file is too big?

Any help or suggestions would be highly appreciated!

Regards,

Flo


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users