[sqlite] BLOBs' affect on DB performance

2010-03-01 Thread Paul Vercellotti

Hi there,

I'm wondering how larger BLOBs in a database affect performance of accessing 
the non-blob data.   We've got a database with potentially a few million 
records in the main tables (of just strings and ints), but joined to that data 
set we want to store up to thousands (maybe 75000 max) of blobs ranging in size 
from 75kB - 4MB. If it comes down to it, we can use the filesystem for these, 
but we'd prefer to store them in a database if it didn't bog things down 
(cleaner to manage implementation-wise and user-experience-wise).Now I'm 
guessing that storing all those blobs will slow down access to the main tables 
(assuming records are added gradually - most without associated blobs, some 
with), because records would be spread out over many more pages (more seeking / 
disk access) - is that correct?Would performance likely be better if the 
blob table were stored in a separate database file?   Also, is it possible to 
adjust page size on a per-database basis
 (I'm thinking larger pages for the blob database would be better, right?)?  
Any other suggestions to do this efficiently?

Sorry if this is answered elsewhere; when searching, most of the BLOB 
performance posts I've seen are about storing / retrieving the blobs.  We're 
not so worried about the time to retrieve / store the blobs as much as time to 
access the main tables.

Thanks!
Paul
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BLOBs' affect on DB performance

2010-03-02 Thread Max Vlasov
On Tue, Mar 2, 2010 at 10:54 AM, Paul Vercellotti wrote:

> Now I'm guessing that storing all those blobs will slow down access to the
> main tables (assuming records are added gradually - most without associated
> blobs, some with), because records would be spread out over many more pages
> (more seeking / disk access) - is that correct?
>

It depends on your operations. For example querying simple SELECT rowid from
table actually means reading all the pages with the table data (but not the
overflow pages when a record does not fit in one page). And for such query
lowering the size of the record will lead to fewer pages, lower
fragmentation so faster access/reading. But everything changes if your query
involves indexes, since starting this your performance will depend not on
the size of the original record, but the size of the "index" record (the
fields that is listed in CREATE INDEX). This one's because the sqlite index
actually contains only the data it needs + rowid and to effectively filter
something related to index data sqlite does not need the main table data.

Ironically that post and tests allowed me to find out that creating separate
index on rowid alias (one that has something like [Id] Primary Key
Autoincrement in CREATE TABLE) makes sense and the SELECT I mentioned in the
beginning being changed to SELECT Id FROM Table Order By Id started to work
faster and the time no longer depended on the size of the record. This was
possible thanks to this new index that was absolute wasting of space
(duplicating to the primary index), but gaining advantages from the facts
that it was effectively packed in much fewer pages.

So, my suggestion for you is to analyze your scheme and future queries and
if expensive parts of them involves just indexes (based on anything but your
blobs) and these blobs are accessed on some final stage of the query, go
with blobs inside sqlite base.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BLOBs' affect on DB performance

2010-03-02 Thread Teg
Hello Paul,

My experiences with blobs suggests it's better to keep them in a
different DB file.  My uses sounded very similar to yours, tables of
normal data interleaved with blob inserts. The physical process of
having to move from page to page seems to be the bottleneck, not
Sqlite itself. I was working with multi-gig DB files so, this might
not be as true for smaller files.

C


Tuesday, March 2, 2010, 2:54:06 AM, you wrote:

PV> Hi there,

PV> I'm wondering how larger BLOBs in a database affect performance
PV> of accessing the non-blob data.   We've got a database with
PV> potentially a few million records in the main tables (of just
PV> strings and ints), but joined to that data set we want to store up
PV> to thousands (maybe 75000 max) of blobs ranging in size from 75kB
PV> - 4MB. If it comes down to it, we can use the filesystem for
PV> these, but we'd prefer to store them in a database if it didn't
PV> bog things down (cleaner to manage implementation-wise and
PV> user-experience-wise).Now I'm guessing that storing all those
PV> blobs will slow down access to the main tables (assuming records
PV> are added gradually - most without associated blobs, some with),
PV> because records would be spread out over many more pages (more
PV> seeking / disk access) - is that correct?Would performance
PV> likely be better if the blob table were stored in a separate
PV> database file?   Also, is it possible to adjust page size on a per-database 
basis
PV>  (I'm thinking larger pages for the blob database would be
PV> better, right?)?  Any other suggestions to do this efficiently?

PV> Sorry if this is answered elsewhere; when searching, most of the
PV> BLOB performance posts I've seen are about storing / retrieving
PV> the blobs.  We're not so worried about the time to retrieve /
PV> store the blobs as much as time to access the main tables.

PV> Thanks!
PV> Paul
PV> ___
PV> sqlite-users mailing list
PV> sqlite-users@sqlite.org
PV> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards,
 Tegmailto:t...@djii.com

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users