[sqlite] Performance of sparse sqlite tables

2012-11-04 Thread Dominguez Bonini, David
Hi,

I have an application where a table is filled with elements whose primary key 
is specified at insertion, and is actually a combination of several independent 
IDs. Example:  ItemID = (id0  32) + (id1  16) + (id2).
The range covered by each ID guarantees that their sum will never exceed the 64 
bit maximum size of an sqlite primary key. The advantage of this approach is 
that a function performing a SELECT statement can pre-compute the id that needs 
to be retrieved from the database. This is what I call a sparse table, meaning 
that the table will never have more than X items, but the primary key range is 
actually much bigger than X. Sorry if my definitions are not standard, SQL is 
not my native language :)

This scheme is used because IDs are usually inserted all at once in a single 
batch, and then they have to be regularly updated over a very long time. So, 
there are relatively few INSERTS and a LOT of UPDATES and SELECTS.

I'm wondering if the advantage in search speed obtained by this ID assignment 
scheme may somehow be offset by other factors like additional memory usage, 
less efficient inserts, etc. Can anyone offer counterarguments, or recommend a 
better scheme?

Regards

David

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance of sparse sqlite tables

2012-11-04 Thread Pavel Ivanov
I'd say generally speaking your way of storing data has no significant
downsides. There's just one but: if each row in your table stores pretty
significant amount of data (blobs, long text fields or just lots of
different fields) you'd better not make your ItemID INTEGER PRIMARY KEY.
Because SQLite stores all rows in the table in the order of rowids. So
every time you update your ItemID SQLite would have to move the whole row
to a new place. So for the case of big rows I'd suggest to make some other
column INTEGER PRIMARY KEY and add unique constraint to your ItemID. It
won't hurt your search speed (could make it faster actually) and will make
updates faster. Although it will come with a larger size of the database
file.

Pavel


On Sun, Nov 4, 2012 at 9:26 AM, Dominguez Bonini, David 
david.doming...@ikusi.com wrote:

 Hi,

 I have an application where a table is filled with elements whose primary
 key is specified at insertion, and is actually a combination of several
 independent IDs. Example:  ItemID = (id0  32) + (id1  16) + (id2).
 The range covered by each ID guarantees that their sum will never exceed
 the 64 bit maximum size of an sqlite primary key. The advantage of this
 approach is that a function performing a SELECT statement can pre-compute
 the id that needs to be retrieved from the database. This is what I call a
 sparse table, meaning that the table will never have more than X items, but
 the primary key range is actually much bigger than X. Sorry if my
 definitions are not standard, SQL is not my native language :)

 This scheme is used because IDs are usually inserted all at once in a
 single batch, and then they have to be regularly updated over a very long
 time. So, there are relatively few INSERTS and a LOT of UPDATES and SELECTS.

 I'm wondering if the advantage in search speed obtained by this ID
 assignment scheme may somehow be offset by other factors like additional
 memory usage, less efficient inserts, etc. Can anyone offer
 counterarguments, or recommend a better scheme?

 Regards

 David

 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance of sparse sqlite tables

2012-11-04 Thread Simon Slavin

On 4 Nov 2012, at 5:26pm, Dominguez Bonini, David david.doming...@ikusi.com 
wrote:

 I have an application where a table is filled with elements whose primary key 
 is specified at insertion, and is actually a combination of several 
 independent IDs. Example:  ItemID = (id0  32) + (id1  16) + (id2).
 The range covered by each ID guarantees that their sum will never exceed the 
 64 bit maximum size of an sqlite primary key. The advantage of this approach 
 is that a function performing a SELECT statement can pre-compute the id that 
 needs to be retrieved from the database. This is what I call a sparse table, 
 meaning that the table will never have more than X items, but the primary key 
 range is actually much bigger than X. Sorry if my definitions are not 
 standard, SQL is not my native language :)

You have invented a 'hashing function' which is the normal term used for that 
kind of calculation.  If this renders unique ItemIDs for you, and you can 
easily calculate the hash for any item, then it's probably not slowing you down 
much.

 This scheme is used because IDs are usually inserted all at once in a single 
 batch, and then they have to be regularly updated over a very long time. So, 
 there are relatively few INSERTS and a LOT of UPDATES and SELECTS.
 
 I'm wondering if the advantage in search speed obtained by this ID assignment 
 scheme may somehow be offset by other factors like additional memory usage, 
 less efficient inserts, etc. Can anyone offer counterarguments, or recommend 
 a better scheme?

There are so many considerations involved in this, including your file system, 
the amount of memory you have spare, your OS, and the total width of your 
table, that we couldn't guess the result.  Your only way to find out is to do 
the actual test.  However, I suspect that any speed gain or loss involved in a 
scheme like this would be trivial when measured against the overall performance 
of your software.  SQLite is itself extremely fast even if you just store your 
data without any special processing.  You can probably get better return on 
your time by spending it thinking about a better user-interface, or some other 
element of your application.

If, on the other hand, you have tested and found that your application is just 
a tiny bit too slow to be usable, and that profiling is showing that most of 
the time is spend inside SQLite functions, that's a different matter.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users