[sqlite] Performance of sparse sqlite tables
Hi, I have an application where a table is filled with elements whose primary key is specified at insertion, and is actually a combination of several independent IDs. Example: ItemID = (id0 32) + (id1 16) + (id2). The range covered by each ID guarantees that their sum will never exceed the 64 bit maximum size of an sqlite primary key. The advantage of this approach is that a function performing a SELECT statement can pre-compute the id that needs to be retrieved from the database. This is what I call a sparse table, meaning that the table will never have more than X items, but the primary key range is actually much bigger than X. Sorry if my definitions are not standard, SQL is not my native language :) This scheme is used because IDs are usually inserted all at once in a single batch, and then they have to be regularly updated over a very long time. So, there are relatively few INSERTS and a LOT of UPDATES and SELECTS. I'm wondering if the advantage in search speed obtained by this ID assignment scheme may somehow be offset by other factors like additional memory usage, less efficient inserts, etc. Can anyone offer counterarguments, or recommend a better scheme? Regards David ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance of sparse sqlite tables
I'd say generally speaking your way of storing data has no significant downsides. There's just one but: if each row in your table stores pretty significant amount of data (blobs, long text fields or just lots of different fields) you'd better not make your ItemID INTEGER PRIMARY KEY. Because SQLite stores all rows in the table in the order of rowids. So every time you update your ItemID SQLite would have to move the whole row to a new place. So for the case of big rows I'd suggest to make some other column INTEGER PRIMARY KEY and add unique constraint to your ItemID. It won't hurt your search speed (could make it faster actually) and will make updates faster. Although it will come with a larger size of the database file. Pavel On Sun, Nov 4, 2012 at 9:26 AM, Dominguez Bonini, David david.doming...@ikusi.com wrote: Hi, I have an application where a table is filled with elements whose primary key is specified at insertion, and is actually a combination of several independent IDs. Example: ItemID = (id0 32) + (id1 16) + (id2). The range covered by each ID guarantees that their sum will never exceed the 64 bit maximum size of an sqlite primary key. The advantage of this approach is that a function performing a SELECT statement can pre-compute the id that needs to be retrieved from the database. This is what I call a sparse table, meaning that the table will never have more than X items, but the primary key range is actually much bigger than X. Sorry if my definitions are not standard, SQL is not my native language :) This scheme is used because IDs are usually inserted all at once in a single batch, and then they have to be regularly updated over a very long time. So, there are relatively few INSERTS and a LOT of UPDATES and SELECTS. I'm wondering if the advantage in search speed obtained by this ID assignment scheme may somehow be offset by other factors like additional memory usage, less efficient inserts, etc. Can anyone offer counterarguments, or recommend a better scheme? Regards David ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance of sparse sqlite tables
On 4 Nov 2012, at 5:26pm, Dominguez Bonini, David david.doming...@ikusi.com wrote: I have an application where a table is filled with elements whose primary key is specified at insertion, and is actually a combination of several independent IDs. Example: ItemID = (id0 32) + (id1 16) + (id2). The range covered by each ID guarantees that their sum will never exceed the 64 bit maximum size of an sqlite primary key. The advantage of this approach is that a function performing a SELECT statement can pre-compute the id that needs to be retrieved from the database. This is what I call a sparse table, meaning that the table will never have more than X items, but the primary key range is actually much bigger than X. Sorry if my definitions are not standard, SQL is not my native language :) You have invented a 'hashing function' which is the normal term used for that kind of calculation. If this renders unique ItemIDs for you, and you can easily calculate the hash for any item, then it's probably not slowing you down much. This scheme is used because IDs are usually inserted all at once in a single batch, and then they have to be regularly updated over a very long time. So, there are relatively few INSERTS and a LOT of UPDATES and SELECTS. I'm wondering if the advantage in search speed obtained by this ID assignment scheme may somehow be offset by other factors like additional memory usage, less efficient inserts, etc. Can anyone offer counterarguments, or recommend a better scheme? There are so many considerations involved in this, including your file system, the amount of memory you have spare, your OS, and the total width of your table, that we couldn't guess the result. Your only way to find out is to do the actual test. However, I suspect that any speed gain or loss involved in a scheme like this would be trivial when measured against the overall performance of your software. SQLite is itself extremely fast even if you just store your data without any special processing. You can probably get better return on your time by spending it thinking about a better user-interface, or some other element of your application. If, on the other hand, you have tested and found that your application is just a tiny bit too slow to be usable, and that profiling is showing that most of the time is spend inside SQLite functions, that's a different matter. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users