Re: [sqlite] Question about database design
On Feb 3, 2011, at 3:38 AM, Simon Slavin wrote: > SQLite creates some indexes the programmer doesn't specifically ask for: on > the rowid, on the primary key, and on any column declared as UNIQUE. Of > course, in a particular table all three of these might actually be the same > column, so it might need just one. But every table has at least one index. The ROWID of a table doesn't have an "index," so it isn't really fair to say that every table has at least one index. At least, not when using the word "index" to mean a secondary data structure in the database that must be updated and maintained in conjunction with the table data structure. Both tables and indexes are stored within the SQLite file as a tree structure. The raw table data is stored in a tree structure that happens to be sorted by ROWID. For every index, SQLite creates an addition tree structure that is sorted by whatever columns make up the index. The structures are extremely similar, since they serve the same purpose. Both explicit indexes (CREATE INDEX...), and implicit indexes (non-integer PRIMARY KEYs, UNIQUE constraints) create additional tree structures in the database. However, if no PK is defined, or if the PK is defined as an INTEGER PRIMARY KEY, then no external index is created. The only data structure is the table itself. This is also why INTEGER PRIMARY KEYs are so desirable over other PK types. Not only do they reduce database size, by using the inherent structure of the main table data store as their "index," they also tend to be about twice as fast at doing explicit row lookups (vs a traditional index), as only one tree structure needs to be searched to retrieve any column. That speed difference means the break-even point for indexed lookup vs table scan for a set of INTEGER PK values is closer to 15% to 20% of the rows, vs the traditional 5% to 10% rule of thumb that applies to standard indexes. -j > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On 3 Feb 2011, at 10:43am, Mihai Militaru wrote: > Nicolas Williams wrote: > >>> Any idea why pg does ok on these queries without the extra index - >>> Maybe they're created by default? SQLIte doesn't create any indexes >>> automatically on primary key fields or anything else, correct? >> >> No, it doesn't. Use EXPLAIN QUERY PLAN to see what SQLite3 is doing. > > Hmm SqLite does create persistent indices on UNIQUE - and consequently > PRIMARY - keys, doesn't it? SQLite creates some indexes the programmer doesn't specifically ask for: on the rowid, on the primary key, and on any column declared as UNIQUE. Of course, in a particular table all three of these might actually be the same column, so it might need just one. But every table has at least one index. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On Wed, 2 Feb 2011 18:59:48 -0600 Nicolas Williams wrote: > > Any idea why pg does ok on these queries without the extra index - > > Maybe they're created by default? SQLIte doesn't create any indexes > > automatically on primary key fields or anything else, correct? > > No, it doesn't. Use EXPLAIN QUERY PLAN to see what SQLite3 is doing. Hmm SqLite does create persistent indices on UNIQUE - and consequently PRIMARY - keys, doesn't it? -- Mihai Militaru ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/02/2011 11:48 AM, Bert Nelsen wrote: > Because I felt so stupid about these mostly empty columns taking so much > space, I tended to replace all the "phone" columns by a single column named > "customerPhone". > I stored the values into customerPhone like that: > > You want to know something hilarious? Your space saving mechanism uses *more* space than the empty columns would consume. By my calculations 150% additional space for the example above! Some databases have a storage scheme where if you declare a column "char(256)" then 256 bytes of storage are consumed even if you store nothing in that column. If you are used to that then empty columns would indeed add up to a lot. SQLite does not work that way. Look up its type affinity documentation (link below) to see. Every record has a header followed by the values for that record. The header entries indicates the type (and for blob/string the length) of the corresponding value. A record with 100 columns where every single one contains NULL or a zero length string would occupy 100 bytes of record header and zero bytes of record values. Note that an old school database would be substantially similar even if it recorded the type information for the table rather than the record since it would still need a per record way of indicating whether a value is null versus the declared type. (ie it would need at least one bit per value, so in this case would need around 16 bytes.) Using your example above, you have 6 columns. Storing them as 6 columns in SQLite with the two numbers would occupy 22 bytes for both the record header and 2 non-empty values. Storing it as one column with that hackery consumes 54 bytes. Heck your empty storage is 37 bytes compared to 6 empty SQLite columns being 6 bytes. The only time your scheme would actually save space is if all values were empty and you stored a zero length string in which case you'd use 1 byte for the column instead of 6 bytes for 6 empty ones. The important lesson here is you should always do measurements first to confirm your beliefs as they may not actually be stupid! And secondly as everyone else pointed out, you aren't the first person wanting to store lots of information in a database and there are best practises such as normalization that help address your needs if you seek them out. It is never stupid to ask! (Or read a book, or web sites etc) SQLite type information: http://en.wikipedia.org/wiki/Database_normalization SQLite file format - record: http://www.sqlite.org/fileformat.html#record_format Schema normalization: http://en.wikipedia.org/wiki/Database_normalization SQLite books: http://www.sqlite.org/books.html Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEUEARECAAYFAk1KCYcACgkQmOOfHg372QRC1gCfbaE7tCvNKFKNkq9N14tSTOzh S30Al3Tosxpwa4qPvzqXFxeNptxfwPs= =Jrag -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On Wed, Feb 02, 2011 at 04:45:16PM -0800, Jeff Rogers wrote: > Nicolas Williams wrote: > >On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote: > > > >>SQLite seems to do quite poorly performance-wise with fully-normalized > >>attribute tables like this, when you want to query against multiple > >>attributes. My timing comparisons with postgres show sqlite to be as > >>much as 10x-15x slower than pg. > >> > >>My timing code is at http://paste.tclers.tk/2346 > > > >You need an index on props(id) -- you always need an index on columns > >that form a foreign key. SQLite3 requires that if you want to cascade > >deletions. > > Ok, that caught me by surprise, but it improved things a lot. With > that index sqlite is now significantly faster than pg on 3 of the > tests, but still similar to or slower than pg on the remaining 2 - > in my code, "sx" and "sx3". (I can't be sure of the exact timings > because I can't run sqlite and the pg server on the same machine) > > Any idea why pg does ok on these queries without the extra index - > Maybe they're created by default? SQLIte doesn't create any indexes > automatically on primary key fields or anything else, correct? No, it doesn't. Use EXPLAIN QUERY PLAN to see what SQLite3 is doing. Also, see the followup about covering indexes. Also, run ANALYZE. Finally, you might denormalize somewhat by having a trigger to copy the obj.name column to the props table and add that column to the _end_ of the covering indexes. Then SQLite3 might be able to do your joins using a single index. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On 3 Feb 2011, at 12:45am, Jeff Rogers wrote: > Any idea why pg does ok on these queries without the extra index - > Maybe they're created by default? Both PostgreSQL and SQLite will make indexes on the fly if that's the most efficient way of scanning the table. However, PostgreSQL can keep the index around and reuse it any number of times, at the expense of using up disk space and memory to store the index. SQLite is better designed for small light systems and has to place minimal load on disk space and memory, so it doesn't keep indexes around between uses. What it comes down to is that if you use SQLite you have to work out what indexes are worthwhile for yourself whereas Postgres does some of the work for you but takes up more memory and disk space to do so. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
Nicolas Williams wrote: > On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote: > >> SQLite seems to do quite poorly performance-wise with fully-normalized >> attribute tables like this, when you want to query against multiple >> attributes. My timing comparisons with postgres show sqlite to be as >> much as 10x-15x slower than pg. >> >> My timing code is at http://paste.tclers.tk/2346 > > You need an index on props(id) -- you always need an index on columns > that form a foreign key. SQLite3 requires that if you want to cascade > deletions. Ok, that caught me by surprise, but it improved things a lot. With that index sqlite is now significantly faster than pg on 3 of the tests, but still similar to or slower than pg on the remaining 2 - in my code, "sx" and "sx3". (I can't be sure of the exact timings because I can't run sqlite and the pg server on the same machine) Any idea why pg does ok on these queries without the extra index - Maybe they're created by default? SQLIte doesn't create any indexes automatically on primary key fields or anything else, correct? > Without that index your joins will use full table scans. > > Also, it's better to use FROM table1 t1 JOIN table2 t2 ON t1.col1 = > t2.col2 than to put the join conditions in the WHERE clause, as that's > much easier to read. But I can see that you have JOINs using that > un-indexed column. I wrote this test code some time ago, I think originally against a database that didn't support JOIN syntax. I would use that were I rewriting it now. (I also don't remember exactly what conditions I was trying to exercise, but I think it was the case where two attributes each match a large set but the intersection of those sets is small) Thanks, -J > >> This is a synthetic test, but I ran across the issue in a real >> application. I'm not sure what else I can do do optimize the queries; >> using a denormalized table is the only thing that seems to help. > > Add the missing index. > > Nico ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On Wed, Feb 02, 2011 at 03:38:07PM -0800, Jim Morris wrote: > I'd probably move the analyze out of the loop. > > Since your joining on props.id a better index pind might be > create index pind on props (id, pnam) Yes, you probably want two covering or partially-covering indexes: CREATE INDEX pind ON props (id, pnam, pval); CREATE INDEX pind ON props (pval, pname, id); The first is for when you need to join on 'id' and want to extract attribute/values. The second is for when you want to search by attribute/values and need to get rowids back. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
I'd probably move the analyze out of the loop. Since your joining on props.id a better index pind might be create index pind on props (id, pnam) The name of column id in table props would be clearer as obj_id since it is not the id of the property but the id of the record in the obj table. On 2/2/2011 3:23 PM, Jeff Rogers wrote: > Andreas Kupries wrote: > >> It seems to me that you are looking for >> http://en.wikipedia.org/wiki/Database_normalization >> > SQLite seems to do quite poorly performance-wise with fully-normalized > attribute tables like this, when you want to query against multiple > attributes. My timing comparisons with postgres show sqlite to be as > much as 10x-15x slower than pg. > > My timing code is at http://paste.tclers.tk/2346 > > This is a synthetic test, but I ran across the issue in a real > application. I'm not sure what else I can do do optimize the queries; > using a denormalized table is the only thing that seems to help. > > -J > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote: > Andreas Kupries wrote: > > > It seems to me that you are looking for > > http://en.wikipedia.org/wiki/Database_normalization > > > > SQLite seems to do quite poorly performance-wise with fully-normalized > attribute tables like this, when you want to query against multiple > attributes. My timing comparisons with postgres show sqlite to be as > much as 10x-15x slower than pg. > > My timing code is at http://paste.tclers.tk/2346 You need an index on props(id) -- you always need an index on columns that form a foreign key. SQLite3 requires that if you want to cascade deletions. Without that index your joins will use full table scans. Also, it's better to use FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col2 than to put the join conditions in the WHERE clause, as that's much easier to read. But I can see that you have JOINs using that un-indexed column. > This is a synthetic test, but I ran across the issue in a real > application. I'm not sure what else I can do do optimize the queries; > using a denormalized table is the only thing that seems to help. Add the missing index. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
Andreas Kupries wrote: > It seems to me that you are looking for > http://en.wikipedia.org/wiki/Database_normalization > SQLite seems to do quite poorly performance-wise with fully-normalized attribute tables like this, when you want to query against multiple attributes. My timing comparisons with postgres show sqlite to be as much as 10x-15x slower than pg. My timing code is at http://paste.tclers.tk/2346 This is a synthetic test, but I ran across the issue in a real application. I'm not sure what else I can do do optimize the queries; using a denormalized table is the only thing that seems to help. -J ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On 2 Feb 2011, at 7:48pm, Bert Nelsen wrote: > customer_lastname > customer_firstname > customer_street > customer_PhonePrivate (will almost always stay empty) > customer_PhoneCompany > customer_PhoneMobile > customer_PhoneWAP (will almost always stay empty) > customer_Phone1 (will almost always stay empty) > customer_Phone2 (will almost always stay empty) > > Because I felt so stupid about these mostly empty columns taking so much > space, I tended to replace all the "phone" columns by a single column named > "customerPhone". > I stored the values into customerPhone like that: > > It makes more sense to create a property table for your rarely-used columns. Remove them from the main table and add another TABLE customer_Properties customer_Id property_name property_value So if you know two normally-empty phone numbers for a customer that would be two entries in the table. NO reason why you shouldn't use the property table for any field which would normally stay empty, for example customer_AddressBlling if that is rarely used. By the way, you did not mention a customer_id column for your main customer table. It is extremely important that you have one, and relate other tables to that table using that, rather than the name. Customers change their names (get married, get Knighted, etc.) and you don't want to have to race through the rest of your system changing primary keys. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On 2/2/2011 11:48 AM, Bert Nelsen wrote: > Hello! > > I have over 100 columns in a table. Most of the columns will stay empty. > Just an example: [example elided] > This is kind of an xml design, but it works and it takes up less space. > I "decompile" this weird expression at runtime to get the separate values > again. > > Can I get an opinion on it? > I have not found any good information on the number of columns and > relationship between their count and speed and diskspace used. It seems to me that you are looking for http://en.wikipedia.org/wiki/Database_normalization -- Andreas Kupries Senior Tcl Developer ActiveState, The Dynamic Language Experts P: 778.786.1122 F: 778.786.1133 andre...@activestate.com http://www.activestate.com Get insights on Open Source and Dynamic Languages at www.activestate.com/blog ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Question about database design
On 2/2/11 11:48 AM, "Bert Nelsen" wrote: > Because I felt so stupid about these mostly empty columns taking so much > space, I tended to replace all the "phone" columns by a single column named > "customerPhone". > I stored the values into customerPhone like that: > > > > This is kind of an xml design, but it works and it takes up less space. > I "decompile" this weird expression at runtime to get the separate values > again. > > Can I get an opinion on it? It'll work so long as you don't want to write queries that discriminate based on the kind of phone number, or on the phone number content. You might want to query customers by area code, for example, or get all customers with a cell phone number. A third approach would be to add an additional table with three columns: * customer_id * phone_number_type * phone_number Then JOIN that to your customer table. Saves space, and lets you do the two kinds of query I mention above. Will > I have not found any good information on the number of columns and > relationship between their count and speed and diskspace used. > > Thank you very much! > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Will Duquette -- william.h.duque...@jpl.nasa.gov Athena Development Lead -- Jet Propulsion Laboratory "It's amazing what you can do with the right tools." ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Question about database design
Hello! I have over 100 columns in a table. Most of the columns will stay empty. Just an example: customer_lastname customer_firstname customer_street customer_PhonePrivate (will almost always stay empty) customer_PhoneCompany customer_PhoneMobile customer_PhoneWAP (will almost always stay empty) customer_Phone1 (will almost always stay empty) customer_Phone2 (will almost always stay empty) Because I felt so stupid about these mostly empty columns taking so much space, I tended to replace all the "phone" columns by a single column named "customerPhone". I stored the values into customerPhone like that: This is kind of an xml design, but it works and it takes up less space. I "decompile" this weird expression at runtime to get the separate values again. Can I get an opinion on it? I have not found any good information on the number of columns and relationship between their count and speed and diskspace used. Thank you very much! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users