Re: [sqlite] Question about database design

2011-02-03 Thread Mihai Militaru
On Wed, 2 Feb 2011 18:59:48 -0600
Nicolas Williams nicolas.willi...@oracle.com wrote:

  Any idea why pg does ok on these queries without the extra index -
  Maybe they're created by default?  SQLIte doesn't create any indexes
  automatically on primary key fields or anything else, correct?
 
 No, it doesn't.  Use EXPLAIN QUERY PLAN to see what SQLite3 is doing.

Hmm SqLite does create persistent indices on UNIQUE - and consequently
PRIMARY - keys, doesn't it?

-- 
Mihai Militaru mihai.milit...@gmx.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-03 Thread Simon Slavin

On 3 Feb 2011, at 10:43am, Mihai Militaru wrote:

 Nicolas Williams nicolas.willi...@oracle.com wrote:
 
 Any idea why pg does ok on these queries without the extra index -
 Maybe they're created by default?  SQLIte doesn't create any indexes
 automatically on primary key fields or anything else, correct?
 
 No, it doesn't.  Use EXPLAIN QUERY PLAN to see what SQLite3 is doing.
 
 Hmm SqLite does create persistent indices on UNIQUE - and consequently
 PRIMARY - keys, doesn't it?

SQLite creates some indexes the programmer doesn't specifically ask for: on the 
rowid, on the primary key, and on any column declared as UNIQUE.  Of course, in 
a particular table all three of these might actually be the same column, so it 
might need just one.  But every table has at least one index.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-03 Thread Jay Kreibich
On Feb 3, 2011, at 3:38 AM, Simon Slavin slav...@bigfraud.org wrote:

 SQLite creates some indexes the programmer doesn't specifically ask for: on 
 the rowid, on the primary key, and on any column declared as UNIQUE.  Of 
 course, in a particular table all three of these might actually be the same 
 column, so it might need just one.  But every table has at least one index.

The ROWID of a table doesn't have an index, so it isn't really fair to say 
that every table has at least one index.  At least, not when using the word 
index to mean a secondary data structure in the database that must be updated 
and maintained in conjunction with the table data structure.

Both tables and indexes are stored within the SQLite file as a tree structure.  
The raw table data is stored in a tree structure that happens to be sorted by 
ROWID.  For every index, SQLite creates an addition tree structure that is 
sorted by whatever columns make up the index.  The structures are extremely 
similar, since they serve the same purpose.

Both explicit indexes (CREATE INDEX...), and implicit indexes (non-integer 
PRIMARY KEYs, UNIQUE constraints) create additional tree structures in the 
database.  However, if no PK is defined, or if the PK is defined as an INTEGER 
PRIMARY KEY, then no external index is created.  The only data structure is the 
table itself.

This is also why INTEGER PRIMARY KEYs are so desirable over other PK types.  
Not only do they reduce database size, by using the inherent structure of the 
main table data store as their index, they also tend to be  about twice as 
fast at doing explicit row lookups (vs a traditional index), as only one tree 
structure needs to be searched to retrieve any column.  That speed difference 
means the break-even point for indexed lookup vs table scan for a set of  
INTEGER PK values is closer to 15% to 20% of the rows, vs the traditional 5% to 
10% rule of thumb that applies to standard indexes.

 -j



 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Question about database design

2011-02-02 Thread Bert Nelsen
Hello!

I have over 100 columns in a table. Most of the columns will stay empty.
Just an example:

customer_lastname
customer_firstname
customer_street
customer_PhonePrivate (will almost always stay empty)
customer_PhoneCompany
customer_PhoneMobile
customer_PhoneWAP (will almost always stay empty)
customer_Phone1 (will almost always stay empty)
customer_Phone2 (will almost always stay empty)

Because I felt so stupid about these mostly empty columns taking so much
space, I tended to replace all the phone columns by a single column named
customerPhone.
I stored the values into customerPhone like that:

cpp=cpc=cpm=93829343cpw=cp1=234928734cp2=

This is kind of an xml design, but it works and it takes up less space.
I decompile this weird expression at runtime to get the separate values
again.

Can I get an opinion on it?
I have not found any good information on the number of columns and
relationship between their count and speed and diskspace used.

Thank you very much!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Duquette, William H (318K)
On 2/2/11 11:48 AM, Bert Nelsen bert.nel...@googlemail.com wrote:

 Because I felt so stupid about these mostly empty columns taking so much
 space, I tended to replace all the phone columns by a single column named
 customerPhone.
 I stored the values into customerPhone like that:
 
 cpp=cpc=cpm=93829343cpw=cp1=234928734cp2=
 
 This is kind of an xml design, but it works and it takes up less space.
 I decompile this weird expression at runtime to get the separate values
 again.
 
 Can I get an opinion on it?

It'll work so long as you don't want to write queries that discriminate
based on the kind of phone number, or on the phone number content.  You
might want to query customers by area code, for example, or get all
customers with a cell phone number.

A third approach would be to add an additional table with three columns:

* customer_id
* phone_number_type
* phone_number

Then JOIN that to your customer table.  Saves space, and lets you do the
two kinds of query I mention above.

Will


 I have not found any good information on the number of columns and
 relationship between their count and speed and diskspace used.
 
 Thank you very much!
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

--
Will Duquette -- william.h.duque...@jpl.nasa.gov
Athena Development Lead -- Jet Propulsion Laboratory
It's amazing what you can do with the right tools.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Andreas Kupries
On 2/2/2011 11:48 AM, Bert Nelsen wrote:
 Hello!

 I have over 100 columns in a table. Most of the columns will stay empty.
 Just an example:
[example elided]

 This is kind of an xml design, but it works and it takes up less space.
 I decompile this weird expression at runtime to get the separate values
 again.

 Can I get an opinion on it?
 I have not found any good information on the number of columns and
 relationship between their count and speed and diskspace used.

It seems to me that you are looking for
http://en.wikipedia.org/wiki/Database_normalization

-- 
Andreas Kupries
Senior Tcl Developer
ActiveState, The Dynamic Language Experts

P: 778.786.1122
F: 778.786.1133
andre...@activestate.com
http://www.activestate.com
Get insights on Open Source and Dynamic Languages at www.activestate.com/blog
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Simon Slavin

On 2 Feb 2011, at 7:48pm, Bert Nelsen wrote:

 customer_lastname
 customer_firstname
 customer_street
 customer_PhonePrivate (will almost always stay empty)
 customer_PhoneCompany
 customer_PhoneMobile
 customer_PhoneWAP (will almost always stay empty)
 customer_Phone1 (will almost always stay empty)
 customer_Phone2 (will almost always stay empty)
 
 Because I felt so stupid about these mostly empty columns taking so much
 space, I tended to replace all the phone columns by a single column named
 customerPhone.
 I stored the values into customerPhone like that:
 
 cpp=cpc=cpm=93829343cpw=cp1=234928734cp2=

It makes more sense to create a property table for your rarely-used columns.  
Remove them from the main table and add another TABLE customer_Properties

customer_Id
property_name
property_value

So if you know two normally-empty phone numbers for a customer that would be 
two entries in the table.  NO reason why you shouldn't use the property table 
for any field which would normally stay empty, for example 
customer_AddressBlling if that is rarely used.

By the way, you did not mention a customer_id column for your main customer 
table.  It is extremely important that you have one, and relate other tables to 
that table using that, rather than the name.  Customers change their names (get 
married, get Knighted, etc.) and you don't want to have to race through the 
rest of your system changing primary keys.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Jeff Rogers
Andreas Kupries wrote:

 It seems to me that you are looking for
   http://en.wikipedia.org/wiki/Database_normalization


SQLite seems to do quite poorly performance-wise with fully-normalized 
attribute tables like this, when you want to query against multiple 
attributes.  My timing comparisons with postgres show sqlite to be as 
much as 10x-15x slower than pg.

My timing code is at http://paste.tclers.tk/2346

This is a synthetic test, but I ran across the issue in a real 
application.  I'm not sure what else I can do do optimize the queries; 
using a denormalized table is the only thing that seems to help.

-J
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Nicolas Williams
On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote:
 Andreas Kupries wrote:
 
  It seems to me that you are looking for
  http://en.wikipedia.org/wiki/Database_normalization
 
 
 SQLite seems to do quite poorly performance-wise with fully-normalized 
 attribute tables like this, when you want to query against multiple 
 attributes.  My timing comparisons with postgres show sqlite to be as 
 much as 10x-15x slower than pg.
 
 My timing code is at http://paste.tclers.tk/2346

You need an index on props(id) -- you always need an index on columns
that form a foreign key.  SQLite3 requires that if you want to cascade
deletions.

Without that index your joins will use full table scans.

Also, it's better to use FROM table1 t1 JOIN table2 t2 ON t1.col1 =
t2.col2 than to put the join conditions in the WHERE clause, as that's
much easier to read.  But I can see that you have JOINs using that
un-indexed column.

 This is a synthetic test, but I ran across the issue in a real 
 application.  I'm not sure what else I can do do optimize the queries; 
 using a denormalized table is the only thing that seems to help.

Add the missing index.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Jim Morris
I'd probably move the analyze out of the loop.

Since your joining on props.id a better index pind might be
create index pind on props (id, pnam)

The name of column id in table props would be clearer as obj_id since it 
is not the id of the property but the id of the record in the obj table.

On 2/2/2011 3:23 PM, Jeff Rogers wrote:
 Andreas Kupries wrote:

 It seems to me that you are looking for
  http://en.wikipedia.org/wiki/Database_normalization

 SQLite seems to do quite poorly performance-wise with fully-normalized
 attribute tables like this, when you want to query against multiple
 attributes.  My timing comparisons with postgres show sqlite to be as
 much as 10x-15x slower than pg.

 My timing code is at http://paste.tclers.tk/2346

 This is a synthetic test, but I ran across the issue in a real
 application.  I'm not sure what else I can do do optimize the queries;
 using a denormalized table is the only thing that seems to help.

 -J
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Nicolas Williams
On Wed, Feb 02, 2011 at 03:38:07PM -0800, Jim Morris wrote:
 I'd probably move the analyze out of the loop.
 
 Since your joining on props.id a better index pind might be
 create index pind on props (id, pnam)

Yes, you probably want two covering or partially-covering indexes:

CREATE INDEX pind ON props (id, pnam, pval);
CREATE INDEX pind ON props (pval, pname, id);

The first is for when you need to join on 'id' and want to extract
attribute/values.  The second is for when you want to search by
attribute/values and need to get rowids back.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Jeff Rogers
Nicolas Williams wrote:
 On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote:
 
 SQLite seems to do quite poorly performance-wise with fully-normalized
 attribute tables like this, when you want to query against multiple
 attributes.  My timing comparisons with postgres show sqlite to be as
 much as 10x-15x slower than pg.

 My timing code is at http://paste.tclers.tk/2346

 You need an index on props(id) -- you always need an index on columns
 that form a foreign key.  SQLite3 requires that if you want to cascade
 deletions.

Ok, that caught me by surprise, but it improved things a lot.  With that 
index sqlite is now significantly faster than pg on 3 of the tests, but 
still similar to or slower than pg on the remaining 2 - in my code, sx 
and sx3.   (I can't be sure of the exact timings because I can't run 
sqlite and the pg server on the same machine)

Any idea why pg does ok on these queries without the extra index - 
Maybe they're created by default?  SQLIte doesn't create any indexes 
automatically on primary key fields or anything else, correct?

 Without that index your joins will use full table scans.

 Also, it's better to use FROM table1 t1 JOIN table2 t2 ON t1.col1 =
 t2.col2 than to put the join conditions in the WHERE clause, as that's
 much easier to read.  But I can see that you have JOINs using that
 un-indexed column.

I wrote this test code some time ago, I think originally against a 
database that didn't support JOIN syntax.  I would use that were I 
rewriting it now.  (I also don't remember exactly what conditions I was 
trying to exercise, but I think it was the case where two attributes 
each match a large set but the intersection of those sets is small)

Thanks,
-J


 This is a synthetic test, but I ran across the issue in a real
 application.  I'm not sure what else I can do do optimize the queries;
 using a denormalized table is the only thing that seems to help.

 Add the missing index.

 Nico

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Simon Slavin

On 3 Feb 2011, at 12:45am, Jeff Rogers wrote:

 Any idea why pg does ok on these queries without the extra index - 
 Maybe they're created by default?

Both PostgreSQL and SQLite will make indexes on the fly if that's the most 
efficient way of scanning the table.  However, PostgreSQL can keep the index 
around and reuse it any number of times, at the expense of using up disk space 
and memory to store the index.  SQLite is better designed for small light 
systems and has to place minimal load on disk space and memory, so it doesn't 
keep indexes around between uses.

What it comes down to is that if you use SQLite you have to work out what 
indexes are worthwhile for yourself whereas Postgres does some of the work for 
you but takes up more memory and disk space to do so.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Nicolas Williams
On Wed, Feb 02, 2011 at 04:45:16PM -0800, Jeff Rogers wrote:
 Nicolas Williams wrote:
 On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote:
 
 SQLite seems to do quite poorly performance-wise with fully-normalized
 attribute tables like this, when you want to query against multiple
 attributes.  My timing comparisons with postgres show sqlite to be as
 much as 10x-15x slower than pg.
 
 My timing code is at http://paste.tclers.tk/2346
 
 You need an index on props(id) -- you always need an index on columns
 that form a foreign key.  SQLite3 requires that if you want to cascade
 deletions.
 
 Ok, that caught me by surprise, but it improved things a lot.  With
 that index sqlite is now significantly faster than pg on 3 of the
 tests, but still similar to or slower than pg on the remaining 2 -
 in my code, sx and sx3.   (I can't be sure of the exact timings
 because I can't run sqlite and the pg server on the same machine)
 
 Any idea why pg does ok on these queries without the extra index -
 Maybe they're created by default?  SQLIte doesn't create any indexes
 automatically on primary key fields or anything else, correct?

No, it doesn't.  Use EXPLAIN QUERY PLAN to see what SQLite3 is doing.

Also, see the followup about covering indexes.  Also, run ANALYZE.

Finally, you might denormalize somewhat by having a trigger to copy the
obj.name column to the props table and add that column to the _end_ of
the covering indexes.  Then SQLite3 might be able to do your joins using
a single index.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/02/2011 11:48 AM, Bert Nelsen wrote:
 Because I felt so stupid about these mostly empty columns taking so much
 space, I tended to replace all the phone columns by a single column named
 customerPhone.
 I stored the values into customerPhone like that:
 
 cpp=cpc=cpm=93829343cpw=cp1=234928734cp2=

You want to know something hilarious?  Your space saving mechanism uses
*more* space than the empty columns would consume.  By my calculations 150%
additional space for the example above!

Some databases have a storage scheme where if you declare a column
char(256) then 256 bytes of storage are consumed even if you store nothing
in that column.  If you are used to that then empty columns would indeed add
up to a lot.

SQLite does not work that way.  Look up its type affinity documentation
(link below) to see.  Every record has a header followed by the values for
that record.  The header entries indicates the type (and for blob/string the
length) of the corresponding value.

A record with 100 columns where every single one contains NULL or a zero
length string would occupy 100 bytes of record header and zero bytes of
record values.  Note that an old school database would be substantially
similar even if it recorded the type information for the table rather than
the record since it would still need a per record way of indicating whether
a value is null versus the declared type.  (ie it would need at least one
bit per value, so in this case would need around 16 bytes.)

Using your example above, you have 6 columns.  Storing them as 6 columns in
SQLite with the two numbers would occupy 22 bytes for both the record header
and 2 non-empty values.

Storing it as one column with that hackery consumes 54 bytes.  Heck your
empty storage is 37 bytes compared to 6 empty SQLite columns being 6 bytes.
 The only time your scheme would actually save space is if all values were
empty and you stored a zero length string in which case you'd use 1 byte for
the column instead of 6 bytes for 6 empty ones.

The important lesson here is you should always do measurements first to
confirm your beliefs as they may not actually be stupid!  And secondly as
everyone else pointed out, you aren't the first person wanting to store lots
of information in a database and there are best practises such as
normalization that help address your needs if you seek them out.  It is
never stupid to ask!  (Or read a book, or web sites etc)

SQLite type information:

  http://en.wikipedia.org/wiki/Database_normalization

SQLite file format - record:

  http://www.sqlite.org/fileformat.html#record_format

Schema normalization:

  http://en.wikipedia.org/wiki/Database_normalization

SQLite books:

  http://www.sqlite.org/books.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEUEARECAAYFAk1KCYcACgkQmOOfHg372QRC1gCfbaE7tCvNKFKNkq9N14tSTOzh
S30Al3Tosxpwa4qPvzqXFxeNptxfwPs=
=Jrag
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users