Re: [sqlite] Speed of sorting - diff between integer and string

2011-10-12 Thread Shorty

Thanks a bunch for all the great info and the benchmark test !!!

Also thanks for explaining the term normalization - as you can probably tell 
I am a newbie, and it is hard to look stuff up without knowing the right 
terms.  I spent hours looking for a technique to split results into pages, 
and then stumbled on the term "pagination", which really cracked open the 
door to finding great information that other programmers had posted.


I should have included the size of the database and other info, so here it 
is, incase that makes a difference.

--
Using the same grocery store example, the database I am going to create is 
probably going to peak out at 5000 grocery items, but may get as large as 
7000, but doubt any larger than that.


Each of those items is going to fall into one category, with only 10 
different categories (like fruits, vegetables, canned goods, boxed 
goods,...)


Then I am creating a second database that is like brands of the items. 
Such as Dole bananas, Delmonte bananas, John Doe's organic bananas etc. 
That database will have about 15,000 items in it, each one relating to just 
one of the grocery store items.


I am running this with php 5 and the sqlite that is included with php 5, on 
a web server that is the shared type.The website is not a storefront, it 
is a freely available database I am building that anyone in the public can 
access without any type of login.  I am expecting about 5,000 views a day on 
the page that returns the search query that I am asking about.


Thanks
Shorty 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] : Speed of sorting - diff between integer and string

2011-10-12 Thread Black, Michael (IS)
To answer your question of which is fastersee the benchmark below...there's 
no difference at 100,000 records.  The index creation is a bit faster for 
integer.
But for your data you won't be doing many inserts or updates on items...more on 
prices so the grocery_type_index won't get touched very much.


#include 
main()
{
int i;
for(i=0;i<10;++i) {
char sql[4096];
char *type;
if ((i%5)==0) type="fruit";
if ((i%5)==1) type="vegetable";
if ((i%5)==2) type="meat";
if ((i%5)==3) type="sundrie";
if ((i%5)==4) type="bakery";
sprintf(sql,"insert into store 
values(%d,'%s',155);\n",i%5,type,"testing"); // for int grocery type
//sprintf(sql,"insert into store 
values('%s','%s',155);\n",type,"testing"); // for string grocery type
printf("%s",sql);
}
}


sqlite> create table store (grocery_type text, description text,price int);
sqlite> begin;
sqlite> .read data1.sql
sqlite> commit;
sqlite> .timer on
sqlite> create index grocery_type_index on store(grocery_type);
CPU Time: user 0.247962 sys 0.005999
sqlite> select count(*) from store where grocery_type='fruit';
2
CPU Time: user 0.002999 sys 0.00
sqlite> select count(*) from store where grocery_type='vegetable';
2
CPU Time: user 0.003000 sys 0.00
sqlite> select count(*) from store where grocery_type='meat';
2
CPU Time: user 0.003000 sys 0.00
sqlite> select count(*) from store where grocery_type='meat';
2
CPU Time: user 0.003000 sys 0.00
sqlite> select count(*) from store where grocery_type='sundrie';
2
CPU Time: user 0.002999 sys 0.00

Now do it with integer grocery_type
sqlite> create table store (grocery_type int, description text,price int);
sqlite> begin;
sqlite> .read data2.sql
sqlite> commit;
sqlite> .timer on
sqlite> create index grocery_type_index on store(grocery_type);
CPU Time: user 0.197970 sys 0.003000
sqlite> select count(*) from store where grocery_type='vegetable';
0
CPU Time: user 0.00 sys 0.00
sqlite> select count(*) from store where grocery_type=0;
2
CPU Time: user 0.002999 sys 0.00
sqlite> select count(*) from store where grocery_type=1;
2
CPU Time: user 0.003000 sys 0.00
sqlite> select count(*) from store where grocery_type=2;
2
CPU Time: user 0.002999 sys 0.00
sqlite> select count(*) from store where grocery_type=3;
2
CPU Time: user 0.003000 sys 0.00
sqlite> select count(*) from store where grocery_type=4;
2
CPU Time: user 0.003000 sys 0.00


Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Shorty [sho...@shortypen.com]
Sent: Tuesday, October 11, 2011 8:31 PM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Speed of sorting - diff between integer and string


I am building a database that is going to be used to produce a list of items
on my website.   The list is created depending on how the user wants the
list sorted. For example, lets say I am going to list groceries that are
available in my store.   Users will typically select the type of grocery
they want to see, like list all the fruits, or a list all vegatables.

Here is an example table:

COLUMNS:
grocery_type, description, price

ROWS:
fruit, apple, 1.23
fruit, banana, 5.35
vegetable, carrot,  1.55
vegetable, spinach, 6.85

-- HERE IS MY QUESTION: --
Is faster for the sqlite database to have the grocery_type as a string or
integer?
Or is the speed difference so small it doesn't matter?

Like instead of having "fruit" in the grocery_type column, I could have it
as a "1" type, vegetables would be "2", and then translate that just before
spitting out the table to my web page.

Thanks -
Shorty

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed of sorting - diff between integer and string

2011-10-12 Thread Petite Abeille

On Oct 12, 2011, at 3:31 AM, Shorty wrote:

> Or is the speed difference so small it doesn't matter?

If you have a trivial amount of data, then it doesn't matter much either way. 
Anything goes when you have no data :)

But in general, do normalize your data and use foreign keys. There are numerous 
benefits in doing so. 

> Like instead of having "fruit" in the grocery_type column, I could have it as 
> a "1" type, vegetables would be "2", and then translate that just before 
> spitting out the table to my web page.

Yes, as pointed out by Simon, this is called normalization [1]. Which is a good 
thing to do, in the same way as designing a proper OO model is important when 
doing, well, OOP.

As far as performance goes, thing about normalization as a form of compression: 
for each discreet piece of information (e..g. grocery type) you assign a unique 
number to it. Then you always refer to that data using that unique number. So, 
say, instead of storing the literal 'vegetable' half-a-million time, you might 
just store its foreign key, say, 1. Same applies for any indices you might have 
on that column. Such "compression" add up quite quickly. You database engine 
will be grateful  :)

Tangentially related:

Double-thinking in SQL
http://explainextended.com/2009/07/12/double-thinking-in-sql/


[1] http://en.wikipedia.org/wiki/Database_normalization

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed of sorting - diff between integer and string

2011-10-11 Thread Frank Missel
> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Shorty
> Sent: 12 October 2011 09:31
> 
> -- HERE IS MY QUESTION: --
> Is faster for the sqlite database to have the grocery_type as a string or
> integer?
> Or is the speed difference so small it doesn't matter?
> 
> Like instead of having "fruit" in the grocery_type column, I could have it
as a
> "1" type, vegetables would be "2", and then translate that just before
> spitting out the table to my web page.
> 

I think that the sorting itself would be faster with an integer rather than
string.
However, with the small number of items that you mention here the difference
would probably be immeasurably small. Also, if you were to use an integer it
would, as you mention, have to later be converted to a string which also
takes time.

Generally, I go with the ease of use (i.e. use a meaningful string rather
than an integer) unless it is a database where performance or storage space
is an issue, which could be an another advantage of using integers with
reference to a lookup table of strings, if e.g. your database contains other
large tables with many records referring to these fruits, vegetables or
other items.

It's somewhat similar to the issue with dates. You can store them as Julian
numbers which is more effective, but then you have the hassle of having to
use date function in the select statement for the result to be meaningful.
Thus, unless there is a storage or performance issue I just store the dates
as strings, e.g. '2011-10-12'. 

/Frank Missel


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed of sorting - diff between integer and string

2011-10-11 Thread Igor Tandetnik

On 10/11/2011 9:31 PM, Shorty wrote:

Here is an example table:

COLUMNS:
grocery_type, description, price

ROWS:
fruit, apple, 1.23
fruit, banana, 5.35
vegetable, carrot, 1.55
vegetable, spinach, 6.85

-- HERE IS MY QUESTION: --
Is faster for the sqlite database to have the grocery_type as a string
or integer?


I doubt there'll be a noticeable difference until your table grows to 
hundreds of thousands of rows. But of course integers are somewhat more 
efficient. It's not so much the cost of comparisons: integers have a 
more compact representation, so the database file is smaller, so there's 
less disk I/O to perform. And disk I/O is were the bottleneck is, most 
of the time.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed of sorting - diff between integer and string

2011-10-11 Thread Simon Slavin

On 12 Oct 2011, at 2:31am, Shorty wrote:

> COLUMNS:
> grocery_type, description, price
> 
> ROWS:
> fruit, apple, 1.23
> fruit, banana, 5.35
> vegetable, carrot,  1.55
> vegetable, spinach, 6.85
> 
> -- HERE IS MY QUESTION: --
> Is faster for the sqlite database to have the grocery_type as a string or 
> integer?

Good question.  Integer.

> Or is the speed difference so small it doesn't matter?

By the time you're stocking a few thousand items, you should start seeing a 
difference.

> Like instead of having "fruit" in the grocery_type column, I could have it as 
> a "1" type, vegetables would be "2", and then translate that just before 
> spitting out the table to my web page.

Right.  You can even have another table called 'typecodes':

COLUMNS:
type_number,type_name

or something like that.  It's called 'normalisation'.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Speed of sorting - diff between integer and string

2011-10-11 Thread Shorty
I am building a database that is going to be used to produce a list of items 
on my website.   The list is created depending on how the user wants the 
list sorted. For example, lets say I am going to list groceries that are 
available in my store.   Users will typically select the type of grocery 
they want to see, like list all the fruits, or a list all vegatables.


Here is an example table:

COLUMNS:
grocery_type, description, price

ROWS:
fruit, apple, 1.23
fruit, banana, 5.35
vegetable, carrot,  1.55
vegetable, spinach, 6.85

-- HERE IS MY QUESTION: --
Is faster for the sqlite database to have the grocery_type as a string or 
integer?

Or is the speed difference so small it doesn't matter?

Like instead of having "fruit" in the grocery_type column, I could have it 
as a "1" type, vegetables would be "2", and then translate that just before 
spitting out the table to my web page.


Thanks -
Shorty

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users