Re: [sqlite] Speed of sorting - diff between integer and string
Thanks a bunch for all the great info and the benchmark test !!! Also thanks for explaining the term normalization - as you can probably tell I am a newbie, and it is hard to look stuff up without knowing the right terms. I spent hours looking for a technique to split results into pages, and then stumbled on the term "pagination", which really cracked open the door to finding great information that other programmers had posted. I should have included the size of the database and other info, so here it is, incase that makes a difference. -- Using the same grocery store example, the database I am going to create is probably going to peak out at 5000 grocery items, but may get as large as 7000, but doubt any larger than that. Each of those items is going to fall into one category, with only 10 different categories (like fruits, vegetables, canned goods, boxed goods,...) Then I am creating a second database that is like brands of the items. Such as Dole bananas, Delmonte bananas, John Doe's organic bananas etc. That database will have about 15,000 items in it, each one relating to just one of the grocery store items. I am running this with php 5 and the sqlite that is included with php 5, on a web server that is the shared type.The website is not a storefront, it is a freely available database I am building that anyone in the public can access without any type of login. I am expecting about 5,000 views a day on the page that returns the search query that I am asking about. Thanks Shorty ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] : Speed of sorting - diff between integer and string
To answer your question of which is fastersee the benchmark below...there's no difference at 100,000 records. The index creation is a bit faster for integer. But for your data you won't be doing many inserts or updates on items...more on prices so the grocery_type_index won't get touched very much. #include main() { int i; for(i=0;i<10;++i) { char sql[4096]; char *type; if ((i%5)==0) type="fruit"; if ((i%5)==1) type="vegetable"; if ((i%5)==2) type="meat"; if ((i%5)==3) type="sundrie"; if ((i%5)==4) type="bakery"; sprintf(sql,"insert into store values(%d,'%s',155);\n",i%5,type,"testing"); // for int grocery type //sprintf(sql,"insert into store values('%s','%s',155);\n",type,"testing"); // for string grocery type printf("%s",sql); } } sqlite> create table store (grocery_type text, description text,price int); sqlite> begin; sqlite> .read data1.sql sqlite> commit; sqlite> .timer on sqlite> create index grocery_type_index on store(grocery_type); CPU Time: user 0.247962 sys 0.005999 sqlite> select count(*) from store where grocery_type='fruit'; 2 CPU Time: user 0.002999 sys 0.00 sqlite> select count(*) from store where grocery_type='vegetable'; 2 CPU Time: user 0.003000 sys 0.00 sqlite> select count(*) from store where grocery_type='meat'; 2 CPU Time: user 0.003000 sys 0.00 sqlite> select count(*) from store where grocery_type='meat'; 2 CPU Time: user 0.003000 sys 0.00 sqlite> select count(*) from store where grocery_type='sundrie'; 2 CPU Time: user 0.002999 sys 0.00 Now do it with integer grocery_type sqlite> create table store (grocery_type int, description text,price int); sqlite> begin; sqlite> .read data2.sql sqlite> commit; sqlite> .timer on sqlite> create index grocery_type_index on store(grocery_type); CPU Time: user 0.197970 sys 0.003000 sqlite> select count(*) from store where grocery_type='vegetable'; 0 CPU Time: user 0.00 sys 0.00 sqlite> select count(*) from store where grocery_type=0; 2 CPU Time: user 0.002999 sys 0.00 sqlite> select count(*) from store where grocery_type=1; 2 CPU Time: user 0.003000 sys 0.00 sqlite> select count(*) from store where grocery_type=2; 2 CPU Time: user 0.002999 sys 0.00 sqlite> select count(*) from store where grocery_type=3; 2 CPU Time: user 0.003000 sys 0.00 sqlite> select count(*) from store where grocery_type=4; 2 CPU Time: user 0.003000 sys 0.00 Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Shorty [sho...@shortypen.com] Sent: Tuesday, October 11, 2011 8:31 PM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] Speed of sorting - diff between integer and string I am building a database that is going to be used to produce a list of items on my website. The list is created depending on how the user wants the list sorted. For example, lets say I am going to list groceries that are available in my store. Users will typically select the type of grocery they want to see, like list all the fruits, or a list all vegatables. Here is an example table: COLUMNS: grocery_type, description, price ROWS: fruit, apple, 1.23 fruit, banana, 5.35 vegetable, carrot, 1.55 vegetable, spinach, 6.85 -- HERE IS MY QUESTION: -- Is faster for the sqlite database to have the grocery_type as a string or integer? Or is the speed difference so small it doesn't matter? Like instead of having "fruit" in the grocery_type column, I could have it as a "1" type, vegetables would be "2", and then translate that just before spitting out the table to my web page. Thanks - Shorty ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speed of sorting - diff between integer and string
On Oct 12, 2011, at 3:31 AM, Shorty wrote: > Or is the speed difference so small it doesn't matter? If you have a trivial amount of data, then it doesn't matter much either way. Anything goes when you have no data :) But in general, do normalize your data and use foreign keys. There are numerous benefits in doing so. > Like instead of having "fruit" in the grocery_type column, I could have it as > a "1" type, vegetables would be "2", and then translate that just before > spitting out the table to my web page. Yes, as pointed out by Simon, this is called normalization [1]. Which is a good thing to do, in the same way as designing a proper OO model is important when doing, well, OOP. As far as performance goes, thing about normalization as a form of compression: for each discreet piece of information (e..g. grocery type) you assign a unique number to it. Then you always refer to that data using that unique number. So, say, instead of storing the literal 'vegetable' half-a-million time, you might just store its foreign key, say, 1. Same applies for any indices you might have on that column. Such "compression" add up quite quickly. You database engine will be grateful :) Tangentially related: Double-thinking in SQL http://explainextended.com/2009/07/12/double-thinking-in-sql/ [1] http://en.wikipedia.org/wiki/Database_normalization ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speed of sorting - diff between integer and string
> -Original Message- > From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > boun...@sqlite.org] On Behalf Of Shorty > Sent: 12 October 2011 09:31 > > -- HERE IS MY QUESTION: -- > Is faster for the sqlite database to have the grocery_type as a string or > integer? > Or is the speed difference so small it doesn't matter? > > Like instead of having "fruit" in the grocery_type column, I could have it as a > "1" type, vegetables would be "2", and then translate that just before > spitting out the table to my web page. > I think that the sorting itself would be faster with an integer rather than string. However, with the small number of items that you mention here the difference would probably be immeasurably small. Also, if you were to use an integer it would, as you mention, have to later be converted to a string which also takes time. Generally, I go with the ease of use (i.e. use a meaningful string rather than an integer) unless it is a database where performance or storage space is an issue, which could be an another advantage of using integers with reference to a lookup table of strings, if e.g. your database contains other large tables with many records referring to these fruits, vegetables or other items. It's somewhat similar to the issue with dates. You can store them as Julian numbers which is more effective, but then you have the hassle of having to use date function in the select statement for the result to be meaningful. Thus, unless there is a storage or performance issue I just store the dates as strings, e.g. '2011-10-12'. /Frank Missel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speed of sorting - diff between integer and string
On 10/11/2011 9:31 PM, Shorty wrote: Here is an example table: COLUMNS: grocery_type, description, price ROWS: fruit, apple, 1.23 fruit, banana, 5.35 vegetable, carrot, 1.55 vegetable, spinach, 6.85 -- HERE IS MY QUESTION: -- Is faster for the sqlite database to have the grocery_type as a string or integer? I doubt there'll be a noticeable difference until your table grows to hundreds of thousands of rows. But of course integers are somewhat more efficient. It's not so much the cost of comparisons: integers have a more compact representation, so the database file is smaller, so there's less disk I/O to perform. And disk I/O is were the bottleneck is, most of the time. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speed of sorting - diff between integer and string
On 12 Oct 2011, at 2:31am, Shorty wrote: > COLUMNS: > grocery_type, description, price > > ROWS: > fruit, apple, 1.23 > fruit, banana, 5.35 > vegetable, carrot, 1.55 > vegetable, spinach, 6.85 > > -- HERE IS MY QUESTION: -- > Is faster for the sqlite database to have the grocery_type as a string or > integer? Good question. Integer. > Or is the speed difference so small it doesn't matter? By the time you're stocking a few thousand items, you should start seeing a difference. > Like instead of having "fruit" in the grocery_type column, I could have it as > a "1" type, vegetables would be "2", and then translate that just before > spitting out the table to my web page. Right. You can even have another table called 'typecodes': COLUMNS: type_number,type_name or something like that. It's called 'normalisation'. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users