Re: [sqlite] Optimize Table Access Efficiency
When I did the index test I did not have the second table in the DB. Just the complete large table (id, topics, definition). I also didn't have order by random() limit 1 In the select statment. The data is alread in alphabetical order, but I'll try that anyway. I am going to be away from the office over the next week or so but I will look into this some more as I get time. Thank you both again for your time and patience. Tim On 3/21/2012 2:53 AM, Max Vlasov wrote: On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton wrote: So it seems the index is no help; Are you sure you did as Simon explained? Becausem my tests shows the trick works... (sqlite 3.7.10) Create table [TestTable] ([Id] integer primary key, [topics] Text, [Definition] Text); CREATE TABLE [SourceTable] ([Id] integer primary key, [Title] Text); Insert into SourceTable (Title) VALUES ('bar') Insert into SourceTable (Title) VALUES ('bar') Insert into SourceTable (Title) VALUES ('foo bar') Insert into SourceTable (Title) VALUES ('bar foo') insert into TestTable (topics, definition) values ((Select Title from SourceTable order by random() limit 1), 'abcdefghij ...') (about 500 symbols) 100,000 times so we have about 100,000 records select topics, id from TestTable :Sqlite read 103 Mb (I have a special "hack" into vfs to monitor how much sqlite read during the query) :Time: 4,9 sec ...After adding the index CREATE INDEX [idx_Trick] ON [TestTable] ([Topics], [Id]) select topics, id from TestTable order by topics,id :Sqlite read 2 Mb :Time: 1,3 sec The timing was unaffected by sqlite and system cache since the db was reopened with a special trick on Windows forcing it to clear cache for this file Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
At 07:53 21/03/2012, Max Vlasov wrote: On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton wrote: > So it seems the index is no help; > Are you sure you did as Simon explained? Becausem my tests shows the trick works... (sqlite 3.7.10) The trick works, but i think he didn't timed what he should timed. He did: cur.execute('SELECT topics FROM dict') start = time.time() for item in cur: tmp.append(item[0]) print time.time() - start, 'get WL' So he timed getting the data from the python cursor and moving to a python array. I think he should time: start = time.time() cur.execute('SELECT topics FROM dict') for item in cur: tmp.append(item[0]) print time.time() - start, 'get WL' Including the SELECT because that's where the trick work. The timing was unaffected by sqlite and system cache since the db was reopened with a special trick on Windows forcing it to clear cache for this file Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton wrote: > So it seems the index is no help; > Are you sure you did as Simon explained? Becausem my tests shows the trick works... (sqlite 3.7.10) Create table [TestTable] ([Id] integer primary key, [topics] Text, [Definition] Text); CREATE TABLE [SourceTable] ([Id] integer primary key, [Title] Text); Insert into SourceTable (Title) VALUES ('bar') Insert into SourceTable (Title) VALUES ('bar') Insert into SourceTable (Title) VALUES ('foo bar') Insert into SourceTable (Title) VALUES ('bar foo') insert into TestTable (topics, definition) values ((Select Title from SourceTable order by random() limit 1), 'abcdefghij ...') (about 500 symbols) 100,000 times so we have about 100,000 records select topics, id from TestTable :Sqlite read 103 Mb (I have a special "hack" into vfs to monitor how much sqlite read during the query) :Time: 4,9 sec ...After adding the index CREATE INDEX [idx_Trick] ON [TestTable] ([Topics], [Id]) select topics, id from TestTable order by topics,id :Sqlite read 2 Mb :Time: 1,3 sec The timing was unaffected by sqlite and system cache since the db was reopened with a special trick on Windows forcing it to clear cache for this file Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
On 20 Mar 2012, at 9:33pm, Tim Morton wrote: > So it seems the index is no help; a second topics table is a significant > help; and a separate file with topics table a negligible help; and creation > order is no help. Interesting and a little unexpected. Good to see a user doing experimentation. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
I did some quick tests: I took a 100,000 entry dictionary and made a single table "create table dict (id INTEGER PRIMARY KEY, topics, def)" I ran a "for loop" on the cursor object to extract all the topic column data and put in in a Python list . cur.execute('SELECT topics FROM dict') start = time.time() for item in cur: tmp.append(item[0]) print time.time() - start, 'get WL' The time was around .035 sec. With an index created, 'CREATE INDEX dctidx ON dict (id, topics)' There was no difference in time. The file was around 2MB larger. I made a second table with just the topic data, "create table topics (topic)" with this select, 'SELECT topic FROM topics' The time was roughly half at around .017 sec. I made a separate file with just the topic data table as above and the access time was only slightly less .014-.015 sec. Also, I populated the topics table completely and then populated the def table, with no difference in speed if they were populated in the same loop. So it seems the index is no help; a second topics table is a significant help; and a separate file with topics table a negligible help; and creation order is no help. They were all good ideas to try, though. Tim On 3/20/2012 2:36 PM, Simon Slavin wrote: On 20 Mar 2012, at 6:11pm, Tim Morton wrote: I may try this, but the topics are usually just one word each so I doubt an index could reduce it much. Max's trick is something that takes advantage of how SQLite works entirely. If you do a SELECT like SELECT id,topics FROM dictionary ORDER BY topics and there's an index like CREATE INDEX dti ON dictionary (topics,id) then SQLite will use that index to find the right order to retrieve the records in. But having figured out what rows it needs it realises that it has all the information -- both the id and the topic -- right there in the index. So it doesn't bother then looking up the TABLE data to fetch the other columns, it just returns all the information you asked for right from the index it's already looking at. It's fast and, because the index contains only the data you want, it's equivalent to making up a separate table which has just the id and topics columns. I forgot about that trick when I replied but Max didn't. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
Ah, very good. Thanks to you both. I will definitly try this. Sounds like it may help. Tim On 3/20/2012 2:36 PM, Simon Slavin wrote: On 20 Mar 2012, at 6:11pm, Tim Morton wrote: I may try this, but the topics are usually just one word each so I doubt an index could reduce it much. Max's trick is something that takes advantage of how SQLite works entirely. If you do a SELECT like SELECT id,topics FROM dictionary ORDER BY topics and there's an index like CREATE INDEX dti ON dictionary (topics,id) then SQLite will use that index to find the right order to retrieve the records in. But having figured out what rows it needs it realises that it has all the information -- both the id and the topic -- right there in the index. So it doesn't bother then looking up the TABLE data to fetch the other columns, it just returns all the information you asked for right from the index it's already looking at. It's fast and, because the index contains only the data you want, it's equivalent to making up a separate table which has just the id and topics columns. I forgot about that trick when I replied but Max didn't. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
On 20 Mar 2012, at 6:11pm, Tim Morton wrote: > I may try this, but the topics are usually just one word each so I doubt an > index could reduce it much. Max's trick is something that takes advantage of how SQLite works entirely. If you do a SELECT like SELECT id,topics FROM dictionary ORDER BY topics and there's an index like CREATE INDEX dti ON dictionary (topics,id) then SQLite will use that index to find the right order to retrieve the records in. But having figured out what rows it needs it realises that it has all the information -- both the id and the topic -- right there in the index. So it doesn't bother then looking up the TABLE data to fetch the other columns, it just returns all the information you asked for right from the index it's already looking at. It's fast and, because the index contains only the data you want, it's equivalent to making up a separate table which has just the id and topics columns. I forgot about that trick when I replied but Max didn't. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
Thanks, I may try this, but the topics are usually just one word each so I doubt an index could reduce it much. Tim On 3/20/2012 1:52 PM, Max Vlasov wrote: Hi, Tim On Tue, Mar 20, 2012 at 6:21 PM, Tim Morton wrote: Is there a way to read only the part of the file that has the queried table/column thus saving time and hard dive grinding? There is also a trick. You can create an index on 'topics' column and perform your first reading forcing this column order and not touching definition at the same time. Since any index is just another b-tree but most of the time smaller you will end up reading a smaller part of file. Although there's a price to pay, your db will become larger. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
Hi, Tim On Tue, Mar 20, 2012 at 6:21 PM, Tim Morton wrote: > > Is there a way to read only the part of the file that has the queried > table/column thus saving time and hard dive grinding? > > There is also a trick. You can create an index on 'topics' column and perform your first reading forcing this column order and not touching definition at the same time. Since any index is just another b-tree but most of the time smaller you will end up reading a smaller part of file. Although there's a price to pay, your db will become larger. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
Thanks, Simon, for your detailed answer. I will try the suggestions you and Michael supplied and see if there is any significant inprovement. Tim On 3/20/2012 11:13 AM, Simon Slavin wrote: On 20 Mar 2012, at 2:21pm, Tim Morton wrote: My app reads dozens of SQLite databases ranging in size from 1MB to 100MB with a simple table structure like, "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)" On startup the app reads all the databases to extract the "topics" column data. With the DB configured as above extracting the topic data is a lengthy process. It seems the whole DB file is being read to just access the relatively small "topics" column. If I add another table with just the topics data, "create table dictionary(id INTEGER PRIMARY KEY, topics)" access is quicker. However, if I make another DB file with just the topics column, access is much quicker. I don't really want to use two files to access a single DB. Also, the DB is only read from; not written to by the app. Thank you for your detailed description which helps a lot in considering your situation. My questions, Does table creation order affect the speed of access? These things depend mostly on your hardware. Windows deals much better with defragmented files than fragmented files. A lot of its speed comes from assumptions like if you're looking at sector s, you're shortly going to want sector s+1. To speed up your situation, after your database file has been written, execute a VACUUM command in SQL then (if you're running Windows) defragment your hard disk. If the above does not give you fast enough access, then it may be possible to speed things up still further but writing one TABLE then the other. You can do this when you originally make the file, or you can use the sqlite shell tool to .dump the file to SQL commands and .read the SQL commands back in to form another database file. After doing this, once again, if you're running Windows do another defragment. I don't expect that to make much difference but it might be worth trying. You should also look at the PRAGMA command Michael recommended. Is there a way to read only the part of the file that has the queried table/column thus saving time and hard dive grinding? Is it possible to read a small table/column from a huge file nearly as quick as the small table alone from a small file? And if so, how? Your TABLE definition is as efficient as it can be. Your most efficient reading code in each situation is SELECT id,topics FROM dictionary Any other optimization can't be done inside the SQL command. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
On 20 Mar 2012, at 2:21pm, Tim Morton wrote: > My app reads dozens of SQLite databases ranging in size from 1MB to 100MB > with a simple table structure like, > > "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)" > > On startup the app reads all the databases to extract the "topics" column > data. With the DB configured as above extracting the topic data is a lengthy > process. It seems the whole DB file is being read to just access the > relatively small "topics" column. > > If I add another table with just the topics data, > > "create table dictionary(id INTEGER PRIMARY KEY, topics)" > > access is quicker. However, if I make another DB file with just the topics > column, access is much quicker. I don't really want to use two files to > access a single DB. > > Also, the DB is only read from; not written to by the app. Thank you for your detailed description which helps a lot in considering your situation. > My questions, > > Does table creation order affect the speed of access? These things depend mostly on your hardware. Windows deals much better with defragmented files than fragmented files. A lot of its speed comes from assumptions like if you're looking at sector s, you're shortly going to want sector s+1. To speed up your situation, after your database file has been written, execute a VACUUM command in SQL then (if you're running Windows) defragment your hard disk. If the above does not give you fast enough access, then it may be possible to speed things up still further but writing one TABLE then the other. You can do this when you originally make the file, or you can use the sqlite shell tool to .dump the file to SQL commands and .read the SQL commands back in to form another database file. After doing this, once again, if you're running Windows do another defragment. I don't expect that to make much difference but it might be worth trying. You should also look at the PRAGMA command Michael recommended. > Is there a way to read only the part of the file that has the queried > table/column thus saving time and hard dive grinding? > > Is it possible to read a small table/column from a huge file nearly as quick > as the small table alone from a small file? And if so, how? Your TABLE definition is as efficient as it can be. Your most efficient reading code in each situation is SELECT id,topics FROM dictionary Any other optimization can't be done inside the SQL command. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
You may also want to try pragma cache_size and bump it up a LOT just to see what happens to your timings. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Tim Morton [t...@preservedwords.com] Sent: Tuesday, March 20, 2012 9:44 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Optimize Table Access Efficiency Thanks for the reply, "Lengthy" varies according to the system, but from a fresh, initial start on this one it can be 15 to 20 seconds. The system cache speeds up subsequent starts to around 3-4 seconds. As mentioned in the post, I have two tables, one for topics and one for definitions. I read the topics table and get the index and then use it to get the definition. The one thing I didn't do is populate the two tables on creation separately. They both are populated in the same "for loop." I'll try that, but with my previous texts I could always get better speed with the topics in a separate DB on a fresh start. Tim On 3/20/2012 10:26 AM, Black, Michael (IS) wrote: > Try creating 2 tables, one for topics, one for definitions. > > > > Then insert all the topics at once followed by all the definitions. > > That should give you the same disk layout as two databases. > > > > And you don't say what "lengthy" means. > > > > Michael D. Black > > Senior Scientist > > Advanced Analytics Directorate > > Advanced GEOINT Solutions Operating Unit > > Northrop Grumman Information Systems > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Tim Morton [t...@preservedwords.com] > Sent: Tuesday, March 20, 2012 9:21 AM > To: sqlite-users@sqlite.org > Subject: EXT :[sqlite] Optimize Table Access Efficiency > > Greetings, > > My app reads dozens of SQLite databases ranging in size from 1MB to > 100MB with a simple table structure like, > > "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)" > > On startup the app reads all the databases to extract the "topics" > column data. With the DB configured as above extracting the topic data > is a lengthy process. It seems the whole DB file is being read to just > access the relatively small "topics" column. > > If I add another table with just the topics data, > > "create table dictionary(id INTEGER PRIMARY KEY, topics)" > > access is quicker. However, if I make another DB file with just the > topics column, access is much quicker. I don't really want to use two > files to access a single DB. > > Also, the DB is only read from; not written to by the app. > > My questions, > > Does table creation order affect the speed of access? > > Is there a way to read only the part of the file that has the queried > table/column thus saving time and hard dive grinding? > > Is it possible to read a small table/column from a huge file nearly as > quick as the small table alone from a small file? And if so, how? > > Thanks, > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
Thanks for the reply, "Lengthy" varies according to the system, but from a fresh, initial start on this one it can be 15 to 20 seconds. The system cache speeds up subsequent starts to around 3-4 seconds. As mentioned in the post, I have two tables, one for topics and one for definitions. I read the topics table and get the index and then use it to get the definition. The one thing I didn't do is populate the two tables on creation separately. They both are populated in the same "for loop." I'll try that, but with my previous texts I could always get better speed with the topics in a separate DB on a fresh start. Tim On 3/20/2012 10:26 AM, Black, Michael (IS) wrote: Try creating 2 tables, one for topics, one for definitions. Then insert all the topics at once followed by all the definitions. That should give you the same disk layout as two databases. And you don't say what "lengthy" means. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Tim Morton [t...@preservedwords.com] Sent: Tuesday, March 20, 2012 9:21 AM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] Optimize Table Access Efficiency Greetings, My app reads dozens of SQLite databases ranging in size from 1MB to 100MB with a simple table structure like, "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)" On startup the app reads all the databases to extract the "topics" column data. With the DB configured as above extracting the topic data is a lengthy process. It seems the whole DB file is being read to just access the relatively small "topics" column. If I add another table with just the topics data, "create table dictionary(id INTEGER PRIMARY KEY, topics)" access is quicker. However, if I make another DB file with just the topics column, access is much quicker. I don't really want to use two files to access a single DB. Also, the DB is only read from; not written to by the app. My questions, Does table creation order affect the speed of access? Is there a way to read only the part of the file that has the queried table/column thus saving time and hard dive grinding? Is it possible to read a small table/column from a huge file nearly as quick as the small table alone from a small file? And if so, how? Thanks, ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimize Table Access Efficiency
Try creating 2 tables, one for topics, one for definitions. Then insert all the topics at once followed by all the definitions. That should give you the same disk layout as two databases. And you don't say what "lengthy" means. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Tim Morton [t...@preservedwords.com] Sent: Tuesday, March 20, 2012 9:21 AM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] Optimize Table Access Efficiency Greetings, My app reads dozens of SQLite databases ranging in size from 1MB to 100MB with a simple table structure like, "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)" On startup the app reads all the databases to extract the "topics" column data. With the DB configured as above extracting the topic data is a lengthy process. It seems the whole DB file is being read to just access the relatively small "topics" column. If I add another table with just the topics data, "create table dictionary(id INTEGER PRIMARY KEY, topics)" access is quicker. However, if I make another DB file with just the topics column, access is much quicker. I don't really want to use two files to access a single DB. Also, the DB is only read from; not written to by the app. My questions, Does table creation order affect the speed of access? Is there a way to read only the part of the file that has the queried table/column thus saving time and hard dive grinding? Is it possible to read a small table/column from a huge file nearly as quick as the small table alone from a small file? And if so, how? Thanks, ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: Re: [sqlite] optimize table
Since you produce the statistics once per month on a schedule it would be more efficient to not have an index. The index will slow down daily operations each time a record is inserted. Copy the database file once per month to a 'snapshot' file. Add the index to the snapshot database and then produce your statistics. Make it a scheduled task that runs the night before you need it. On 4/15/05, msaka msaka <[EMAIL PROTECTED]> wrote: > have you some feeling on my problem? > > cut or not cut db files? > > >-Pôvodná správa- > >Od: D. Richard Hipp [mailto:[EMAIL PROTECTED] > >Odoslané: 15. apríla 2005 11:15 > >Komu: sqlite-users@sqlite.org > >Predmet: Re: [sqlite] optimize table > > > > > >On Fri, 2005-04-15 at 06:15 -0400, Ken & Deb Allen wrote: > >> I cannot speak for SQLITE for certain, but I know with a number of > >> commercial RDBMS packages the index is not used unless it contains a > >> sufficient degree of distribution. > >> > > > >SQLite does not do this. It makes no effort to keep track of the > >"degree of distribution" or other statistics associated with indices. > >It just uses them if they are available. > > > >This can be either a feature or a bug, depending on your point > >of view. I have had people tell me that they prefer the SQLite > >way. With other databases, they tell me, they are constantly > >in a struggle trying to outwit and trick the database engine > >into doing what they want. SQLite just does what you tell it > >to do, for better or for worse. > >-- > >D. Richard Hipp <[EMAIL PROTECTED]> > > -- --- You a Gamer? If you're near Kansas City: Conquest 36 https://events.reddawn.net The Castles of Dereth Calendar: a tour of the art and architecture of Asheron's Call http://www.lulu.com/content/77264
Re: Re: [sqlite] optimize table
have you some feeling on my problem? cut or not cut db files? >-PÃvodnà sprÃva- >Od: D. Richard Hipp [mailto:[EMAIL PROTECTED] >OdoslanÃ: 15. aprÃla 2005 11:15 >Komu: sqlite-users@sqlite.org >Predmet: Re: [sqlite] optimize table > > >On Fri, 2005-04-15 at 06:15 -0400, Ken & Deb Allen wrote: >> I cannot speak for SQLITE for certain, but I know with a number of >> commercial RDBMS packages the index is not used unless it contains a >> sufficient degree of distribution. >> > >SQLite does not do this. It makes no effort to keep track of the >"degree of distribution" or other statistics associated with indices. >It just uses them if they are available. > >This can be either a feature or a bug, depending on your point >of view. I have had people tell me that they prefer the SQLite >way. With other databases, they tell me, they are constantly >in a struggle trying to outwit and trick the database engine >into doing what they want. SQLite just does what you tell it >to do, for better or for worse. >-- >D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] optimize table
On Fri, 2005-04-15 at 06:15 -0400, Ken & Deb Allen wrote: > I cannot speak for SQLITE for certain, but I know with a number of > commercial RDBMS packages the index is not used unless it contains a > sufficient degree of distribution. > SQLite does not do this. It makes no effort to keep track of the "degree of distribution" or other statistics associated with indices. It just uses them if they are available. This can be either a feature or a bug, depending on your point of view. I have had people tell me that they prefer the SQLite way. With other databases, they tell me, they are constantly in a struggle trying to outwit and trick the database engine into doing what they want. SQLite just does what you tell it to do, for better or for worse. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] optimize table
P.S. When experimenting with this, try naming this 'second' table first to reduce the work the database has to do in selecting/rejecting records. Ypu can experiment with this by creating this new table, populating it with a SELECT from the existing table (only the records without statistics), and then timing the SELECT/JOIN. -ken On 15-Apr-05, at 6:15 AM, Ken & Deb Allen wrote: I cannot speak for SQLITE for certain, but I know with a number of commercial RDBMS packages the index is not used unless it contains a sufficient degree of distribution. Microsoft SQL Server, for example, is quite explicit in stating that if the distribution of values across the range of records in an index falls below a certain level, then the index is ignored, which is why you are used to put the most dynamic column first in a multi-column index. I do not know if SQLITE performs any of this analysis and makes decisions on whether to use the index or not, but it is possible, especially if a sort is involved, that the index is not helping much if all records have one of two values, and as the majority of he records have the same value, you are looking only for the minority records. I do not know if using a date/time for this files will speed it up or not, since your statistical inclusion query will be for records "WHERE statistic IS NULL" or "WHERE statistic = 0", and as the database grows in size this will be the minority set of records. One 'trick' I have seen in the past was to create a second table that contained an entry for new records, also keyed by the same "id"; as each sale is created, an entry is also made in that table, possibly by a trigger. You select the records for statistics using a JOIN (inner), which will only return records that are in both tables. As the records are processed, the record is removed from the second table, and therefore the records will not be part of the JOIN in the future. In some databases this may be faster than using an index; I have not experimented with this in SQLITE, however. -ken On 15-Apr-05, at 4:30 AM, msaka msaka wrote: i dont need working with dates and times.. i only set 1 records on which was done.. which index is better to create for this problem? must i cut db files? or speed of statistic will be same now and after 1000 000 records? -Pôvodná správa- Od: Gert Rijs [mailto:[EMAIL PROTECTED] Odoslané: 15. apríla 2005 7:13 Komu: sqlite-users@sqlite.org Predmet: Re: [sqlite] optimize table msaka msaka wrote: CREATE TABLE sale( id INTEGER NOT NULL UNIQUE PRIMARY KEY, bill_no integer, bill_item_no integer, item_name varchar(20), qty integer, amount DOUBLE, statistic integer ) Why don't you replace "statistic" with a timestamp formatted like mmdd, perhaps add the time as well. If you add an index on it you can efficiently retrieve the rows for certain date periods without needing to update the table. Gert
Re: [sqlite] optimize table
I cannot speak for SQLITE for certain, but I know with a number of commercial RDBMS packages the index is not used unless it contains a sufficient degree of distribution. Microsoft SQL Server, for example, is quite explicit in stating that if the distribution of values across the range of records in an index falls below a certain level, then the index is ignored, which is why you are used to put the most dynamic column first in a multi-column index. I do not know if SQLITE performs any of this analysis and makes decisions on whether to use the index or not, but it is possible, especially if a sort is involved, that the index is not helping much if all records have one of two values, and as the majority of he records have the same value, you are looking only for the minority records. I do not know if using a date/time for this files will speed it up or not, since your statistical inclusion query will be for records "WHERE statistic IS NULL" or "WHERE statistic = 0", and as the database grows in size this will be the minority set of records. One 'trick' I have seen in the past was to create a second table that contained an entry for new records, also keyed by the same "id"; as each sale is created, an entry is also made in that table, possibly by a trigger. You select the records for statistics using a JOIN (inner), which will only return records that are in both tables. As the records are processed, the record is removed from the second table, and therefore the records will not be part of the JOIN in the future. In some databases this may be faster than using an index; I have not experimented with this in SQLITE, however. -ken On 15-Apr-05, at 4:30 AM, msaka msaka wrote: i dont need working with dates and times.. i only set 1 records on which was done.. which index is better to create for this problem? must i cut db files? or speed of statistic will be same now and after 1000 000 records? -Pôvodná správa- Od: Gert Rijs [mailto:[EMAIL PROTECTED] Odoslané: 15. apríla 2005 7:13 Komu: sqlite-users@sqlite.org Predmet: Re: [sqlite] optimize table msaka msaka wrote: CREATE TABLE sale( id INTEGER NOT NULL UNIQUE PRIMARY KEY, bill_no integer, bill_item_no integer, item_name varchar(20), qty integer, amount DOUBLE, statistic integer ) Why don't you replace "statistic" with a timestamp formatted like mmdd, perhaps add the time as well. If you add an index on it you can efficiently retrieve the rows for certain date periods without needing to update the table. Gert
Re: Re: [sqlite] optimize table
i dont need working with dates and times.. i only set 1 records on which was done.. which index is better to create for this problem? must i cut db files? or speed of statistic will be same now and after 1000 000 records? >-Pôvodná správa- >Od: Gert Rijs [mailto:[EMAIL PROTECTED] >Odoslané: 15. apríla 2005 7:13 >Komu: sqlite-users@sqlite.org >Predmet: Re: [sqlite] optimize table > > >msaka msaka wrote: >> >> CREATE TABLE sale( >>id INTEGER NOT NULL UNIQUE PRIMARY KEY, >>bill_no integer, >>bill_item_no integer, >>item_name varchar(20), >>qty integer, >>amount DOUBLE, >>statistic integer >> ) >> >> > >Why don't you replace "statistic" with a timestamp formatted like >mmdd, perhaps add the time as well. If you add an index on it you >can efficiently retrieve the rows for certain date periods without >needing to update the table. > >Gert
Re: [sqlite] optimize table
msaka msaka wrote: CREATE TABLE sale( id INTEGER NOT NULL UNIQUE PRIMARY KEY, bill_no integer, bill_item_no integer, item_name varchar(20), qty integer, amount DOUBLE, statistic integer ) Why don't you replace "statistic" with a timestamp formatted like mmdd, perhaps add the time as well. If you add an index on it you can efficiently retrieve the rows for certain date periods without needing to update the table. Gert