Re: [sqlite] Adding index to table makes SELECT much slower. Why?
Ken and others- Thanks for your help. I have put a copy of the data and a python script that exercise the data via sqlite3 at http://oxbow.sr.unh.edu/nogoogle/wod/ for you to play with and see if these results hold on other architectures. The script is straightforward, and you can easily cut and paste the appropriate commands directly to the sqlite command line. It should run un-altered on any python version 2.5 or greater with no additional modules beyond the base install. The script access the data from three databases: 1. a database that is un-ordered, and has no index, 2. a database that is un-ordered, and has an index, 3. a database that is ordered and has an index. The databases are as described above in this thread, but truncated to reduce the time it takes to download them. The output of the test on my iMac is plume:% python test_ocean_data_script.py results on un-ordered database with no index start select select finished in 9.43625998497 seconds and found 95852 records results on un-ordered database with index on depth field start select select finished in 125.182437897 seconds and found 95852 records results on a database ordered by depth with an index on the depth field start select select finished in 0.681365013123 seconds and found 95852 records I get similar results on my linux boxes. I would be interested in hearing what you find. The data is from the World Ocean Atlas 2005 data set described in http://www.nodc.noaa.gov/OC5/WOA05/pr_woa05.html , and should be acknowledged in the event you use it for any oceanographic research. Cheers, Jamie On Thu, Oct 9, 2008 at 12:16 PM, James Pringle <[EMAIL PROTECTED]> wrote: > Hi all- > >Thanks for your many suggestions. I have tried many of your > suggestions, and found the following: > > >1. If I create the database with depth ordered from least to greatest, >and then create an index, the searches run an order of magnitude faster > than >without an index. I.e. the search for matching entries in the data base >concludes in about 9 seconds, instead of the 90 seconds it had before. >Thanks to "developir" for this suggestion! >2. searching via "rowid" makes no difference. >3. doing vacuum makes no difference, since this is a write once, read >many database. >4. ulimit on my machine is unlimited, so this was not important for me. > >5. Cache size seems to make no difference -- and the SQLite >documentation says it should only be important for DELETE's and UPDATE's. >6. I like the idea of using RTREE, however, I don't want each of the >students who use this database to have to recompile the version of SQLite >that comes with their version of python. > > Several of you, in particular Ken, suggested that I run SQL queries that > would provide some output that would be helpful in figuring out what was > going on.The database is now more than fast enough for me, so I would > not do this on my own. However, if others on the mailing list would like to > see this output in order to make SQLite better (or just for their own > curiosity), please feel free to let me know, and I can make the runs. > > I have a small subset of the data (450Mb) which exhibits the same > behaviour. The data is public (it is from the National Ocean Database), and > so if anyone wants to see it I would be happy to put it on my web server. > > Cheers, > and thanks to everyone who helped me! > Jamie Pringle > > > On Wed, Oct 8, 2008 at 6:50 PM, James Pringle <[EMAIL PROTECTED]> wrote: > > Hi- > > > > I am new to sqlite, and am having a puzzling problem. I have read > > that adding an INDEX to a sqlite table can never make a SELECT > > statement slower. However, in my case, it seems to be making the > > select statement an order of magnitude slower, which is not what I > > wanted! What could be going on? > > > > I am calling SQLite from python 2.5 from fink useing the sqlite3 > > module on OS X 10.5 with 2Gb of memory. My data base contains 32 > > million lines of ocean hydrographic data, stored on disk, with the > > table created with the following statement: > > > >CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER, > >month INTEGER, day INTEGER, time REAL, cast_id INTEGER, > >depth REAL, T REAL, S REAL, water_depth REAL) > > > > When I perform a SELECT that returns about 0.6% of the data in the > > database (about 200,000 records) > > > >SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <= > 50.99 > > > > It selects and returns the records in 82 seconds. I wished to improve > > on this time, so I created an index with the foll
Re: [sqlite] Adding index to table makes SELECT much slower. Why?
Hi all- Thanks for your many suggestions. I have tried many of your suggestions, and found the following: 1. If I create the database with depth ordered from least to greatest, and then create an index, the searches run an order of magnitude faster than without an index. I.e. the search for matching entries in the data base concludes in about 9 seconds, instead of the 90 seconds it had before. Thanks to "developir" for this suggestion! 2. searching via "rowid" makes no difference. 3. doing vacuum makes no difference, since this is a write once, read many database. 4. ulimit on my machine is unlimited, so this was not important for me. 5. Cache size seems to make no difference -- and the SQLite documentation says it should only be important for DELETE's and UPDATE's. 6. I like the idea of using RTREE, however, I don't want each of the students who use this database to have to recompile the version of SQLite that comes with their version of python. Several of you, in particular Ken, suggested that I run SQL queries that would provide some output that would be helpful in figuring out what was going on.The database is now more than fast enough for me, so I would not do this on my own. However, if others on the mailing list would like to see this output in order to make SQLite better (or just for their own curiosity), please feel free to let me know, and I can make the runs. I have a small subset of the data (450Mb) which exhibits the same behaviour. The data is public (it is from the National Ocean Database), and so if anyone wants to see it I would be happy to put it on my web server. Cheers, and thanks to everyone who helped me! Jamie Pringle On Wed, Oct 8, 2008 at 6:50 PM, James Pringle <[EMAIL PROTECTED]> wrote: > Hi- > > I am new to sqlite, and am having a puzzling problem. I have read > that adding an INDEX to a sqlite table can never make a SELECT > statement slower. However, in my case, it seems to be making the > select statement an order of magnitude slower, which is not what I > wanted! What could be going on? > > I am calling SQLite from python 2.5 from fink useing the sqlite3 > module on OS X 10.5 with 2Gb of memory. My data base contains 32 > million lines of ocean hydrographic data, stored on disk, with the > table created with the following statement: > >CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER, >month INTEGER, day INTEGER, time REAL, cast_id INTEGER, >depth REAL, T REAL, S REAL, water_depth REAL) > > When I perform a SELECT that returns about 0.6% of the data in the > database (about 200,000 records) > >SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <= 50.99 > > It selects and returns the records in 82 seconds. I wished to improve > on this time, so I created an index with the following command: > >CREATE INDEX hydro_indx ON hydro (depth) > > I then tried the SELECT command again, and read the results into > memory, and it took 717 seconds!?! > > The "depth" field contains many distinct numeric values, with no > specific value matching more then 0.5% of the data base. When I DROP > the index with the command > >DROP INDEX hydro_indx > > The SELECT time returns to about 80 seconds, confirming that it is the > index which is slowing things down. What is going on? I have > repeated and confirmed these timings. > > I have listened for disk chatter and monitored the system, and it does > not seem to be thrashing swap, or otherwise becoming unresponsive. > > I have two questions: > >1) Why is the index making things slower? >2) How can I make my SELECT statement faster? The primary > selection will be done > on the "depth" and "water_depth" keys. > > I thank you for your time. > > Cheers, > Jamie Pringle > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Adding index to table makes SELECT much slower. Why?
Hi- I am new to sqlite, and am having a puzzling problem. I have read that adding an INDEX to a sqlite table can never make a SELECT statement slower. However, in my case, it seems to be making the select statement an order of magnitude slower, which is not what I wanted! What could be going on? I am calling SQLite from python 2.5 from fink useing the sqlite3 module on OS X 10.5 with 2Gb of memory. My data base contains 32 million lines of ocean hydrographic data, stored on disk, with the table created with the following statement: CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER, month INTEGER, day INTEGER, time REAL, cast_id INTEGER, depth REAL, T REAL, S REAL, water_depth REAL) When I perform a SELECT that returns about 0.6% of the data in the database (about 200,000 records) SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <= 50.99 It selects and returns the records in 82 seconds. I wished to improve on this time, so I created an index with the following command: CREATE INDEX hydro_indx ON hydro (depth) I then tried the SELECT command again, and read the results into memory, and it took 717 seconds!?! The "depth" field contains many distinct numeric values, with no specific value matching more then 0.5% of the data base. When I DROP the index with the command DROP INDEX hydro_indx The SELECT time returns to about 80 seconds, confirming that it is the index which is slowing things down. What is going on? I have repeated and confirmed these timings. I have listened for disk chatter and monitored the system, and it does not seem to be thrashing swap, or otherwise becoming unresponsive. I have two questions: 1) Why is the index making things slower? 2) How can I make my SELECT statement faster? The primary selection will be done on the "depth" and "water_depth" keys. I thank you for your time. Cheers, Jamie Pringle ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users