Re: [sqlite] Adding index to table makes SELECT much slower. Why?

2008-10-10 Thread James Pringle
Ken and others-

 Thanks for your help.  I have put a copy of the data and a python
script that exercise the data via sqlite3 at
http://oxbow.sr.unh.edu/nogoogle/wod/ for you to play with and see if these
results hold on other architectures.  The script is straightforward, and you
can easily cut and paste the appropriate commands directly to the sqlite
command line.   It should run un-altered on any python version 2.5 or
greater with no additional modules beyond the base install. The script
access the data from three databases:

   1. a database that is un-ordered, and has no index,
   2. a database that is un-ordered, and has an index,
   3. a database that is ordered and has an index.

The databases are as described above in this thread, but truncated to reduce
the time it takes to download them. The output of the test on my iMac is

plume:% python test_ocean_data_script.py

results on un-ordered database with no index
start select
select finished in  9.43625998497 seconds and found  95852  records

results on un-ordered database with index on depth field
start select
select finished in  125.182437897 seconds and found  95852  records

results on a database ordered by depth with an index on the depth field
start select
select finished in  0.681365013123 seconds and found  95852  records

I get similar results on my linux boxes.

I would be interested in hearing what you find.   The data is from the World
Ocean Atlas 2005 data set described in
http://www.nodc.noaa.gov/OC5/WOA05/pr_woa05.html , and should be
acknowledged in the event you use it for any oceanographic research.

Cheers,
Jamie

On Thu, Oct 9, 2008 at 12:16 PM, James Pringle <[EMAIL PROTECTED]> wrote:

> Hi all-
>
>Thanks for your many suggestions.  I have tried many of your
> suggestions, and found the following:
>
>
>1. If I create the database with depth ordered from least to greatest,
>and then create an index, the searches run an order of magnitude faster 
> than
>without an index.  I.e. the search for matching entries in the data base
>concludes in about 9 seconds, instead of the 90 seconds it had before.
>Thanks to "developir" for this suggestion!
>2. searching via "rowid" makes no difference.
>3. doing vacuum makes no difference, since this is a write once, read
>many database.
>4. ulimit on my machine is unlimited, so this was not important for me.
>
>5. Cache size seems to make no difference -- and the SQLite
>documentation says it should only be important for DELETE's and UPDATE's.
>6. I like the idea of using RTREE, however, I don't want each of the
>students who use this database to have to recompile the version of SQLite
>that comes with their version of python.
>
> Several of you, in particular Ken, suggested that I run SQL queries that
> would provide some output that would be helpful in figuring out what was
> going on.The database is now more than fast enough for me, so I would
> not do this on my own.  However, if others on the mailing list would like to
> see this output in order to make SQLite better (or just for their own
> curiosity), please feel free to let me know, and I can make the runs.
>
> I have a small subset of the data (450Mb) which exhibits the same
> behaviour.  The data is public (it is from the National Ocean Database), and
> so if anyone wants to see it I would be happy to put it on my web server.
>
> Cheers,
> and thanks to everyone who helped me!
> Jamie Pringle
>
>
> On Wed, Oct 8, 2008 at 6:50 PM, James Pringle <[EMAIL PROTECTED]> wrote:
> > Hi-
> >
> >  I am new to sqlite, and am having a puzzling problem.  I have read
> > that adding an INDEX to a sqlite table can never make a SELECT
> > statement slower.  However, in my case, it seems to be making the
> > select statement an order of magnitude slower, which is not what I
> > wanted!  What could be going on?
> >
> >  I am calling SQLite from python 2.5 from fink useing the sqlite3
> > module on OS X 10.5 with 2Gb of memory.  My data base contains 32
> > million lines of ocean hydrographic data, stored on disk, with the
> > table created with the following statement:
> >
> >CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER,
> >month INTEGER, day INTEGER, time REAL, cast_id INTEGER,
> >depth REAL, T REAL, S REAL, water_depth REAL)
> >
> > When I perform a SELECT that returns about 0.6% of the data in the
> > database (about 200,000 records)
> >
> >SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <=
> 50.99
> >
> > It selects and returns the records in 82 seconds.  I wished to improve
> > on this time, so I created an index with the foll

Re: [sqlite] Adding index to table makes SELECT much slower. Why?

2008-10-09 Thread James Pringle
Hi all-

   Thanks for your many suggestions.  I have tried many of your suggestions,
and found the following:


   1. If I create the database with depth ordered from least to greatest,
   and then create an index, the searches run an order of magnitude faster than
   without an index.  I.e. the search for matching entries in the data base
   concludes in about 9 seconds, instead of the 90 seconds it had before.
   Thanks to "developir" for this suggestion!
   2. searching via "rowid" makes no difference.
   3. doing vacuum makes no difference, since this is a write once, read
   many database.
   4. ulimit on my machine is unlimited, so this was not important for me.
   5. Cache size seems to make no difference -- and the SQLite documentation
   says it should only be important for DELETE's and UPDATE's.
   6. I like the idea of using RTREE, however, I don't want each of the
   students who use this database to have to recompile the version of SQLite
   that comes with their version of python.

Several of you, in particular Ken, suggested that I run SQL queries that
would provide some output that would be helpful in figuring out what was
going on.The database is now more than fast enough for me, so I would
not do this on my own.  However, if others on the mailing list would like to
see this output in order to make SQLite better (or just for their own
curiosity), please feel free to let me know, and I can make the runs.

I have a small subset of the data (450Mb) which exhibits the same
behaviour.  The data is public (it is from the National Ocean Database), and
so if anyone wants to see it I would be happy to put it on my web server.

Cheers,
and thanks to everyone who helped me!
Jamie Pringle

On Wed, Oct 8, 2008 at 6:50 PM, James Pringle <[EMAIL PROTECTED]> wrote:
> Hi-
>
>  I am new to sqlite, and am having a puzzling problem.  I have read
> that adding an INDEX to a sqlite table can never make a SELECT
> statement slower.  However, in my case, it seems to be making the
> select statement an order of magnitude slower, which is not what I
> wanted!  What could be going on?
>
>  I am calling SQLite from python 2.5 from fink useing the sqlite3
> module on OS X 10.5 with 2Gb of memory.  My data base contains 32
> million lines of ocean hydrographic data, stored on disk, with the
> table created with the following statement:
>
>CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER,
>month INTEGER, day INTEGER, time REAL, cast_id INTEGER,
>depth REAL, T REAL, S REAL, water_depth REAL)
>
> When I perform a SELECT that returns about 0.6% of the data in the
> database (about 200,000 records)
>
>SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <=
50.99
>
> It selects and returns the records in 82 seconds.  I wished to improve
> on this time, so I created an index with the following command:
>
>CREATE INDEX hydro_indx ON hydro (depth)
>
> I then tried the SELECT command again, and read the results into
> memory, and it took 717 seconds!?!
>
> The "depth" field contains many distinct numeric values, with no
> specific value matching more then 0.5% of the data base.  When I DROP
> the index with the command
>
>DROP INDEX hydro_indx
>
> The SELECT time returns to about 80 seconds, confirming that it is the
> index which is slowing things down.  What is going on?  I have
> repeated and confirmed these timings.
>
> I have listened for disk chatter and monitored the system, and it does
> not seem to be thrashing swap, or otherwise becoming unresponsive.
>
> I have two questions:
>
>1) Why is the index making things slower?
>2) How can I make my SELECT statement faster?  The primary
> selection will be done
>   on the "depth" and "water_depth" keys.
>
> I thank you for your time.
>
> Cheers,
> Jamie Pringle
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Adding index to table makes SELECT much slower. Why?

2008-10-08 Thread James Pringle
Hi-

  I am new to sqlite, and am having a puzzling problem.  I have read
that adding an INDEX to a sqlite table can never make a SELECT
statement slower.  However, in my case, it seems to be making the
select statement an order of magnitude slower, which is not what I
wanted!  What could be going on?

  I am calling SQLite from python 2.5 from fink useing the sqlite3
module on OS X 10.5 with 2Gb of memory.  My data base contains 32
million lines of ocean hydrographic data, stored on disk, with the
table created with the following statement:

CREATE TABLE hydro (lat REAL, lon REAL, year INTEGER,
month INTEGER, day INTEGER, time REAL, cast_id INTEGER,
depth REAL, T REAL, S REAL, water_depth REAL)

When I perform a SELECT that returns about 0.6% of the data in the
database (about 200,000 records)

SELECT lat, lon, depth FROM hydro WHERE depth>= 49.01 AND depth <= 50.99

It selects and returns the records in 82 seconds.  I wished to improve
on this time, so I created an index with the following command:

CREATE INDEX hydro_indx ON hydro (depth)

I then tried the SELECT command again, and read the results into
memory, and it took 717 seconds!?!

The "depth" field contains many distinct numeric values, with no
specific value matching more then 0.5% of the data base.  When I DROP
the index with the command

DROP INDEX hydro_indx

The SELECT time returns to about 80 seconds, confirming that it is the
index which is slowing things down.  What is going on?  I have
repeated and confirmed these timings.

I have listened for disk chatter and monitored the system, and it does
not seem to be thrashing swap, or otherwise becoming unresponsive.

I have two questions:

1) Why is the index making things slower?
2) How can I make my SELECT statement faster?  The primary
selection will be done
   on the "depth" and "water_depth" keys.

I thank you for your time.

Cheers,
Jamie Pringle
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users