Re: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Dani Va

First, thanks, your suggestion worked. 

To my surprise, it was enough to add limit 1 to the original query.

So:

select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum limit 1
takes about 1.398-005 seconds 

and 

select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum 
takes about 3 seconds.





Igor Tandetnik wrote:
 
 Dani Valevski [EMAIL PROTECTED] wrote:
 I think I have a performance problem for a simple select with range.

 My Tables:
 CREATE TABLE locations(
locidINTEGER PRIMARY KEY,
country TEXT,
regionTEXT,
cityTEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)

 CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)

 My Data:
 http://www.maxmind.com/app/geolitecity
 Blocks table has 2,776,436 rows
 Locations table has 159,488 rows

 After inserting the data I run analyze.

 My Query:
 select * from blocks,locations where locations.locid = blocks.locid
 AND ? = blocks.startIpNum AND ? = blocks.endIpNum
 (replace ? with a number)

 Performance issues:
 I use python's sqlite3 module to run the query.
 With this configuration it takes about 0.6 seconds to complete the
 query. I
 think this is too slow. I could write a binary tree myself and have
 searches
 like this take, O(log(num_rows)) which is
 7*something_which_shouldnt_take_too_much. Am I wrong?
 
 And what would you use as a key for this binary tree? I bet you would 
 utilize additional information that the DB engine doesn't have - that 
 your blocks don't overlap (they don't, right?) Try coming up with a 
 search strategy without making this assumption.
 
 Try this: create an index on startIpNum, and run a query like this:
 
 select * from blocks, locations
 where blocks.startIpNum = ? and blocks.locid = locations.locid
 order by blocks.startIpNum desc limit 1;
 
 This gives you the record with the largest value of startIpNum that is 
 still smaller than the threshold, and should be very fast. It can 
 produce a false positive - make the additional check for (? = 
 startIpEnd) in your application code. Don't put this check into the 
 query though, or you will force it back into O(N) behavior in case your 
 target value doesn't fall within any block after all.
 
 Igor Tandetnik 
 
 
 -
 To unsubscribe, send email to [EMAIL PROTECTED]
 -
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13509241
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Doug
I'm not an SQL guru by any means, so seeing this made a light go on.  Does
that mean it is a good idea in the general case to always add limit 1 to a
select that you know should only return 1 row?  I'm assuming this works
because the engine can short-cut out as soon as it finds that first matching
row.

 -Original Message-
 From: Dani Va [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 31, 2007 8:30 AM
 To: sqlite-users@sqlite.org
 Subject: Re: [sqlite] Performance problem for a simple select with range
 
 
 First, thanks, your suggestion worked.
 
 To my surprise, it was enough to add limit 1 to the original query.
 
 So:
 
 select * from blocks,locations where locations.locid = blocks.locid AND ?
=
 blocks.startIpNum AND ? = blocks.endIpNum limit 1
 takes about 1.398-005 seconds
 
 and
 
 select * from blocks,locations where locations.locid = blocks.locid AND ?
=
 blocks.startIpNum AND ? = blocks.endIpNum
 takes about 3 seconds.
 
 
 
 
 
 Igor Tandetnik wrote:
 
  Dani Valevski [EMAIL PROTECTED] wrote:
  I think I have a performance problem for a simple select with range.
 
  My Tables:
  CREATE TABLE locations(
 locidINTEGER PRIMARY KEY,
 country TEXT,
 regionTEXT,
 cityTEXT,
 postalCode TEXT,
 latitude REAL,
 longitude REAL,
 dmaCode INTEGER,
 areaCode INTEGER)
 
  CREATE TABLE blocks(
 startIpNum INTEGER,
 endIpNum INTEGER,
 locId INTEGER)
 
  My Data:
  http://www.maxmind.com/app/geolitecity
  Blocks table has 2,776,436 rows
  Locations table has 159,488 rows
 
  After inserting the data I run analyze.
 
  My Query:
  select * from blocks,locations where locations.locid = blocks.locid
  AND ? = blocks.startIpNum AND ? = blocks.endIpNum
  (replace ? with a number)
 
  Performance issues:
  I use python's sqlite3 module to run the query.
  With this configuration it takes about 0.6 seconds to complete the
  query. I
  think this is too slow. I could write a binary tree myself and have
  searches
  like this take, O(log(num_rows)) which is
  7*something_which_shouldnt_take_too_much. Am I wrong?
 
  And what would you use as a key for this binary tree? I bet you would
  utilize additional information that the DB engine doesn't have - that
  your blocks don't overlap (they don't, right?) Try coming up with a
  search strategy without making this assumption.
 
  Try this: create an index on startIpNum, and run a query like this:
 
  select * from blocks, locations
  where blocks.startIpNum = ? and blocks.locid = locations.locid
  order by blocks.startIpNum desc limit 1;
 
  This gives you the record with the largest value of startIpNum that is
  still smaller than the threshold, and should be very fast. It can
  produce a false positive - make the additional check for (? =
  startIpEnd) in your application code. Don't put this check into the
  query though, or you will force it back into O(N) behavior in case your
  target value doesn't fall within any block after all.
 
  Igor Tandetnik
 
 
 

-
  To unsubscribe, send email to [EMAIL PROTECTED]
 

-
 
 
 
 
 --
 View this message in context:
http://www.nabble.com/Performance-problem-for-a-
 simple-select-with-range-tf4711654.html#a13509241
 Sent from the SQLite mailing list archive at Nabble.com.
 
 


-
 To unsubscribe, send email to [EMAIL PROTECTED]


-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Dani Va


Igor Tandetnik wrote:
 
 Try searching for a value that doesn't fall into any block - you'll 
 likely find that the query takes a noticeable time to produce zero 
 records. Pick a large value that's greater than all startIpNum's.
 

Yes, you are right. That's why I'm going with the original query you
suggested. 

Thanks again
Dani



-- 
View this message in context: 
http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13517991
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-29 Thread Kees Nuyt
[Default] On Mon, 29 Oct 2007 15:25:18 +0200, Dani Valevski
[EMAIL PROTECTED] wrote:

I think I have a performance problem for a simple select with range.

My Tables:
CREATE TABLE locations(
locidINTEGER PRIMARY KEY,
country TEXT,
regionTEXT,
cityTEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)

CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)

My Data:
http://www.maxmind.com/app/geolitecity
Blocks table has 2,776,436 rows
Locations table has 159,488 rows

After inserting the data I run analyze.

My Query:
select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum
(replace ? with a number)

Disclaimer:
I'm a bit new to databases.

Performance issues:
I use python's sqlite3 module to run the query.
With this configuration it takes about 0.6 seconds to complete the query. I
think this is too slow. I could write a binary tree myself and have searches
like this take, O(log(num_rows)) which is
7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer)

Anyway, I thought the problem was that startIpNum, endIpNum are not indexed.
So I added indices for them (even tried indexing them both). This only makes
the query take about 3 seconds.
Ideas anyone?

Source:
is attached.

Thank you for your help

Just some suggestions:
Index locid in both tables, and rewrite

 select *
   from blocks,locations 
  where locations.locid = blocks.locid
  AND ? = blocks.startIpNum 
  AND ? = blocks.endIpNum

to:

 select *
   from blocks
  INNER JOIN locations USING (locid)
  where ? = blocks.startIpNum 
AND ? = blocks.endIpNum

or:

 select *
   from locations
  INNER JOIN blocks USING (locid)
  where ? = blocks.startIpNum 
AND ? = blocks.endIpNum

HTH
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-29 Thread drh
Dani Valevski [EMAIL PROTECTED] wrote:
 I think I have a performance problem for a simple select with range.
 
 My Tables:
 CREATE TABLE locations(locidINTEGER PRIMARY KEY, ...);
 
 CREATE TABLE blocks(
 startIpNum INTEGER,
 endIpNum INTEGER,
 locId INTEGER)
 
 My Data:
 Blocks table has 2,776,436 rows
 Locations table has 159,488 rows
 
 My Query:
 select * from blocks,locations where locations.locid = blocks.locid AND ? =
 blocks.startIpNum AND ? = blocks.endIpNum
 (replace ? with a number)
 

To do searches of this kind with maximum efficiency, you normally
want to use a 1-dimensional R-Tree index.  SQLite does not support
RTree indices natively, though it is conceivable that you could
write a RTree virtual table extension for SQLite.

Without an RTree index, and unless you can exploit the distribution
of data in the blocks table, you really cannot do much better than a
full table scan on blocks with an indexed lookup of locations for
each matching block.  That is probably what is happening on your
original query before you added indices.

--
D. Richard Hipp [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-