Re: [sqlite] Performance problem for a simple select with range
First, thanks, your suggestion worked. To my surprise, it was enough to add limit 1 to the original query. So: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum limit 1 takes about 1.398-005 seconds and select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum takes about 3 seconds. Igor Tandetnik wrote: Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? And what would you use as a key for this binary tree? I bet you would utilize additional information that the DB engine doesn't have - that your blocks don't overlap (they don't, right?) Try coming up with a search strategy without making this assumption. Try this: create an index on startIpNum, and run a query like this: select * from blocks, locations where blocks.startIpNum = ? and blocks.locid = locations.locid order by blocks.startIpNum desc limit 1; This gives you the record with the largest value of startIpNum that is still smaller than the threshold, and should be very fast. It can produce a false positive - make the additional check for (? = startIpEnd) in your application code. Don't put this check into the query though, or you will force it back into O(N) behavior in case your target value doesn't fall within any block after all. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - -- View this message in context: http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13509241 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Performance problem for a simple select with range
I'm not an SQL guru by any means, so seeing this made a light go on. Does that mean it is a good idea in the general case to always add limit 1 to a select that you know should only return 1 row? I'm assuming this works because the engine can short-cut out as soon as it finds that first matching row. -Original Message- From: Dani Va [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 31, 2007 8:30 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Performance problem for a simple select with range First, thanks, your suggestion worked. To my surprise, it was enough to add limit 1 to the original query. So: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum limit 1 takes about 1.398-005 seconds and select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum takes about 3 seconds. Igor Tandetnik wrote: Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? And what would you use as a key for this binary tree? I bet you would utilize additional information that the DB engine doesn't have - that your blocks don't overlap (they don't, right?) Try coming up with a search strategy without making this assumption. Try this: create an index on startIpNum, and run a query like this: select * from blocks, locations where blocks.startIpNum = ? and blocks.locid = locations.locid order by blocks.startIpNum desc limit 1; This gives you the record with the largest value of startIpNum that is still smaller than the threshold, and should be very fast. It can produce a false positive - make the additional check for (? = startIpEnd) in your application code. Don't put this check into the query though, or you will force it back into O(N) behavior in case your target value doesn't fall within any block after all. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - -- View this message in context: http://www.nabble.com/Performance-problem-for-a- simple-select-with-range-tf4711654.html#a13509241 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem for a simple select with range
Igor Tandetnik wrote: Try searching for a value that doesn't fall into any block - you'll likely find that the query takes a noticeable time to produce zero records. Pick a large value that's greater than all startIpNum's. Yes, you are right. That's why I'm going with the original query you suggested. Thanks again Dani -- View this message in context: http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13517991 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem for a simple select with range
[Default] On Mon, 29 Oct 2007 15:25:18 +0200, Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations( locidINTEGER PRIMARY KEY, country TEXT, regionTEXT, cityTEXT, postalCode TEXT, latitude REAL, longitude REAL, dmaCode INTEGER, areaCode INTEGER) CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: http://www.maxmind.com/app/geolitecity Blocks table has 2,776,436 rows Locations table has 159,488 rows After inserting the data I run analyze. My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) Disclaimer: I'm a bit new to databases. Performance issues: I use python's sqlite3 module to run the query. With this configuration it takes about 0.6 seconds to complete the query. I think this is too slow. I could write a binary tree myself and have searches like this take, O(log(num_rows)) which is 7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer) Anyway, I thought the problem was that startIpNum, endIpNum are not indexed. So I added indices for them (even tried indexing them both). This only makes the query take about 3 seconds. Ideas anyone? Source: is attached. Thank you for your help Just some suggestions: Index locid in both tables, and rewrite select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum to: select * from blocks INNER JOIN locations USING (locid) where ? = blocks.startIpNum AND ? = blocks.endIpNum or: select * from locations INNER JOIN blocks USING (locid) where ? = blocks.startIpNum AND ? = blocks.endIpNum HTH -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Performance problem for a simple select with range
Dani Valevski [EMAIL PROTECTED] wrote: I think I have a performance problem for a simple select with range. My Tables: CREATE TABLE locations(locidINTEGER PRIMARY KEY, ...); CREATE TABLE blocks( startIpNum INTEGER, endIpNum INTEGER, locId INTEGER) My Data: Blocks table has 2,776,436 rows Locations table has 159,488 rows My Query: select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum (replace ? with a number) To do searches of this kind with maximum efficiency, you normally want to use a 1-dimensional R-Tree index. SQLite does not support RTree indices natively, though it is conceivable that you could write a RTree virtual table extension for SQLite. Without an RTree index, and unless you can exploit the distribution of data in the blocks table, you really cannot do much better than a full table scan on blocks with an indexed lookup of locations for each matching block. That is probably what is happening on your original query before you added indices. -- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -