Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Dennis Cote

[EMAIL PROTECTED] wrote:


INTERSECT would give you x=5 AND y=7.  For x=5 OR y=7 you want UNION.
  


Oops, yes of course.

I was thinking of the higher level problem with circles where 
intersection could be used to find a small subset of the table that 
would then be scanned to locate the points in the circle using the exact 
distance calculation.


select * from ex4 where rowid in
   (
   select rowid from ex4 where x between :x_center - :radius and 
:x_center + :radius

   intersect
   select rowid from ex4 where y between :y_center - :radius and 
:y_center + :radius

   )

This will find points in a circumscribed square around the center of the 
circle. The distance calculation would eliminate those points outside 
the circle (i.e. the points in the corners of the squares).


Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread bash
On Wed, 9 May 2007 21:00:46 +0400
Tomash Brechko <[EMAIL PROTECTED]> wrote:

> On Wed, May 09, 2007 at 14:45:33 +, [EMAIL PROTECTED] wrote:
> > You need an R-Tree index to do something like this.  The
> > public-domain version of SQLite only supports B-Tree indices.
> > So, no, indices are not going to help you here.
> 
> Alternatively to R-tree index, you may simply partition the space into
> NxM cells, with, say, left and bottom border belonging to the cell
> itself (while right and upper borders belonging to the right and upper
> cells as their left and bottom borders respectively), and enumerate
> these cells row-by-row like
> 
>   10|11|12|13|14
>  ---+--+--+--+---
>5| 6| 7| 8| 9
>  ---+--+--+--+---
>0| 1| 2| 3| 4
> 
> 
> This way every point belongs to exactly one cell.  Then you create
> 
>CREATE TABLE map (
>x INTEGER,
>y INTEGER,
>name TEXT,
>cell_no INTEGER
>);
>CREATE INDEX map_cell_no ON map (cell_no);
> 
> When inserting a point, you compute its cell_no (something like
> 
>   cell_no(x, y) = y / cell_height * cells_in_row + x / cell_width;
> 
> 
> ).  When doing a region query, you compute a set of cell numbers that
> intersect with a query window, accumulate them in a (memory) table
> selected_cells, and then do
> 
>SELECT map.*
>FROM mem.selected_cells sc CROSS JOIN map ON sc.cell_no = map.cell_no;
> 
> Better yet to compute two sets: those cells that reside completely
> within the query window, and those that intersect window border.
> Points from the latter cells should be filtered further.
> 
> Reasonable cell dimensions based on typical query window size and
> points distribution will give quite reasonable performance.

Interesting idea. I'll try to test it.
Dimension for my map is 800x800 and number of points 50,000+.


-- 
Biomechanical Artificial Sabotage Humanoid

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Tomash Brechko
On Wed, May 09, 2007 at 14:45:33 +, [EMAIL PROTECTED] wrote:
> You need an R-Tree index to do something like this.  The
> public-domain version of SQLite only supports B-Tree indices.
> So, no, indices are not going to help you here.

Alternatively to R-tree index, you may simply partition the space into
NxM cells, with, say, left and bottom border belonging to the cell
itself (while right and upper borders belonging to the right and upper
cells as their left and bottom borders respectively), and enumerate
these cells row-by-row like

  10|11|12|13|14
 ---+--+--+--+---
   5| 6| 7| 8| 9
 ---+--+--+--+---
   0| 1| 2| 3| 4


This way every point belongs to exactly one cell.  Then you create

   CREATE TABLE map (
   x INTEGER,
   y INTEGER,
   name TEXT,
   cell_no INTEGER
   );
   CREATE INDEX map_cell_no ON map (cell_no);

When inserting a point, you compute its cell_no (something like

  cell_no(x, y) = y / cell_height * cells_in_row + x / cell_width;


).  When doing a region query, you compute a set of cell numbers that
intersect with a query window, accumulate them in a (memory) table
selected_cells, and then do

   SELECT map.*
   FROM mem.selected_cells sc CROSS JOIN map ON sc.cell_no = map.cell_no;

Better yet to compute two sets: those cells that reside completely
within the query window, and those that intersect window border.
Points from the latter cells should be filtered further.

Reasonable cell dimensions based on typical query window size and
points distribution will give quite reasonable performance.


-- 
   Tomash Brechko

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread drh
Dennis Cote <[EMAIL PROTECTED]> wrote:
> >
> > Why full table scan? :/
> > SQLite can takes set (1) of rowid by ex(x) index for
> > "X=5". Then takes another set (2) of rowid by ex(y) for "Y=7".
> > Then SQLite need only to union this two set (1) and (2).
> > Final SQLite should returns rows where rowid in (set1 union set2).
> >
> >
> >   
> I think you mean intersection where you have used union. SQLite won't 
> optimize the query this way, but you can do it manually.
> 
> Instead of
> 
> select * from ex4 where x = 5 or y = 7;
> 
> You can do this
> 
> select * from ex4 where rowid in
> (
> select rowid from ex4 where x = 5
> intersect
> select rowid from ex4 where y = 7
> );
> 
> The intersect operation allows each of the sub-selects to be executed 
> using an independent index, and the outer select uses the implicit index 
> on the rowid.
> 

INTERSECT would give you x=5 AND y=7.  For x=5 OR y=7 you want UNION.
--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Dennis Cote

bash wrote:


Why full table scan? :/
SQLite can takes set (1) of rowid by ex(x) index for
"X=5". Then takes another set (2) of rowid by ex(y) for "Y=7".
Then SQLite need only to union this two set (1) and (2).
Final SQLite should returns rows where rowid in (set1 union set2).


  
I think you mean intersection where you have used union. SQLite won't 
optimize the query this way, but you can do it manually.


Instead of

   select * from ex4 where x = 5 or y = 7;

You can do this

   select * from ex4 where rowid in
   (
   select rowid from ex4 where x = 5
   intersect
   select rowid from ex4 where y = 7
   );

The intersect operation allows each of the sub-selects to be executed 
using an independent index, and the outer select uses the implicit index 
on the rowid.


HTH
Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread bash
On Wed, 9 May 2007 11:08:26 -0400
"Samuel R. Neff" <[EMAIL PROTECTED]> wrote:

>  
> I wonder if it would be beneficial to add an additional where clause which
> can prefilter the data so you only need to perform the full calculation on a
> subset of records. 
> 
> I haven't done the math, but let's supposed that point_x is 10 and that for
> any result of your long calculation to be true, then x must be between 5 and
> 15, then you can use the where clause
> 
> WHERE
>   X BETWEEN 5 AND 15
> AND   (point_x - x)^2 + (point_y -y)^2 < R^2;
> 

Yes... it should help a little :)

-- 
Biomechanical Artificial Sabotage Humanoid

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread bash
On Wed, 09 May 2007 14:45:33 +
[EMAIL PROTECTED] wrote:

> bash <[EMAIL PROTECTED]> wrote:
> > 
> > Oh... so this is implementation limitation.
> > Im currently thinking about this table:
> > 
> > CREATE TABLE map (
> > x int,
> > y int,
> > name char
> > );
> > CREATE INDEX map_x ON map(x);
> > CREATE INDEX map_y ON map(y);
> > 
> > And query for it will be something like this (circle):
> > SELECT name
> >   FROM map
> >   WHERE (point_x - x)^2 + (point_y -y)^2 < R^2;
> > 
> > How SQLite will works? Is there any benefit in indexes?
> > 
> 
> You need an R-Tree index to do something like this.  The
> public-domain version of SQLite only supports B-Tree indices.
> So, no, indices are not going to help you here.

Thanks for advice.

-- 
Biomechanical Artificial Sabotage Humanoid

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Ken

[EMAIL PROTECTED] wrote:Note that some client/server database engines (ex: 
PostgreSQL
and I think also firebird) will automatically rewrite the
original query into something logically similar to my
second example above.  But the query optimizer in SQLite 
does not attempt to be quite that clever.

--
D. Richard Hipp 


I  think that is a good thing that sqlite isn't that clever! I've seen too many 
cases of advanced optimizers such as Oracle totally mess up the execution plans 
for complex query operations. There are simply too many choices for the 
optimizer to pick the best plan. 
 
 Ken


RE: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Samuel R. Neff
 
I wonder if it would be beneficial to add an additional where clause which
can prefilter the data so you only need to perform the full calculation on a
subset of records. 

I haven't done the math, but let's supposed that point_x is 10 and that for
any result of your long calculation to be true, then x must be between 5 and
15, then you can use the where clause

WHERE
X BETWEEN 5 AND 15
AND (point_x - x)^2 + (point_y -y)^2 < R^2;


If you can make this type of pre-calculation and get a proper range for X
then it can reduce the number of records that need to be checked.  I'm
assuming SQLite will use an index on X for BETWEEN (I don't know for sure).
Also you should do testing to be sure this query really is faster in
practice--I'm only theorizing here.  :-)

HTH,

Sam


---
We're Hiring! Seeking a passionate developer to join our team building
products. Position is in the Washington D.C. metro area. If interested
contact [EMAIL PROTECTED]
 
-Original Message-
From: bash [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 09, 2007 10:33 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

On Wed, 9 May 2007 18:13:07 +0400
Tomash Brechko <[EMAIL PROTECTED]> wrote:

Im currently thinking about this table:

CREATE TABLE map (
x int,
y int,
name char
);
CREATE INDEX map_x ON map(x);
CREATE INDEX map_y ON map(y);

And query for it will be something like this (circle):
SELECT name
  FROM map
  WHERE (point_x - x)^2 + (point_y -y)^2 < R^2;

How SQLite will works? Is there any benefit in indexes?



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread drh
bash <[EMAIL PROTECTED]> wrote:
> 
> Oh... so this is implementation limitation.
> Im currently thinking about this table:
> 
> CREATE TABLE map (
>   x int,
>   y int,
>   name char
> );
> CREATE INDEX map_x ON map(x);
> CREATE INDEX map_y ON map(y);
> 
> And query for it will be something like this (circle):
> SELECT name
>   FROM map
>   WHERE (point_x - x)^2 + (point_y -y)^2 < R^2;
> 
> How SQLite will works? Is there any benefit in indexes?
> 

You need an R-Tree index to do something like this.  The
public-domain version of SQLite only supports B-Tree indices.
So, no, indices are not going to help you here.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread bash
On Wed, 9 May 2007 18:13:07 +0400
Tomash Brechko <[EMAIL PROTECTED]> wrote:

> On Wed, May 09, 2007 at 17:45:52 +0400, bash wrote:
> > > One index per table rule.  At first glance it seems like SQLite could
> > > use at least one index for "x=5 OR y=7" case too, but there is no
> > > point in that, as the other part of the OR would require full table
> > > scan anyway.
> > 
> > Why full table scan? :/
> > SQLite can takes set (1) of rowid by ex(x) index for
> > "X=5". Then takes another set (2) of rowid by ex(y) for "Y=7".
> > Then SQLite need only to union this two set (1) and (2).
> > Final SQLite should returns rows where rowid in (set1 union set2).
> 
> You should read it the following way: "SQLite can't use two indexes
> per table, and using only one index is pointless, hence no index is
> used at all".
> 
> So your question is actually "why SQLite uses at most one index per
> table?".  My guess is that the benefits are out-weighted by the
> implementation complexity.

Oh... so this is implementation limitation.
Im currently thinking about this table:

CREATE TABLE map (
x int,
y int,
name char
);
CREATE INDEX map_x ON map(x);
CREATE INDEX map_y ON map(y);

And query for it will be something like this (circle):
SELECT name
  FROM map
  WHERE (point_x - x)^2 + (point_y -y)^2 < R^2;

How SQLite will works? Is there any benefit in indexes?

-- 
Biomechanical Artificial Sabotage Humanoid

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Tomash Brechko
On Wed, May 09, 2007 at 17:45:52 +0400, bash wrote:
> > One index per table rule.  At first glance it seems like SQLite could
> > use at least one index for "x=5 OR y=7" case too, but there is no
> > point in that, as the other part of the OR would require full table
> > scan anyway.
> 
> Why full table scan? :/
> SQLite can takes set (1) of rowid by ex(x) index for
> "X=5". Then takes another set (2) of rowid by ex(y) for "Y=7".
> Then SQLite need only to union this two set (1) and (2).
> Final SQLite should returns rows where rowid in (set1 union set2).

You should read it the following way: "SQLite can't use two indexes
per table, and using only one index is pointless, hence no index is
used at all".

So your question is actually "why SQLite uses at most one index per
table?".  My guess is that the benefits are out-weighted by the
implementation complexity.


-- 
   Tomash Brechko

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread drh
Tomash Brechko <[EMAIL PROTECTED]> wrote:
> 
>   sqlite> explain query plan
>  ...>   SELECT id, n1, n2
>  ...>   FROM tbl
>  ...>   WHERE  n1 = $I
>  ...> UNION
>  ...>   SELECT id, n1, n2
>  ...>   FROM tbl
>  ...>   WHERE  n2 = $I
>  ...> ORDER BY id DESC;
>   0|0|TABLE tbl WITH INDEX idx1
>   0|0|TABLE tbl WITH INDEX idx2
> 

Correct.  Notice, however, that the UNION is not strictly
equivalent to the original query.  The UNION query above
gives the same results as:

   SELECT DISTINCT id, n1, n2 FROM tbl WHERE n1=$I OR n2=$I;

Perhaps the added DISTINCT will make no difference in
the output.  If so, then the UNION is the best way to go.
But if DISTINCT will combine records that you do not want
to be combined, then you might consider rewriting the query
as follows:

   SELECT id, n1, n2 FROM tbl
WHERE rowid in (
SELECT rowid FROM tbl WHERE n1=$I
UNION
SELECT rowid FROM tbl WHERE n2=$I
)

Note that some client/server database engines (ex: PostgreSQL
and I think also firebird) will automatically rewrite the
original query into something logically similar to my
second example above.  But the query optimizer in SQLite 
does not attempt to be quite that clever.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread bash
On Wed, 9 May 2007 17:29:29 +0400
Tomash Brechko <[EMAIL PROTECTED]> wrote:

> 
> On Wed, May 09, 2007 at 16:32:34 +0400, bash wrote:
> > SELECT * FROM ex1 WHERE x>'abc' AND y>'abc';
> > In this form only one indexes will be used, why not both?
> 
> One index per table rule.  At first glance it seems like SQLite could
> use at least one index for "x=5 OR y=7" case too, but there is no
> point in that, as the other part of the OR would require full table
> scan anyway.

Why full table scan? :/
SQLite can takes set (1) of rowid by ex(x) index for
"X=5". Then takes another set (2) of rowid by ex(y) for "Y=7".
Then SQLite need only to union this two set (1) and (2).
Final SQLite should returns rows where rowid in (set1 union set2).


> And for the solution with the UNION,
> 
>   sqlite> explain query plan
>  ...>   SELECT id, n1, n2
>  ...>   FROM tbl
>  ...>   WHERE  n1 = $I
>  ...> UNION
>  ...>   SELECT id, n1, n2
>  ...>   FROM tbl
>  ...>   WHERE  n2 = $I
>  ...> ORDER BY id DESC;
>   0|0|TABLE tbl WITH INDEX idx1
>   0|0|TABLE tbl WITH INDEX idx2

Yep... Im using now this construction.

-- 
Biomechanical Artificial Sabotage Humanoid

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Tomash Brechko
On Wed, May 09, 2007 at 16:34:57 +0400, bash wrote:
> On Wed, 9 May 2007 14:24:27 +0400
> Tomash Brechko <[EMAIL PROTECTED]> wrote:
> > From http://www.sqlite.org/optoverview.html section 6.0:
> > 
> >   Each table in the FROM clause of a query can use at most one index...
> > 
> > So the first query can't benefit from both idx1 and idx2.  You may use
> > 
> >   EXPLAIN QUERY PLAN SELECT ...
> > 
> > to see what indexes will be used.
> 
> If i understand right from previous post by Peter there will be not
> used any indexes because of "OR".

I think those slides are a bit outdated.  On the same
http://www.sqlite.org/optoverview.html page the section "3.0: The OR
optimization" says that the query from the slide 52

  SELECT * FROM ex4 WHERE x=5 OR x=7;

will be rewritten as

  SELECT * FROM ex4 WHERE x IN (5, 7);

and IN can use indexes.  But "x=5 OR y=7" (i.e. conditions on
_different_ columns) can't be rewritten that way---exactly your
situation.


On Wed, May 09, 2007 at 16:32:34 +0400, bash wrote:
> SELECT * FROM ex1 WHERE x>'abc' AND y>'abc';
> In this form only one indexes will be used, why not both?

One index per table rule.  At first glance it seems like SQLite could
use at least one index for "x=5 OR y=7" case too, but there is no
point in that, as the other part of the OR would require full table
scan anyway.

And for the solution with the UNION,

  sqlite> explain query plan
 ...>   SELECT id, n1, n2
 ...>   FROM tbl
 ...>   WHERE  n1 = $I
 ...> UNION
 ...>   SELECT id, n1, n2
 ...>   FROM tbl
 ...>   WHERE  n2 = $I
 ...> ORDER BY id DESC;
  0|0|TABLE tbl WITH INDEX idx1
  0|0|TABLE tbl WITH INDEX idx2

both indexes are used.


-- 
   Tomash Brechko

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread bash
On Wed, 9 May 2007 12:23:14 +0200
Peter van Dijk <[EMAIL PROTECTED]> wrote:

> 
> On 9-mei-2007, at 11:28, bash wrote:
> 
> > SELECT type, stamp_id, old_player_id, new_player_id
> > FROM town_log
> > WHERE old_player_id = $ID OR new_player_id = $ID
> > ORDER BY stamp_id DESC;
> >
> > This query works really slowly and i don't know why :/
> > For example, the same result by another QUERY work much faster!
> >
> > SELECT type, stamp_id, old_player_id, new_player_id
> > FROM town_log
> > WHERE old_player_id = $ID
> > UNION
> > SELECT type, stamp_id, old_player_id, new_player_id
> > FROM town_log
> > WHERE new_player_id = %d
> > ORDER BY stamp_id DESC;
> 
> Hello Humanoid,
> 
> UNION performing much better than an equivalent query with OR is a  
> known 'limitation' in many database systems, including MySQL and  
> SQLite. Page 52 of http://www.sqlite.org/php2004/slides-all.html  
> explicitly recommends to use UNION here.
> 
> Cheers, Peter.


Thanks a lot! I don't know that information :)
Moreover this papers is really interesting.
But i don't understand about indexes this:
http://www.sqlite.org/php2004/slides-all.html
Page 54 of 63

For example:
CREATE TABLE ex1(
   id INTEGER PRIMARY KEY,
   x ,
   y
);
CREATE INDEX idx1 ON ex1(x);
CREATE INDEX idx2 ON ex1(y);

SELECT * FROM ex1 WHERE x>'abc' AND y>'abc';
In this form only one indexes will be used, why not both?

For example by idx1 we can get set1 of ROWIDs which is satisfy "x >
'abc'" term, by idx2 we can get set2 of ROWIDs which is satisfy
"y>'abc'" term and then just union set1 with set2. So next we should
just extracts rows with ROWIDS in this union


-- 
Biomechanical Artificial Sabotage Humanoid

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] perfomance degradation for expr "foo = X or bar =X"

2007-05-09 Thread Peter van Dijk


On 9-mei-2007, at 11:28, bash wrote:


SELECT type, stamp_id, old_player_id, new_player_id
FROM town_log
WHERE old_player_id = $ID OR new_player_id = $ID
ORDER BY stamp_id DESC;

This query works really slowly and i don't know why :/
For example, the same result by another QUERY work much faster!

SELECT type, stamp_id, old_player_id, new_player_id
FROM town_log
WHERE old_player_id = $ID
UNION
SELECT type, stamp_id, old_player_id, new_player_id
FROM town_log
WHERE new_player_id = %d
ORDER BY stamp_id DESC;


Hello Humanoid,

UNION performing much better than an equivalent query with OR is a  
known 'limitation' in many database systems, including MySQL and  
SQLite. Page 52 of http://www.sqlite.org/php2004/slides-all.html  
explicitly recommends to use UNION here.


Cheers, Peter.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-