[sqlite] sqlite 3.8.X query planner bug with r_tree joins(r_tree utilization not working on joins & cross join has no affect on plan with r_tree joins)?

2013-11-11 Thread Martin Riša
Hello,
We are facing performance regression on queries over r_tree tables with any new 
3.8.X sqlite version and we are sure it is connected with new versions of 
sqlite especially with new query planner and r_tree module.
We have distincted by now two bugs:

First one:
We are convienced that new version of sqlite query planner does not utilize 
queries with joins over r_tree tables in the way the r_tree module is intended 
to.
I will try to prove it on this example:

we have table T of nodes with their 2-D coordinates(X,Y) and R_tree virtual 
table R_TREE of 2-D bounding rectangles (MIN_X/Y,MIN_X/Y)
we want to select for every node from T its boung rectangles it lies in from 
R_TREE
we do it by running this statement:
select *  from T
join R_TREE on
  T.X  >= R_TREE.MIN_X  and
  T.X <= R_TREE.MAX_X and
 T.Y  >= R_TREE.MIN_Y  and
  T.Y <= R_TREE.MAX_Y

Explain query plan of such query returns different results in 3.7.X versions 
and 3.8.X  and we think that this difference is responsible for huge 
performance drops on such queries(actually our performance drops are very 
costly from minutes to days of execution times on large tables)

3.7.X explain query plan result:
ORDER DETAIL
1.SCAN TABLE T USING INTEGER PRIMARY KEY (~100 
rows)
2.SCAN TABLE R_TREE VIRTUAL TABLE INDEX 2:BaDbBc 
(~0 rows)

word interpretation : For all nodes find all rectangles where node lies in 
rectangle.

3.8.X explain query plan results:
ORDER DETAIL
1.SCAN TABLE R_TREE VIRTUAL TABLE INDEX 2:
2.SCAN TABLE T

word interpretation: For all rectangles find nodes where node lies in 
rectangle..

Actually our and everybody‘s intention using r_tree in similiar way, is to have 
3.7.X plan.

According to documentation of r_tree module
“R*Tree index is used to narrow a search down to a list of candidate objects 
and then more detailed and expensive computations are done on each candidate to 
find if the candidate truly meets the search criteria.“(source: 
http://www.sqlite.org/rtree.html )

This example is the case where sqlite 3.8.X query plan scans r_tree before any 
criteria could have been chosen.  What is in conflict with documentation 
citation and makes R_tree module unusable efficiently. Can this be fixed to 
make r-tree module usable efficiently like in prior sqlite versions?

Second one:
Cross join has no effect on query plan on join over r_tree.

In previous example when used with cross join instead of join , has no effect 
on plan in contrast to cross join on common(not virtual) table. There must be 
some bug whether in documentation(not mentioning that cross join has no effect 
over r_tree tables) or in implementation of cross join functionality in query 
planning. Can this be fixed too?

Thank you for any reply.
Best regards
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] version 3.8 query planner doesn't find same or better plan for simple query over rtree tables

2013-10-01 Thread Martin Riša
Hello , we are facing performance regression in sqlite version 3.8 and higher 
versions on queries over r-tree tables and are unable to solve them , so we are 
asking professionals to consider severity of described problem: description is 
little Littler but , contains everything relevat.
we have 1 table containing set of nodes with their X,Y coordinates
Base_node_set (ID,X,Y)
Then we have r-tree table with bounding boxes
BBox_set_r_tree (ID,MAX_X,MIN_X,MAX_Y,MIN_Y)
then we want to select for all nodes all its bounding boxes , the node lies in
we do it  by executing this statement

select   *
from base_node_set as base
join BBox_set_r_tree as r_tree on
r_tree.MIN_X <= base.X and
r_tree.MAX_X >= base.X and
r_tree.MIN_Y <= base.Y and
r_tree.MAX_Y >= base.Y
order by base.ID

in prior versions of sqlite query planner query plan was:
ORDER DETAIL
1.SCAN TABLE base_node_set AS base USING INTEGER 
PRIMARY KEY (~100 rows)
2.SCAN TABLE BBox_set_r_tree AS r_tree VIRTUAL 
TABLE INDEX 2:BaDbBc (~0 rows)

On sqlite 3.8.0 and higher query plan of the same statement is little different
order  DETAIL
1. SCAN TABLE osm_road_nodes_r_tree AS r_tree 
VIRTUAL TABLE INDEX 2:
2. SCAN TABLE sp_house_numbers AS base
0. 0  USE TEMP B-TREE FOR ORDER BY

does first plan say?:
Scan nodes and for every node find its bboxes where the node lies (no step for 
order by is required because of use primary integer key in 1. scan)

Does second one say?
scan bboxes and for every bbox find nodes in it, then order result by nodes ID

Do I interpret such query plan correctly?

If so, I think that this is a bug in query planning with r_trees.
I dont know how to force new version of sqlite query planner to use old version 
plan.
Have tried to use CROSS JOIN instead of JOIN and the resulting plan was the 
same.
we now use the newest 3.8.0.2 and compile time options are:
-DSQLITE_THREADSAFE=1
-DSQLITE_ENABLE_MEMORY_MANAGEMENT=1
-DSQLITE_ENABLE_RTREE=1
-DSQLITE_ENABLE_STAT3=1
analyze has been run with no change to query plan,
this both tables contain milions of entries, but is more often that the rtree 
table is much more larger than node set table,
so it‘s our intention to reduce searches in r_tree table to minimum because of 
its size. We also ran such queries on less populated tables, but, the diference 
in query plans makes huge impact on performance on them too.
I have noticed 3.8.1 core function unlikely() ,but in such simple query cant 
find out how the unlikely() function use,
select from empty table and then cross join in right order from that tables did 
not changed query plan as well. What are other options to control query plans 
over rtree queries plans?
I am asking because , we were forced to upgrade  sqlite version in our 
application  from 3.7.6 because of occurences of disk I/O error codes in 
execution of such statements on larger datasets, we run our application on 
windows OS, after upgrade to 3.8.0 disk I/O errors dissapeared, but now with 
latest 3.8.0.2 we have completely different logic of query plans over rtree 
tables and are unable to rewrite them to be executed like in former versions of 
sqlite.  This is a case when NGQP find worst plan and execution time jumps from 
minutes to years.

Can anyone help solve this problem?
thank you for any reply
Best regards

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users