Hello , we are facing performance regression in sqlite version 3.8 and higher
versions on queries over r-tree tables and are unable to solve them , so we are
asking professionals to consider severity of described problem: description is
little Littler but , contains everything relevat.
we have 1 table containing set of nodes with their X,Y coordinates
Base_node_set (ID,X,Y)
Then we have r-tree table with bounding boxes
BBox_set_r_tree (ID,MAX_X,MIN_X,MAX_Y,MIN_Y)
then we want to select for all nodes all its bounding boxes , the node lies in
we do it by executing this statement
select *
from base_node_set as base
join BBox_set_r_tree as r_tree on
r_tree.MIN_X <= base.X and
r_tree.MAX_X >= base.X and
r_tree.MIN_Y <= base.Y and
r_tree.MAX_Y >= base.Y
order by base.ID
in prior versions of sqlite query planner query plan was:
ORDER DETAIL
1.SCAN TABLE base_node_set AS base USING INTEGER
PRIMARY KEY (~100 rows)
2.SCAN TABLE BBox_set_r_tree AS r_tree VIRTUAL
TABLE INDEX 2:BaDbBc (~0 rows)
On sqlite 3.8.0 and higher query plan of the same statement is little different
order DETAIL
1. SCAN TABLE osm_road_nodes_r_tree AS r_tree
VIRTUAL TABLE INDEX 2:
2. SCAN TABLE sp_house_numbers AS base
0. 0 USE TEMP B-TREE FOR ORDER BY
does first plan say?:
Scan nodes and for every node find its bboxes where the node lies (no step for
order by is required because of use primary integer key in 1. scan)
Does second one say?
scan bboxes and for every bbox find nodes in it, then order result by nodes ID
Do I interpret such query plan correctly?
If so, I think that this is a bug in query planning with r_trees.
I dont know how to force new version of sqlite query planner to use old version
plan.
Have tried to use CROSS JOIN instead of JOIN and the resulting plan was the
same.
we now use the newest 3.8.0.2 and compile time options are:
-DSQLITE_THREADSAFE=1
-DSQLITE_ENABLE_MEMORY_MANAGEMENT=1
-DSQLITE_ENABLE_RTREE=1
-DSQLITE_ENABLE_STAT3=1
analyze has been run with no change to query plan,
this both tables contain milions of entries, but is more often that the rtree
table is much more larger than node set table,
so it‘s our intention to reduce searches in r_tree table to minimum because of
its size. We also ran such queries on less populated tables, but, the diference
in query plans makes huge impact on performance on them too.
I have noticed 3.8.1 core function unlikely() ,but in such simple query cant
find out how the unlikely() function use,
select from empty table and then cross join in right order from that tables did
not changed query plan as well. What are other options to control query plans
over rtree queries plans?
I am asking because , we were forced to upgrade sqlite version in our
application from 3.7.6 because of occurences of disk I/O error codes in
execution of such statements on larger datasets, we run our application on
windows OS, after upgrade to 3.8.0 disk I/O errors dissapeared, but now with
latest 3.8.0.2 we have completely different logic of query plans over rtree
tables and are unable to rewrite them to be executed like in former versions of
sqlite. This is a case when NGQP find worst plan and execution time jumps from
minutes to years.
Can anyone help solve this problem?
thank you for any reply
Best regards
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users