Hello , we are facing performance regression in sqlite version 3.8 and higher versions on queries over r-tree tables and are unable to solve them , so we are asking professionals to consider severity of described problem: description is little Littler but , contains everything relevat. we have 1 table containing set of nodes with their X,Y coordinates Base_node_set (ID,X,Y) Then we have r-tree table with bounding boxes BBox_set_r_tree (ID,MAX_X,MIN_X,MAX_Y,MIN_Y) then we want to select for all nodes all its bounding boxes , the node lies in we do it by executing this statement
select * from base_node_set as base join BBox_set_r_tree as r_tree on r_tree.MIN_X <= base.X and r_tree.MAX_X >= base.X and r_tree.MIN_Y <= base.Y and r_tree.MAX_Y >= base.Y order by base.ID in prior versions of sqlite query planner query plan was: ORDER DETAIL 1. SCAN TABLE base_node_set AS base USING INTEGER PRIMARY KEY (~1000000 rows) 2. SCAN TABLE BBox_set_r_tree AS r_tree VIRTUAL TABLE INDEX 2:BaDbBc (~0 rows) On sqlite 3.8.0 and higher query plan of the same statement is little different order DETAIL 1. SCAN TABLE osm_road_nodes_r_tree AS r_tree VIRTUAL TABLE INDEX 2: 2. SCAN TABLE sp_house_numbers AS base 0. 0 USE TEMP B-TREE FOR ORDER BY does first plan say?: Scan nodes and for every node find its bboxes where the node lies (no step for order by is required because of use primary integer key in 1. scan) Does second one say? scan bboxes and for every bbox find nodes in it, then order result by nodes ID Do I interpret such query plan correctly? If so, I think that this is a bug in query planning with r_trees. I dont know how to force new version of sqlite query planner to use old version plan. Have tried to use CROSS JOIN instead of JOIN and the resulting plan was the same. we now use the newest 3.8.0.2 and compile time options are: -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_MEMORY_MANAGEMENT=1 -DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_STAT3=1 analyze has been run with no change to query plan, this both tables contain milions of entries, but is more often that the rtree table is much more larger than node set table, so it‘s our intention to reduce searches in r_tree table to minimum because of its size. We also ran such queries on less populated tables, but, the diference in query plans makes huge impact on performance on them too. I have noticed 3.8.1 core function unlikely() ,but in such simple query cant find out how the unlikely() function use, select from empty table and then cross join in right order from that tables did not changed query plan as well. What are other options to control query plans over rtree queries plans? I am asking because , we were forced to upgrade sqlite version in our application from 3.7.6 because of occurences of disk I/O error codes in execution of such statements on larger datasets, we run our application on windows OS, after upgrade to 3.8.0 disk I/O errors dissapeared, but now with latest 3.8.0.2 we have completely different logic of query plans over rtree tables and are unable to rewrite them to be executed like in former versions of sqlite. This is a case when NGQP find worst plan and execution time jumps from minutes to years. Can anyone help solve this problem? thank you for any reply Best regards _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users