having said all that, i don't know if the optimizer will pick the population index - since it will think it has to scan all rows of that index to figure out what rows meet the lat/long conditions.
As you point out knowing that only the top 50 rows are necessary is key to costing the plan. Mike Matrigali wrote: > I'll start by saying that I am not an optimizer expert. I don't think > the optimizer does anything special with limits - but I could be wrong, > maybe someone else with more knowledge in that area can let us know. > > Having said that, the optimizer definitely will try to eliminate sorts > necessary by order by if there are existing indexes which it can use > to get the necessary order - in the case below it needs a descending > index on the population column in cities. Do you have such an index? > > I think without such an index there is no choice in derby but to find > all 50,000 > rows so that it can determine the 50 most populous, which meet your > criteria. The Derby optimizer does not consider creating intermediate > sort nodes in the query plan. Also the sorter is not optimized for > limited result sets, it > tends to randomly create merge buckets and then sort those merge buckets > and finally merge all the merge buckets - that is about the opposite > of what you want for a limited result. You rather have buckets for > ranges and then once you had 50 more than a given value you could throw > away all those bigger/less than the value. > > The limit function as you say I believe mostly just cuts off the rows, > it doesn't affect the processing much. When possible derby trys to > stream rows back to the user rather than processing them all before > returning the first row - but an order by without a supporting index is > a case where all the results of the query will be determined and then > as the final step they are thrown into the sorter before returning > results to the caller. Derby query execution does not add intermediate > sort nodes. > > Stan wrote: > > >>Does Derby optimize queries that use "ORDER BY" and "setMaxRows()" when >>indexes are present? >> >>My situation: I have an existing Derby table listing 2 million cities, >>with the name, latitude, longitude, and population of each city. I want to >>efficiently find the 50 most populous cities between (for example) >>latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9. The query is simple: >> >>Statement s = >>DriverManager.getConnection("jdbc:derby:test;create=false").createStatement(); >> >>s.setMaxRows(50); >> >>rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and >>lon>19.8 and lon<27.9 ORDER BY population desc"); >> >>There are indexes on lat, lon, and population, but the query seems to take >>a long time. In particular, setting "s.setMaxRows(50)" doesn't seem to >>speed things up at all. It looks like Derby finds ALL the cities in the >>specified latitude/longitude range (over 50,000 of them), and that >>setMaxRows() just limits how many rows it shows me, not how many rows it >>computes. >> >>I know that MySQL optimizes queries with LIMITs -- does Derby do the same? >> >>At the risk of offending, is there a better way to do what I want (by >>creating my own data structures for example) without necessarily using >>Derby? >> >>(In case anyone's interested, I'm trying to update a map, so finding the >>biggest cities in a given area quickly is important) >> >> > >