having said all that, i don't know if the optimizer will pick
the population index - since it will think it has to scan all rows
of that index to figure out what rows meet the lat/long conditions.

As you point out knowing that only the top 50 rows are necessary is
key to costing the plan.

Mike Matrigali wrote:

> I'll start by saying that I am not an optimizer expert.  I don't think
> the optimizer does anything special with limits - but I could be wrong,
> maybe someone else with more knowledge in that area can let us know.
> 
> Having said that, the optimizer definitely will try to eliminate sorts
> necessary by order by if there are existing indexes which it can use
> to get the necessary order - in the case below it needs a descending
> index on the population column in cities.  Do you have such an index?
> 
> I think without such an index there is no choice in derby but to find
> all 50,000
> rows so that it can determine the 50 most populous, which meet your
> criteria. The Derby optimizer does not consider creating intermediate
> sort nodes in the query plan.  Also the sorter is not optimized for
> limited result sets, it
> tends to randomly create merge buckets and then sort those merge buckets
> and finally merge all the merge buckets - that is about the opposite
> of what you want for a limited result.  You rather have buckets for
> ranges and then once you had 50 more than a given value you could throw
> away all those bigger/less than the value.
> 
> The limit function as you say I believe mostly just cuts off the rows,
> it doesn't affect the processing much.  When possible derby trys to
> stream rows back to the user rather than processing them all before
> returning the first row - but an order by without a supporting index is
> a case where all the results of the query will be determined and then
> as the final step they are thrown into the sorter before returning
> results to the caller.  Derby query execution does not add intermediate
> sort nodes.
> 
> Stan wrote:
> 
> 
>>Does Derby optimize queries that use "ORDER BY" and "setMaxRows()" when
>>indexes are present?
>>
>>My situation: I have an existing Derby table listing 2 million cities,
>>with the name, latitude, longitude, and population of each city. I want to
>>efficiently find the 50 most populous cities between (for example)
>>latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9. The query is simple:
>>
>>Statement s =
>>DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
>>            
>>s.setMaxRows(50);
>>            
>>rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and 
>>lon>19.8 and lon<27.9 ORDER BY population desc");
>>
>>There are indexes on lat, lon, and population, but the query seems to take
>>a long time. In particular, setting "s.setMaxRows(50)" doesn't seem to
>>speed things up at all. It looks like Derby finds ALL the cities in the
>>specified latitude/longitude range (over 50,000 of them), and that
>>setMaxRows() just limits how many rows it shows me, not how many rows it
>>computes.
>>
>>I know that MySQL optimizes queries with LIMITs -- does Derby do the same?
>>
>>At the risk of offending, is there a better way to do what I want (by
>>creating my own data structures for example) without necessarily using
>>Derby?
>>
>>(In case anyone's interested, I'm trying to update a map, so finding the
>>biggest cities in a given area quickly is important)
>>
>>
> 
> 

Reply via email to