On Jan 5, 2017, at 8:50 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > > The index filters using bounding boxes. A long, diagonal route will have a > large bounding box, relative to the area you actually care about (within a > narrow strip of the route). Use ST_Segmentize() to add points to your route, > ST_DumpPoints() to dump those out as point and ST_MakeLine to generate new > lines from those points, each line very short. The maximum index > effectiveness will come when your line length is close to your buffer width. > > P
Ok, I think I understand the concept. So attempting to follow your advice, I modified the query to be: SELECT elevation FROM data WHERE ST_DWithin( location, (SELECT ST_MakeLine(geom)::geography as split_line FROM (SELECT (ST_DumpPoints( ST_Segmentize( ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 71.285833)'), 600 )::geometry )).geom ) s1), 600 ) ORDER BY elevation DESC limit 1; It took some fiddling to find a syntax that Postgresql would accept, but eventually that's what I came up with. Unfortunately, far from improving performance, it killed it - in running the query, it went from 22 seconds to several minutes (EXPLAIn ANALYZE has yet to return a result). Looking at the query execution plan shows, at least partially, why: QUERY PLAN ------------------------------------------------------------------------------ Limit (cost=17119748.98..17119748.98 rows=1 width=4) InitPlan 1 (returns $0) -> Aggregate (cost=17.76..17.77 rows=1 width=32) -> Result (cost=0.00..5.25 rows=1000 width=32) -> Sort (cost=17119731.21..17171983.43 rows=20900890 width=4) Sort Key: data.elevation DESC -> Seq Scan on data (cost=0.00..17015226.76 rows=20900890 width=4) Filter: st_dwithin(location, $0, '600'::double precision) (8 rows) So apparently it is now doing a sequential scan on data rather than using the index. And, of course, sorting 20 million rows is not trivial either. Did I do something wrong with forming the query? ----------------------------------------------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 ----------------------------------------------- > > On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <isr...@ravnalaska.net > <mailto:isr...@ravnalaska.net>> wrote: > I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of latitude > (numeric), longitude(numeric), elevation(integer) data, along with a PostGIS > (2.3.0) geometry column (location), running on a CentOS 6.8 box with 64GB RAM > and a RAID10 SSD data drive. I'm trying to get the maximum elevation along a > path, for which purpose I've come up with the following query (for one > particular path example): > > SELECT elevation FROM data > > > > > WHERE ST_DWithin(location, > ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 > 71.285833)'), 600) > > > ORDER BY elevation > LIMIT 1; > > The EXPLAIN ANALYZE output of this particular query > (https://explain.depesz.com/s/heZ <https://explain.depesz.com/s/heZ>) shows: > > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=4.83..4.83 rows=1 width=4) (actual time=22653.840..22653.842 > rows=1 loops=1) > -> Sort (cost=4.83..4.83 rows=1 width=4) (actual > time=22653.837..22653.837 rows=1 loops=1) > Sort Key: elevation DESC > Sort Method: top-N heapsort Memory: 25kB > -> Index Scan using location_gix on data (cost=0.42..4.82 rows=1 > width=4) (actual time=15.786..22652.041 rows=11081 loops=1) > Index Cond: (location && > '0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography) > Filter: > (('0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography > && _st_expand(location, '600'::double precision)) AND _st_dwithin(location, > '0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography, > '600'::double precision, true)) > Rows Removed by Filter: 4934534 > Planning time: 0.741 ms > Execution time: 22653.906 ms > (10 rows) > > So it is using the index properly, but still takes a good 22 seconds to run, > most of which appears to be in the Index Scan. > > Is there any way to improve this, or is this going to be about as good as it > gets with the number of rows being dealt with? I was planning to use this for > a real-time display - punch in a couple of points, get some information about > the route between, including maximum elevation - but with it taking 22 > seconds for the longer routes at least, that doesn't make for the best user > experience. > > It's perhaps worth noting that the example above is most likely a worst case > scenario. I would expect the vast majority of routes to be significantly > shorter, and I want to say the shorter routes query much faster [testing > needed]. That said, the faster the better, even for short routes :-) > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 <tel:(907)%20450-7293> > ----------------------------------------------- > > > > > >