> On Jan 5, 2017, at 10:38 AM, Paul Ramsey <pram...@cleverelephant.ca> wrote: > > Yes, you did. You want a query that spits out a tupleset of goemetries (one > each for each wee segment), and then you can join that set to your main table > using st_dwithin() as the join clause. > So start by ditching the main table and just work on a query that generates a > pile of wee segments.
Ahhh, I see you've done this sort of thing before (http://blog.cleverelephant.ca/2015/02/breaking-linestring-into-segments.html <http://blog.cleverelephant.ca/2015/02/breaking-linestring-into-segments.html>) :-) So following that advice I came up with the following query: WITH dump AS (SELECT ST_DumpPoints( ST_Segmentize( ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 71.285833)'), 600 )::geometry ) as pt ), pts AS ( SELECT (pt).geom, (pt).path[1] as vert FROM dump ) SELECT elevation FROM data INNER JOIN (SELECT ST_MakeLine(ARRAY[a.geom, b.geom]) as short_line FROM pts a INNER JOIN pts b ON a.vert=b.vert-1 AND b.vert>1) segments ON ST_DWithin(location, segments.short_line, 600) ORDER BY elevation DESC limit 1; Which yields the following EXPLAIN ANALYZE (https://explain.depesz.com/s/RsTD <https://explain.depesz.com/s/ukwc>): QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=11611706.90..11611706.91 rows=1 width=4) (actual time=1171.814..1171.814 rows=1 loops=1) CTE dump -> Result (cost=0.00..5.25 rows=1000 width=32) (actual time=0.024..1.989 rows=1939 loops=1) CTE pts -> CTE Scan on dump (cost=0.00..20.00 rows=1000 width=36) (actual time=0.032..4.071 rows=1939 loops=1) -> Sort (cost=11611681.65..11611768.65 rows=34800 width=4) (actual time=1171.813..1171.813 rows=1 loops=1) Sort Key: data.elevation DESC Sort Method: top-N heapsort Memory: 25kB -> Nested Loop (cost=0.55..11611507.65 rows=34800 width=4) (actual time=0.590..1167.615 rows=28408 loops=1) -> Nested Loop (cost=0.00..8357.50 rows=1665 width=64) (actual time=0.046..663.475 rows=1938 loops=1) Join Filter: (a.vert = (b.vert - 1)) Rows Removed by Join Filter: 3755844 -> CTE Scan on pts b (cost=0.00..22.50 rows=333 width=36) (actual time=0.042..0.433 rows=1938 loops=1) Filter: (vert > 1) Rows Removed by Filter: 1 -> CTE Scan on pts a (cost=0.00..20.00 rows=1000 width=36) (actual time=0.000..0.149 rows=1939 loops=1938) -> Index Scan using location_gix on data (cost=0.55..6968.85 rows=1 width=36) (actual time=0.085..0.256 rows=15 loops=1938) Index Cond: (location && _st_expand((st_makeline(ARRAY[a.geom, b.geom]))::geography, '600'::double precision)) Filter: (((st_makeline(ARRAY[a.geom, b.geom]))::geography && _st_expand(location, '600'::double precision)) AND _st_dwithin(location, (st_makeline(ARRAY[a.geom, b.geom]))::geography, '600'::double precision, true)) Rows Removed by Filter: 7 Planning time: 4.318 ms Execution time: 1171.994 ms (22 rows) So not bad. Went from 20+ seconds to a little over 1 second. Still noticeable for a end user, but defiantly usable - and like mentioned, that's a worst-case scenario query. Thanks! Of course, if you have any suggestions for further improvement, I'm all ears :-) ----------------------------------------------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 ----------------------------------------------- > > On Thu, Jan 5, 2017 at 11:36 AM, Israel Brewster <isr...@ravnalaska.net > <mailto:isr...@ravnalaska.net>> wrote: > On Jan 5, 2017, at 8:50 AM, Paul Ramsey <pram...@cleverelephant.ca > <mailto:pram...@cleverelephant.ca>> wrote: >> >> The index filters using bounding boxes. A long, diagonal route will have a >> large bounding box, relative to the area you actually care about (within a >> narrow strip of the route). Use ST_Segmentize() to add points to your route, >> ST_DumpPoints() to dump those out as point and ST_MakeLine to generate new >> lines from those points, each line very short. The maximum index >> effectiveness will come when your line length is close to your buffer width. >> >> P > > Ok, I think I understand the concept. So attempting to follow your advice, I > modified the query to be: > > SELECT elevation > FROM data > WHERE > ST_DWithin( > location, > (SELECT ST_MakeLine(geom)::geography as split_line > FROM (SELECT > (ST_DumpPoints( > ST_Segmentize( > ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 > 61.179167,-156.77 71.285833)'), > 600 > )::geometry > )).geom > ) s1), > 600 > ) > ORDER BY elevation DESC limit 1; > > It took some fiddling to find a syntax that Postgresql would accept, but > eventually that's what I came up with. Unfortunately, far from improving > performance, it killed it - in running the query, it went from 22 seconds to > several minutes (EXPLAIn ANALYZE has yet to return a result). Looking at the > query execution plan shows, at least partially, why: > > QUERY PLAN > ------------------------------------------------------------------------------ > Limit (cost=17119748.98..17119748.98 rows=1 width=4) > InitPlan 1 (returns $0) > -> Aggregate (cost=17.76..17.77 rows=1 width=32) > -> Result (cost=0.00..5.25 rows=1000 width=32) > -> Sort (cost=17119731.21..17171983.43 rows=20900890 width=4) > Sort Key: data.elevation DESC > -> Seq Scan on data (cost=0.00..17015226.76 rows=20900890 width=4) > Filter: st_dwithin(location, $0, '600'::double precision) > (8 rows) > > So apparently it is now doing a sequential scan on data rather than using the > index. And, of course, sorting 20 million rows is not trivial either. Did I > do something wrong with forming the query? > > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 <tel:(907)%20450-7293> > ----------------------------------------------- > >> >> On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <isr...@ravnalaska.net >> <mailto:isr...@ravnalaska.net>> wrote: >> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of latitude >> (numeric), longitude(numeric), elevation(integer) data, along with a PostGIS >> (2.3.0) geometry column (location), running on a CentOS 6.8 box with 64GB >> RAM and a RAID10 SSD data drive. I'm trying to get the maximum elevation >> along a path, for which purpose I've come up with the following query (for >> one particular path example): >> >> SELECT elevation FROM data >> >> >> >> >> WHERE ST_DWithin(location, >> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056 61.179167,-156.77 >> 71.285833)'), 600) >> >> >> ORDER BY elevation >> LIMIT 1; >> >> The EXPLAIN ANALYZE output of this particular query >> (https://explain.depesz.com/s/heZ <https://explain.depesz.com/s/heZ>) shows: >> >> >> >> QUERY PLAN >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> Limit (cost=4.83..4.83 rows=1 width=4) (actual time=22653.840..22653.842 >> rows=1 loops=1) >> -> Sort (cost=4.83..4.83 rows=1 width=4) (actual >> time=22653.837..22653.837 rows=1 loops=1) >> Sort Key: elevation DESC >> Sort Method: top-N heapsort Memory: 25kB >> -> Index Scan using location_gix on data (cost=0.42..4.82 rows=1 >> width=4) (actual time=15.786..22652.041 rows=11081 loops=1) >> Index Cond: (location && >> '0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography) >> Filter: >> (('0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography >> && _st_expand(location, '600'::double precision)) AND _st_dwithin(location, >> '0102000020E6100000020000002C11A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography, >> '600'::double precision, true)) >> Rows Removed by Filter: 4934534 >> Planning time: 0.741 ms >> Execution time: 22653.906 ms >> (10 rows) >> >> So it is using the index properly, but still takes a good 22 seconds to run, >> most of which appears to be in the Index Scan. >> >> Is there any way to improve this, or is this going to be about as good as it >> gets with the number of rows being dealt with? I was planning to use this >> for a real-time display - punch in a couple of points, get some information >> about the route between, including maximum elevation - but with it taking 22 >> seconds for the longer routes at least, that doesn't make for the best user >> experience. >> >> It's perhaps worth noting that the example above is most likely a worst case >> scenario. I would expect the vast majority of routes to be significantly >> shorter, and I want to say the shorter routes query much faster [testing >> needed]. That said, the faster the better, even for short routes :-) >> ----------------------------------------------- >> Israel Brewster >> Systems Analyst II >> Ravn Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7293 <tel:(907)%20450-7293> >> ----------------------------------------------- >> >> >> >> >> >> > >