I have the following query:

SELECT
        read_date,
        to_char(read_date,'YYYY-MM-DD') as text_date,
        CASE WHEN lat_2>lat_1 THEN
                ns_dist*-1
        ELSE
                ns_dist
        END as ns_dist,
        CASE WHEN lon_2>lon_1 THEN
                ew_dist*-1
        ELSE
                ew_dist
        END as ew_dist,
        CASE WHEN alt_2>alt_1 THEN
                (ud_dist*-1)::float
        ELSE
                ud_dist::float
        END as ud_dist,
        lat_e/10::float as lat_e,
        lon_e/10::float as lon_e,
        alt_e/10::float as alt_e,
        CASE WHEN rapid THEN
            1
        ELSE
            0
        END as rapid
FROM
        (SELECT
                g1.read_date as read_date,
                g1.rapid as rapid,
                g1.lat as lat_1,
                g1.lon as lon_1,
                g1.alt as alt_1,
                g2.lat as lat_2,
                g2.lon as lon_2,
                g2.alt as alt_2,
                ST_Distance(ST_SetSRID( ST_Point( g1.lon, g1.lat), 
4326)::geography,
                                        ST_SetSRID( ST_Point( g1.lon, g2.lat), 
4326)::geography)/1000 as ns_dist,
                ST_Distance(ST_SetSRID( ST_Point( g1.lon, g1.lat), 
4326)::geography,
                                        ST_SetSRID( ST_Point( g2.lon, g1.lat), 
4326)::geography)/1000 as ew_dist,
                g1.alt-g2.alt ud_dist,
                sqrt(g1.lat_e^2+g2.lat_e^2) as lat_e,
                sqrt(g1.lon_e^2+g2.lon_e^2) as lon_e,
                sqrt(g1.alt_e^2+g2.alt_e^2) as alt_e
        FROM gps_data g1
        INNER JOIN gps_data g2 ON g2.read_date=g1.read_date
        WHERE g1.station=%s AND g2.station=%s
        ) g3
ORDER BY read_date;

Which, when executed on my system for a particular par of g1.station and 
g2.station takes about 190ms to run (explain output here: 
https://explain.depesz.com/s/8Nf9 <https://explain.depesz.com/s/8Nf9> ). This 
is not horrible, but when running this query for a dozen stations, it becomes 
noticeable. As such, I was wondering if I could do better. Looking at that 
explain output, I noticed that the largest time sink was the Merge Join at 
114ms. Some research into that gave me the command "set enable_mergejoin=off” 
which I tried just for fun to see what would happen. 

Somewhat to my surprise, this actually shaved a good 80ms off the total run 
time (see https://explain.depesz.com/s/kpncZ 
<https://explain.depesz.com/s/kpncZ>). Looking at that, I see it does a 
Parallel Hash Join, which looks like it should be slower, but I guess since it 
can do it in parallel, it winds up faster? Not sure on that, but the overall 
time is definitely significantly faster. Which leaves me with a couple of 
questions:

1) Since not using a merge join is, in this case apparently better, is there 
some way I can tweak the query/settings such that it knows this without having 
to force the merge join off?

2) If not, are there any side effects/gotchas I should be aware of if I simply 
call that set command before each query (or, presumably before the series of 
queries)?

3) Is there anything else I can do to optimize the query? I have tried using a 
pre-calculated point column rather than calculating ST_Point( g1.lon, g1.lat), 
as well as providing “false” as the last argument to ST_Distance, but while 
these did speed things up some, it was only about 10ms or so, so not terribly 
significant.

If it helps, the goal behind the somewhat odd arguments to ST_Distance is to 
find the component North/South and East/West distances, rather than the actual 
total “point a to point b” distance. Perhaps there is a better way of obtaining 
that goal?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

Reply via email to