Re: [postgis-users] Non-linear time cost; please suggest a better way to structure the query

Florian Nadler Sat, 26 Feb 2022 03:57:09 -0800

Hi Rory,

I recommend to start with the latest MobilityDB extension (docker imageseems to be outdated) and import and partition the Berlin Workshop data,as mentioned inhttps://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf.

The query below should give a good starting point for your analysis -for a given period, output the average time cars remain in regions.


select carid,
       regionid,
       avg(duration(
               (atgeometry((atPeriod(T.Trip,'[2007-05-28 00:00:00+00, 
2007-05-29 00:00:00+00]'::period)),
                           R.geom))::tgeompoint))
FROM Trips T,
     Regions R WHERE T.Trip && stbox(R.Geom,'[2007-05-28 00:00:00+00, 
2007-05-29 00:00:00+00]'::period)
  AND st_intersects(trajectory(atPeriod(T.Trip,'[2007-05-28 00:00:00+00, 
2007-05-29 00:00:00+00]'::period)),
                    R.Geom)
group by carid,regionid

Just let me know if this works out for.

Florian


Am 24.02.2022 um 08:53 schrieb Rory Meyer:

I haven't looked at MobilityDB (although I will now). In addition toPostGIS I'm using TimeScaleDB to build "continuous aggregates" of theGPS data in order to get things like the latest position per hour, theaverage speed per day etc. It's pretty handy for that but is unable touse more complex aggregates with window functions, distinct,trajectory creation etc.
I suppose the main part of my query would be portion that distributesthe difference between the lead/lag and current event_times (timedelta) over the grids that the lead/lag created line pass over: sum(((st_length(st_intersection(traj.traj, grid.geom)) *traj.time_delta) / traj.traj_dist))
Does MobilityDB have a function that could help with this?
I tried to stay away from using the trajectory data type since it isonly (x,y,t) and what I really need is (x,y,t,m1,m2,m3....) so that Ican distribute the speed/bearing/dataX over the hexagon weighted withthe time associated with each vertex.
Regards,
Rory

------------------------------------------------------------------------
*From:* Florian Nadler <florian.nad...@cybertec.at>
*Sent:* 24 February 2022 07:45
*To:* PostGIS Users Discussion <postgis-users@lists.osgeo.org>; RoryMeyer <rory.me...@vliz.be>*Subject:* Re: [postgis-users] Non-linear time cost; please suggest abetter way to structure the query
Hi,
apart from Paul advice did you ever take into consideration to useMobilityDB for this kind of spatial questions?
This will imply creating trajectories out of gps points too, but mightsimplify query design and processing time as this extension isdeveloped for this kind of queries.
Checkouthttps://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf<https://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf>,chapter 3.3 which deals with quite simliar analysis.
Florian


Am 23.02.2022 um 15:14 schrieb Rory Meyer:
Afternoon all,
I've got a database full of GPS points (along with associated datalike speed, bearing, time, GPS_ID, class etc) and I'm trying to docomplex aggregations with the data. I'm trying to build up a"heatmap" of the data by first creating a grid of polygonsusing ST_HexagonGrid and then using a window function to overlay thelines (created from a lead/lag window of each GPS point and the nextone from the same GPS_ID) over the grid. I'd like to get the thenumber of seconds that gps carrying vehicles spend in each hex cell,grouped by class, speed, date, direction etc etc. The end goal wouldbe to query a lon/lat and get a bunch of aggregated data fordifferent classes, speed/bearing distributions.
Here's a simplified look at the SQL (sorry, it's not really simple...):

'''
SELECT
    grid.gid,
    grid.geom,
    avg(traj.bearing, 511.0) AS avg_bearing,
    avg(traj.time_delta) AS avg_time_delta,
sum(((st_length(st_intersection(traj.traj, grid.geom)) *traj.time_delta) / traj.traj_dist)) AS cum_time_in_grid
FROM (
(my_hex_grid AS grid LEFT JOIN ( SELECT
 subquery.gps_id,
subquery.event_date,
subquery.bearing,
subquery.time_delta,
st_makeline(subquery.pos, subquery.pos2) AS traj,
st_distance(subquery.pos, subquery.pos2) AS traj_dist
FROM (
SELECT
gps.mmsi,
date_part('epoch'::text, (lead(gps.event_time) OVER time_order -gps.event_time)) AS time_delta,
gps.geom,
gps.bearing,
lead(gps.geom) OVER time_order AS geom2
 FROM gps
WHERE ((gps.event_time >= '<Start Time>') AND (gps.event_time <='<End Time>'))WINDOW time_order AS (PARTITION BY gps.gps_id ORDER BYgps.event_time)) as subquery ON (st_intersects(gps.traj,grid.geom)))
  GROUP BY grid.gid, grid.geom
'''
My issue is that I've got a non-linear increase in time that thequery takes to complete. If <Start Time> to <End Time> is a couple ofhours then it's takes a couple of seconds to run. If it's for a day,it takes a couple minutes to run. If it's for a week it takes acouple of hours.
I'd like to run this for over a year of data but that won't befeasible at this rate.
Is there some way to avoid this non-linear increase in time or wouldit be best to just write some python code to loop through smallerchunks of data and write the results somewhere?
Regards,
Rory

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org  <mailto:postgis-users@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/postgis-users  
<https://lists.osgeo.org/mailman/listinfo/postgis-users>
--
CYBERTEC PostgreSQL International GmbH
Römerstraße 19, A-2752 Wöllersdorf
Web:https://www.cybertec-postgresql.com  <https://www.cybertec-postgresql.com>


--
CYBERTEC PostgreSQL International GmbH
Römerstraße 19, A-2752 Wöllersdorf
Web:https://www.cybertec-postgresql.com

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Non-linear time cost; please suggest a better way to structure the query

Reply via email to