[graph-tool] Re: Expected memory usage for large-scale road networks

Mathias Versichele Mon, 05 Jul 2021 05:22:52 -0700

Anyone able to scan if I'm using the library correctly? Or any other tips ?
The MWE is provided through a download link.


Much appreciated!

On Wed, Jun 30, 2021 at 10:17 PM Mathias Versichele <
[email protected]> wrote:

> Fair point. I managed to find some time to make a MWE. Here's the link to
> a zip file containing both the input data (network and costs as csv dumps
> from our postgres db) and the code:
> https://www.dropbox.com/s/3iby8nil7zyi7q3/MWE.zip?dl=0
>
> To use: just run "python mwe_main.py". The last line calculates a shortest
> path tree until 60mins for cost structure "49" (these are costs associated
> with car travel, the specific id has no meaning). Because this is only
> Belgium, mem usage when I run this is manageable (under 1Gb).
>
> In general my question is: can this code be made more performant, both in
> terms of mem and speed ? Am I right in using GraphViews for each cost
> structure, where the view only contains the edges which have costs
> associated with them ? Does this make gt.shortest_distance() faster ?
>
> The most production-critical part of the code here is point_drivetimes().
> Any gain there in calculation time would be very valuable. The loading of
> the network and cost structure only need to happen once, so can be a bit
> slower.
>
> Thx for helping me !
>
> On Tue, Jun 22, 2021 at 9:43 AM Tiago de Paula Peixoto <[email protected]>
> wrote:
>
>> Dear Mathias,
>>
>> It is not reasonable to expect us to make this kind of evaluation just
>> from partial code. As with anything, we need a minimal working example
>> to be able to say something concrete.
>>
>> I would recommend you to try to separate the pandas dataframe
>> manipulation from the graph-tool side in order to determine which is
>> consuming more memory.
>>
>> Best,
>> Tiago
>>
>> Am 22.06.21 um 09:24 schrieb Mathias Versichele:
>> > Hi all. Anyone can provide me with some insights here ? I know it's
>> > quite an open question here, and it might take some effort of course.
>> > Would anyone be available/willing to do an actual code audit of the
>> code
>> > that I have ? This would be compensated of course. Feel free to contact
>> > me to discuss.
>> >
>> > Kind regards
>> >
>> > On Tue, Jun 15, 2021 at 8:45 PM Mathias Versichele
>> > <[email protected] <mailto:[email protected]>>
>> wrote:
>> >
>> >     Hi, I've been using graph-tool for the last year or so for
>> >     calculating shortest-path trees on large-scale road networks. We
>> >     used to do this in a postgres database with the pgrouting extension,
>> >     but were continually confronted with unacceptable startup costs. The
>> >     switch to a python module using graph-tool has considerably sped up
>> >     our routing queries, but we are suffering from this service using
>> >     too much memory. I have the feeling I might be using graph-tool in a
>> >     wrong way, but before I dive into that, it would be good to know
>> >     what is the expected memory footprint for my use case.
>> >
>> >     Take for example a road network with 30Mio edges and 31 Mio nodes
>> >     (the combined road network of Belgium, Netherland, France and
>> >     Germany in OSM). For this road network, I need to calculate shortest
>> >     paths using different edge weights (edge property map). What would
>> >     be a  very rough estimate how much memory this would use ? For the
>> >     network only + per edge-property-map. In our setup, there would be
>> >     one edge-property-map with edge weights per country. We're currently
>> >     seeing usage of over 50Gb easily, spiking even higher when we're
>> >     loading extra cost structures or networks. Is that expected ? Or am
>> >     I experiencing memory leaks somewhere ?
>> >
>> >     How I'm using graph-tool right now:
>> >
>> >     *1) loading network*
>> >     /nw = dataframe with edges info in the structure: startnode-id,
>> >     endnode-id, edge-id, country/
>> >
>> >     G = gt.Graph(directed=True)
>> >     G.ep["edge_id"] = G.new_edge_property("int")
>> >     G.ep["country_id"] = G.new_edge_property("int16_t")
>> >     eprops = [G.ep["edge_id"], G.ep["country_id"]]
>> >
>> >     n = G.add_edge_list(nw.to_numpy(), hashed=True, eprops=eprops)
>> >     G.vertex_properties["n"] = n
>> >
>> >     *2) loading edge costs: I'm using GraphViews*
>> >     *
>> >     *
>> >     /countries = list of country-codes/
>> >     edge_filter = np.in1d(G.ep["country_id"].a, [get_country_id(c) for c
>> >     in countries])*
>> >     *
>> >     GV = gt.GraphView(G, efilt=edge_filter)
>> >
>> >     edges = GV.get_edges([GV.edge_index])
>> >     sources = G.vertex_properties["n"].a[edges[:,0]]
>> >     targets = G.vertex_properties["n"].a[edges[:,1]]
>> >     idxs = edges[:,2]
>> >
>> >     /db_costs = pandas dataframe with structure: source, target, cost
>> >     /
>> >
>> >     sti = np.vstack((idxs,sources,targets)).T
>> >     sti_df = pd.DataFrame({'idx': sti[:, 0], 'source': sti[:, 1],
>> >     'target': sti[:, 2]})
>> >     m = pd.merge(sti_df, db_costs, on=['source', 'target'], how='left',
>> >     sort=False)[['idx', 'c']]
>> >     wgts_list = m.sort_values(by=['idx']).T.iloc[1].to_numpy()
>> >     wgts_list = np.where(wgts_list==np.nan, np.iinfo(np.int32).max,
>> >     wgts_list)
>> >
>> >     wgts = GV.new_edge_property("int32_t")
>> >     wgts.fa = wgts_list
>> >     wgts.fa = np.where(wgts.fa==-2147483648, np.iinfo(np.int32).max,
>> >     wgts.fa)
>> >     GV.edge_properties[cs_ids_str] = wgts
>> >
>> >     GV2 = gt.GraphView(GV, efilt=wgts.fa != np.inf)
>> >
>> >     *3) I then use GV2 for calculating Dijkstra and such...*
>> >
>> >
>> >     I could of course work on an MWE of some sorts. But would be very
>> >     nice to get an estimate on mem footprint, and to see if I'm doing
>> >     sth really silly in the code above.
>> >
>> >     Thx!
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > graph-tool mailing list -- [email protected]
>> > To unsubscribe send an email to [email protected]
>> >
>>
>>
>> --
>> Tiago de Paula Peixoto <[email protected]>
>> _______________________________________________
>> graph-tool mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>

_______________________________________________
graph-tool mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[graph-tool] Re: Expected memory usage for large-scale road networks

Reply via email to