Anyone able to scan if I'm using the library correctly? Or any other tips ? The MWE is provided through a download link.
Much appreciated! On Wed, Jun 30, 2021 at 10:17 PM Mathias Versichele < [email protected]> wrote: > Fair point. I managed to find some time to make a MWE. Here's the link to > a zip file containing both the input data (network and costs as csv dumps > from our postgres db) and the code: > https://www.dropbox.com/s/3iby8nil7zyi7q3/MWE.zip?dl=0 > > To use: just run "python mwe_main.py". The last line calculates a shortest > path tree until 60mins for cost structure "49" (these are costs associated > with car travel, the specific id has no meaning). Because this is only > Belgium, mem usage when I run this is manageable (under 1Gb). > > In general my question is: can this code be made more performant, both in > terms of mem and speed ? Am I right in using GraphViews for each cost > structure, where the view only contains the edges which have costs > associated with them ? Does this make gt.shortest_distance() faster ? > > The most production-critical part of the code here is point_drivetimes(). > Any gain there in calculation time would be very valuable. The loading of > the network and cost structure only need to happen once, so can be a bit > slower. > > Thx for helping me ! > > On Tue, Jun 22, 2021 at 9:43 AM Tiago de Paula Peixoto <[email protected]> > wrote: > >> Dear Mathias, >> >> It is not reasonable to expect us to make this kind of evaluation just >> from partial code. As with anything, we need a minimal working example >> to be able to say something concrete. >> >> I would recommend you to try to separate the pandas dataframe >> manipulation from the graph-tool side in order to determine which is >> consuming more memory. >> >> Best, >> Tiago >> >> Am 22.06.21 um 09:24 schrieb Mathias Versichele: >> > Hi all. Anyone can provide me with some insights here ? I know it's >> > quite an open question here, and it might take some effort of course. >> > Would anyone be available/willing to do an actual code audit of the >> code >> > that I have ? This would be compensated of course. Feel free to contact >> > me to discuss. >> > >> > Kind regards >> > >> > On Tue, Jun 15, 2021 at 8:45 PM Mathias Versichele >> > <[email protected] <mailto:[email protected]>> >> wrote: >> > >> > Hi, I've been using graph-tool for the last year or so for >> > calculating shortest-path trees on large-scale road networks. We >> > used to do this in a postgres database with the pgrouting extension, >> > but were continually confronted with unacceptable startup costs. The >> > switch to a python module using graph-tool has considerably sped up >> > our routing queries, but we are suffering from this service using >> > too much memory. I have the feeling I might be using graph-tool in a >> > wrong way, but before I dive into that, it would be good to know >> > what is the expected memory footprint for my use case. >> > >> > Take for example a road network with 30Mio edges and 31 Mio nodes >> > (the combined road network of Belgium, Netherland, France and >> > Germany in OSM). For this road network, I need to calculate shortest >> > paths using different edge weights (edge property map). What would >> > be a very rough estimate how much memory this would use ? For the >> > network only + per edge-property-map. In our setup, there would be >> > one edge-property-map with edge weights per country. We're currently >> > seeing usage of over 50Gb easily, spiking even higher when we're >> > loading extra cost structures or networks. Is that expected ? Or am >> > I experiencing memory leaks somewhere ? >> > >> > How I'm using graph-tool right now: >> > >> > *1) loading network* >> > /nw = dataframe with edges info in the structure: startnode-id, >> > endnode-id, edge-id, country/ >> > >> > G = gt.Graph(directed=True) >> > G.ep["edge_id"] = G.new_edge_property("int") >> > G.ep["country_id"] = G.new_edge_property("int16_t") >> > eprops = [G.ep["edge_id"], G.ep["country_id"]] >> > >> > n = G.add_edge_list(nw.to_numpy(), hashed=True, eprops=eprops) >> > G.vertex_properties["n"] = n >> > >> > *2) loading edge costs: I'm using GraphViews* >> > * >> > * >> > /countries = list of country-codes/ >> > edge_filter = np.in1d(G.ep["country_id"].a, [get_country_id(c) for c >> > in countries])* >> > * >> > GV = gt.GraphView(G, efilt=edge_filter) >> > >> > edges = GV.get_edges([GV.edge_index]) >> > sources = G.vertex_properties["n"].a[edges[:,0]] >> > targets = G.vertex_properties["n"].a[edges[:,1]] >> > idxs = edges[:,2] >> > >> > /db_costs = pandas dataframe with structure: source, target, cost >> > / >> > >> > sti = np.vstack((idxs,sources,targets)).T >> > sti_df = pd.DataFrame({'idx': sti[:, 0], 'source': sti[:, 1], >> > 'target': sti[:, 2]}) >> > m = pd.merge(sti_df, db_costs, on=['source', 'target'], how='left', >> > sort=False)[['idx', 'c']] >> > wgts_list = m.sort_values(by=['idx']).T.iloc[1].to_numpy() >> > wgts_list = np.where(wgts_list==np.nan, np.iinfo(np.int32).max, >> > wgts_list) >> > >> > wgts = GV.new_edge_property("int32_t") >> > wgts.fa = wgts_list >> > wgts.fa = np.where(wgts.fa==-2147483648, np.iinfo(np.int32).max, >> > wgts.fa) >> > GV.edge_properties[cs_ids_str] = wgts >> > >> > GV2 = gt.GraphView(GV, efilt=wgts.fa != np.inf) >> > >> > *3) I then use GV2 for calculating Dijkstra and such...* >> > >> > >> > I could of course work on an MWE of some sorts. But would be very >> > nice to get an estimate on mem footprint, and to see if I'm doing >> > sth really silly in the code above. >> > >> > Thx! >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > graph-tool mailing list -- [email protected] >> > To unsubscribe send an email to [email protected] >> > >> >> >> -- >> Tiago de Paula Peixoto <[email protected]> >> _______________________________________________ >> graph-tool mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> >
_______________________________________________ graph-tool mailing list -- [email protected] To unsubscribe send an email to [email protected]
