[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
Am 22.10.21 um 21:45 schrieb Sam G: thanks for your reply ``It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right?`` this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. so unless algorithm is creating 10 copies of the graph within each iteration, it should be within bounds The inference algorithm has internal data structures that take more memory than the original graph. For instance, it needs to keep track of the number of edges between groups, the degree distribution inside each group, several partitions during the bisection search, etc. This is all done with a trade-off in favor of overall speed, rather than memory usage. ideally the parallelization would not make a copy of the entire graph. since the edges are fixed, all it needs to do is make vertex properties. i haven't figured that out yet. i was hoping someone might have tips on how to do this in graph-tool. there are ways to do this with other objects (data frames, lists), but i am not sure how to approach it with graph-tool graphs Graph-tool graphs are not any different than any other data structure in Python in this regard. Parallelization with the multiprocessing library is implemented by spawning multiple processes (via fork()), each with its own copy of the data. Most systems have a copy-on-write implementation, where initially the cloned processes point to the same memory. But as soon as this is modified, the memory usage is duplicated. Sometimes in Python it can happen that a read will trigger a copy because of the reference counting, but this should not happen with Graph objects. But I think in your case the memory use that you are experiencing is simply due to the 10 different BlockState objects being created, together the memory usage required for the minimize_blockmodel_dl() algorithm. I don't think there is very much you can do, except use a model that requires less internal book-keeping, e.g. PPBlockState. Best, Tiago -- Tiago de Paula Peixoto ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
thanks for your reply ``It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right?`` this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. so unless algorithm is creating 10 copies of the graph within each iteration, it should be within bounds ideally the parallelization would not make a copy of the entire graph. since the edges are fixed, all it needs to do is make vertex properties. i haven't figured that out yet. i was hoping someone might have tips on how to do this in graph-tool. there are ways to do this with other objects (data frames, lists), but i am not sure how to approach it with graph-tool graphs ```Maybe you can reproduce the problem for a smaller graph, or show how much memory you are actually using for a single process. It would also be important to tell us what version of graph-tool you are running.``` will do this soon ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
Am 22.10.21 um 06:08 schrieb Sam G: thanks for your reply. here's an example running minimize_blockmodel_dl() 10 times on 10 cores. when i run this on a large network (2GB, 2M vertices, 20M edges) i get a MemoryError. It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right? BTW, your example is not self-contained, because I cannot run it without the 'large_graph' file. Maybe you can reproduce the problem for a smaller graph, or show how much memory you are actually using for a single process. It would also be important to tell us what version of graph-tool you are running. Best, Tiago -- Tiago de Paula Peixoto ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
thanks for your reply. here's an example running minimize_blockmodel_dl() 10 times on 10 cores. when i run this on a large network (2GB, 2M vertices, 20M edges) i get a MemoryError. ``` import graph_tool.all as gt import multiprocessign as mp import numpy as np g = gt.load_graph("large_graph", fmt="graphml") N_iter = 10 N_core = 10 def fit_sbm(): state = gt.minimize_blockmodel_dl(g) b = state.get_blocks() return b def _parallel_sbm(iter = N_iter): pool = mp.Pool(N_core) future_res = [pool.apply_async(fit_sbm) for m in range(iter)] res = [f.get() for f in future_res] return res def parallel_fit_sbm(iter = M_iter): results = parallel_sbm(iter) return results results = parallel_fit_sbm() ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
Am 21.10.21 um 22:38 schrieb Sam G: hi, i was wondering if anyone had any tips on conserving memory when running graph-tool MCMCs on large networks in parallel. i have a large network (2GB), and about 260GB of memory, and was surprised to receive a MemoryError when i ran 10 MCMC chains in parallel using multiprocessing. my impression is that MCMC was relatively light on memory. As usual, it is not possible to say anything concrete without a minimal complete working example that shows the problem. -- Tiago de Paula Peixoto ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de