subject:"\[graph\-tool\] Re\: memory handling when using graph\-tool's MCMCs and multiprocessing on large networks"

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

2021-10-22 Thread Tiago de Paula Peixoto


Am 22.10.21 um 21:45 schrieb Sam G:

thanks for your reply

``It is clear to you that if you run 10 processes in parallel you will need 10 
times more memory, right?``

this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. 
so unless algorithm is creating 10 copies of the graph within each iteration, 
it should be within bounds


The inference algorithm has internal data structures that take more 
memory than the original graph. For instance, it needs to keep track of 
the number of edges between groups, the degree distribution inside each 
group, several partitions during the bisection search, etc. This is all 
done with a trade-off in favor of overall speed, rather than memory usage.



ideally the parallelization would not make a copy of the entire graph. since 
the edges are fixed, all it needs to do is make vertex properties. i haven't 
figured that out yet. i was hoping someone might have tips on how to do this in 
graph-tool. there are ways to do this with other objects (data frames, lists), 
but i am not sure how to approach it with graph-tool graphs


Graph-tool graphs are not any different than any other data structure in 
Python in this regard.


Parallelization with the multiprocessing library is implemented by 
spawning multiple processes (via fork()), each with its own copy of the 
data. Most systems have a copy-on-write implementation, where initially 
the cloned processes point to the same memory. But as soon as this is 
modified, the memory usage is duplicated. Sometimes in Python it can 
happen that a read will trigger a copy because of the reference 
counting, but this should not happen with Graph objects.


But I think in your case the memory use that you are experiencing is 
simply due to the 10 different BlockState objects being created, 
together the memory usage required for the minimize_blockmodel_dl() 
algorithm.


I don't think there is very much you can do, except use a model that 
requires less internal book-keeping, e.g. PPBlockState.


Best,
Tiago

--
Tiago de Paula Peixoto 
___
graph-tool mailing list -- graph-tool@skewed.de
To unsubscribe send an email to graph-tool-le...@skewed.de

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

2021-10-22 Thread Sam G

thanks for your reply

``It is clear to you that if you run 10 processes in parallel you will need 10 
times more memory, right?``

this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. 
so unless algorithm is creating 10 copies of the graph within each iteration, 
it should be within bounds

ideally the parallelization would not make a copy of the entire graph. since 
the edges are fixed, all it needs to do is make vertex properties. i haven't 
figured that out yet. i was hoping someone might have tips on how to do this in 
graph-tool. there are ways to do this with other objects (data frames, lists), 
but i am not sure how to approach it with graph-tool graphs

```Maybe you can reproduce the problem for a smaller graph, or show how 
much memory you are actually using for a single process. It would also 
be important to tell us what version of graph-tool you are running.```

will do this soon
___
graph-tool mailing list -- graph-tool@skewed.de
To unsubscribe send an email to graph-tool-le...@skewed.de

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

2021-10-22 Thread Tiago de Paula Peixoto


Am 22.10.21 um 06:08 schrieb Sam G:

thanks for your reply.

here's an example running minimize_blockmodel_dl() 10 times on 10 cores. when i 
run this on a large network (2GB, 2M vertices, 20M edges) i get a MemoryError.


It is clear to you that if you run 10 processes in parallel you will 
need 10 times more memory, right?


BTW, your example is not self-contained, because I cannot run it without 
the 'large_graph' file.


Maybe you can reproduce the problem for a smaller graph, or show how 
much memory you are actually using for a single process. It would also 
be important to tell us what version of graph-tool you are running.


Best,
Tiago

--
Tiago de Paula Peixoto 
___
graph-tool mailing list -- graph-tool@skewed.de
To unsubscribe send an email to graph-tool-le...@skewed.de

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

2021-10-21 Thread Sam G

thanks for your reply.

here's an example running minimize_blockmodel_dl() 10 times on 10 cores. when i 
run this on a large network (2GB, 2M vertices, 20M edges) i get a MemoryError. 

```
import graph_tool.all as gt
import multiprocessign as mp
import numpy as np

g = gt.load_graph("large_graph", fmt="graphml")

N_iter = 10
N_core = 10

def fit_sbm():
state = gt.minimize_blockmodel_dl(g)
b = state.get_blocks()
return b

def _parallel_sbm(iter = N_iter):
pool = mp.Pool(N_core)
future_res = [pool.apply_async(fit_sbm) for m in range(iter)]
res = [f.get() for f in future_res]
return res

def parallel_fit_sbm(iter = M_iter):
results = parallel_sbm(iter)
return results

results = parallel_fit_sbm()

___
graph-tool mailing list -- graph-tool@skewed.de
To unsubscribe send an email to graph-tool-le...@skewed.de

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

2021-10-21 Thread Tiago de Paula Peixoto


Am 21.10.21 um 22:38 schrieb Sam G:

hi,

i was wondering if anyone had any tips on conserving memory when running 
graph-tool MCMCs on large networks in parallel.

i have a large network (2GB), and about 260GB of memory, and was surprised to 
receive a MemoryError when i ran 10 MCMC chains in parallel using 
multiprocessing. my impression is that MCMC was relatively light on memory.


As usual, it is not possible to say anything concrete without a minimal 
complete working example that shows the problem.


--
Tiago de Paula Peixoto 
___
graph-tool mailing list -- graph-tool@skewed.de
To unsubscribe send an email to graph-tool-le...@skewed.de

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks

5 matches

Site Navigation

Mail list logo

Footer information