[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
thanks for your reply ``It is clear to you that if you run 10 processes in parallel you will need 10 times more memory, right?`` this is clear to me. however 2GB*10 ~= 20GB, and the machine has 260GB memory. so unless algorithm is creating 10 copies of the graph within each iteration, it should be within bounds ideally the parallelization would not make a copy of the entire graph. since the edges are fixed, all it needs to do is make vertex properties. i haven't figured that out yet. i was hoping someone might have tips on how to do this in graph-tool. there are ways to do this with other objects (data frames, lists), but i am not sure how to approach it with graph-tool graphs ```Maybe you can reproduce the problem for a smaller graph, or show how much memory you are actually using for a single process. It would also be important to tell us what version of graph-tool you are running.``` will do this soon ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] Re: memory handling when using graph-tool's MCMCs and multiprocessing on large networks
thanks for your reply. here's an example running minimize_blockmodel_dl() 10 times on 10 cores. when i run this on a large network (2GB, 2M vertices, 20M edges) i get a MemoryError. ``` import graph_tool.all as gt import multiprocessign as mp import numpy as np g = gt.load_graph("large_graph", fmt="graphml") N_iter = 10 N_core = 10 def fit_sbm(): state = gt.minimize_blockmodel_dl(g) b = state.get_blocks() return b def _parallel_sbm(iter = N_iter): pool = mp.Pool(N_core) future_res = [pool.apply_async(fit_sbm) for m in range(iter)] res = [f.get() for f in future_res] return res def parallel_fit_sbm(iter = M_iter): results = parallel_sbm(iter) return results results = parallel_fit_sbm() ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] memory handling when using graph-tool's MCMCs and multiprocessing on large networks
hi, i was wondering if anyone had any tips on conserving memory when running graph-tool MCMCs on large networks in parallel. i have a large network (2GB), and about 260GB of memory, and was surprised to receive a MemoryError when i ran 10 MCMC chains in parallel using multiprocessing. my impression is that MCMC was relatively light on memory. cheers, -sam ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de
[graph-tool] running minimize_blockmodel_dl() in parallel
hi, here is a simple example where i run minimize_blockmodel_dl() 10 times in parallel using multiprocessing and collect the entropy. when i run this, i get the same value of entropy every single time. ``` import multiprocessing as mp import numpy as np import time import graph_tool.all as gt # load graph g = gt.collection.data["celegansneural"] N_iter = 10 def get_sbm_entropy(): np.random.seed() state = gt.minimize_blockmodel_dl(g) return state.entropy() def _parallel_mc(iter=N_iter): pool = mp.Pool(10) future_res = [pool.apply_async(get_sbm_entropy) for _ in range(iter)] res = [f.get() for f in future_res] return res def parallel_monte_carlo(iter=N_iter): entropies = _parallel_mc(iter) return entropies parallel_monte_carlo() ``` result: [8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546, 8331.810102822546] ultimately i would like to use this to keep entropy as well as the block membership vector for each iteration any ideas? cheers, -sam ___ graph-tool mailing list -- graph-tool@skewed.de To unsubscribe send an email to graph-tool-le...@skewed.de