It does now seem to work once I transform the node ids as we discussed. igraph still takes 4GB of RAM for only 62.5 million edges and 31.25 million vertices but at least that fits.
Is there a function to write a sparse adjacency matrix of a graph to a file? I see "write_adjacency" but the docs don't indicate it gives a sparse matrix. I just need to write the adjacency matrix in any format that scipy can read. Clearly it has to be sparse otherwise the file would be vast. Raphael On 1 August 2016 at 20:58, Raphael Clifford <[email protected]> wrote: > It does now seem to work once I transform the node ids as we > discussed. igraph still takes 4GB of RAM for only 62.5 million edges > and 31.25 million vertices but at least that fits. > > Is there a function to write a sparse adjacency matrix of a graph to a > file? I see "write_adjacency" but the docs don't indicate it gives a > sparse matrix. > > Raphael > > > On 1 August 2016 at 14:57, Tamas Nepusz <[email protected]> wrote: >> Yes, it's probably the best if you do the relabeling externally. Let >> us know if it still doesn't work after using Read_Edgelist() with a >> relabeled file. >> T. >> >> >> On Mon, Aug 1, 2016 at 2:37 PM, Raphael C <[email protected]> wrote: >>> Thank you for the quick reply. My system is certainly 64 bit. The >>> problem is just the amount of RAM >>> >>> g = Graph.Read_Ncol('edges.txt') >>> >>> uses it seems. >>> >>> Here is some code to produce a fake edge list that reproduces my problem. >>> >>> import random >>> >>> #Number of edges, vertices >>> m = 62500000 >>> n = m/2 >>> >>> for i in xrange(m): >>> fromnode = str(random.randint(0, n-1)).zfill(9) >>> tonode = str(random.randint(0, n-1)).zfill(9) >>> print fromnode, tonode >>> >>> If I produce a file edges.txt using this code and then run >>> >>> from igraph import Graph >>> g = Graph.Read_Ncol('edges.txt') >>> >>> it runs out of RAM. >>> >>> To get a better picture of the RAM usage I ran the same test with m = >>> 20000000 (that is about one third of the edges and vertices). >>> >>> /usr/bin/time -v python ./test.py >>> >>> shows >>> >>> Maximum resident set size (kbytes): 3172988 >>> >>> With m = 30000000 I see Maximum resident set size (kbytes): 4750440 >>> >>> Maybe one solution is to relabel the nodes myself external so I can >>> avoid the overhead of Ncol? >>> >>> Raphael >>> >>> >>> >>> >>> On 1 August 2016 at 10:23, Tamas Nepusz <[email protected]> wrote: >>>> Hello, >>>> >>>> Read_Edgelist() won't work because that assumes that the vertex IDs >>>> are in the range [0; |V|-1] so it would create lots of isolated >>>> vertices if your vertex ID range has "gaps" in it. Read_Ncol() is the >>>> way to go, but it has an additional space penalty as it has to >>>> maintain a mapping from the numeric IDs in the file to the range [0; >>>> |V|-1]. >>>> >>>> igraph requires 32 bytes per edge and 16 bytes per vertex to store the >>>> graph itself, plus additional data structures to store the vertex/edge >>>> attributes. Therefore, a graph of your size would require ~2.5 GB of >>>> memory plus the attributes. 8 GB of RAM should therefore be enough -- >>>> however, note that Python might not be able to utilize all that >>>> memory. In particular, 32-bit Python on Windows is limited to 2 or 3 >>>> GBs of memory (see >>>> https://msdn.microsoft.com/en-us/library/aa366778(v=vs.85).aspx#memory_limits >>>> ). If you happen to use a 32-bit Python on a 64-bit machine, you will >>>> need to install a 64-bit Python with a corresponding igraph package >>>> that is also built for 64-bit, and then try again. >>>> >>>> Best, >>>> T. >>>> >>>> >>>> On Mon, Aug 1, 2016 at 9:52 AM, Raphael C <[email protected]> wrote: >>>>> I have 8GB of RAM and I have a simple edge list text file of size >>>>> 1.2GB. It was 62500000 edges and about half that many vertices. Each >>>>> line looks like >>>>> >>>>> 287111206 357850135 >>>>> >>>>> I would like to read in the graph and output a sparse adjacency >>>>> matrix. I am failing on all counts. I have tried >>>>> >>>>> >>>>> g = Graph.Read_Edgelist('edges.txt') >>>>> >>>>> but this fails immediately with >>>>> >>>>> MemoryError: Error at vector.pmt:439: cannot reserve space for vector, >>>>> Out of memory >>>>> >>>>> This seems unrelated to the size of the graph is just a function of >>>>> the node ids being large. >>>>> >>>>> So instead I tried >>>>> >>>>> g = Graph.Read_Ncol('edges.txt') >>>>> >>>>> This eats up all the RAM in my PC forcing me to kill the code. >>>>> >>>>> I fact I tested g = Graph.Read_Ncol('edges.txt') with the first 1/5 of >>>>> the edges and have the same memory problem. >>>>> >>>>> Each node id is a 32 bit integer so the graph should fit easily in 8GB of >>>>> RAM. >>>>> >>>>> What can I do? >>>>> >>>>> Thanks very much for any help. >>>>> Raphael >>>>> >>>>> _______________________________________________ >>>>> igraph-help mailing list >>>>> [email protected] >>>>> https://lists.nongnu.org/mailman/listinfo/igraph-help >>>> >>>> _______________________________________________ >>>> igraph-help mailing list >>>> [email protected] >>>> https://lists.nongnu.org/mailman/listinfo/igraph-help >>> >>> _______________________________________________ >>> igraph-help mailing list >>> [email protected] >>> https://lists.nongnu.org/mailman/listinfo/igraph-help >> >> _______________________________________________ >> igraph-help mailing list >> [email protected] >> https://lists.nongnu.org/mailman/listinfo/igraph-help _______________________________________________ igraph-help mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/igraph-help
