Hi,

This could be a bug in the pickle implementation (not in igraph, but in
Python itself):

https://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb
https://bugs.python.org/issue24658

The workaround is to pickle the object into a string, and then write that
string in chunks less than 2^31 bytes into a file.

However, note that pickling is not a terribly efficient format -- since it
needs to support serializing an arbitrary set of Python objects that may
link to each other and form cycles in any conceivable configuration, it has
to do a lot of extra bookkeeping so that object cycles and objects embedded
within themselves do not trip up the implementation. That's why the memory
usage rockets up to 35 GB during pickling. If you only have a name and an
additional attribute for each vertex, you could potentially gain some speed
(and cut down on the memory usage) if you brew your custom format -- for
instance, you could get the edge list and the two vertex attributes, stuff
them into a Python dict, and then save the dict in JSON format:

def graph_as_json(graph):
    return {
        "vertices": {
            "name": graph.vs["name"],
            "pt": graph.vs["pt"]
        },
        "edges": graph.get_edgelist()
    }

with open("output.json", "w") as fp:
    json.dump(graph_as_json(graph), fp)

You could also use gzip.open() instead of open() to compress the saved data
on-the-fly. You'll also need a json_as_graph() function to perform the
conversion in the opposite direction.


T.

On Tue, Apr 18, 2017 at 9:25 PM, Nick Eubank <nickeub...@gmail.com> wrote:

> Hello all,
>
> I'm trying to pickle a very large graph (23 million vertices, 152 million
> edges, two vertex attributes), but keep getting an `OSError: [Errno 22]
> Invalid argument` error. However, I think that's erroneous, as if I
> subsample the graph and save with exact same code I have no problems.
> Here's the traceback:
>
>
> g.summary()
> Out[8]: 'IGRAPH UN-- 23331862 152099394 -- \n+ attr: name (v), pt (v)'
>
> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms,
> voz))
> Traceback (most recent call last):
>
>  File "<ipython-input-9-6b5409a79251>", line 1, in <module>
>    
> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms,
> voz))
>
>  File "/Users/Nick/anaconda/lib/python3.5/site-packages/igraph/__init__.py",
> line 1778, in write_pickle
>    result=pickle.dump(self, fname, version)
>
> OSError: [Errno 22] Invalid argument
>
>
> g=g.vs[range(3331862)].subgraph()
>
> g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms,
> voz))
>
>     [success]
>
> The graph takes up about 10gb in memory, and the pickle command expands
> Python's memory footprint to about 35gb before the exception gets thrown,
> but I'm on a machine with 80gb ram, so that's not the constraint.
>
> Any suggestions as to what might be going on / is there a work around for
> saving?
>
> Thanks!
>
> Nick
>
> _______________________________________________
> igraph-help mailing list
> igraph-help@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
>
_______________________________________________
igraph-help mailing list
igraph-help@nongnu.org
https://lists.nongnu.org/mailman/listinfo/igraph-help

Reply via email to