Hey,
I have a large (by my standards) graph and I would like to reduce it's size
so it all fits in memory.
This is that same Twitter graph as I mentioned earlier: 2.5million Nodes
250million Relationships.

The goal is for the graph to still have the same topology and
characteristics after it has been made more sparse.
My plan to do this was to uniformly randomly select Relationships for
deletion, until the graph is small enough.

My first approach is basically this:

until (graph_is_small_enough)
  random_relationship = get_relationship_by_id(random_number)
  random_relationship.delete()

I'm using the transactional GraphDatabaseService at the moment, rather than
the BatchInserter... mostly because I'm not inserting anything and I assumed
the optimizations made to the BatchInserter were only for write operations.

The reason I want to delete Relationships instead of Nodes is
  (1) I don't want to accidentally delete any "super nodes", as these are
what gives Twitter it's unique structure
  (2) The number of Nodes is not the main problem that's keeping me from
being able to store the graph in RAM

The problem with the current approach is that it feels like I'm working
against Neo4j's strengths and it is very very slow... I waited over an hour
and less than 1,000,000 Relationships had been deleted. Given that my aim is
to half the number of Relationships, it would take me over 100hours (1 week)
to complete this process. In the worst case this is what I'll resort to, but
I'd rather not if there's a better way.

My questions are:
(1) Can you think of an alternative, faster and still meaningful (maintain
graph structure) way to reduce this graph size?
(2) Using the same method I'm using now, are there some magical
optimizations that will greatly improve performance?

Thanks,
Alex
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to