It depends on what you want to do with the data? By default cypher uses BFS for path finding which will acummulate a lot of state (backtracking).
I think you'll be much better off with the Java API and/or the Traversal-API for your needs. But you should be clear on how you want to process the results efficiently in a streaming fashion so that not more memory is held that absolutely necessary. Neo4j is able to stream path-matching results from the graph you just have to provide the correct consumption mechanisms. Check out the bidirectional-traversal description for most effective handling. http://api.neo4j.org/current/org/neo4j/graphdb/traversal/BidirectionalTraversalDescription.html http://stackoverflow.com/questions/21288678/neo4j-which-traversal-should-one-use Michael On Fri, May 30, 2014 at 8:28 PM, John Fry <frydom.j...@gmail.com> wrote: > Hello all, > > I hope this question is relevant to this community. Please let me know. > > The question is along the lines of how do you avoid unexpected heap issues > or garbage collection thrashing that causes 'timeouts' when handling large > graphs. > The application I am trying to write depends on ~11M nodes with ~100M > relationships with 2 or 3 properties. > > In the application I will select a handful of nodes and find the > connection paths between them and the expands sub graphs from the results. > Conceivably the expansion of the sub graphs could return 100 of thousands > or millions of nodes and even more relationships and associated > properties. I will run some analytics on sub graphs and then update the > properties. I guess a fairly standard use model. (the machine running the > DB has 16G ram) > > As I run trials on the queries I am seeing 'almost random' heap usage that > every now and again causes 'out of heap' related errors. I understand the > use of limits and batching to a reasonably level but feel that there should > be a solid and consistent programmatic way to protect against heap > problems. As the heap usage increase then I am seeing increasingly wider > variations in query performance for repeat queries. > > Ideally I wouldn't have to artificially limit the size of the sub graphs > that I work on as it erodes the performance of my analytic algorithms. In > my application reliability is my #1 goal; consistent performance is my #2 > goal; Absolute performance a #3 goal. > > So the question is: is there solid and consistent programmatic way to > protect against heap problems for all classes of queries and large volumes > of property updates? > > Best regards, John. > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to neo4j+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.