It depends on what you want to do with the data?

By default cypher uses BFS for path finding which will acummulate a lot of
state (backtracking).

I think you'll be much better off with the Java API and/or the
Traversal-API for your needs.

But you should be clear on how you want to process the results efficiently
in a streaming fashion so that not more memory is held that absolutely
necessary.

Neo4j is able to stream path-matching results from the graph you just have
to provide the correct consumption mechanisms.

Check out the bidirectional-traversal description for most effective
handling.
http://api.neo4j.org/current/org/neo4j/graphdb/traversal/BidirectionalTraversalDescription.html
http://stackoverflow.com/questions/21288678/neo4j-which-traversal-should-one-use

Michael



On Fri, May 30, 2014 at 8:28 PM, John Fry <frydom.j...@gmail.com> wrote:

> Hello all,
>
> I hope this question is relevant to this community. Please let me know.
>
> The question is along the lines of how do you avoid unexpected heap issues
> or garbage collection thrashing that causes 'timeouts' when handling large
> graphs.
> The application I am trying to write depends on ~11M nodes with ~100M
> relationships with 2 or 3 properties.
>
> In the application I will select a handful of nodes and find the
> connection paths between them and the expands sub graphs from the results.
> Conceivably the expansion of the sub graphs could return 100 of thousands
> or millions of nodes and even more relationships and associated
> properties.  I will run some analytics on sub graphs and then update the
> properties. I guess a fairly standard use model. (the machine running the
> DB has 16G ram)
>
> As I run trials on the queries I am seeing 'almost random' heap usage that
> every now and again causes 'out of heap' related errors. I understand the
> use of limits and batching to a reasonably level but feel that there should
> be a solid and consistent programmatic way to protect against heap
> problems. As the heap usage increase then I am seeing increasingly wider
> variations in query performance for repeat queries.
>
> Ideally I wouldn't have to artificially limit the size of the sub graphs
> that I work on as it erodes the performance of my analytic algorithms. In
> my application reliability is my #1 goal; consistent performance is my #2
> goal; Absolute performance a #3 goal.
>
> So the question is: is there solid and consistent programmatic way to
> protect against heap problems for all classes of queries and large volumes
> of property updates?
>
> Best regards, John.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to