John,
Here are my initial thoughts:
You either have too much memory assigned to the JVM given that Neo uses 
off-heap memory for node caches. You have a lot of nodes in memory at a given 
time (550k nodes if I understand correctly).the OS is probably paging like mad 
to keep up which is consistent with the profile you are seeing. You need to 
reduce the number of nodes you are loading in a transaction and likely decrease 
the JVM size.
If your graph is actually connected, meaning that a destination nodes may be 
attached to more than one source node you could probably do this more 
efficiently so as to not have to reload the destination nodes multiple times.
If your graph is NOT that connected and you just have 
(:source)-[]->(:connected) 15 million times and nothing else then I might even 
question why you are using a graph database.


> On May 4, 2016, at 1:30 PM, John Fry <[email protected]> wrote:
> 
> Hi All,
> 
> I am seeing slow and worsening memory performance and ~6 hour run times for 
> the application detailed below.
> I am running to close to using all 32G of RAM and often the AWS stalls/fails 
> due to memory allocation issues. This is despite liming the application/JVM 
> to 24G.
> Garbage collection rates worsen as the application progresses. 
> 
> What I need help with is the following:
> given the description below - is 6 hours run-time and 32G footprint about 
> correct?
> how do you know when you have tuned a single instance embedded use of neo4j 
> to optimal memory performance? (is there a benchmark to tune against?)
> is there anything obviously wrong or naive with the approach below?
> what other tuning options are available for me to try?
> Let me know if you need see any code.
> 
> Many thanks in advance for help, John.
> 
> Environment:
> AWS m4.2large - 16 VCPUs; 32G RAM
> Application is using neo4j embedded in Java
> neo4j-community-2.3.0
> 
> 
> Graph Size:
> 15M Nodes - with properties: a name/string; some floating point values
> 170M Relationships - with properties: 5 floating point values
> approximately 15G of data
> 
> Algorithm Intent:
> Fetch every source node in turn (all 15M), its out going relationships and 
> connecting destination nodes
> Calculate some statistics and parameters based from the properties in the 
> source and destination nodes and their connecting relationships
> For every outgoing relationship update and write back the 5 floating point 
> properties 
> Implementation & Runtime Details:
> Using 8 threads in a thread pool to queue up the algorithm in batches
> Batch sizes of 50,000 nodes 
> tx.success is therefore posted every 50k iterations of the algorithm after 
> touching 50k sources nodes and about 500k destination nodes and 500k 
> relationships (~1000k objects + properties)
> JVM params: -Xms16000m -Xmx24000m -XX:NewRatio=1 -XX:+UseG1G
> neo4j-properties - everything commented out including - 
> #dbms.pagecache.memory=10g
> 
> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to