Re: [Neo4j] Enhancing Performance For Batched Read/Writes

Clark Richey Wed, 04 May 2016 19:35:01 -0700

Glad I.could help!

Sent from my iPhone


> On May 4, 2016, at 21:55, John Fry <[email protected]> wrote:
> 
> Many many thanks Clark - a huge help....
> 
> see in line:
> 
>> On Wednesday, May 4, 2016 at 1:11:44 PM UTC-7, Clark Richey wrote:
>> John,
>> Here are my initial thoughts:
>> You either have too much memory assigned to the JVM given that Neo uses 
>> off-heap memory for node caches. You have a lot of nodes in memory at a 
>> given time (550k nodes if I understand correctly).the OS is probably paging 
>> like mad to keep up which is consistent with the profile you are seeing. You 
>> need to reduce the number of nodes you are loading in a transaction and 
>> likely decrease the JVM size.
> I did not know that nodes were cached off-heap! I reduced the allocation by 
> roughly 8G and ran -Xms12000m -Xmx16000m.
> I also decreased the batch size to 5k source nodes to keep the total number 
> of nodes computed over per batch to less than 100k.
> The application now completes in about 20mins with uniform loading across 
> threads :)
>  
>> If your graph is actually connected, meaning that a destination nodes may be 
>> attached to more than one source node you could probably do this more 
>> efficiently so as to not have to reload the destination nodes multiple times.
> There is probably a clever way to do what I need to do but with a 20min run 
> time it is now good enough and readable etc. 
>> If your graph is NOT that connected and you just have 
>> (:source)-[]->(:connected) 15 million times and nothing else then I might 
>> even question why you are using a graph database.
>  The graph is highly connect with average io/out degree of 10.
>> 
>>> On May 4, 2016, at 1:30 PM, John Fry <[email protected]> wrote:
>>> 
>>> Hi All,
>>> 
>>> I am seeing slow and worsening memory performance and ~6 hour run times for 
>>> the application detailed below.
>>> I am running to close to using all 32G of RAM and often the AWS 
>>> stalls/fails due to memory allocation issues. This is despite liming the 
>>> application/JVM to 24G.
>>> Garbage collection rates worsen as the application progresses. 
>>> 
>>> What I need help with is the following:
>>> given the description below - is 6 hours run-time and 32G footprint about 
>>> correct?
>>> how do you know when you have tuned a single instance embedded use of neo4j 
>>> to optimal memory performance? (is there a benchmark to tune against?)
>>> is there anything obviously wrong or naive with the approach below?
>>> what other tuning options are available for me to try?
>>> Let me know if you need see any code.
>>> 
>>> Many thanks in advance for help, John.
>>> 
>>> Environment:
>>> AWS m4.2large - 16 VCPUs; 32G RAM
>>> Application is using neo4j embedded in Java
>>> neo4j-community-2.3.0
>>> 
>>> 
>>> Graph Size:
>>> 15M Nodes - with properties: a name/string; some floating point values
>>> 170M Relationships - with properties: 5 floating point values
>>> approximately 15G of data
>>> 
>>> Algorithm Intent:
>>> Fetch every source node in turn (all 15M), its out going relationships and 
>>> connecting destination nodes
>>> Calculate some statistics and parameters based from the properties in the 
>>> source and destination nodes and their connecting relationships
>>> For every outgoing relationship update and write back the 5 floating point 
>>> properties 
>>> Implementation & Runtime Details:
>>> Using 8 threads in a thread pool to queue up the algorithm in batches
>>> Batch sizes of 50,000 nodes 
>>> tx.success is therefore posted every 50k iterations of the algorithm after 
>>> touching 50k sources nodes and about 500k destination nodes and 500k 
>>> relationships (~1000k objects + properties)
>>> JVM params: -Xms16000m -Xmx24000m -XX:NewRatio=1 -XX:+UseG1G
>>> neo4j-properties - everything commented out including - 
>>> #dbms.pagecache.memory=10g
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Enhancing Performance For Batched Read/Writes

Reply via email to