Many many thanks Clark - a huge help.... see in line:
On Wednesday, May 4, 2016 at 1:11:44 PM UTC-7, Clark Richey wrote: > > John, > Here are my initial thoughts: > > - You either have too much memory assigned to the JVM given that Neo > uses off-heap memory for node caches. You have a lot of nodes in memory at > a given time (550k nodes if I understand correctly).the OS is probably > paging like mad to keep up which is consistent with the profile you are > seeing. You need to reduce the number of nodes you are loading in a > transaction and likely decrease the JVM size. > > I did not know that nodes were cached off-heap! I reduced the allocation by roughly 8G and ran -Xms12000m -Xmx16000m. I also decreased the batch size to 5k source nodes to keep the total number of nodes computed over per batch to less than 100k. The application now completes in about 20mins with uniform loading across threads :) > > - If your graph is actually connected, meaning that a destination > nodes may be attached to more than one source node you could probably do > this more efficiently so as to not have to reload the destination nodes > multiple times. > > There is probably a clever way to do what I need to do but with a 20min run time it is now good enough and readable etc. > > - If your graph is NOT that connected and you just have > (:source)-[]->(:connected) 15 million times and nothing else then I might > even question why you are using a graph database. > > The graph is highly connect with average io/out degree of 10. > > On May 4, 2016, at 1:30 PM, John Fry <[email protected] <javascript:>> > wrote: > > Hi All, > > I am seeing slow and worsening memory performance and ~6 hour run times > for the application detailed below. > I am running to close to using all 32G of RAM and often the AWS > stalls/fails due to memory allocation issues. This is despite liming the > application/JVM to 24G. > Garbage collection rates worsen as the application progresses. > > What I need help with is the following: > > - given the description below - is 6 hours run-time and 32G footprint > about correct? > - how do you know when you have tuned a single instance embedded use > of neo4j to optimal memory performance? (is there a benchmark to tune > against?) > - is there anything obviously wrong or naive with the approach below? > - what other tuning options are available for me to try? > > Let me know if you need see any code. > > Many thanks in advance for help, John. > > Environment: > > - AWS m4.2large - 16 VCPUs; 32G RAM > - Application is using neo4j embedded in Java > - > > neo4j-community-2.3.0 > > > Graph Size: > > - 15M Nodes - with properties: a name/string; some floating point > values > - 170M Relationships - with properties: 5 floating point values > - approximately 15G of data > > > Algorithm Intent: > > - Fetch every source node in turn (all 15M), its out going > relationships and connecting destination nodes > - Calculate some statistics and parameters based from the properties > in the source and destination nodes and their connecting relationships > - For every outgoing relationship update and write back the 5 floating > point properties > > Implementation & Runtime Details: > > - Using 8 threads in a thread pool to queue up the algorithm in batches > - Batch sizes of 50,000 nodes > - tx.success is therefore posted every 50k iterations of the algorithm > after touching 50k sources nodes and about 500k destination nodes and 500k > relationships (~1000k objects + properties) > - JVM params: -Xms16000m -Xmx24000m -XX:NewRatio=1 -XX:+UseG1G > - neo4j-properties - everything commented out including - > #dbms.pagecache.memory=10g > > > > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
