Glad to hear that. Perhaps we see each other during the graph tour in Europe :)
Cheers, Michael On Thu, Feb 1, 2018 at 3:09 PM, Vincent Mooser <vincent.moo...@gmail.com> wrote: > Hi Michael, > I applied all your recommendations and performance are better now. Next > step will be the SSD. > > Thank you for your help > Vincent > > On Tuesday, January 30, 2018 at 7:34:30 PM UTC+1, Michael Hunger wrote: >> >> Hi Vincent, >> >> >> On Tue, Jan 30, 2018 at 4:27 PM, Vincent Mooser <vincent...@gmail.com> >> wrote: >> >>> Hi, >>> >>> How much memory does the machine have? >>> >>> The machine has 64g of memory, so I think I can increase my page cache. >>> But I should have at least twice this memory to be able to load the whole >>> graph in the page cache. >>> >> >> I would definitely increase the page-cache, >> >> If it's only 100k nodes that you're loading it should be fine. >> The page-cache is emptied by utilization (LRU-K) so if those 100k nodes >> keep getting used, their pages stay in. >> Although if a lot of other data is loaded they might get unloaded. >> There is no idle eviction. >> >> For the node-properties there are separate pages. >> From your description it would be 2 or at most 3 property-records per >> node. >> >> The disk is the biggest issue, if you can compensate with the larger >> page-cache to avoid disks hits that will help (at least for reads). >> >> Switch to 3.3.2 >> Use 12G heap >> Use 48G page-cache >> >> Then this should be better. >> Also try my query suggestion. >> >> Cheers, Michael >> >> >> In my use case, as Solr only contains a subset of the FOLDER nodes (about >>> 100000 nodes), I was thinking of executing a query that selects these >>> 100000 nodes at start, for warming up the cache and to be sure that the >>> page cache contains (at least) these nodes. Will they be evicted of the >>> page cache after a certain amount of time ? >>> >>> Which properties of the nodes do you need to be returned? the full nodes? >>> >>> Yes, the full nodes have to be returned. They contain 1 oid (String), 1 >>> property 'name' (String), 4 boolean properties used as flags for business >>> tasks and 2 long properties (creation and modification date) >>> >>> Thank you, >>> Vincent >>> >>> On Tuesday, January 30, 2018 at 3:04:50 AM UTC+1, Michael Hunger wrote: >>>> >>>> Hi, >>>> this query should be better: >>>> >>>> match(node : FOLDER) where node.oid IN {uuidList} return node >>>> >>>> You have definitely a really bad system for this graph size: >>>> How much memory does the machine have? >>>> >>>> 0. Switch to Neo4j Enterprise 3.3.2 which is more memory efficient >>>> 1. *use an SSD* >>>> 2. use more memory >>>> 3. use a constraint instead of an index >>>> >>>> Otherwise you are effectively measuring disk speed. >>>> >>>> The problem is that the nodes might be distributed across the disk and >>>> then it might have to load up to 200 pages with the HDD having to seek to >>>> each of the blocks. >>>> >>>> Which properties of the nodes do you need to be returned? the full >>>> nodes? >>>> >>>> >>>> On Mon, Jan 29, 2018 at 5:11 PM, Vincent Mooser <vincent...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> I am currently facing some performance problems when loading nodes >>>>> using an indexed UUID. My use case is the following: >>>>> >>>>> - I initiate a search query in Apache Solr which returns a list of 200 >>>>> UUID (max) >>>>> - I load the 200 nodes corresponding to the uuid with the following >>>>> cypher: >>>>> >>>>> unwind {uuidList} as uuid >>>>> match(node : FOLDER { oid : uuid}) return node >>>>> >>>>> The uuidList is a query param containing the list of UUID (string) >>>>> >>>>> When the query has no page fault, it takes about 10-20ms to load the >>>>> 200 nodes. But when some page faults appears in the query log, the query >>>>> time can take up to 4 seconds. I understand that some nodes have to be >>>>> loaded directly from the disk, but for 200 nodes, it looks very slow to >>>>> me. >>>>> >>>>> The FOLDER nodes are organized like folders in a filesystem and are >>>>> attached together with a 'PARENT' relationship. The only folder that does >>>>> not have any parent is the root folder. >>>>> >>>>> Environment specs are: >>>>> - 300M nodes >>>>> - 600M relationships >>>>> - 110M nodes with the label 'FOLDER' >>>>> - all FOLDER nodes have a property 'oid' which index is online >>>>> - the graph.db directory is about 125g (without transaction logs) >>>>> - neo4j enterprise 3.2.6 and java driver 1.4.4 >>>>> - 8g of Heap >>>>> - 32g of page cache >>>>> - no SSD >>>>> >>>>> Any hints for improving performances ? >>>>> >>>>> Thank you >>>>> Vincent >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to neo4j+un...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to neo4j+un...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to neo4j+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.