Re: [Neo4j] Loading nodes matching an indexed UUID

'Michael Hunger' via Neo4j Thu, 01 Feb 2018 16:47:35 -0800

Glad to hear that.

Perhaps we see each other during the graph tour in Europe :)


Cheers, Michael

On Thu, Feb 1, 2018 at 3:09 PM, Vincent Mooser <vincent.moo...@gmail.com>
wrote:

> Hi Michael,
> I applied all your recommendations and performance are better now. Next
> step will be the SSD.
>
> Thank you for your help
> Vincent
>
> On Tuesday, January 30, 2018 at 7:34:30 PM UTC+1, Michael Hunger wrote:
>>
>> Hi Vincent,
>>
>>
>> On Tue, Jan 30, 2018 at 4:27 PM, Vincent Mooser <vincent...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How much memory does the machine have?
>>>
>>> The machine has 64g of memory, so I think I can increase my page cache.
>>> But I should have at least twice this memory to be able to load the whole
>>> graph in the page cache.
>>>
>>
>> I would definitely increase the page-cache,
>>
>> If it's only 100k nodes that you're  loading it should be fine.
>> The page-cache is emptied by utilization (LRU-K) so if those 100k nodes
>> keep getting used, their pages stay in.
>> Although if a lot of other data is loaded they might get unloaded.
>> There is no idle eviction.
>>
>> For the node-properties there are separate pages.
>> From your description it would be 2 or at most 3 property-records per
>> node.
>>
>> The disk is the biggest issue, if you can compensate with the larger
>> page-cache to avoid disks hits that will help (at least for reads).
>>
>> Switch to 3.3.2
>> Use 12G heap
>> Use 48G page-cache
>>
>> Then this should be better.
>> Also try my query suggestion.
>>
>> Cheers, Michael
>>
>>
>> In my use case, as Solr only contains a subset of the FOLDER nodes (about
>>> 100000 nodes), I was thinking of executing a query that selects these
>>> 100000 nodes at start, for warming up the cache and to be sure that the
>>> page cache contains (at least) these nodes. Will they be evicted of the
>>> page cache after a certain amount of time ?
>>>
>>> Which properties of the nodes do you need to be returned? the full nodes?
>>>
>>> Yes, the full nodes have to be returned. They contain 1 oid (String), 1
>>> property 'name' (String), 4 boolean properties used as flags for business
>>> tasks and 2 long properties (creation and modification date)
>>>
>>> Thank you,
>>> Vincent
>>>
>>> On Tuesday, January 30, 2018 at 3:04:50 AM UTC+1, Michael Hunger wrote:
>>>>
>>>> Hi,
>>>> this query should be better:
>>>>
>>>> match(node : FOLDER) where node.oid IN {uuidList} return node
>>>>
>>>> You have definitely a really bad system for this graph size:
>>>> How much memory does the machine have?
>>>>
>>>> 0. Switch to Neo4j Enterprise 3.3.2 which is more memory efficient
>>>> 1. *use an SSD*
>>>> 2. use more memory
>>>> 3. use a constraint instead of an index
>>>>
>>>> Otherwise you are effectively measuring disk speed.
>>>>
>>>> The problem is that the nodes might be distributed across the disk and
>>>> then it might have to load up to 200 pages with the HDD having to seek to
>>>> each of the blocks.
>>>>
>>>> Which properties of the nodes do you need to be returned? the full
>>>> nodes?
>>>>
>>>>
>>>> On Mon, Jan 29, 2018 at 5:11 PM, Vincent Mooser <vincent...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> I am currently facing some performance problems when loading nodes
>>>>> using an indexed UUID. My use case is the following:
>>>>>
>>>>> - I initiate a search query in Apache Solr which returns a list of 200
>>>>> UUID (max)
>>>>> - I load the 200 nodes corresponding to the uuid with the following
>>>>> cypher:
>>>>>
>>>>> unwind {uuidList} as uuid
>>>>> match(node : FOLDER { oid : uuid}) return node
>>>>>
>>>>> The uuidList is a query param containing the list of UUID (string)
>>>>>
>>>>> When the query has no page fault, it takes about 10-20ms to load the
>>>>> 200 nodes. But when some page faults appears in the query log, the query
>>>>> time can take up to 4 seconds. I understand that some nodes have to be
>>>>> loaded directly from the disk, but for 200 nodes, it looks very slow to 
>>>>> me.
>>>>>
>>>>> The FOLDER nodes are organized  like folders in a filesystem and are
>>>>> attached together with a 'PARENT' relationship. The only folder that does
>>>>> not have any parent is the root folder.
>>>>>
>>>>> Environment specs are:
>>>>> - 300M nodes
>>>>> - 600M relationships
>>>>> - 110M nodes with the label 'FOLDER'
>>>>> - all FOLDER nodes have a property 'oid' which index is online
>>>>> - the graph.db directory is about 125g (without transaction logs)
>>>>> - neo4j enterprise 3.2.6 and java driver 1.4.4
>>>>> - 8g of Heap
>>>>> - 32g of page cache
>>>>> - no SSD
>>>>>
>>>>> Any hints for improving performances ?
>>>>>
>>>>> Thank you
>>>>> Vincent
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to neo4j+un...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Loading nodes matching an indexed UUID

Reply via email to