Re: [Neo4j] Re: Poor performance on relatively big db

Matias Burak Fri, 30 Dec 2016 06:57:18 -0800

Ok, the USING INDEX had to go before WHERE, but doing that the query never 
finishes…


Also, the first query you proposed is slower than the original.


> El 30 dic. 2016, a las 11:32, Matias Burak <mbu...@gmail.com> escribió:
> 
> I think most of the slowness is when doing count. If I just get a small set 
> doing limit it works fine.
> 
> Also, while trying to test your suggestions, I wan’t able to get USING INDEX 
> to work, is there anything wrong in the syntax?
> 
> neo4j> profile
>        MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
>        USING INDEX n1:Label1(prop1)
>        MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
>        USING INDEX n:Label2(prop2)
>        RETURN COUNT(*) AS `Count`;
> Invalid input 'S': expected 'n/N' (line 4, column 2 (offset: 56))
> "USING INDEX n1:Label1(prop1)"
>             ^
> 
> Also, how can I make sure is it completely memory mapped? I have 16gb on the 
> environment I’m testing it and and I left the neo4j defaults for memory.
>  
>> El 29 dic. 2016, a las 18:12, 'Michael Hunger' via Neo4j 
>> <neo4j@googlegroups.com <mailto:neo4j@googlegroups.com>> escribió:
>> 
>> In 3.1  (and then also 3.2)
>> you could also use the compiled runtime (except for the count)
>> 
>> So a query like
>> 
>> cypher runtime=compiledExperimentalFeatureNotSupportedForProductionUse 
>> profile 
>> MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND 
>> n1.`prop2`='bbb' RETURN 1
>> 
>> should use the compiled runtime and run faster than the interpreted runtime. 
>> In 3.2 it will be again (runtime=compiled) or actually the default in 
>> enterprise.
>> 
>> Michael
>> 
>> 
>> On Thu, Dec 29, 2016 at 10:07 PM, Michael Hunger 
>> <michael.hun...@neotechnology.com <mailto:michael.hun...@neotechnology.com>> 
>> wrote:
>> It depends a bit.
>> 
>> If you can aggressively filter on both sides (doesn't seem so with your > 2M 
>> results).
>> 
>> Then you can do an index lookup for both sides and do a WHERE on the cross 
>> product.
>> 
>> The other thing you can try is this (make sure n1 is the label with smaller 
>> cardinality, which you can determine upfront quickly, by running MATCH 
>> (:Label)-[:TYPE]->()  return count(*)
>> 
>> profile 
>> MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes
>> MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes 
>> RETURN COUNT(*) AS `Count`;
>> 
>> 
>> profile 
>> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
>> USING INDEX n1:Label1(prop1)
>> MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa'
>> USING INDEX n:Label2(prop2)
>> RETURN COUNT(*) AS `Count`;
>> 
>> profile 
>> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb'
>> USING INDEX n1:Label1(prop1)
>> MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1)
>> USING INDEX n:Label2(prop2)
>> RETURN COUNT(*) AS `Count`;
>> 
>> Please try these variants, I can imagine reasons for either being faster.
>> And let us know.
>> 
>> Your store is not that big (3M nodes 100M rels) but have you made sure it is 
>> completely memory mapped?
>> 
>> Michael
>> 
>> On Thu, Dec 29, 2016 at 3:10 PM, Matias Burak <mbu...@gmail.com 
>> <mailto:mbu...@gmail.com>> wrote:
>> Thanks Michael, maybe using the boolean property as label would help but 
>> it's very complicated for us to do that because we are allowing the users to 
>> build their own data model, so there might be a lot of boolean properties, 
>> and we would have to turn all those properties into labels, making the 
>> database very complex and the way to retrieve data too.
>> Also, it's not just about booleans, we also have other cases like filtering 
>> by strings instead of boolean, and that's slow too.
>> The players/clubs graph was something i made up to protect our data. 
>> This is what we get when we try to filter on 2 properties on related nodes.
>> 
>> neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE 
>> n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`;
>> +---------+
>> | Count   |
>> +---------+
>> | 2127237 |
>> +---------+
>> 1 row
>> 5747 ms
>> 
>> Compiler CYPHER 3.1
>> 
>> Planner COST
>> 
>> Runtime INTERPRETED
>> 
>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   | 
>> Other                                         |
>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>> | +ProduceResults   |            498 |       1 |       0 | Count       | 
>> Count                                         |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +EagerAggregation |            498 |       1 |       0 | Count       |     
>>                                           |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +Filter           |         248298 | 2127237 | 4815283 | n, n1, r1   | 
>> n.prop1 == {  AUTOSTRING0} AND n:Label2       |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 | 
>> (n1)<-[r1:REL1]-(n)                           |
>> | |                 
>> +----------------+---------+---------+-------------+-----------------------------------------------+
>> | +NodeIndexSeek    |              1 |       1 |       2 | n1          | 
>> :Label1(prop2)                                |
>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>> 
>> Total database accesses: 7259857
>> 
>> 
>> 
>> El miércoles, 28 de diciembre de 2016, 8:06:45 (UTC-3), Michael Hunger 
>> escribió:
>> I would turn boolean properties, like international into a label.
>> 
>> 
>> like this:
>> MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name 
>> <http://n1.name/>='FC Barcelona' RETURN COUNT(*) AS `Count`
>> 
>> Something seems to be off with your data too.
>> 
>> As there are hardly 2.2M international players in FC Barcelona (and 2.4M in 
>> total) it will take a bit to expand / load them.
>> What kind of Hardware do you run this on?
>> 
>> Or is this some computer game and players there?
>> 
>> Please also try something like this:
>> 
>> MATCH (c:Club {name:"FC Barcelona"})
>> MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c)
>> RETURN count(*);
>> 
>> Michael
>> 
>> 
>> On Tue, Dec 27, 2016 at 9:38 PM, Matias Burak <mbu...@gmail.com <>> wrote:
>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE 
>> n.`international` = true AND n1.name <http://n1.name/>='FC Barcelona' RETURN 
>> COUNT(*) AS `Count`;
>> +---------+
>> | Count   |
>> +---------+
>> | 2181771 |
>> +---------+
>> 1 row
>> 7912 ms
>> 
>> Compiler CYPHER 3.1
>> 
>> Planner COST
>> 
>> Runtime INTERPRETED
>> 
>> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   | 
>> Other                                            |
>> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
>> | +ProduceResults   |            593 |       1 |       0 | Count       | 
>> Count                                            |
>> | |                 
>> +----------------+---------+---------+-------------+--------------------------------------------------+
>> | +EagerAggregation |            593 |       1 |       0 | Count       |     
>>                                              |
>> | |                 
>> +----------------+---------+---------+-------------+--------------------------------------------------+
>> | +Filter           |         351532 | 2181771 | 4869817 | n, n1, r1   | 
>> n.international == {  AUTOBOOL0} AND n:Player    |
>> | |                 
>> +----------------+---------+---------+-------------+--------------------------------------------------+
>> | +Expand(All)      |         815288 | 2444571 | 2444572 | n, r1 -- n1 | 
>> (n1)<-[r1:CLUB]-(n)                              |
>> | |                 
>> +----------------+---------+---------+-------------+--------------------------------------------------+
>> | +NodeIndexSeek    |              1 |       1 |       2 | n1          | 
>> :Club(name)                                      |
>> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+
>> 
>> Total database accesses: 7314391
>> 
>> That's the actual query and the execution plan. And yes, we do have indexes 
>> on :Club(name) and :Player(international)
>> 
>> 
>> 
>> 
>> El martes, 27 de diciembre de 2016, 1:00:41 (UTC-3), Max De Marzi Jr. 
>> escribió:
>> 
>> Have you created Indexes for the fields you will be searching on? For 
>> example:
>> 
>> CREATE INDEX ON :Club(name)
>> 
>> What is the actual query you are trying to optimize and what is the actual 
>> query execution plan of the query?
>> 
>> On Sunday, December 25, 2016 at 1:17:30 PM UTC-6, Matias Burak wrote:
>> Well, actually I was simplifying the query but what we really need to do is 
>> to filter by some property in club too (like club.name <http://club.name/> = 
>> 'FC Barcelona') so that solution wouldn't work...
>> 
>> 
>> El 25 de dic. de 2016 13:27 -0300, Max De Marzi Jr. <maxde...@gmail.com <>>, 
>> escribió:
>>> Try:
>>> 
>>> MATCH (n:`Player`) WHERE n.`international` = true RETURN 
>>> SUM(size((n)-[:CLUB]->()) ) AS `Count`;
>>> 
>>> On Saturday, December 24, 2016 at 3:55:26 AM UTC-6, Matias Burak wrote:
>>> We are working with the latest 3.1 version, we have a database of around 
>>> 3.5M nodes and 100M relationships.
>>> Doing some basic queries is very slow that makes the application unusable.
>>> 
>>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE 
>>> n.`international` = true RETURN COUNT(*) AS `Count`;
>>> +---------+
>>> | Count   |
>>> +---------+
>>> | 2181771 <> |
>>> +---------+
>>> 1 row
>>> 28676 ms
>>> 
>>> Compiler CYPHER 3.1
>>> 
>>> Planner COST
>>> 
>>> Runtime INTERPRETED
>>> 
>>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>>> | Operator          | Estimated Rows | Rows    | DB Hits | Variables   | 
>>> Other                                         |
>>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>>> | +ProduceResults   |           1027 |       1 |       0 | Count       | 
>>> Count                                         |
>>> | |                 
>>> +----------------+---------+---------+-------------+-----------------------------------------------+
>>> | +EagerAggregation |           1027 |       1 |       0 | Count       |    
>>>                                            |
>>> | |                 
>>> +----------------+---------+---------+-------------+-----------------------------------------------+
>>> | +Filter           |        1054596 <> | 2181771 <> | 4872405 <> | n, n1, 
>>> r1   | n.international == {  AUTOBOOL0} AND n:Player |
>>> | |                 
>>> +----------------+---------+---------+-------------+-----------------------------------------------+
>>> | +Expand(All)      |        2445865 <> | 2445865 <> | 2445868 <> | n, r1 
>>> -- n1 | (n1)<-[r1:CLUB]-(n)                           |
>>> | |                 
>>> +----------------+---------+---------+-------------+-----------------------------------------------+
>>> | +NodeByLabelScan  |              3 |       3 |       4 | n1          | 
>>> :Club                                         |
>>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+
>>> 
>>> Total database accesses: 7318277 <>
>>> There are around 2.3M players and 3 clubs. 
>>> It's taking 28secs the first time, then in takes around 9-10 secs.
>>> It's really annoying because it's a very simple query.
>>> Is there anything we are doing wrong? Is this something we can do about or 
>>> it's the expected behaviour? This is running on a 16gbRAM machine, getting 
>>> more RAM is the only way to improve it?
>>> 
>>> Thanks,
>>> Matias. 
>>>  
>>> 
>>> --
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "Neo4j" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe 
>>> <https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe>.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> neo4j+un...@googlegroups.com <>.
>>> For more options, visit https://groups.google.com/d/optout 
>>> <https://groups.google.com/d/optout>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+unsubscr...@googlegroups.com 
>> <mailto:neo4j+unsubscr...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe 
>> <https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe>.
>> To unsubscribe from this group and all its topics, send an email to 
>> neo4j+unsubscr...@googlegroups.com 
>> <mailto:neo4j+unsubscr...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Poor performance on relatively big db

Reply via email to