Ok, the USING INDEX had to go before WHERE, but doing that the query never finishes…
Also, the first query you proposed is slower than the original. > El 30 dic. 2016, a las 11:32, Matias Burak <mbu...@gmail.com> escribió: > > I think most of the slowness is when doing count. If I just get a small set > doing limit it works fine. > > Also, while trying to test your suggestions, I wan’t able to get USING INDEX > to work, is there anything wrong in the syntax? > > neo4j> profile > MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb' > USING INDEX n1:Label1(prop1) > MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa' > USING INDEX n:Label2(prop2) > RETURN COUNT(*) AS `Count`; > Invalid input 'S': expected 'n/N' (line 4, column 2 (offset: 56)) > "USING INDEX n1:Label1(prop1)" > ^ > > Also, how can I make sure is it completely memory mapped? I have 16gb on the > environment I’m testing it and and I left the neo4j defaults for memory. > >> El 29 dic. 2016, a las 18:12, 'Michael Hunger' via Neo4j >> <neo4j@googlegroups.com <mailto:neo4j@googlegroups.com>> escribió: >> >> In 3.1 (and then also 3.2) >> you could also use the compiled runtime (except for the count) >> >> So a query like >> >> cypher runtime=compiledExperimentalFeatureNotSupportedForProductionUse >> profile >> MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE n.`prop1` = 'aaa' AND >> n1.`prop2`='bbb' RETURN 1 >> >> should use the compiled runtime and run faster than the interpreted runtime. >> In 3.2 it will be again (runtime=compiled) or actually the default in >> enterprise. >> >> Michael >> >> >> On Thu, Dec 29, 2016 at 10:07 PM, Michael Hunger >> <michael.hun...@neotechnology.com <mailto:michael.hun...@neotechnology.com>> >> wrote: >> It depends a bit. >> >> If you can aggressively filter on both sides (doesn't seem so with your > 2M >> results). >> >> Then you can do an index lookup for both sides and do a WHERE on the cross >> product. >> >> The other thing you can try is this (make sure n1 is the label with smaller >> cardinality, which you can determine upfront quickly, by running MATCH >> (:Label)-[:TYPE]->() return count(*) >> >> profile >> MATCH (n:`Label2`) WHERE n.`prop1` = 'aaa' WITH collect(n) as nodes >> MATCH (n1:`Label1`)<-[:REL1]-(n) WHERE n1.`prop2`='bbb' AND n IN nodes >> RETURN COUNT(*) AS `Count`; >> >> >> profile >> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb' >> USING INDEX n1:Label1(prop1) >> MATCH (n:`Label2`)-[:REL1]->(n1) WHERE n.`prop2`='aaa' >> USING INDEX n:Label2(prop2) >> RETURN COUNT(*) AS `Count`; >> >> profile >> MATCH (n1:`Label1`) WHERE n1.`prop1` = 'bbb' >> USING INDEX n1:Label1(prop1) >> MATCH (n:`Label2`) WHERE n.`prop2`='aaa' AND (n)-[:REL1]->(n1) >> USING INDEX n:Label2(prop2) >> RETURN COUNT(*) AS `Count`; >> >> Please try these variants, I can imagine reasons for either being faster. >> And let us know. >> >> Your store is not that big (3M nodes 100M rels) but have you made sure it is >> completely memory mapped? >> >> Michael >> >> On Thu, Dec 29, 2016 at 3:10 PM, Matias Burak <mbu...@gmail.com >> <mailto:mbu...@gmail.com>> wrote: >> Thanks Michael, maybe using the boolean property as label would help but >> it's very complicated for us to do that because we are allowing the users to >> build their own data model, so there might be a lot of boolean properties, >> and we would have to turn all those properties into labels, making the >> database very complex and the way to retrieve data too. >> Also, it's not just about booleans, we also have other cases like filtering >> by strings instead of boolean, and that's slow too. >> The players/clubs graph was something i made up to protect our data. >> This is what we get when we try to filter on 2 properties on related nodes. >> >> neo4j-sh (?)$ profile MATCH (n1:`Label1`)<-[r1:REL1]-(n:`Label2`) WHERE >> n.`prop1` = 'aaa' AND n1.`prop2`='bbb' RETURN COUNT(*) AS `Count`; >> +---------+ >> | Count | >> +---------+ >> | 2127237 | >> +---------+ >> 1 row >> 5747 ms >> >> Compiler CYPHER 3.1 >> >> Planner COST >> >> Runtime INTERPRETED >> >> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+ >> | Operator | Estimated Rows | Rows | DB Hits | Variables | >> Other | >> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+ >> | +ProduceResults | 498 | 1 | 0 | Count | >> Count | >> | | >> +----------------+---------+---------+-------------+-----------------------------------------------+ >> | +EagerAggregation | 498 | 1 | 0 | Count | >> | >> | | >> +----------------+---------+---------+-------------+-----------------------------------------------+ >> | +Filter | 248298 | 2127237 | 4815283 | n, n1, r1 | >> n.prop1 == { AUTOSTRING0} AND n:Label2 | >> | | >> +----------------+---------+---------+-------------+-----------------------------------------------+ >> | +Expand(All) | 815288 | 2444571 | 2444572 | n, r1 -- n1 | >> (n1)<-[r1:REL1]-(n) | >> | | >> +----------------+---------+---------+-------------+-----------------------------------------------+ >> | +NodeIndexSeek | 1 | 1 | 2 | n1 | >> :Label1(prop2) | >> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+ >> >> Total database accesses: 7259857 >> >> >> >> El miércoles, 28 de diciembre de 2016, 8:06:45 (UTC-3), Michael Hunger >> escribió: >> I would turn boolean properties, like international into a label. >> >> >> like this: >> MATCH (n1:`Club`)<-[r1:CLUB]-(n:International) WHERE n1.name >> <http://n1.name/>='FC Barcelona' RETURN COUNT(*) AS `Count` >> >> Something seems to be off with your data too. >> >> As there are hardly 2.2M international players in FC Barcelona (and 2.4M in >> total) it will take a bit to expand / load them. >> What kind of Hardware do you run this on? >> >> Or is this some computer game and players there? >> >> Please also try something like this: >> >> MATCH (c:Club {name:"FC Barcelona"}) >> MATCH (p:Player:International) WHERE (p)-[:CLUB]->(c) >> RETURN count(*); >> >> Michael >> >> >> On Tue, Dec 27, 2016 at 9:38 PM, Matias Burak <mbu...@gmail.com <>> wrote: >> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE >> n.`international` = true AND n1.name <http://n1.name/>='FC Barcelona' RETURN >> COUNT(*) AS `Count`; >> +---------+ >> | Count | >> +---------+ >> | 2181771 | >> +---------+ >> 1 row >> 7912 ms >> >> Compiler CYPHER 3.1 >> >> Planner COST >> >> Runtime INTERPRETED >> >> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+ >> | Operator | Estimated Rows | Rows | DB Hits | Variables | >> Other | >> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+ >> | +ProduceResults | 593 | 1 | 0 | Count | >> Count | >> | | >> +----------------+---------+---------+-------------+--------------------------------------------------+ >> | +EagerAggregation | 593 | 1 | 0 | Count | >> | >> | | >> +----------------+---------+---------+-------------+--------------------------------------------------+ >> | +Filter | 351532 | 2181771 | 4869817 | n, n1, r1 | >> n.international == { AUTOBOOL0} AND n:Player | >> | | >> +----------------+---------+---------+-------------+--------------------------------------------------+ >> | +Expand(All) | 815288 | 2444571 | 2444572 | n, r1 -- n1 | >> (n1)<-[r1:CLUB]-(n) | >> | | >> +----------------+---------+---------+-------------+--------------------------------------------------+ >> | +NodeIndexSeek | 1 | 1 | 2 | n1 | >> :Club(name) | >> +-------------------+----------------+---------+---------+-------------+--------------------------------------------------+ >> >> Total database accesses: 7314391 >> >> That's the actual query and the execution plan. And yes, we do have indexes >> on :Club(name) and :Player(international) >> >> >> >> >> El martes, 27 de diciembre de 2016, 1:00:41 (UTC-3), Max De Marzi Jr. >> escribió: >> >> Have you created Indexes for the fields you will be searching on? For >> example: >> >> CREATE INDEX ON :Club(name) >> >> What is the actual query you are trying to optimize and what is the actual >> query execution plan of the query? >> >> On Sunday, December 25, 2016 at 1:17:30 PM UTC-6, Matias Burak wrote: >> Well, actually I was simplifying the query but what we really need to do is >> to filter by some property in club too (like club.name <http://club.name/> = >> 'FC Barcelona') so that solution wouldn't work... >> >> >> El 25 de dic. de 2016 13:27 -0300, Max De Marzi Jr. <maxde...@gmail.com <>>, >> escribió: >>> Try: >>> >>> MATCH (n:`Player`) WHERE n.`international` = true RETURN >>> SUM(size((n)-[:CLUB]->()) ) AS `Count`; >>> >>> On Saturday, December 24, 2016 at 3:55:26 AM UTC-6, Matias Burak wrote: >>> We are working with the latest 3.1 version, we have a database of around >>> 3.5M nodes and 100M relationships. >>> Doing some basic queries is very slow that makes the application unusable. >>> >>> neo4j-sh (?)$ profile MATCH (n1:`Club`)<-[r1:CLUB]-(n:`Player`) WHERE >>> n.`international` = true RETURN COUNT(*) AS `Count`; >>> +---------+ >>> | Count | >>> +---------+ >>> | 2181771 <> | >>> +---------+ >>> 1 row >>> 28676 ms >>> >>> Compiler CYPHER 3.1 >>> >>> Planner COST >>> >>> Runtime INTERPRETED >>> >>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+ >>> | Operator | Estimated Rows | Rows | DB Hits | Variables | >>> Other | >>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+ >>> | +ProduceResults | 1027 | 1 | 0 | Count | >>> Count | >>> | | >>> +----------------+---------+---------+-------------+-----------------------------------------------+ >>> | +EagerAggregation | 1027 | 1 | 0 | Count | >>> | >>> | | >>> +----------------+---------+---------+-------------+-----------------------------------------------+ >>> | +Filter | 1054596 <> | 2181771 <> | 4872405 <> | n, n1, >>> r1 | n.international == { AUTOBOOL0} AND n:Player | >>> | | >>> +----------------+---------+---------+-------------+-----------------------------------------------+ >>> | +Expand(All) | 2445865 <> | 2445865 <> | 2445868 <> | n, r1 >>> -- n1 | (n1)<-[r1:CLUB]-(n) | >>> | | >>> +----------------+---------+---------+-------------+-----------------------------------------------+ >>> | +NodeByLabelScan | 3 | 3 | 4 | n1 | >>> :Club | >>> +-------------------+----------------+---------+---------+-------------+-----------------------------------------------+ >>> >>> Total database accesses: 7318277 <> >>> There are around 2.3M players and 3 clubs. >>> It's taking 28secs the first time, then in takes around 9-10 secs. >>> It's really annoying because it's a very simple query. >>> Is there anything we are doing wrong? Is this something we can do about or >>> it's the expected behaviour? This is running on a 16gbRAM machine, getting >>> more RAM is the only way to improve it? >>> >>> Thanks, >>> Matias. >>> >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "Neo4j" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe >>> <https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe>. >>> To unsubscribe from this group and all its topics, send an email to >>> neo4j+un...@googlegroups.com <>. >>> For more options, visit https://groups.google.com/d/optout >>> <https://groups.google.com/d/optout>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@googlegroups.com <>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+unsubscr...@googlegroups.com >> <mailto:neo4j+unsubscr...@googlegroups.com>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. >> >> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "Neo4j" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe >> <https://groups.google.com/d/topic/neo4j/uZtQWJOC1HE/unsubscribe>. >> To unsubscribe from this group and all its topics, send an email to >> neo4j+unsubscr...@googlegroups.com >> <mailto:neo4j+unsubscr...@googlegroups.com>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.