Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Rio Eduardo Mon, 31 Mar 2014 04:09:27 -0700

Hello again Michael.

I just want to make sure that my query is correct to find friends of 
friends at depth of four and five. Please help me by checking my query.


Query at depth of four:
MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
WHERE U.user_id=1
WITH DISTINCT U, FU, FFU
WHERE FFU<>U 
WITH DISTINCT U, FU, FFU
MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
WHERE FFFU<>FU
WITH DISTINCT U, FFU, FFFU
MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)
RETURN DISTINCT FFFFU.username;

Query at depth of five:
MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
WHERE U.user_id=1
WITH DISTINCT U, FU, FFU
WHERE FFU<>U 
WITH DISTINCT U, FU, FFU
MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
WHERE FFFU<>FU
WITH DISTINCT U, FFU, FFFU
MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
WHERE FFFFU<>FFU
WITH DISTINCT U, FFFU, FFFFU
MATCH (FFFFU:User)-[FFFFF:Friend]->(FFFFFU:User)
WHERE FFFFFU<>FFFU AND FFFFFU<>U AND NOT (U)-[:Friend]->(FFFFFU)
RETURN DISTINCT FFFFFU.username;

I need your help so much.
Thank you.

On Sunday, March 30, 2014 7:42:27 PM UTC+7, Michael Hunger wrote:
>
> Split it up in one more intermediate step, the intermediate steps are 
> there to get the cardinality down, so it doesn't have to match billions of 
> paths, only millions or 100k
>
> MATCH 
> (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:Friend]->(FFFU:User)
> WHERE U.user_id=1
> WITH DISTINCT U, FU, FFU
> WHERE FFU<>U 
> WITH DISTINCT U, FFU
> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
> WHERE NOT (U)-[:Friend]->(FFFU)
> RETURN distinct FFFU.username;
>
>
>
>
> On Sun, Mar 30, 2014 at 1:29 PM, Rio Eduardo <rioedu...@gmail.com<javascript:>
> > wrote:
>
>> Please help me again Michael.
>>
>> You ever said:
>>
>> I would also change:
>>
>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT (U)-[:Friend]->(FFU)
>> RETURN FFU.username
>>
>> to
>>
>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>> WHERE U.user_id=1 
>> WITH distinct U, FFU
>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>> RETURN FFU.username
>>
>> Query above is to find friends of friends at depth of two. And I would 
>> like to find friends of friends  at depth of three, when I use model of 
>> your query, it returns result longer than mine and the result is much more 
>> than mine. Ok so here is model of your query at depth of three:
>>
>> MATCH 
>> (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:Friend]->(FFFU:User)
>> WHERE U.user_id=1
>> WITH DISTINCT U, FU, FFU, FFFU
>> WHERE FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>> RETURN FFFU.username;
>>
>> ...
>>
>> 118858 rows
>> 20090 ms
>>
>> Mine:
>> MATCH 
>> (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:Friend]->(FFFU:User)
>> WHERE U.user_id=1 AND FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>> RETURN DISTINCT FFFU.username;
>>
>> ...
>>
>> 950 rows
>> 18133 ms
>>
>> Please help me, Why is model of your query longer than mine and return 
>> much more results than mine?
>>
>> Thank you.
>>
>>
>>
>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote:
>>
>>> Rio,
>>>
>>> was this your first run of both statements? If so, please run them for a 
>>> second time.
>>> And did you create an index or constraint for :User(user_id) ?
>>>
>>> MATCH (U:User) RETURN COUNT(U);
>>>
>>> I would also change:
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.username
>>>
>>> to
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1 
>>> WITH distinct U, FFU
>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.username
>>>  
>>> I quickly created a dataset on my machine:
>>>
>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i}));
>>>
>>> create constraint on (u:User) assert u.id is unique;  
>>>
>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create 
>>> (u1)-[:Friend]->(u2);
>>>
>>> Relationships created: 99974
>>>
>>> 778 ms
>>>
>>> match (u:User) return count(*);
>>>
>>> +----------+
>>> | count(*) |
>>> +----------+
>>> | 1000     |
>>> +----------+
>>> 1 row
>>> *4 ms*
>>>
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.id=1 
>>> WITH distinct U, FFU
>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.id;
>>>
>>> ...
>>>
>>> 910 rows
>>>
>>> 101 ms
>>>
>>> but even your query takes only
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.id;
>>>
>>> ...
>>>
>>> 8188 rows
>>>
>>> 578 ms
>>>
>>>
>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <lundin....@gmail.com> wrote:
>>> >
>>> > ms, it is milliseconds.
>>> >
>>> > What is the corresponding result for a SQL db ?
>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF;
>>> >
>>> > Albeit a valid search is it something useful ? I would think finding a 
>>> specific persons FoFoF in either end, as a starting point or end point, 
>>> would be a very realistic scenario. Adding an Index on User:name and query 
>>> for a User with name:Rio try to find his FoFoF.
>>> >
>>> > Yes, neo4j has been kind and exposed various function, like 
>>> shortestpath in cypher
>>> > http://docs.neo4j.org/refcard/2.0/
>>> >
>>> > Also look at some gist examples
>>> > https://github.com/neo4j-contrib/graphgist/wiki
>>> >
>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio Eduardo:
>>> >>
>>> >> Thank you so much for the reply Lundin. I really apreciate it. Okay, 
>>> yesterday I just tested my experiment again. And the result was not what I 
>>> imagined and expected before. Okay, before I tested 1M users, I reduced the 
>>> number of users into 1000 users and tested it not in my social network but 
>>> directly in database only(Neo4j Shell) to find out that it was not caused 
>>> by the performance of pc. But the result of returning 1000 users was 200ms 
>>> and 1 row and the result of returning friends at depth of two was 85000ms 
>>> and 2500 rows and are 200ms and 85000ms fast to you? and what does ms stand 
>>> for? is it milliseconds or microseconds?
>>> >>
>>> >> the query I use for returning 1000 users is
>>> >> MATCH (U:User) RETURN COUNT(U);
>>> >>
>>> >> and the query I use for returning friends at depth of two is
>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT 
>>> (U)-[:Friend]->(FFU)
>>> >> RETURN FFU.username
>>> >>
>>> >> Please note that I tested with default configuration of Neo4j and 
>>> created users with 1000 random nodes and created friends relationships with 
>>> 50000 random relationships(1 user has 50 friends). Each relationship has a 
>>> label Friend and no properties on it. Each node has a label User, 4 
>>> properties: user_id, username, password and profile_picture. Each property 
>>> has a value of 1-60 characters. average of characters of user_id=1-1000 
>>> characters, all usernames have 10 characters randomly, all passwords have 
>>> 60 characters because I MD5 it, and profile_picture has 1-60 characters.
>>> >>
>>> >> And about your statement "Otherwise if you really need to present 
>>> that many "things" just paging the result with SKIP,LIMIT. I has never made 
>>> sense to present 1M of anything at a time for a user.", I already did 
>>> according to your statement above but it is still the same, Neo4j returns 
>>> result slower.
>>> >>
>>> >> And I'm wondering if Neo4j already applied one of graph 
>>> algorithms(shortest path, djikstra, A*, etc) in its system or not.
>>> >>
>>> >> Thank you.
>>> >>
>>> >>
>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote:
>>> >>>
>>> >>> Rio, any version will do. They can all handle million nodes on 
>>> common hardware, no magic at all. When hundred of millions of billions then 
>>> we might need to look into specfication more in detail. But in that case 
>>> with that kind of data there are other bottlencks for a social network or 
>>> any web appp that needs to be taken care of as well.
>>> >>>
>>> >>> you said:
>>> >>>>
>>> >>>>  Given any two persons chosen at random, is there a path that 
>>> connects them that is at most five relationships long? For a social network 
>>> containing 1,000,000 people, each with approximately 50 friends, the 
>>> results strongly suggest that graph databases are the best choice for 
>>> connected data. And graph database can still work 150 times faster than 
>>> relational database at third degree and 1000 times faster at fourth degre
>>> >>>
>>> >>>
>>> >>> I fail to see how this is connected to your attempt to list 1M users 
>>> in one go at the first page. You would want to seek if there is a 
>>> relationship and return that path between users. You need two start nodes 
>>> and seek a path by traveser the relationsip rather than scan tables and 
>>> that would be the comparison.
>>> >>> Otherwise if you really need to present that many "things" just 
>>> paging the result with SKIP,LIMIT. I has never made sense to present 1M of 
>>> anything at a time for a user. Again, that wouldn't really serve your 
>>> experiment much good to prove graph theory.
>>> >>>
>>> >>> What is the result of MATCH(U:User) RETURN count(U); ?
>>> >>>
>>> >>> Also when you do your test make sure to add the warm/cold cache 
>>> effect (better/worse performance)
>>> >>>
>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio 
>>> Eduardo:
>>> >>>>
>>> >>>> I just knew about memory allocation and just read Server 
>>> Performance Tuning of Neo4j. neo4j.properties:
>>> >>>> # Default values for the low-level graph engine
>>> >>>>
>>> >>>> #neostore.nodestore.db.mapped_memory=25M
>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M
>>> >>>> #neostore.propertystore.db.mapped_memory=90M
>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M
>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M
>>> >>>>
>>> >>>> Should I change this to get high performance? If yes, please 
>>> suggest me.
>>> >>>>
>>> >>>> And I just knew about Neo4j Licenses, they are Community, Personal, 
>>> Startups, Business and Enterprise. And at Neo4j website all features are 
>>> explained. So which Neo4j should I use for my case that has millions nodes 
>>> and relationships?
>>> >>>>
>>> >>>> Please answer. I need your help so much.
>>> >>>>
>>> >>>> Thanks.
>>> >>>>
>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote:
>>> >>>>>
>>> >>>>> I'm testing my thesis which is about transforming from relational 
>>> database to graph database. After transforming from relational database to 
>>> graph database, I will test their own performance according to query 
>>> response time and throughput. In relational database, I use MySQL while in 
>>> graph database I use Neo4j for testing. I will have 3 Million more nodes 
>>> and 6 Million more relationships. But when I just added 60000 nodes, my 
>>> Neo4j is already dead. When I tried to return all 60000 nodes, it returned 
>>> unknown. I did the same to MySQL, I added 60000 records but it could return 
>>> all 60000 records. It's weird because it's against the papers I read that 
>>> told me graph database is faster than relational database So Why is Neo4j 
>>> slower(totally dead) in lower specification of pc/notebook while MySQL is 
>>> not? And What specification of pc/notebook do I should use to give the best 
>>> performance during testing with millions of nodes and relationships?
>>> >>>>>
>>> >>>>> Thank you.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to neo4j+un...@googlegroups.com.
>>>
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Reply via email to