Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Michael Hunger Mon, 31 Mar 2014 06:09:49 -0700

Just use a dataset that you can reason about and check if they work
correctly.


Hard for me to be the consistency checker on your queries :)

In general if you really want to do these deep traversals you might be
better off (in terms of performance) using the traversal-API with an
appropriate uniqueness constraint, like node-path.




On Mon, Mar 31, 2014 at 1:09 PM, Rio Eduardo <rioeduard...@gmail.com> wrote:

> Hello again Michael.
>
> I just want to make sure that my query is correct to find friends of
> friends at depth of four and five. Please help me by checking my query.
>
> Query at depth of four:
> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
> WHERE U.user_id=1
> WITH DISTINCT U, FU, FFU
> WHERE FFU<>U
> WITH DISTINCT U, FU, FFU
> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
> WHERE FFFU<>FU
> WITH DISTINCT U, FFU, FFFU
> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
> WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)
> RETURN DISTINCT FFFFU.username;
>
> Query at depth of five:
> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
> WHERE U.user_id=1
> WITH DISTINCT U, FU, FFU
> WHERE FFU<>U
> WITH DISTINCT U, FU, FFU
> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
> WHERE FFFU<>FU
> WITH DISTINCT U, FFU, FFFU
> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
> WHERE FFFFU<>FFU
> WITH DISTINCT U, FFFU, FFFFU
> MATCH (FFFFU:User)-[FFFFF:Friend]->(FFFFFU:User)
> WHERE FFFFFU<>FFFU AND FFFFFU<>U AND NOT (U)-[:Friend]->(FFFFFU)
> RETURN DISTINCT FFFFFU.username;
>
> I need your help so much.
> Thank you.
>
>
> On Sunday, March 30, 2014 7:42:27 PM UTC+7, Michael Hunger wrote:
>
>> Split it up in one more intermediate step, the intermediate steps are
>> there to get the cardinality down, so it doesn't have to match billions of
>> paths, only millions or 100k
>>
>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>> Friend]->(FFFU:User)
>> WHERE U.user_id=1
>> WITH DISTINCT U, FU, FFU
>> WHERE FFU<>U
>> WITH DISTINCT U, FFU
>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>> WHERE NOT (U)-[:Friend]->(FFFU)
>> RETURN distinct FFFU.username;
>>
>>
>>
>>
>> On Sun, Mar 30, 2014 at 1:29 PM, Rio Eduardo <rioedu...@gmail.com> wrote:
>>
>>> Please help me again Michael.
>>>
>>> You ever said:
>>>
>>> I would also change:
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.username
>>>
>>> to
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1
>>> WITH distinct U, FFU
>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.username
>>>
>>> Query above is to find friends of friends at depth of two. And I would
>>> like to find friends of friends  at depth of three, when I use model of
>>> your query, it returns result longer than mine and the result is much more
>>> than mine. Ok so here is model of your query at depth of three:
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>> Friend]->(FFFU:User)
>>> WHERE U.user_id=1
>>> WITH DISTINCT U, FU, FFU, FFFU
>>> WHERE FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>>> RETURN FFFU.username;
>>>
>>> ...
>>>
>>> 118858 rows
>>> 20090 ms
>>>
>>> Mine:
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>> Friend]->(FFFU:User)
>>> WHERE U.user_id=1 AND FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>>> RETURN DISTINCT FFFU.username;
>>>
>>> ...
>>>
>>> 950 rows
>>> 18133 ms
>>>
>>> Please help me, Why is model of your query longer than mine and return
>>> much more results than mine?
>>>
>>> Thank you.
>>>
>>>
>>>
>>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote:
>>>
>>>> Rio,
>>>>
>>>> was this your first run of both statements? If so, please run them for
>>>> a second time.
>>>> And did you create an index or constraint for :User(user_id) ?
>>>>
>>>> MATCH (U:User) RETURN COUNT(U);
>>>>
>>>> I would also change:
>>>>
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>> (U)-[:Friend]->(FFU)
>>>> RETURN FFU.username
>>>>
>>>> to
>>>>
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> WHERE U.user_id=1
>>>> WITH distinct U, FFU
>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>> RETURN FFU.username
>>>>
>>>> I quickly created a dataset on my machine:
>>>>
>>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i}));
>>>>
>>>> create constraint on (u:User) assert u.id is unique;
>>>>
>>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create
>>>> (u1)-[:Friend]->(u2);
>>>>
>>>> Relationships created: 99974
>>>>
>>>> 778 ms
>>>>
>>>> match (u:User) return count(*);
>>>>
>>>> +----------+
>>>> | count(*) |
>>>> +----------+
>>>> | 1000     |
>>>> +----------+
>>>> 1 row
>>>> *4 ms*
>>>>
>>>>
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> WHERE U.id=1
>>>> WITH distinct U, FFU
>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>> RETURN FFU.id;
>>>>
>>>> ...
>>>>
>>>> 910 rows
>>>>
>>>> 101 ms
>>>>
>>>> but even your query takes only
>>>>
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU)
>>>> RETURN FFU.id;
>>>>
>>>> ...
>>>>
>>>> 8188 rows
>>>>
>>>> 578 ms
>>>>
>>>>
>>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <lundin....@gmail.com> wrote:
>>>> >
>>>> > ms, it is milliseconds.
>>>> >
>>>> > What is the corresponding result for a SQL db ?
>>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF;
>>>> >
>>>> > Albeit a valid search is it something useful ? I would think finding
>>>> a specific persons FoFoF in either end, as a starting point or end point,
>>>> would be a very realistic scenario. Adding an Index on User:name and query
>>>> for a User with name:Rio try to find his FoFoF.
>>>> >
>>>> > Yes, neo4j has been kind and exposed various function, like
>>>> shortestpath in cypher
>>>> > http://docs.neo4j.org/refcard/2.0/
>>>> >
>>>> > Also look at some gist examples
>>>> > https://github.com/neo4j-contrib/graphgist/wiki
>>>> >
>>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio Eduardo:
>>>> >>
>>>> >> Thank you so much for the reply Lundin. I really apreciate it. Okay,
>>>> yesterday I just tested my experiment again. And the result was not what I
>>>> imagined and expected before. Okay, before I tested 1M users, I reduced the
>>>> number of users into 1000 users and tested it not in my social network but
>>>> directly in database only(Neo4j Shell) to find out that it was not caused
>>>> by the performance of pc. But the result of returning 1000 users was 200ms
>>>> and 1 row and the result of returning friends at depth of two was 85000ms
>>>> and 2500 rows and are 200ms and 85000ms fast to you? and what does ms stand
>>>> for? is it milliseconds or microseconds?
>>>> >>
>>>> >> the query I use for returning 1000 users is
>>>> >> MATCH (U:User) RETURN COUNT(U);
>>>> >>
>>>> >> and the query I use for returning friends at depth of two is
>>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT
>>>> (U)-[:Friend]->(FFU)
>>>> >> RETURN FFU.username
>>>> >>
>>>> >> Please note that I tested with default configuration of Neo4j and
>>>> created users with 1000 random nodes and created friends relationships with
>>>> 50000 random relationships(1 user has 50 friends). Each relationship has a
>>>> label Friend and no properties on it. Each node has a label User, 4
>>>> properties: user_id, username, password and profile_picture. Each property
>>>> has a value of 1-60 characters. average of characters of user_id=1-1000
>>>> characters, all usernames have 10 characters randomly, all passwords have
>>>> 60 characters because I MD5 it, and profile_picture has 1-60 characters.
>>>> >>
>>>> >> And about your statement "Otherwise if you really need to present
>>>> that many "things" just paging the result with SKIP,LIMIT. I has never made
>>>> sense to present 1M of anything at a time for a user.", I already did
>>>> according to your statement above but it is still the same, Neo4j returns
>>>> result slower.
>>>> >>
>>>> >> And I'm wondering if Neo4j already applied one of graph
>>>> algorithms(shortest path, djikstra, A*, etc) in its system or not.
>>>> >>
>>>> >> Thank you.
>>>> >>
>>>> >>
>>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote:
>>>> >>>
>>>> >>> Rio, any version will do. They can all handle million nodes on
>>>> common hardware, no magic at all. When hundred of millions of billions then
>>>> we might need to look into specfication more in detail. But in that case
>>>> with that kind of data there are other bottlencks for a social network or
>>>> any web appp that needs to be taken care of as well.
>>>> >>>
>>>> >>> you said:
>>>> >>>>
>>>> >>>>  Given any two persons chosen at random, is there a path that
>>>> connects them that is at most five relationships long? For a social network
>>>> containing 1,000,000 people, each with approximately 50 friends, the
>>>> results strongly suggest that graph databases are the best choice for
>>>> connected data. And graph database can still work 150 times faster than
>>>> relational database at third degree and 1000 times faster at fourth degre
>>>> >>>
>>>> >>>
>>>> >>> I fail to see how this is connected to your attempt to list 1M
>>>> users in one go at the first page. You would want to seek if there is a
>>>> relationship and return that path between users. You need two start nodes
>>>> and seek a path by traveser the relationsip rather than scan tables and
>>>> that would be the comparison.
>>>> >>> Otherwise if you really need to present that many "things" just
>>>> paging the result with SKIP,LIMIT. I has never made sense to present 1M of
>>>> anything at a time for a user. Again, that wouldn't really serve your
>>>> experiment much good to prove graph theory.
>>>> >>>
>>>> >>> What is the result of MATCH(U:User) RETURN count(U); ?
>>>> >>>
>>>> >>> Also when you do your test make sure to add the warm/cold cache
>>>> effect (better/worse performance)
>>>> >>>
>>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio
>>>> Eduardo:
>>>> >>>>
>>>> >>>> I just knew about memory allocation and just read Server
>>>> Performance Tuning of Neo4j. neo4j.properties:
>>>> >>>> # Default values for the low-level graph engine
>>>> >>>>
>>>> >>>> #neostore.nodestore.db.mapped_memory=25M
>>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M
>>>> >>>> #neostore.propertystore.db.mapped_memory=90M
>>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M
>>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M
>>>> >>>>
>>>> >>>> Should I change this to get high performance? If yes, please
>>>> suggest me.
>>>> >>>>
>>>> >>>> And I just knew about Neo4j Licenses, they are Community,
>>>> Personal, Startups, Business and Enterprise. And at Neo4j website all
>>>> features are explained. So which Neo4j should I use for my case that has
>>>> millions nodes and relationships?
>>>> >>>>
>>>> >>>> Please answer. I need your help so much.
>>>> >>>>
>>>> >>>> Thanks.
>>>> >>>>
>>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote:
>>>> >>>>>
>>>> >>>>> I'm testing my thesis which is about transforming from relational
>>>> database to graph database. After transforming from relational database to
>>>> graph database, I will test their own performance according to query
>>>> response time and throughput. In relational database, I use MySQL while in
>>>> graph database I use Neo4j for testing. I will have 3 Million more nodes
>>>> and 6 Million more relationships. But when I just added 60000 nodes, my
>>>> Neo4j is already dead. When I tried to return all 60000 nodes, it returned
>>>> unknown. I did the same to MySQL, I added 60000 records but it could return
>>>> all 60000 records. It's weird because it's against the papers I read that
>>>> told me graph database is faster than relational database So Why is Neo4j
>>>> slower(totally dead) in lower specification of pc/notebook while MySQL is
>>>> not? And What specification of pc/notebook do I should use to give the best
>>>> performance during testing with millions of nodes and relationships?
>>>> >>>>>
>>>> >>>>> Thank you.
>>>> >
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>> send an email to neo4j+un...@googlegroups.com.
>>>>
>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to neo4j+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Reply via email to