Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Rio Eduardo Tue, 01 Apr 2014 21:48:22 -0700

Oh yeah Michael I'm new in traversal api, after I read the doc about 
traversal api, should I use it with java? I mean is there others way to use 
traversal api, ex: I can run the syntax traversal api in neo4j shell or 
http? if there is, please provide me a reference how to use traversal api 
not through java.


Thank you.

On Tuesday, April 1, 2014 8:31:25 PM UTC+7, Michael Hunger wrote:
>
> For the traversal framework check out: 
> http://docs.neo4j.org/chunked/milestone/tutorial-traversal.html
>
>
> On Tue, Apr 1, 2014 at 3:09 PM, Rio Eduardo <rioedu...@gmail.com<javascript:>
> > wrote:
>
>> Hi Michael,
>>
>> you said "In general if you really want to do these deep traversals you 
>> might be better off (in terms of performance) using the traversal-API with 
>> an appropriate uniqueness constraint, like node-path". Please give me any 
>> references so I can learn it. or Does it mean you suggest me to use Gremlin?
>>
>> Thank you.
>>
>>
>> On Monday, March 31, 2014 8:09:32 PM UTC+7, Michael Hunger wrote:
>>
>>> Just use a dataset that you can reason about and check if they work 
>>> correctly.
>>>
>>> Hard for me to be the consistency checker on your queries :)
>>>
>>> In general if you really want to do these deep traversals you might be 
>>> better off (in terms of performance) using the traversal-API with an 
>>> appropriate uniqueness constraint, like node-path.
>>>
>>>
>>>
>>>
>>> On Mon, Mar 31, 2014 at 1:09 PM, Rio Eduardo <rioedu...@gmail.com>wrote:
>>>
>>>> Hello again Michael.
>>>>
>>>> I just want to make sure that my query is correct to find friends of 
>>>> friends at depth of four and five. Please help me by checking my query.
>>>>
>>>> Query at depth of four:
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> WHERE U.user_id=1
>>>> WITH DISTINCT U, FU, FFU
>>>> WHERE FFU<>U 
>>>> WITH DISTINCT U, FU, FFU
>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>> WHERE FFFU<>FU
>>>> WITH DISTINCT U, FFU, FFFU
>>>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
>>>> WHERE FFFFU<>FFU AND FFFFU<>U AND NOT (U)-[:Friend]->(FFFFU)
>>>> RETURN DISTINCT FFFFU.username;
>>>>
>>>> Query at depth of five:
>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>> WHERE U.user_id=1
>>>> WITH DISTINCT U, FU, FFU
>>>> WHERE FFU<>U 
>>>> WITH DISTINCT U, FU, FFU
>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>> WHERE FFFU<>FU
>>>> WITH DISTINCT U, FFU, FFFU
>>>> MATCH (FFFU:User)-[FFFF:Friend]->(FFFFU:User)
>>>> WHERE FFFFU<>FFU
>>>> WITH DISTINCT U, FFFU, FFFFU
>>>> MATCH (FFFFU:User)-[FFFFF:Friend]->(FFFFFU:User)
>>>> WHERE FFFFFU<>FFFU AND FFFFFU<>U AND NOT (U)-[:Friend]->(FFFFFU)
>>>> RETURN DISTINCT FFFFFU.username;
>>>>
>>>> I need your help so much.
>>>> Thank you.
>>>>
>>>>
>>>> On Sunday, March 30, 2014 7:42:27 PM UTC+7, Michael Hunger wrote:
>>>>
>>>>> Split it up in one more intermediate step, the intermediate steps are 
>>>>> there to get the cardinality down, so it doesn't have to match billions 
>>>>> of 
>>>>> paths, only millions or 100k
>>>>>
>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>> Friend]->(FFFU:User)
>>>>> WHERE U.user_id=1
>>>>> WITH DISTINCT U, FU, FFU
>>>>> WHERE FFU<>U 
>>>>> WITH DISTINCT U, FFU
>>>>> MATCH (FFU:User)-[FFF:Friend]->(FFFU:User)
>>>>> WHERE NOT (U)-[:Friend]->(FFFU)
>>>>> RETURN distinct FFFU.username;
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Mar 30, 2014 at 1:29 PM, Rio Eduardo <rioedu...@gmail.com>wrote:
>>>>>
>>>>>> Please help me again Michael.
>>>>>>
>>>>>> You ever said:
>>>>>>
>>>>>> I would also change:
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT 
>>>>>> (U)-[:Friend]->(FFU)
>>>>>> RETURN FFU.username
>>>>>>
>>>>>> to
>>>>>>
>>>>>>  MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>> WHERE U.user_id=1 
>>>>>> WITH distinct U, FFU
>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>> RETURN FFU.username
>>>>>>
>>>>>> Query above is to find friends of friends at depth of two. And I 
>>>>>> would like to find friends of friends  at depth of three, when I use 
>>>>>> model 
>>>>>> of your query, it returns result longer than mine and the result is much 
>>>>>> more than mine. Ok so here is model of your query at depth of three:
>>>>>>
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>>> Friend]->(FFFU:User)
>>>>>> WHERE U.user_id=1
>>>>>> WITH DISTINCT U, FU, FFU, FFFU
>>>>>> WHERE FFU<>U AND FFFU<>FU AND NOT (U)-[:Friend]->(FFFU)
>>>>>> RETURN FFFU.username;
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> 118858 rows
>>>>>> 20090 ms
>>>>>>
>>>>>> Mine:
>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)-[FFF:
>>>>>> Friend]->(FFFU:User)
>>>>>> WHERE U.user_id=1 AND FFU<>U AND FFFU<>FU AND NOT 
>>>>>> (U)-[:Friend]->(FFFU)
>>>>>> RETURN DISTINCT FFFU.username;
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> 950 rows
>>>>>> 18133 ms
>>>>>>
>>>>>> Please help me, Why is model of your query longer than mine and 
>>>>>> return much more results than mine?
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote:
>>>>>>
>>>>>>> Rio,
>>>>>>>
>>>>>>> was this your first run of both statements? If so, please run them 
>>>>>>> for a second time.
>>>>>>> And did you create an index or constraint for :User(user_id) ?
>>>>>>>
>>>>>>> MATCH (U:User) RETURN COUNT(U);
>>>>>>>
>>>>>>> I would also change:
>>>>>>>
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT 
>>>>>>> (U)-[:Friend]->(FFU)
>>>>>>> RETURN FFU.username
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> WHERE U.user_id=1 
>>>>>>> WITH distinct U, FFU
>>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>>> RETURN FFU.username
>>>>>>>  
>>>>>>> I quickly created a dataset on my machine:
>>>>>>>
>>>>>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i}));
>>>>>>>
>>>>>>> create constraint on (u:User) assert u.id is unique;  
>>>>>>>
>>>>>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create 
>>>>>>> (u1)-[:Friend]->(u2);
>>>>>>>
>>>>>>> Relationships created: 99974
>>>>>>>
>>>>>>> 778 ms
>>>>>>>
>>>>>>> match (u:User) return count(*);
>>>>>>>
>>>>>>> +----------+
>>>>>>> | count(*) |
>>>>>>> +----------+
>>>>>>> | 1000     |
>>>>>>> +----------+
>>>>>>> 1 row
>>>>>>> *4 ms*
>>>>>>>
>>>>>>>
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> WHERE U.id=1 
>>>>>>> WITH distinct U, FFU
>>>>>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>>>>>> RETURN FFU.id;
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> 910 rows
>>>>>>>
>>>>>>> 101 ms
>>>>>>>
>>>>>>> but even your query takes only
>>>>>>>
>>>>>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU)
>>>>>>> RETURN FFU.id;
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> 8188 rows
>>>>>>>
>>>>>>> 578 ms
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <lundin....@gmail.com> 
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > ms, it is milliseconds.
>>>>>>> >
>>>>>>> > What is the corresponding result for a SQL db ?
>>>>>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF;
>>>>>>> >
>>>>>>> > Albeit a valid search is it something useful ? I would think 
>>>>>>> finding a specific persons FoFoF in either end, as a starting point or 
>>>>>>> end 
>>>>>>> point, would be a very realistic scenario. Adding an Index on User:name 
>>>>>>> and 
>>>>>>> query for a User with name:Rio try to find his FoFoF.
>>>>>>> >
>>>>>>> > Yes, neo4j has been kind and exposed various function, like 
>>>>>>> shortestpath in cypher
>>>>>>> > http://docs.neo4j.org/refcard/2.0/
>>>>>>> >
>>>>>>> > Also look at some gist examples
>>>>>>> > https://github.com/neo4j-contrib/graphgist/wiki
>>>>>>> >
>>>>>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio 
>>>>>>> Eduardo:
>>>>>>> >>
>>>>>>> >> Thank you so much for the reply Lundin. I really apreciate it. 
>>>>>>> Okay, yesterday I just tested my experiment again. And the result was 
>>>>>>> not 
>>>>>>> what I imagined and expected before. Okay, before I tested 1M users, I 
>>>>>>> reduced the number of users into 1000 users and tested it not in my 
>>>>>>> social 
>>>>>>> network but directly in database only(Neo4j Shell) to find out that it 
>>>>>>> was 
>>>>>>> not caused by the performance of pc. But the result of returning 1000 
>>>>>>> users 
>>>>>>> was 200ms and 1 row and the result of returning friends at depth of two 
>>>>>>> was 
>>>>>>> 85000ms and 2500 rows and are 200ms and 85000ms fast to you? and what 
>>>>>>> does 
>>>>>>> ms stand for? is it milliseconds or microseconds?
>>>>>>> >>
>>>>>>> >> the query I use for returning 1000 users is
>>>>>>> >> MATCH (U:User) RETURN COUNT(U);
>>>>>>> >>
>>>>>>> >> and the query I use for returning friends at depth of two is
>>>>>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>>>>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT 
>>>>>>> (U)-[:Friend]->(FFU)
>>>>>>> >> RETURN FFU.username
>>>>>>> >>
>>>>>>> >> Please note that I tested with default configuration of Neo4j and 
>>>>>>> created users with 1000 random nodes and created friends relationships 
>>>>>>> with 
>>>>>>> 50000 random relationships(1 user has 50 friends). Each relationship 
>>>>>>> has a 
>>>>>>> label Friend and no properties on it. Each node has a label User, 4 
>>>>>>> properties: user_id, username, password and profile_picture. Each 
>>>>>>> property 
>>>>>>> has a value of 1-60 characters. average of characters of user_id=1-1000 
>>>>>>> characters, all usernames have 10 characters randomly, all passwords 
>>>>>>> have 
>>>>>>> 60 characters because I MD5 it, and profile_picture has 1-60 characters.
>>>>>>> >>
>>>>>>> >> And about your statement "Otherwise if you really need to present 
>>>>>>> that many "things" just paging the result with SKIP,LIMIT. I has never 
>>>>>>> made 
>>>>>>> sense to present 1M of anything at a time for a user.", I already did 
>>>>>>> according to your statement above but it is still the same, Neo4j 
>>>>>>> returns 
>>>>>>> result slower.
>>>>>>> >>
>>>>>>> >> And I'm wondering if Neo4j already applied one of graph 
>>>>>>> algorithms(shortest path, djikstra, A*, etc) in its system or not.
>>>>>>> >>
>>>>>>> >> Thank you.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote:
>>>>>>> >>>
>>>>>>> >>> Rio, any version will do. They can all handle million nodes on 
>>>>>>> common hardware, no magic at all. When hundred of millions of billions 
>>>>>>> then 
>>>>>>> we might need to look into specfication more in detail. But in that 
>>>>>>> case 
>>>>>>> with that kind of data there are other bottlencks for a social network 
>>>>>>> or 
>>>>>>> any web appp that needs to be taken care of as well.
>>>>>>> >>>
>>>>>>> >>> you said:
>>>>>>> >>>>
>>>>>>> >>>>  Given any two persons chosen at random, is there a path that 
>>>>>>> connects them that is at most five relationships long? For a social 
>>>>>>> network 
>>>>>>> containing 1,000,000 people, each with approximately 50 friends, the 
>>>>>>> results strongly suggest that graph databases are the best choice for 
>>>>>>> connected data. And graph database can still work 150 times faster than 
>>>>>>> relational database at third degree and 1000 times faster at fourth 
>>>>>>> degre
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> I fail to see how this is connected to your attempt to list 1M 
>>>>>>> users in one go at the first page. You would want to seek if there is a 
>>>>>>> relationship and return that path between users. You need two start 
>>>>>>> nodes 
>>>>>>> and seek a path by traveser the relationsip rather than scan tables and 
>>>>>>> that would be the comparison.
>>>>>>> >>> Otherwise if you really need to present that many "things" just 
>>>>>>> paging the result with SKIP,LIMIT. I has never made sense to present 1M 
>>>>>>> of 
>>>>>>> anything at a time for a user. Again, that wouldn't really serve your 
>>>>>>> experiment much good to prove graph theory.
>>>>>>> >>>
>>>>>>> >>> What is the result of MATCH(U:User) RETURN count(U); ?
>>>>>>> >>>
>>>>>>> >>> Also when you do your test make sure to add the warm/cold cache 
>>>>>>> effect (better/worse performance)
>>>>>>> >>>
>>>>>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio 
>>>>>>> Eduardo:
>>>>>>> >>>>
>>>>>>> >>>> I just knew about memory allocation and just read Server 
>>>>>>> Performance Tuning of Neo4j. neo4j.properties:
>>>>>>> >>>> # Default values for the low-level graph engine
>>>>>>> >>>>
>>>>>>> >>>> #neostore.nodestore.db.mapped_memory=25M
>>>>>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M
>>>>>>> >>>> #neostore.propertystore.db.mapped_memory=90M
>>>>>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M
>>>>>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M
>>>>>>> >>>>
>>>>>>> >>>> Should I change this to get high performance? If yes, please 
>>>>>>> suggest me.
>>>>>>> >>>>
>>>>>>> >>>> And I just knew about Neo4j Licenses, they are Community, 
>>>>>>> Personal, Startups, Business and Enterprise. And at Neo4j website all 
>>>>>>> features are explained. So which Neo4j should I use for my case that 
>>>>>>> has 
>>>>>>> millions nodes and relationships?
>>>>>>> >>>>
>>>>>>> >>>> Please answer. I need your help so much.
>>>>>>> >>>>
>>>>>>> >>>> Thanks.
>>>>>>> >>>>
>>>>>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> I'm testing my thesis which is about transforming from 
>>>>>>> relational database to graph database. After transforming from 
>>>>>>> relational 
>>>>>>> database to graph database, I will test their own performance according 
>>>>>>> to 
>>>>>>> query response time and throughput. In relational database, I use MySQL 
>>>>>>> while in graph database I use Neo4j for testing. I will have 3 Million 
>>>>>>> more 
>>>>>>> nodes and 6 Million more relationships. But when I just added 60000 
>>>>>>> nodes, 
>>>>>>> my Neo4j is already dead. When I tried to return all 60000 nodes, it 
>>>>>>> returned unknown. I did the same to MySQL, I added 60000 records but it 
>>>>>>> could return all 60000 records. It's weird because it's against the 
>>>>>>> papers 
>>>>>>> I read that told me graph database is faster than relational database 
>>>>>>> So 
>>>>>>> Why is Neo4j slower(totally dead) in lower specification of pc/notebook 
>>>>>>> while MySQL is not? And What specification of pc/notebook do I should 
>>>>>>> use 
>>>>>>> to give the best performance during testing with millions of nodes and 
>>>>>>> relationships?
>>>>>>> >>>>>
>>>>>>> >>>>> Thank you.
>>>>>>> >
>>>>>>> > --
>>>>>>> > You received this message because you are subscribed to the Google 
>>>>>>> Groups "Neo4j" group.
>>>>>>> > To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to neo4j+un...@googlegroups.com.
>>>>>>>
>>>>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to neo4j+un...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to neo4j+un...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Reply via email to