Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Rio Eduardo Sat, 29 Mar 2014 07:47:20 -0700

Thank you so much for your help and your explaination Michael. It really 
helps.


On Saturday, March 29, 2014 6:35:06 PM UTC+7, Michael Hunger wrote:
>
> Probably a faster CPU on my machine?
>
> constraint also guarantees uniqueness and creates an index automatically 
> but adds more cost on insertion
> index is optimizing lookups
>
> Depends on your needs, one property might be unique so you want a 
> constraint
> other properties you might want to search by, so you add an index.
>
>
>
> On Sat, Mar 29, 2014 at 3:33 AM, Rio Eduardo <rioedu...@gmail.com<javascript:>
> > wrote:
>
>> Thank you for the reply Michael. Yes, and I already tried it again for a 
>> second time.
>> I just realized that was my mistake. I always thought that the new 
>> feature Labels already applied Index or Constraint so I had never created 
>> Index or Constraint when I was using cypher.
>>
>> And after I created constraint for :User(user_id), I got the result I 
>> expected:
>>
>> match (u:User) return count(*);
>> +----------+
>> | count(*) |
>> +----------+
>> | 1000     |
>> +----------+
>> 1 row
>> 7 ms
>>
>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>> WHERE U.user_id=1 
>> WITH distinct U, FFU
>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>> RETURN FFU.user_id;
>>
>> ...
>> 879 rows
>>
>> 187 ms
>>
>> after I got my cypher faster than before, I have a question again,
>> why is the execution time between me and you different?
>>
>> yours
>> +----------+
>> | count(*) |
>> +----------+
>> | 1000     |
>> +----------+
>> 1 row
>> 4 ms
>>
>> ...
>> 910 rows
>>
>> 101 ms
>>
>> mine
>> +----------+
>> | count(*) |
>> +----------+
>> | 1000     |
>> +----------+
>> 1 row
>> 7 ms
>>
>> ...
>> 879 rows
>>
>> 187 ms
>>
>> is it because the size of the property and the number of the property 
>> that belongs to node is different?
>>
>> And what is different between Index and Constraint?
>> Should I create two of them?
>> If I already created Index, Should I create Constraint again?
>> Or if I already created Constraint, Should I create Index again?
>>
>> Thank you.
>>
>> On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote:
>>
>>> Rio,
>>>
>>> was this your first run of both statements? If so, please run them for a 
>>> second time.
>>> And did you create an index or constraint for :User(user_id) ?
>>>
>>> MATCH (U:User) RETURN COUNT(U);
>>>
>>> I would also change:
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.username
>>>
>>> to
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.user_id=1 
>>> WITH distinct U, FFU
>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.username
>>>  
>>> I quickly created a dataset on my machine:
>>>
>>> cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i}));
>>>
>>> create constraint on (u:User) assert u.id is unique;  
>>>
>>> match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create 
>>> (u1)-[:Friend]->(u2);
>>>
>>> Relationships created: 99974
>>>
>>> 778 ms
>>>
>>> match (u:User) return count(*);
>>>
>>> +----------+
>>> | count(*) |
>>> +----------+
>>> | 1000     |
>>> +----------+
>>> 1 row
>>> *4 ms*
>>>
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.id=1 
>>> WITH distinct U, FFU
>>> WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.id;
>>>
>>> ...
>>>
>>> 910 rows
>>>
>>> 101 ms
>>>
>>> but even your query takes only
>>>
>>> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU)
>>> RETURN FFU.id;
>>>
>>> ...
>>>
>>> 8188 rows
>>>
>>> 578 ms
>>>
>>>
>>> On Fri, Mar 28, 2014 at 2:08 PM, Lundin <lundin....@gmail.com> wrote:
>>> >
>>> > ms, it is milliseconds.
>>> >
>>> > What is the corresponding result for a SQL db ?
>>> > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF;
>>> >
>>> > Albeit a valid search is it something useful ? I would think finding a 
>>> specific persons FoFoF in either end, as a starting point or end point, 
>>> would be a very realistic scenario. Adding an Index on User:name and query 
>>> for a User with name:Rio try to find his FoFoF.
>>> >
>>> > Yes, neo4j has been kind and exposed various function, like 
>>> shortestpath in cypher
>>> > http://docs.neo4j.org/refcard/2.0/
>>> >
>>> > Also look at some gist examples
>>> > https://github.com/neo4j-contrib/graphgist/wiki
>>> >
>>> > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio Eduardo:
>>> >>
>>> >> Thank you so much for the reply Lundin. I really apreciate it. Okay, 
>>> yesterday I just tested my experiment again. And the result was not what I 
>>> imagined and expected before. Okay, before I tested 1M users, I reduced the 
>>> number of users into 1000 users and tested it not in my social network but 
>>> directly in database only(Neo4j Shell) to find out that it was not caused 
>>> by the performance of pc. But the result of returning 1000 users was 200ms 
>>> and 1 row and the result of returning friends at depth of two was 85000ms 
>>> and 2500 rows and are 200ms and 85000ms fast to you? and what does ms stand 
>>> for? is it milliseconds or microseconds?
>>> >>
>>> >> the query I use for returning 1000 users is
>>> >> MATCH (U:User) RETURN COUNT(U);
>>> >>
>>> >> and the query I use for returning friends at depth of two is
>>> >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User)
>>> >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT 
>>> (U)-[:Friend]->(FFU)
>>> >> RETURN FFU.username
>>> >>
>>> >> Please note that I tested with default configuration of Neo4j and 
>>> created users with 1000 random nodes and created friends relationships with 
>>> 50000 random relationships(1 user has 50 friends). Each relationship has a 
>>> label Friend and no properties on it. Each node has a label User, 4 
>>> properties: user_id, username, password and profile_picture. Each property 
>>> has a value of 1-60 characters. average of characters of user_id=1-1000 
>>> characters, all usernames have 10 characters randomly, all passwords have 
>>> 60 characters because I MD5 it, and profile_picture has 1-60 characters.
>>> >>
>>> >> And about your statement "Otherwise if you really need to present 
>>> that many "things" just paging the result with SKIP,LIMIT. I has never made 
>>> sense to present 1M of anything at a time for a user.", I already did 
>>> according to your statement above but it is still the same, Neo4j returns 
>>> result slower.
>>> >>
>>> >> And I'm wondering if Neo4j already applied one of graph 
>>> algorithms(shortest path, djikstra, A*, etc) in its system or not.
>>> >>
>>> >> Thank you.
>>> >>
>>> >>
>>> >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote:
>>> >>>
>>> >>> Rio, any version will do. They can all handle million nodes on 
>>> common hardware, no magic at all. When hundred of millions of billions then 
>>> we might need to look into specfication more in detail. But in that case 
>>> with that kind of data there are other bottlencks for a social network or 
>>> any web appp that needs to be taken care of as well.
>>> >>>
>>> >>> you said:
>>> >>>>
>>> >>>>  Given any two persons chosen at random, is there a path that 
>>> connects them that is at most five relationships long? For a social network 
>>> containing 1,000,000 people, each with approximately 50 friends, the 
>>> results strongly suggest that graph databases are the best choice for 
>>> connected data. And graph database can still work 150 times faster than 
>>> relational database at third degree and 1000 times faster at fourth degre
>>> >>>
>>> >>>
>>> >>> I fail to see how this is connected to your attempt to list 1M users 
>>> in one go at the first page. You would want to seek if there is a 
>>> relationship and return that path between users. You need two start nodes 
>>> and seek a path by traveser the relationsip rather than scan tables and 
>>> that would be the comparison.
>>> >>> Otherwise if you really need to present that many "things" just 
>>> paging the result with SKIP,LIMIT. I has never made sense to present 1M of 
>>> anything at a time for a user. Again, that wouldn't really serve your 
>>> experiment much good to prove graph theory.
>>> >>>
>>> >>> What is the result of MATCH(U:User) RETURN count(U); ?
>>> >>>
>>> >>> Also when you do your test make sure to add the warm/cold cache 
>>> effect (better/worse performance)
>>> >>>
>>> >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio 
>>> Eduardo:
>>> >>>>
>>> >>>> I just knew about memory allocation and just read Server 
>>> Performance Tuning of Neo4j. neo4j.properties:
>>> >>>> # Default values for the low-level graph engine
>>> >>>>
>>> >>>> #neostore.nodestore.db.mapped_memory=25M
>>> >>>> #neostore.relationshipstore.db.mapped_memory=50M
>>> >>>> #neostore.propertystore.db.mapped_memory=90M
>>> >>>> #neostore.propertystore.db.strings.mapped_memory=130M
>>> >>>> #neostore.propertystore.db.arrays.mapped_memory=130M
>>> >>>>
>>> >>>> Should I change this to get high performance? If yes, please 
>>> suggest me.
>>> >>>>
>>> >>>> And I just knew about Neo4j Licenses, they are Community, Personal, 
>>> Startups, Business and Enterprise. And at Neo4j website all features are 
>>> explained. So which Neo4j should I use for my case that has millions nodes 
>>> and relationships?
>>> >>>>
>>> >>>> Please answer. I need your help so much.
>>> >>>>
>>> >>>> Thanks.
>>> >>>>
>>> >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote:
>>> >>>>>
>>> >>>>> I'm testing my thesis which is about transforming from relational 
>>> database to graph database. After transforming from relational database to 
>>> graph database, I will test their own performance according to query 
>>> response time and throughput. In relational database, I use MySQL while in 
>>> graph database I use Neo4j for testing. I will have 3 Million more nodes 
>>> and 6 Million more relationships. But when I just added 60000 nodes, my 
>>> Neo4j is already dead. When I tried to return all 60000 nodes, it returned 
>>> unknown. I did the same to MySQL, I added 60000 records but it could return 
>>> all 60000 records. It's weird because it's against the papers I read that 
>>> told me graph database is faster than relational database So Why is Neo4j 
>>> slower(totally dead) in lower specification of pc/notebook while MySQL is 
>>> not? And What specification of pc/notebook do I should use to give the best 
>>> performance during testing with millions of nodes and relationships?
>>> >>>>>
>>> >>>>> Thank you.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to neo4j+un...@googlegroups.com.
>>>
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Why is Neo4j slower(totally dead) with many nodes and relationships in lower specification of pc/notebook while MySQL is not?

Reply via email to