Thank you for the reply Michael. Yes, and I already tried it again for a second time. I just realized that was my mistake. I always thought that the new feature Labels already applied Index or Constraint so I had never created Index or Constraint when I was using cypher.
And after I created constraint for :User(user_id), I got the result I expected: match (u:User) return count(*); +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row 7 ms MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) WHERE U.user_id=1 WITH distinct U, FFU WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU) RETURN FFU.user_id; ... 879 rows 187 ms after I got my cypher faster than before, I have a question again, why is the execution time between me and you different? yours +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row 4 ms ... 910 rows 101 ms mine +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row 7 ms ... 879 rows 187 ms is it because the size of the property and the number of the property that belongs to node is different? And what is different between Index and Constraint? Should I create two of them? If I already created Index, Should I create Constraint again? Or if I already created Constraint, Should I create Index again? Thank you. On Friday, March 28, 2014 8:30:20 PM UTC+7, Michael Hunger wrote: > > Rio, > > was this your first run of both statements? If so, please run them for a > second time. > And did you create an index or constraint for :User(user_id) ? > > MATCH (U:User) RETURN COUNT(U); > > I would also change: > > MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) > WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT (U)-[:Friend]->(FFU) > RETURN FFU.username > > to > > MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) > WHERE U.user_id=1 > WITH distinct U, FFU > WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU) > RETURN FFU.username > > I quickly created a dataset on my machine: > > cypher 2.0 foreach (i in range(1,1000) | create (:User {id:i})); > > create constraint on (u:User) assert u.id is unique; > > match (u1:User),(u2:User) with u1,u2 where rand() < 0.1 create > (u1)-[:Friend]->(u2); > > Relationships created: 99974 > > 778 ms > > match (u:User) return count(*); > > +----------+ > | count(*) | > +----------+ > | 1000 | > +----------+ > 1 row > *4 ms* > > > MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) > WHERE U.id=1 > WITH distinct U, FFU > WHERE FFU<>U AND NOT (U)-[:Friend]->(FFU) > RETURN FFU.id; > > ... > > 910 rows > > 101 ms > > but even your query takes only > > MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) > WHERE U.id=1 AND FFU.id<>U.id AND NOT (U)-[:Friend]->(FFU) > RETURN FFU.id; > > ... > > 8188 rows > > 578 ms > > > On Fri, Mar 28, 2014 at 2:08 PM, Lundin <lundin....@gmail.com<javascript:>> > wrote: > > > > ms, it is milliseconds. > > > > What is the corresponding result for a SQL db ? > > MATCH (n:User)-[:Friend*3]-(FoFoF) return FoFoF; > > > > Albeit a valid search is it something useful ? I would think finding a > specific persons FoFoF in either end, as a starting point or end point, > would be a very realistic scenario. Adding an Index on User:name and query > for a User with name:Rio try to find his FoFoF. > > > > Yes, neo4j has been kind and exposed various function, like shortestpath > in cypher > > http://docs.neo4j.org/refcard/2.0/ > > > > Also look at some gist examples > > https://github.com/neo4j-contrib/graphgist/wiki > > > > Den fredagen den 28:e mars 2014 kl. 05:00:22 UTC+1 skrev Rio Eduardo: > >> > >> Thank you so much for the reply Lundin. I really apreciate it. Okay, > yesterday I just tested my experiment again. And the result was not what I > imagined and expected before. Okay, before I tested 1M users, I reduced the > number of users into 1000 users and tested it not in my social network but > directly in database only(Neo4j Shell) to find out that it was not caused > by the performance of pc. But the result of returning 1000 users was 200ms > and 1 row and the result of returning friends at depth of two was 85000ms > and 2500 rows and are 200ms and 85000ms fast to you? and what does ms stand > for? is it milliseconds or microseconds? > >> > >> the query I use for returning 1000 users is > >> MATCH (U:User) RETURN COUNT(U); > >> > >> and the query I use for returning friends at depth of two is > >> MATCH (U:User)-[F:Friend]->(FU:User)-[FF:Friend]->(FFU:User) > >> WHERE U.user_id=1 AND FFU.user_id<>U.user_id AND NOT > (U)-[:Friend]->(FFU) > >> RETURN FFU.username > >> > >> Please note that I tested with default configuration of Neo4j and > created users with 1000 random nodes and created friends relationships with > 50000 random relationships(1 user has 50 friends). Each relationship has a > label Friend and no properties on it. Each node has a label User, 4 > properties: user_id, username, password and profile_picture. Each property > has a value of 1-60 characters. average of characters of user_id=1-1000 > characters, all usernames have 10 characters randomly, all passwords have > 60 characters because I MD5 it, and profile_picture has 1-60 characters. > >> > >> And about your statement "Otherwise if you really need to present that > many "things" just paging the result with SKIP,LIMIT. I has never made > sense to present 1M of anything at a time for a user.", I already did > according to your statement above but it is still the same, Neo4j returns > result slower. > >> > >> And I'm wondering if Neo4j already applied one of graph > algorithms(shortest path, djikstra, A*, etc) in its system or not. > >> > >> Thank you. > >> > >> > >> On Friday, March 28, 2014 3:43:49 AM UTC+7, Lundin wrote: > >>> > >>> Rio, any version will do. They can all handle million nodes on common > hardware, no magic at all. When hundred of millions of billions then we > might need to look into specfication more in detail. But in that case with > that kind of data there are other bottlencks for a social network or any > web appp that needs to be taken care of as well. > >>> > >>> you said: > >>>> > >>>> Given any two persons chosen at random, is there a path that > connects them that is at most five relationships long? For a social network > containing 1,000,000 people, each with approximately 50 friends, the > results strongly suggest that graph databases are the best choice for > connected data. And graph database can still work 150 times faster than > relational database at third degree and 1000 times faster at fourth degre > >>> > >>> > >>> I fail to see how this is connected to your attempt to list 1M users > in one go at the first page. You would want to seek if there is a > relationship and return that path between users. You need two start nodes > and seek a path by traveser the relationsip rather than scan tables and > that would be the comparison. > >>> Otherwise if you really need to present that many "things" just paging > the result with SKIP,LIMIT. I has never made sense to present 1M of > anything at a time for a user. Again, that wouldn't really serve your > experiment much good to prove graph theory. > >>> > >>> What is the result of MATCH(U:User) RETURN count(U); ? > >>> > >>> Also when you do your test make sure to add the warm/cold cache effect > (better/worse performance) > >>> > >>> Den torsdagen den 27:e mars 2014 kl. 17:57:10 UTC+1 skrev Rio Eduardo: > >>>> > >>>> I just knew about memory allocation and just read Server Performance > Tuning of Neo4j. neo4j.properties: > >>>> # Default values for the low-level graph engine > >>>> > >>>> #neostore.nodestore.db.mapped_memory=25M > >>>> #neostore.relationshipstore.db.mapped_memory=50M > >>>> #neostore.propertystore.db.mapped_memory=90M > >>>> #neostore.propertystore.db.strings.mapped_memory=130M > >>>> #neostore.propertystore.db.arrays.mapped_memory=130M > >>>> > >>>> Should I change this to get high performance? If yes, please suggest > me. > >>>> > >>>> And I just knew about Neo4j Licenses, they are Community, Personal, > Startups, Business and Enterprise. And at Neo4j website all features are > explained. So which Neo4j should I use for my case that has millions nodes > and relationships? > >>>> > >>>> Please answer. I need your help so much. > >>>> > >>>> Thanks. > >>>> > >>>> On Tuesday, March 25, 2014 12:03:58 AM UTC+7, Rio Eduardo wrote: > >>>>> > >>>>> I'm testing my thesis which is about transforming from relational > database to graph database. After transforming from relational database to > graph database, I will test their own performance according to query > response time and throughput. In relational database, I use MySQL while in > graph database I use Neo4j for testing. I will have 3 Million more nodes > and 6 Million more relationships. But when I just added 60000 nodes, my > Neo4j is already dead. When I tried to return all 60000 nodes, it returned > unknown. I did the same to MySQL, I added 60000 records but it could return > all 60000 records. It's weird because it's against the papers I read that > told me graph database is faster than relational database So Why is Neo4j > slower(totally dead) in lower specification of pc/notebook while MySQL is > not? And What specification of pc/notebook do I should use to give the best > performance during testing with millions of nodes and relationships? > >>>>> > >>>>> Thank you. > > > > -- > > You received this message because you are subscribed to the Google > Groups "Neo4j" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to neo4j+un...@googlegroups.com <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.