Re: [Neo4j] Performance issue on nodes with lots of relationships
Niels, that sounds fantastic, great work everyone so far! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Jul 8, 2011 at 1:27 AM, Niels Hoogeveen wrote: > > I did a write up on indexed relationships in the Git repo: > https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships > A performance comparison would indeed be great. Anecdotally, I have witnessed > the difference when trying to load all entries of Dbpedia. With 2.5 G heap > space, loading becomes problematic after some 70,000 relationships have been > added to the supernode. With the indexed relationship no such problems arise > and 1.6 million relationships are easily created without performance > degradation. > Having real performance figures would be nice though. > Niels > >> From: michael.hun...@neotechnology.com >> Date: Thu, 7 Jul 2011 22:56:17 +0200 >> To: user@lists.neo4j.org >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships >> >> Niels could you perhaps write up a blog post detailing the usage (also for >> your own scenario and how that solution would compare to the naive >> supernodes with just millions of relationships. >> >> Also I'd like to see a performance comparision of both approaches. >> >> Thanks so much for your work >> >> Michael >> >> Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: >> >> > >> > I am glad to see a solution will be provided at the core level. >> > Today, I pushed IndexedRelationships and IndexedRelationshipExpander to >> > Git, see: >> > https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship >> > This provides a solution to the issue, but is certainly not as fast as a >> > solution in core would be. >> > However, it does solve my issues and as a bonus, indexed relationships can >> > be traversed in sorted order,this is especially pleasant, since I usually >> > want to know only the recent additions of dense relationships. >> > Niels >> > >> > >> >> Date: Thu, 7 Jul 2011 21:37:26 +0200 >> >> From: matt...@neotechnology.com >> >> To: user@lists.neo4j.org >> >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships >> >> >> >> 2011/7/7 Agelos Pikoulas >> >> >> >>> I think its the same problem pattern that been in discussion lately with >> >>> dense nodes or supernodes (check >> >>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html). >> >>> >> >>> Michael Hunger has provided a quick solution to visiting the *few* >> >>> RelationshipTypes on a node that has *millions* of others, utilizing a >> >>> RelationshipExpander with an Index (check >> >>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) >> >>> >> >>> Ideally this would be abstracted & implemented in the core distribution >> >>> so >> >>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it >> >>> efficiently... >> >>> >> >> >> >> Yes, I'm positive that something will be done on a core level to make >> >> getting relationships of a specific type regardless of the total number of >> >> relationships fast. In the foreseeable future hopefully. >> >> >> >>> >> >>> Agelos >> >>> >> >>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White >> >>> wrote: >> >>> >> >>>> I use the shell as-is, but the messages.log is reporting... >> >>>> >> >>>> Physical mem: 3962MB, Heap size: 881MB >> >>>> >> >>>> My point is that if you ignore caching altogether, why did one run take >> >>>> 17x longer with only 2.4x more data? Considering this is a rather >> >>>> iterative algorithm, I don't see why you would even read a node or >> >>>> relationship more than once and thus a cache shouldn't matter at all. >> >>>> >> >>>> In this particular
Re: [Neo4j] Performance issue on nodes with lots of relationships
I did a write up on indexed relationships in the Git repo: https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships A performance comparison would indeed be great. Anecdotally, I have witnessed the difference when trying to load all entries of Dbpedia. With 2.5 G heap space, loading becomes problematic after some 70,000 relationships have been added to the supernode. With the indexed relationship no such problems arise and 1.6 million relationships are easily created without performance degradation. Having real performance figures would be nice though. Niels > From: michael.hun...@neotechnology.com > Date: Thu, 7 Jul 2011 22:56:17 +0200 > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships > > Niels could you perhaps write up a blog post detailing the usage (also for > your own scenario and how that solution would compare to the naive supernodes > with just millions of relationships. > > Also I'd like to see a performance comparision of both approaches. > > Thanks so much for your work > > Michael > > Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: > > > > > I am glad to see a solution will be provided at the core level. > > Today, I pushed IndexedRelationships and IndexedRelationshipExpander to > > Git, see: > > https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship > > This provides a solution to the issue, but is certainly not as fast as a > > solution in core would be. > > However, it does solve my issues and as a bonus, indexed relationships can > > be traversed in sorted order,this is especially pleasant, since I usually > > want to know only the recent additions of dense relationships. > > Niels > > > > > >> Date: Thu, 7 Jul 2011 21:37:26 +0200 > >> From: matt...@neotechnology.com > >> To: user@lists.neo4j.org > >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships > >> > >> 2011/7/7 Agelos Pikoulas > >> > >>> I think its the same problem pattern that been in discussion lately with > >>> dense nodes or supernodes (check > >>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html). > >>> > >>> Michael Hunger has provided a quick solution to visiting the *few* > >>> RelationshipTypes on a node that has *millions* of others, utilizing a > >>> RelationshipExpander with an Index (check > >>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) > >>> > >>> Ideally this would be abstracted & implemented in the core distribution so > >>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it > >>> efficiently... > >>> > >> > >> Yes, I'm positive that something will be done on a core level to make > >> getting relationships of a specific type regardless of the total number of > >> relationships fast. In the foreseeable future hopefully. > >> > >>> > >>> Agelos > >>> > >>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White > >>> wrote: > >>> > >>>> I use the shell as-is, but the messages.log is reporting... > >>>> > >>>>Physical mem: 3962MB, Heap size: 881MB > >>>> > >>>> My point is that if you ignore caching altogether, why did one run take > >>>> 17x longer with only 2.4x more data? Considering this is a rather > >>>> iterative algorithm, I don't see why you would even read a node or > >>>> relationship more than once and thus a cache shouldn't matter at all. > >>>> > >>>> In this particular case, I can't imagine taking 9+ minutes to read a > >>>> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an > >>>> artifact of Cypher in which it is building a set of Rs before applying > >>>> `count` rather than making count accept an iterable stream. > >>>> > >>>> Andrew > >>>> > >>>> On 07/06/2011 11:33 PM, David Montag wrote: > >>>>> Hi Andrew, > >>>>> > >>>>> How big is your configured Java heap? It could be that all the nodes > >>> and > >>>>> relationships don't fit into the cache. > >>>>> > >>>>> David > >>>>> > >>>>> On Wed, Jul 6, 2011 at 8:
Re: [Neo4j] Performance issue on nodes with lots of relationships
Niels could you perhaps write up a blog post detailing the usage (also for your own scenario and how that solution would compare to the naive supernodes with just millions of relationships. Also I'd like to see a performance comparision of both approaches. Thanks so much for your work Michael Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: > > I am glad to see a solution will be provided at the core level. > Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, > see: > https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship > This provides a solution to the issue, but is certainly not as fast as a > solution in core would be. > However, it does solve my issues and as a bonus, indexed relationships can be > traversed in sorted order,this is especially pleasant, since I usually want > to know only the recent additions of dense relationships. > Niels > > >> Date: Thu, 7 Jul 2011 21:37:26 +0200 >> From: matt...@neotechnology.com >> To: user@lists.neo4j.org >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships >> >> 2011/7/7 Agelos Pikoulas >> >>> I think its the same problem pattern that been in discussion lately with >>> dense nodes or supernodes (check >>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html). >>> >>> Michael Hunger has provided a quick solution to visiting the *few* >>> RelationshipTypes on a node that has *millions* of others, utilizing a >>> RelationshipExpander with an Index (check >>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) >>> >>> Ideally this would be abstracted & implemented in the core distribution so >>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it >>> efficiently... >>> >> >> Yes, I'm positive that something will be done on a core level to make >> getting relationships of a specific type regardless of the total number of >> relationships fast. In the foreseeable future hopefully. >> >>> >>> Agelos >>> >>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White >>> wrote: >>> >>>> I use the shell as-is, but the messages.log is reporting... >>>> >>>>Physical mem: 3962MB, Heap size: 881MB >>>> >>>> My point is that if you ignore caching altogether, why did one run take >>>> 17x longer with only 2.4x more data? Considering this is a rather >>>> iterative algorithm, I don't see why you would even read a node or >>>> relationship more than once and thus a cache shouldn't matter at all. >>>> >>>> In this particular case, I can't imagine taking 9+ minutes to read a >>>> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an >>>> artifact of Cypher in which it is building a set of Rs before applying >>>> `count` rather than making count accept an iterable stream. >>>> >>>> Andrew >>>> >>>> On 07/06/2011 11:33 PM, David Montag wrote: >>>>> Hi Andrew, >>>>> >>>>> How big is your configured Java heap? It could be that all the nodes >>> and >>>>> relationships don't fit into the cache. >>>>> >>>>> David >>>>> >>>>> On Wed, Jul 6, 2011 at 8:03 PM, Andrew White >>>> wrote: >>>>> >>>>>> Here is some interesting stats to consider. First, I split my nodes >>> into >>>>>> two groups, one node with 1.4M children and the other with 3.4M >>>>>> children. While I do see some cache warm-up improvements, the >>>>>> transversal doesn't seem to scale linearly; ie the larger super-node >>> has >>>>>> 2.4x more children but takes 17x longer to transverse. >>>>>> >>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) >>>>>> +--+ >>>>>> | count(r) | >>>>>> +--+ >>>>>> | 1468486 | >>>>>> +--+ >>>>>> 1 rows, 25724 ms >>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) >>>>>> +--+ >>>>>> | count(r) | >>>>>> +--+ >>>>>> | 1468486 | >>>>>> +--+ >>>>>> 1 rows, 19763 ms &
Re: [Neo4j] Performance issue on nodes with lots of relationships
I am glad to see a solution will be provided at the core level. Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, see: https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship This provides a solution to the issue, but is certainly not as fast as a solution in core would be. However, it does solve my issues and as a bonus, indexed relationships can be traversed in sorted order,this is especially pleasant, since I usually want to know only the recent additions of dense relationships. Niels > Date: Thu, 7 Jul 2011 21:37:26 +0200 > From: matt...@neotechnology.com > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships > > 2011/7/7 Agelos Pikoulas > > > I think its the same problem pattern that been in discussion lately with > > dense nodes or supernodes (check > > http://lists.neo4j.org/pipermail/user/2011-July/009832.html). > > > > Michael Hunger has provided a quick solution to visiting the *few* > > RelationshipTypes on a node that has *millions* of others, utilizing a > > RelationshipExpander with an Index (check > > http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) > > > > Ideally this would be abstracted & implemented in the core distribution so > > that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it > > efficiently... > > > > Yes, I'm positive that something will be done on a core level to make > getting relationships of a specific type regardless of the total number of > relationships fast. In the foreseeable future hopefully. > > > > > Agelos > > > > On Thu, Jul 7, 2011 at 3:16 PM, Andrew White > > wrote: > > > > > I use the shell as-is, but the messages.log is reporting... > > > > > > Physical mem: 3962MB, Heap size: 881MB > > > > > > My point is that if you ignore caching altogether, why did one run take > > > 17x longer with only 2.4x more data? Considering this is a rather > > > iterative algorithm, I don't see why you would even read a node or > > > relationship more than once and thus a cache shouldn't matter at all. > > > > > > In this particular case, I can't imagine taking 9+ minutes to read a > > > mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an > > > artifact of Cypher in which it is building a set of Rs before applying > > > `count` rather than making count accept an iterable stream. > > > > > > Andrew > > > > > > On 07/06/2011 11:33 PM, David Montag wrote: > > > > Hi Andrew, > > > > > > > > How big is your configured Java heap? It could be that all the nodes > > and > > > > relationships don't fit into the cache. > > > > > > > > David > > > > > > > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White > > > wrote: > > > > > > > >> Here is some interesting stats to consider. First, I split my nodes > > into > > > >> two groups, one node with 1.4M children and the other with 3.4M > > > >> children. While I do see some cache warm-up improvements, the > > > >> transversal doesn't seem to scale linearly; ie the larger super-node > > has > > > >> 2.4x more children but takes 17x longer to transverse. > > > >> > > > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > > > >> +--+ > > > >> | count(r) | > > > >> +--+ > > > >> | 1468486 | > > > >> +--+ > > > >> 1 rows, 25724 ms > > > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > > > >> +--+ > > > >> | count(r) | > > > >> +--+ > > > >> | 1468486 | > > > >> +--+ > > > >> 1 rows, 19763 ms > > > >> > > > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > > > >> +--+ > > > >> | count(r) | > > > >> +--+ > > > >> | 3472174 | > > > >> +--+ > > > >> 1 rows, 565448 ms > > > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > > > >> +--+ > > > >> | count(r) | > > > >> +--+ > > > >> | 3472174 | > > > >> +--+ > > > >>
Re: [Neo4j] Performance issue on nodes with lots of relationships
2011/7/7 Agelos Pikoulas > I think its the same problem pattern that been in discussion lately with > dense nodes or supernodes (check > http://lists.neo4j.org/pipermail/user/2011-July/009832.html). > > Michael Hunger has provided a quick solution to visiting the *few* > RelationshipTypes on a node that has *millions* of others, utilizing a > RelationshipExpander with an Index (check > http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) > > Ideally this would be abstracted & implemented in the core distribution so > that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it > efficiently... > Yes, I'm positive that something will be done on a core level to make getting relationships of a specific type regardless of the total number of relationships fast. In the foreseeable future hopefully. > > Agelos > > On Thu, Jul 7, 2011 at 3:16 PM, Andrew White > wrote: > > > I use the shell as-is, but the messages.log is reporting... > > > > Physical mem: 3962MB, Heap size: 881MB > > > > My point is that if you ignore caching altogether, why did one run take > > 17x longer with only 2.4x more data? Considering this is a rather > > iterative algorithm, I don't see why you would even read a node or > > relationship more than once and thus a cache shouldn't matter at all. > > > > In this particular case, I can't imagine taking 9+ minutes to read a > > mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an > > artifact of Cypher in which it is building a set of Rs before applying > > `count` rather than making count accept an iterable stream. > > > > Andrew > > > > On 07/06/2011 11:33 PM, David Montag wrote: > > > Hi Andrew, > > > > > > How big is your configured Java heap? It could be that all the nodes > and > > > relationships don't fit into the cache. > > > > > > David > > > > > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White > > wrote: > > > > > >> Here is some interesting stats to consider. First, I split my nodes > into > > >> two groups, one node with 1.4M children and the other with 3.4M > > >> children. While I do see some cache warm-up improvements, the > > >> transversal doesn't seem to scale linearly; ie the larger super-node > has > > >> 2.4x more children but takes 17x longer to transverse. > > >> > > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > > >> +--+ > > >> | count(r) | > > >> +--+ > > >> | 1468486 | > > >> +--+ > > >> 1 rows, 25724 ms > > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > > >> +--+ > > >> | count(r) | > > >> +--+ > > >> | 1468486 | > > >> +--+ > > >> 1 rows, 19763 ms > > >> > > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > > >> +--+ > > >> | count(r) | > > >> +--+ > > >> | 3472174 | > > >> +--+ > > >> 1 rows, 565448 ms > > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > > >> +--+ > > >> | count(r) | > > >> +--+ > > >> | 3472174 | > > >> +--+ > > >> 1 rows, 337975 ms > > >> > > >> Any ideas on this? > > >> Andrew > > >> > > >> On 07/06/2011 09:55 AM, Peter Neubauer wrote: > > >>> Andrew, > > >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in > > >>> order to count the relationships of a node, not returning them: > > >>> > > >>> start n=(1) match (n)-[r]-(x) return count(r) > > >>> > > >>> and try that several times to see if cold caches are initially > slowing > > >>> down things. > > >>> > > >>> or something along these lines. In the LS and Neoclipse the output > and > > >>> visualization will be slow for that amount of data. > > >>> > > >>> Cheers, > > >>> > > >>> /peter neubauer > > >>> > > >>> GTalk: neubauer.peter > > >>> Skype peter.neubauer > > >>> Phone +46 704 106975 > > >>> LinkedIn http://www.linkedin.com/in/neubauer > > >>> Twitter http://twitter.com/peterneubauer > > >>> > > >>> http://www.neo4j.org - Your high performance graph > > >> database. > > >>> http://startupbootcamp.org/- Öresund - Innovation happens HERE. > > >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing > > party. > > >>> > > >>> > > >>> > > >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White > > >> wrote: > > I have a graph with roughly 10M nodes. Some of these nodes are > highly > > connected to other nodes. For example I may have a single node with > > 1M+ > > relationships. A good analogy is a population that has a "lives-in" > > relationship to a state. Now the problem... > > > > Both neoclipse or neo4j-shell are terribly slow when working with > > these > > nodes. In the shell I would expect a `cd` to be very fast, > > much like selecting via a rowid in a standard DB. Instead, I usually > > see > > several seconds delay. Doing a `ls` takes so long that I usually > have > > to > > just kill the process. In fact `ls` never outputs anything which is > > odd > > sin
Re: [Neo4j] Performance issue on nodes with lots of relationships
I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html). Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted & implemented in the core distribution so that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it efficiently... Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White wrote: > I use the shell as-is, but the messages.log is reporting... > > Physical mem: 3962MB, Heap size: 881MB > > My point is that if you ignore caching altogether, why did one run take > 17x longer with only 2.4x more data? Considering this is a rather > iterative algorithm, I don't see why you would even read a node or > relationship more than once and thus a cache shouldn't matter at all. > > In this particular case, I can't imagine taking 9+ minutes to read a > mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an > artifact of Cypher in which it is building a set of Rs before applying > `count` rather than making count accept an iterable stream. > > Andrew > > On 07/06/2011 11:33 PM, David Montag wrote: > > Hi Andrew, > > > > How big is your configured Java heap? It could be that all the nodes and > > relationships don't fit into the cache. > > > > David > > > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White > wrote: > > > >> Here is some interesting stats to consider. First, I split my nodes into > >> two groups, one node with 1.4M children and the other with 3.4M > >> children. While I do see some cache warm-up improvements, the > >> transversal doesn't seem to scale linearly; ie the larger super-node has > >> 2.4x more children but takes 17x longer to transverse. > >> > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > >> +--+ > >> | count(r) | > >> +--+ > >> | 1468486 | > >> +--+ > >> 1 rows, 25724 ms > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > >> +--+ > >> | count(r) | > >> +--+ > >> | 1468486 | > >> +--+ > >> 1 rows, 19763 ms > >> > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > >> +--+ > >> | count(r) | > >> +--+ > >> | 3472174 | > >> +--+ > >> 1 rows, 565448 ms > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > >> +--+ > >> | count(r) | > >> +--+ > >> | 3472174 | > >> +--+ > >> 1 rows, 337975 ms > >> > >> Any ideas on this? > >> Andrew > >> > >> On 07/06/2011 09:55 AM, Peter Neubauer wrote: > >>> Andrew, > >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in > >>> order to count the relationships of a node, not returning them: > >>> > >>> start n=(1) match (n)-[r]-(x) return count(r) > >>> > >>> and try that several times to see if cold caches are initially slowing > >>> down things. > >>> > >>> or something along these lines. In the LS and Neoclipse the output and > >>> visualization will be slow for that amount of data. > >>> > >>> Cheers, > >>> > >>> /peter neubauer > >>> > >>> GTalk: neubauer.peter > >>> Skype peter.neubauer > >>> Phone +46 704 106975 > >>> LinkedIn http://www.linkedin.com/in/neubauer > >>> Twitter http://twitter.com/peterneubauer > >>> > >>> http://www.neo4j.org - Your high performance graph > >> database. > >>> http://startupbootcamp.org/- Öresund - Innovation happens HERE. > >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing > party. > >>> > >>> > >>> > >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White > >> wrote: > I have a graph with roughly 10M nodes. Some of these nodes are highly > connected to other nodes. For example I may have a single node with > 1M+ > relationships. A good analogy is a population that has a "lives-in" > relationship to a state. Now the problem... > > Both neoclipse or neo4j-shell are terribly slow when working with > these > nodes. In the shell I would expect a `cd` to be very fast, > much like selecting via a rowid in a standard DB. Instead, I usually > see > several seconds delay. Doing a `ls` takes so long that I usually have > to > just kill the process. In fact `ls` never outputs anything which is > odd > since I would expect it to "stream" the output as it found it. I have > very similar performance issues with neoclipse. > > I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > Disclaimer, I am new to Neo4j. > > Thanks, > Andrew > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > >>> ___ > >>> Neo4j
Re: [Neo4j] Performance issue on nodes with lots of relationships
I use the shell as-is, but the messages.log is reporting... Physical mem: 3962MB, Heap size: 881MB My point is that if you ignore caching altogether, why did one run take 17x longer with only 2.4x more data? Considering this is a rather iterative algorithm, I don't see why you would even read a node or relationship more than once and thus a cache shouldn't matter at all. In this particular case, I can't imagine taking 9+ minutes to read a mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an artifact of Cypher in which it is building a set of Rs before applying `count` rather than making count accept an iterable stream. Andrew On 07/06/2011 11:33 PM, David Montag wrote: > Hi Andrew, > > How big is your configured Java heap? It could be that all the nodes and > relationships don't fit into the cache. > > David > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White wrote: > >> Here is some interesting stats to consider. First, I split my nodes into >> two groups, one node with 1.4M children and the other with 3.4M >> children. While I do see some cache warm-up improvements, the >> transversal doesn't seem to scale linearly; ie the larger super-node has >> 2.4x more children but takes 17x longer to transverse. >> >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) >> +--+ >> | count(r) | >> +--+ >> | 1468486 | >> +--+ >> 1 rows, 25724 ms >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) >> +--+ >> | count(r) | >> +--+ >> | 1468486 | >> +--+ >> 1 rows, 19763 ms >> >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) >> +--+ >> | count(r) | >> +--+ >> | 3472174 | >> +--+ >> 1 rows, 565448 ms >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) >> +--+ >> | count(r) | >> +--+ >> | 3472174 | >> +--+ >> 1 rows, 337975 ms >> >> Any ideas on this? >> Andrew >> >> On 07/06/2011 09:55 AM, Peter Neubauer wrote: >>> Andrew, >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in >>> order to count the relationships of a node, not returning them: >>> >>> start n=(1) match (n)-[r]-(x) return count(r) >>> >>> and try that several times to see if cold caches are initially slowing >>> down things. >>> >>> or something along these lines. In the LS and Neoclipse the output and >>> visualization will be slow for that amount of data. >>> >>> Cheers, >>> >>> /peter neubauer >>> >>> GTalk: neubauer.peter >>> Skype peter.neubauer >>> Phone +46 704 106975 >>> LinkedIn http://www.linkedin.com/in/neubauer >>> Twitter http://twitter.com/peterneubauer >>> >>> http://www.neo4j.org - Your high performance graph >> database. >>> http://startupbootcamp.org/- Öresund - Innovation happens HERE. >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. >>> >>> >>> >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White >> wrote: I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a "lives-in" relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cd` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doing a `ls` takes so long that I usually have to just kill the process. In fact `ls` never outputs anything which is odd since I would expect it to "stream" the output as it found it. I have very similar performance issues with neoclipse. I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. Disclaimer, I am new to Neo4j. Thanks, Andrew ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user >>> ___ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Hi Andrew, How big is your configured Java heap? It could be that all the nodes and relationships don't fit into the cache. David On Wed, Jul 6, 2011 at 8:03 PM, Andrew White wrote: > Here is some interesting stats to consider. First, I split my nodes into > two groups, one node with 1.4M children and the other with 3.4M > children. While I do see some cache warm-up improvements, the > transversal doesn't seem to scale linearly; ie the larger super-node has > 2.4x more children but takes 17x longer to transverse. > > neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > +--+ > | count(r) | > +--+ > | 1468486 | > +--+ > 1 rows, 25724 ms > neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > +--+ > | count(r) | > +--+ > | 1468486 | > +--+ > 1 rows, 19763 ms > > neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > +--+ > | count(r) | > +--+ > | 3472174 | > +--+ > 1 rows, 565448 ms > neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > +--+ > | count(r) | > +--+ > | 3472174 | > +--+ > 1 rows, 337975 ms > > Any ideas on this? > Andrew > > On 07/06/2011 09:55 AM, Peter Neubauer wrote: > > Andrew, > > if you upgrade to 1.4.M06, your shell should be able to do Cypher in > > order to count the relationships of a node, not returning them: > > > > start n=(1) match (n)-[r]-(x) return count(r) > > > > and try that several times to see if cold caches are initially slowing > > down things. > > > > or something along these lines. In the LS and Neoclipse the output and > > visualization will be slow for that amount of data. > > > > Cheers, > > > > /peter neubauer > > > > GTalk: neubauer.peter > > Skype peter.neubauer > > Phone +46 704 106975 > > LinkedIn http://www.linkedin.com/in/neubauer > > Twitter http://twitter.com/peterneubauer > > > > http://www.neo4j.org - Your high performance graph > database. > > http://startupbootcamp.org/- Öresund - Innovation happens HERE. > > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > > > > > On Wed, Jul 6, 2011 at 4:15 PM, Andrew White > wrote: > >> I have a graph with roughly 10M nodes. Some of these nodes are highly > >> connected to other nodes. For example I may have a single node with 1M+ > >> relationships. A good analogy is a population that has a "lives-in" > >> relationship to a state. Now the problem... > >> > >> Both neoclipse or neo4j-shell are terribly slow when working with these > >> nodes. In the shell I would expect a `cd` to be very fast, > >> much like selecting via a rowid in a standard DB. Instead, I usually see > >> several seconds delay. Doing a `ls` takes so long that I usually have to > >> just kill the process. In fact `ls` never outputs anything which is odd > >> since I would expect it to "stream" the output as it found it. I have > >> very similar performance issues with neoclipse. > >> > >> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > >> Disclaimer, I am new to Neo4j. > >> > >> Thanks, > >> Andrew > >> ___ > >> Neo4j mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- David Montag Neo Technology, www.neotechnology.com Cell: 650.556.4411 Skype: ddmontag ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Here is some interesting stats to consider. First, I split my nodes into two groups, one node with 1.4M children and the other with 3.4M children. While I do see some cache warm-up improvements, the transversal doesn't seem to scale linearly; ie the larger super-node has 2.4x more children but takes 17x longer to transverse. neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 25724 ms neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 1468486 | +--+ 1 rows, 19763 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 565448 ms neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) +--+ | count(r) | +--+ | 3472174 | +--+ 1 rows, 337975 ms Any ideas on this? Andrew On 07/06/2011 09:55 AM, Peter Neubauer wrote: > Andrew, > if you upgrade to 1.4.M06, your shell should be able to do Cypher in > order to count the relationships of a node, not returning them: > > start n=(1) match (n)-[r]-(x) return count(r) > > and try that several times to see if cold caches are initially slowing > down things. > > or something along these lines. In the LS and Neoclipse the output and > visualization will be slow for that amount of data. > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://startupbootcamp.org/- Öresund - Innovation happens HERE. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > On Wed, Jul 6, 2011 at 4:15 PM, Andrew White wrote: >> I have a graph with roughly 10M nodes. Some of these nodes are highly >> connected to other nodes. For example I may have a single node with 1M+ >> relationships. A good analogy is a population that has a "lives-in" >> relationship to a state. Now the problem... >> >> Both neoclipse or neo4j-shell are terribly slow when working with these >> nodes. In the shell I would expect a `cd` to be very fast, >> much like selecting via a rowid in a standard DB. Instead, I usually see >> several seconds delay. Doing a `ls` takes so long that I usually have to >> just kill the process. In fact `ls` never outputs anything which is odd >> since I would expect it to "stream" the output as it found it. I have >> very similar performance issues with neoclipse. >> >> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. >> Disclaimer, I am new to Neo4j. >> >> Thanks, >> Andrew >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
I just tested with 1.4.M06 and performance seems about the same. Also, only the supernodes are affected, the child nodes are very fast. On 07/06/2011 09:31 AM, Michael Hunger wrote: > Andrew, > > could you please also try to access the graph via the latest Milestone > 1.4.M06 to see if things have improved. > > Does this behaviour only effect the supernodes or every node in your graph > (e.g. when you access, cd, ls a person-node?) > > We've been discussing some changes to the initial loading/caching that might > improve performance on heavily connected (super-)nodes. > > If our changes and tests are successful these change will be integrated in > early 1.5. Milestones. > > Cheers > > Michael > > Am 06.07.2011 um 16:15 schrieb Andrew White: > >> I have a graph with roughly 10M nodes. Some of these nodes are highly >> connected to other nodes. For example I may have a single node with 1M+ >> relationships. A good analogy is a population that has a "lives-in" >> relationship to a state. Now the problem... >> >> Both neoclipse or neo4j-shell are terribly slow when working with these >> nodes. In the shell I would expect a `cd` to be very fast, >> much like selecting via a rowid in a standard DB. Instead, I usually see >> several seconds delay. Doing a `ls` takes so long that I usually have to >> just kill the process. In fact `ls` never outputs anything which is odd >> since I would expect it to "stream" the output as it found it. I have >> very similar performance issues with neoclipse. >> >> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. >> Disclaimer, I am new to Neo4j. >> >> Thanks, >> Andrew >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
I am on a standard filesystem (ext4). I haven't seen the issue again today so I wonder if it was a fluke. Andrew On 07/06/2011 12:29 PM, Paul Bandler wrote: >> Any hints on the memory map issue are welcomed too. > I experienced that on Solaris when I'd placed the db on a filesystem that > didn't support memory mapped I/o such as NFS > > Sent from my iPhone > > On 6 Jul 2011, at 17:48, Andrew White wrote: > >> Any >> hints on the memory map issue are welcomed too. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Just noticed that "ls" shell reads all relationships before displaying them... I'll fix this tomorrow. 2011/7/6 Mattias Persson > > > 2011/7/6 Jim Webber > >> Hi Rick, >> >> > Are you thinking maybe of lazily loading relationships in 1.5? That >> might be a huge boost. >> >> Added to the backlog to be discussed for inclusion in 1.5. >> > > Neo4j _is_ lazily loading relationships... and have done since before 1.0. > Maybe there's some issue with the shell only. > >> >> Jim >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Hacker, Neo Technology > www.neotechnology.com > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
2011/7/6 Jim Webber > Hi Rick, > > > Are you thinking maybe of lazily loading relationships in 1.5? That > might be a huge boost. > > Added to the backlog to be discussed for inclusion in 1.5. > Neo4j _is_ lazily loading relationships... and have done since before 1.0. Maybe there's some issue with the shell only. > > Jim > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Logs are attached. I am using the Sun 64bit HotSpot JVM (see logs). For this particular graph I simply have a single root reference node (0) and millions of nodes with a 1:1 relationship with the root. For all intents, this version of the graph is like a flat table with all elements sharing the same parent. This is the simplest graph I could construct that will eventually represent a sub graph in a more complex system. Some file sizes for the db store are... 43M neostore.nodestore.db 424M neostore.propertystore.db 193M neostore.propertystore.db.arrays 1.1K neostore.propertystore.db.index 1.1K neostore.propertystore.db.index.keys 238M neostore.propertystore.db.strings 156M neostore.relationshipstore.db 10 neostore.relationshiptypestore.db 129 neostore.relationshiptypestore.db.names Andrew On 07/06/2011 12:03 PM, Michael Hunger wrote: Ok, then it is checking the connectedness which actually really traverses all the relationships between the current and the target node. Could you share the whole messages.log file from that graph store? Which JVM are you running? If you can't share the db, could you please describe the structure of the graph, so which category of nodes has what number of (types of) relationships to which others? Also does your node 0 contain the many rels or the node with the id 1 ? Cheers Michael Am 06.07.2011 um 18:48 schrieb Andrew White: When using `cd -a` it is indeed very fast. As to the logs, those where from messages.log. Sharing the graph-db would be tough considering I am generating this graph off of several GB of data and my local upload is very limited. Any hints on the memory map issue are welcomed too. Thanks for all of your help so far. I am going to try/reply to the other recommendations in other e-mails soonish. Andrew On 07/06/2011 11:32 AM, Michael Hunger wrote: Andrew, can you by chance share you graph-db or perhaps your generator script? Then we could evaluate that and see where the performance hit occurs. Neo4j-shell checks the connectedness of the graph so that you can't get lost just while navigating. Could you try to use cd -a 1 (this does absolute jumps w/o checking connectedness). Are those logs you showed from neoclipse as well, or in messages.log in the graph-db directory? The "unable to memory map" sounds not so good, that shouldn't be a problem in Ubuntu. Cheers, Michael Am 06.07.2011 um 16:59 schrieb Andrew White: This is consistently slow. I made a graph which just goes off of the root reference node (0) and I am seeing the following... (0)$ cd 1 (1)$ cd 0 (0)$ cd 1 It's almost like it is scanning the entire relationship list before actually looking up the next node. Of note I have found the following when running neoclipse... WARNING: [/neostore.relationshipstore.db] Unable to memory map And I see this in the logs... neostore.nodestore.db.mapped_memory=20M neostore.propertystore.db.arrays.mapped_memory=130M neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=90M neostore.propertystore.db.strings.mapped_memory=130M neostore.relationshipstore.db.mapped_memory=100M Am I missing something obvious? Even without memory maps, I would expect this to be somewhat faster since reading 156MB (the size of my neostore.relationshipstore.db file) of relation data should be very fast. Also, is there anyway to do a pre-warm up so that the first hit isn't so slow? I would hate for my first user in PROD to get hammered because a cache wasn't warmed up. Thanks, Andrew On 07/06/2011 09:24 AM, Rick Bullotta wrote: Hi, Andrew. In general, this scenario (1 million+ relationships on a node) can be slow, but usually only the first time you access the node. If you're only accessing the node once in a session, then yes, it will seem sluggish. The Neoclipse issue is probably a combination of two issues: the first is lazily loading the node information the first time, and the second is the visual rendering of all those relationships. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Andrew White Sent: Wednesday, July 06, 2011 10:15 AM To: user@lists.neo4j.org Subject: [Neo4j] Performance issue on nodes with lots of relationships I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a "lives-in" relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cd` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doin
Re: [Neo4j] Performance issue on nodes with lots of relationships
>Any hints on the memory map issue are welcomed too. I experienced that on Solaris when I'd placed the db on a filesystem that didn't support memory mapped I/o such as NFS Sent from my iPhone On 6 Jul 2011, at 17:48, Andrew White wrote: > Any > hints on the memory map issue are welcomed too. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Ok, then it is checking the connectedness which actually really traverses all the relationships between the current and the target node. Could you share the whole messages.log file from that graph store? Which JVM are you running? If you can't share the db, could you please describe the structure of the graph, so which category of nodes has what number of (types of) relationships to which others? Also does your node 0 contain the many rels or the node with the id 1 ? Cheers Michael Am 06.07.2011 um 18:48 schrieb Andrew White: > When using `cd -a` it is indeed very fast. As to the logs, those where > from messages.log. > > Sharing the graph-db would be tough considering I am generating this > graph off of several GB of data and my local upload is very limited. Any > hints on the memory map issue are welcomed too. > > Thanks for all of your help so far. I am going to try/reply to the other > recommendations in other e-mails soonish. > > Andrew > > On 07/06/2011 11:32 AM, Michael Hunger wrote: >> Andrew, >> >> can you by chance share you graph-db or perhaps your generator script? Then >> we could evaluate that and see where the performance hit occurs. >> >> Neo4j-shell checks the connectedness of the graph so that you can't get lost >> just while navigating. >> >> Could you try to use cd -a 1 (this does absolute jumps w/o checking >> connectedness). >> >> Are those logs you showed from neoclipse as well, or in messages.log in the >> graph-db directory? >> >> The "unable to memory map" sounds not so good, that shouldn't be a problem >> in Ubuntu. >> >> Cheers, >> >> Michael >> >> Am 06.07.2011 um 16:59 schrieb Andrew White: >> >>> This is consistently slow. I made a graph which just goes off of the >>> root reference node (0) and I am seeing the following... >>> >>>(0)$ cd 1 >>>(1)$ cd 0 >>>(0)$ cd 1 >>> >>> >>> It's almost like it is scanning the entire relationship list before >>> actually looking up the next node. Of note I have found the following >>> when running neoclipse... >>> >>>WARNING: [/neostore.relationshipstore.db] Unable >>>to memory map >>> >>> >>> And I see this in the logs... >>> >>>neostore.nodestore.db.mapped_memory=20M >>>neostore.propertystore.db.arrays.mapped_memory=130M >>>neostore.propertystore.db.index.keys.mapped_memory=1M >>>neostore.propertystore.db.index.mapped_memory=1M >>>neostore.propertystore.db.mapped_memory=90M >>>neostore.propertystore.db.strings.mapped_memory=130M >>>neostore.relationshipstore.db.mapped_memory=100M >>> >>> Am I missing something obvious? Even without memory maps, I would expect >>> this to be somewhat faster since reading 156MB (the size of my >>> neostore.relationshipstore.db file) of relation data should be very >>> fast. Also, is there anyway to do a pre-warm up so that the first hit >>> isn't so slow? I would hate for my first user in PROD to get hammered >>> because a cache wasn't warmed up. >>> >>> Thanks, >>> Andrew >>> >>> >>> On 07/06/2011 09:24 AM, Rick Bullotta wrote: >>>> Hi, Andrew. >>>> >>>> In general, this scenario (1 million+ relationships on a node) can be >>>> slow, but usually only the first time you access the node. If you're only >>>> accessing the node once in a session, then yes, it will seem sluggish. >>>> The Neoclipse issue is probably a combination of two issues: the first is >>>> lazily loading the node information the first time, and the second is the >>>> visual rendering of all those relationships. >>>> >>>> Rick >>>> >>>> -Original Message- >>>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] >>>> On Behalf Of Andrew White >>>> Sent: Wednesday, July 06, 2011 10:15 AM >>>> To: user@lists.neo4j.org >>>> Subject: [Neo4j] Performance issue on nodes with lots of relationships >>>> >>>> I have a graph with roughly 10M nodes. Some of these nodes are highly >>>> connected to other nodes. For example I may have a single node with 1M+ >>>> relationships. A good analogy is a population that has a "lives-in" >>>> relation
Re: [Neo4j] Performance issue on nodes with lots of relationships
When using `cd -a` it is indeed very fast. As to the logs, those where from messages.log. Sharing the graph-db would be tough considering I am generating this graph off of several GB of data and my local upload is very limited. Any hints on the memory map issue are welcomed too. Thanks for all of your help so far. I am going to try/reply to the other recommendations in other e-mails soonish. Andrew On 07/06/2011 11:32 AM, Michael Hunger wrote: > Andrew, > > can you by chance share you graph-db or perhaps your generator script? Then > we could evaluate that and see where the performance hit occurs. > > Neo4j-shell checks the connectedness of the graph so that you can't get lost > just while navigating. > > Could you try to use cd -a 1 (this does absolute jumps w/o checking > connectedness). > > Are those logs you showed from neoclipse as well, or in messages.log in the > graph-db directory? > > The "unable to memory map" sounds not so good, that shouldn't be a problem in > Ubuntu. > > Cheers, > > Michael > > Am 06.07.2011 um 16:59 schrieb Andrew White: > >> This is consistently slow. I made a graph which just goes off of the >> root reference node (0) and I am seeing the following... >> >> (0)$ cd 1 >> (1)$ cd 0 >> (0)$ cd 1 >> >> >> It's almost like it is scanning the entire relationship list before >> actually looking up the next node. Of note I have found the following >> when running neoclipse... >> >> WARNING: [/neostore.relationshipstore.db] Unable >> to memory map >> >> >> And I see this in the logs... >> >> neostore.nodestore.db.mapped_memory=20M >> neostore.propertystore.db.arrays.mapped_memory=130M >> neostore.propertystore.db.index.keys.mapped_memory=1M >> neostore.propertystore.db.index.mapped_memory=1M >> neostore.propertystore.db.mapped_memory=90M >> neostore.propertystore.db.strings.mapped_memory=130M >> neostore.relationshipstore.db.mapped_memory=100M >> >> Am I missing something obvious? Even without memory maps, I would expect >> this to be somewhat faster since reading 156MB (the size of my >> neostore.relationshipstore.db file) of relation data should be very >> fast. Also, is there anyway to do a pre-warm up so that the first hit >> isn't so slow? I would hate for my first user in PROD to get hammered >> because a cache wasn't warmed up. >> >> Thanks, >> Andrew >> >> >> On 07/06/2011 09:24 AM, Rick Bullotta wrote: >>> Hi, Andrew. >>> >>> In general, this scenario (1 million+ relationships on a node) can be slow, >>> but usually only the first time you access the node. If you're only >>> accessing the node once in a session, then yes, it will seem sluggish. The >>> Neoclipse issue is probably a combination of two issues: the first is >>> lazily loading the node information the first time, and the second is the >>> visual rendering of all those relationships. >>> >>> Rick >>> >>> -Original Message- >>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On >>> Behalf Of Andrew White >>> Sent: Wednesday, July 06, 2011 10:15 AM >>> To: user@lists.neo4j.org >>> Subject: [Neo4j] Performance issue on nodes with lots of relationships >>> >>> I have a graph with roughly 10M nodes. Some of these nodes are highly >>> connected to other nodes. For example I may have a single node with 1M+ >>> relationships. A good analogy is a population that has a "lives-in" >>> relationship to a state. Now the problem... >>> >>> Both neoclipse or neo4j-shell are terribly slow when working with these >>> nodes. In the shell I would expect a `cd` to be very fast, >>> much like selecting via a rowid in a standard DB. Instead, I usually see >>> several seconds delay. Doing a `ls` takes so long that I usually have to >>> just kill the process. In fact `ls` never outputs anything which is odd >>> since I would expect it to "stream" the output as it found it. I have >>> very similar performance issues with neoclipse. >>> >>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. >>> Disclaimer, I am new to Neo4j. >>> >>> Thanks, >>> Andrew >>> ___ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> ___ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Andrew, can you by chance share you graph-db or perhaps your generator script? Then we could evaluate that and see where the performance hit occurs. Neo4j-shell checks the connectedness of the graph so that you can't get lost just while navigating. Could you try to use cd -a 1 (this does absolute jumps w/o checking connectedness). Are those logs you showed from neoclipse as well, or in messages.log in the graph-db directory? The "unable to memory map" sounds not so good, that shouldn't be a problem in Ubuntu. Cheers, Michael Am 06.07.2011 um 16:59 schrieb Andrew White: > This is consistently slow. I made a graph which just goes off of the > root reference node (0) and I am seeing the following... > >(0)$ cd 1 >(1)$ cd 0 >(0)$ cd 1 > > > It's almost like it is scanning the entire relationship list before > actually looking up the next node. Of note I have found the following > when running neoclipse... > >WARNING: [/neostore.relationshipstore.db] Unable >to memory map > > > And I see this in the logs... > >neostore.nodestore.db.mapped_memory=20M >neostore.propertystore.db.arrays.mapped_memory=130M >neostore.propertystore.db.index.keys.mapped_memory=1M >neostore.propertystore.db.index.mapped_memory=1M >neostore.propertystore.db.mapped_memory=90M >neostore.propertystore.db.strings.mapped_memory=130M >neostore.relationshipstore.db.mapped_memory=100M > > Am I missing something obvious? Even without memory maps, I would expect > this to be somewhat faster since reading 156MB (the size of my > neostore.relationshipstore.db file) of relation data should be very > fast. Also, is there anyway to do a pre-warm up so that the first hit > isn't so slow? I would hate for my first user in PROD to get hammered > because a cache wasn't warmed up. > > Thanks, > Andrew > > > On 07/06/2011 09:24 AM, Rick Bullotta wrote: >> Hi, Andrew. >> >> In general, this scenario (1 million+ relationships on a node) can be slow, >> but usually only the first time you access the node. If you're only >> accessing the node once in a session, then yes, it will seem sluggish. The >> Neoclipse issue is probably a combination of two issues: the first is lazily >> loading the node information the first time, and the second is the visual >> rendering of all those relationships. >> >> Rick >> >> -Original Message----- >> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On >> Behalf Of Andrew White >> Sent: Wednesday, July 06, 2011 10:15 AM >> To: user@lists.neo4j.org >> Subject: [Neo4j] Performance issue on nodes with lots of relationships >> >> I have a graph with roughly 10M nodes. Some of these nodes are highly >> connected to other nodes. For example I may have a single node with 1M+ >> relationships. A good analogy is a population that has a "lives-in" >> relationship to a state. Now the problem... >> >> Both neoclipse or neo4j-shell are terribly slow when working with these >> nodes. In the shell I would expect a `cd` to be very fast, >> much like selecting via a rowid in a standard DB. Instead, I usually see >> several seconds delay. Doing a `ls` takes so long that I usually have to >> just kill the process. In fact `ls` never outputs anything which is odd >> since I would expect it to "stream" the output as it found it. I have >> very similar performance issues with neoclipse. >> >> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. >> Disclaimer, I am new to Neo4j. >> >> Thanks, >> Andrew >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Hi Rick, > Are you thinking maybe of lazily loading relationships in 1.5? That might be > a huge boost. Added to the backlog to be discussed for inclusion in 1.5. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
This is consistently slow. I made a graph which just goes off of the root reference node (0) and I am seeing the following... (0)$ cd 1 (1)$ cd 0 (0)$ cd 1 It's almost like it is scanning the entire relationship list before actually looking up the next node. Of note I have found the following when running neoclipse... WARNING: [/neostore.relationshipstore.db] Unable to memory map And I see this in the logs... neostore.nodestore.db.mapped_memory=20M neostore.propertystore.db.arrays.mapped_memory=130M neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.mapped_memory=90M neostore.propertystore.db.strings.mapped_memory=130M neostore.relationshipstore.db.mapped_memory=100M Am I missing something obvious? Even without memory maps, I would expect this to be somewhat faster since reading 156MB (the size of my neostore.relationshipstore.db file) of relation data should be very fast. Also, is there anyway to do a pre-warm up so that the first hit isn't so slow? I would hate for my first user in PROD to get hammered because a cache wasn't warmed up. Thanks, Andrew On 07/06/2011 09:24 AM, Rick Bullotta wrote: > Hi, Andrew. > > In general, this scenario (1 million+ relationships on a node) can be slow, > but usually only the first time you access the node. If you're only > accessing the node once in a session, then yes, it will seem sluggish. The > Neoclipse issue is probably a combination of two issues: the first is lazily > loading the node information the first time, and the second is the visual > rendering of all those relationships. > > Rick > > -Original Message- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On > Behalf Of Andrew White > Sent: Wednesday, July 06, 2011 10:15 AM > To: user@lists.neo4j.org > Subject: [Neo4j] Performance issue on nodes with lots of relationships > > I have a graph with roughly 10M nodes. Some of these nodes are highly > connected to other nodes. For example I may have a single node with 1M+ > relationships. A good analogy is a population that has a "lives-in" > relationship to a state. Now the problem... > > Both neoclipse or neo4j-shell are terribly slow when working with these > nodes. In the shell I would expect a `cd` to be very fast, > much like selecting via a rowid in a standard DB. Instead, I usually see > several seconds delay. Doing a `ls` takes so long that I usually have to > just kill the process. In fact `ls` never outputs anything which is odd > since I would expect it to "stream" the output as it found it. I have > very similar performance issues with neoclipse. > > I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > Disclaimer, I am new to Neo4j. > > Thanks, > Andrew > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Andrew, if you upgrade to 1.4.M06, your shell should be able to do Cypher in order to count the relationships of a node, not returning them: start n=(1) match (n)-[r]-(x) return count(r) and try that several times to see if cold caches are initially slowing down things. or something along these lines. In the LS and Neoclipse the output and visualization will be slow for that amount of data. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 6, 2011 at 4:15 PM, Andrew White wrote: > I have a graph with roughly 10M nodes. Some of these nodes are highly > connected to other nodes. For example I may have a single node with 1M+ > relationships. A good analogy is a population that has a "lives-in" > relationship to a state. Now the problem... > > Both neoclipse or neo4j-shell are terribly slow when working with these > nodes. In the shell I would expect a `cd ` to be very fast, > much like selecting via a rowid in a standard DB. Instead, I usually see > several seconds delay. Doing a `ls` takes so long that I usually have to > just kill the process. In fact `ls` never outputs anything which is odd > since I would expect it to "stream" the output as it found it. I have > very similar performance issues with neoclipse. > > I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > Disclaimer, I am new to Neo4j. > > Thanks, > Andrew > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Hi, Michael. Are you thinking maybe of lazily loading relationships in 1.5? That might be a huge boost. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger Sent: Wednesday, July 06, 2011 10:32 AM To: Neo4j user discussions Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships Andrew, could you please also try to access the graph via the latest Milestone 1.4.M06 to see if things have improved. Does this behaviour only effect the supernodes or every node in your graph (e.g. when you access, cd, ls a person-node?) We've been discussing some changes to the initial loading/caching that might improve performance on heavily connected (super-)nodes. If our changes and tests are successful these change will be integrated in early 1.5. Milestones. Cheers Michael Am 06.07.2011 um 16:15 schrieb Andrew White: > I have a graph with roughly 10M nodes. Some of these nodes are highly > connected to other nodes. For example I may have a single node with 1M+ > relationships. A good analogy is a population that has a "lives-in" > relationship to a state. Now the problem... > > Both neoclipse or neo4j-shell are terribly slow when working with these > nodes. In the shell I would expect a `cd ` to be very fast, > much like selecting via a rowid in a standard DB. Instead, I usually see > several seconds delay. Doing a `ls` takes so long that I usually have to > just kill the process. In fact `ls` never outputs anything which is odd > since I would expect it to "stream" the output as it found it. I have > very similar performance issues with neoclipse. > > I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > Disclaimer, I am new to Neo4j. > > Thanks, > Andrew > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Andrew, could you please also try to access the graph via the latest Milestone 1.4.M06 to see if things have improved. Does this behaviour only effect the supernodes or every node in your graph (e.g. when you access, cd, ls a person-node?) We've been discussing some changes to the initial loading/caching that might improve performance on heavily connected (super-)nodes. If our changes and tests are successful these change will be integrated in early 1.5. Milestones. Cheers Michael Am 06.07.2011 um 16:15 schrieb Andrew White: > I have a graph with roughly 10M nodes. Some of these nodes are highly > connected to other nodes. For example I may have a single node with 1M+ > relationships. A good analogy is a population that has a "lives-in" > relationship to a state. Now the problem... > > Both neoclipse or neo4j-shell are terribly slow when working with these > nodes. In the shell I would expect a `cd ` to be very fast, > much like selecting via a rowid in a standard DB. Instead, I usually see > several seconds delay. Doing a `ls` takes so long that I usually have to > just kill the process. In fact `ls` never outputs anything which is odd > since I would expect it to "stream" the output as it found it. I have > very similar performance issues with neoclipse. > > I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > Disclaimer, I am new to Neo4j. > > Thanks, > Andrew > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance issue on nodes with lots of relationships
Hi, Andrew. In general, this scenario (1 million+ relationships on a node) can be slow, but usually only the first time you access the node. If you're only accessing the node once in a session, then yes, it will seem sluggish. The Neoclipse issue is probably a combination of two issues: the first is lazily loading the node information the first time, and the second is the visual rendering of all those relationships. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Andrew White Sent: Wednesday, July 06, 2011 10:15 AM To: user@lists.neo4j.org Subject: [Neo4j] Performance issue on nodes with lots of relationships I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a "lives-in" relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cd ` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doing a `ls` takes so long that I usually have to just kill the process. In fact `ls` never outputs anything which is odd since I would expect it to "stream" the output as it found it. I have very similar performance issues with neoclipse. I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. Disclaimer, I am new to Neo4j. Thanks, Andrew ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Performance issue on nodes with lots of relationships
I have a graph with roughly 10M nodes. Some of these nodes are highly connected to other nodes. For example I may have a single node with 1M+ relationships. A good analogy is a population that has a "lives-in" relationship to a state. Now the problem... Both neoclipse or neo4j-shell are terribly slow when working with these nodes. In the shell I would expect a `cd ` to be very fast, much like selecting via a rowid in a standard DB. Instead, I usually see several seconds delay. Doing a `ls` takes so long that I usually have to just kill the process. In fact `ls` never outputs anything which is odd since I would expect it to "stream" the output as it found it. I have very similar performance issues with neoclipse. I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. Disclaimer, I am new to Neo4j. Thanks, Andrew ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user