Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Peter Neubauer
Niels,
that sounds fantastic, great work everyone so far!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Fri, Jul 8, 2011 at 1:27 AM, Niels Hoogeveen
 wrote:
>
> I did a write up on indexed relationships in the Git repo: 
> https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships
> A performance comparison would indeed be great. Anecdotally, I have witnessed 
> the difference when trying to load all entries of Dbpedia. With 2.5 G heap 
> space, loading becomes problematic after some 70,000 relationships have been 
> added to the supernode. With the indexed relationship no such problems arise 
> and 1.6 million relationships are easily created without  performance 
> degradation.
> Having real performance figures would be nice though.
> Niels
>
>> From: michael.hun...@neotechnology.com
>> Date: Thu, 7 Jul 2011 22:56:17 +0200
>> To: user@lists.neo4j.org
>> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
>>
>> Niels could you perhaps write up a blog post detailing the usage (also for 
>> your own scenario and how that solution would compare to the naive 
>> supernodes with just millions of relationships.
>>
>> Also I'd like to see a performance comparision of both approaches.
>>
>> Thanks so much for your work
>>
>> Michael
>>
>> Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen:
>>
>> >
>> > I am glad to see a solution will be provided at the core level.
>> > Today, I pushed IndexedRelationships and IndexedRelationshipExpander to 
>> > Git, see: 
>> > https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship
>> > This provides a solution to the issue, but is certainly not as fast as a 
>> > solution in core would be.
>> > However, it does solve my issues and as a bonus, indexed relationships can 
>> > be traversed in sorted order,this is especially pleasant, since I usually 
>> > want to know only the recent additions of dense relationships.
>> > Niels
>> >
>> >
>> >> Date: Thu, 7 Jul 2011 21:37:26 +0200
>> >> From: matt...@neotechnology.com
>> >> To: user@lists.neo4j.org
>> >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
>> >>
>> >> 2011/7/7 Agelos Pikoulas 
>> >>
>> >>> I think its the same problem pattern that been in discussion lately with
>> >>> dense nodes or supernodes (check
>> >>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
>> >>>
>> >>> Michael Hunger has provided a quick solution to visiting the *few*
>> >>> RelationshipTypes on a node that has *millions* of others, utilizing a
>> >>> RelationshipExpander with an Index (check
>> >>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)
>> >>>
>> >>> Ideally this would be abstracted & implemented in the core distribution 
>> >>> so
>> >>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
>> >>> efficiently...
>> >>>
>> >>
>> >> Yes, I'm positive that something will be done on a core level to make
>> >> getting relationships of a specific type regardless of the total number of
>> >> relationships fast. In the foreseeable future hopefully.
>> >>
>> >>>
>> >>> Agelos
>> >>>
>> >>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White 
>> >>> wrote:
>> >>>
>> >>>> I use the shell as-is, but the messages.log is reporting...
>> >>>>
>> >>>>    Physical mem: 3962MB, Heap size: 881MB
>> >>>>
>> >>>> My point is that if you ignore caching altogether, why did one run take
>> >>>> 17x longer with only 2.4x more data? Considering this is a rather
>> >>>> iterative algorithm, I don't see why you would even read a node or
>> >>>> relationship more than once and thus a cache shouldn't matter at all.
>> >>>>
>> >>>> In this particular 

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Niels Hoogeveen

I did a write up on indexed relationships in the Git repo: 
https://github.com/peterneubauer/graph-collections/wiki/Indexed-relationships
A performance comparison would indeed be great. Anecdotally, I have witnessed 
the difference when trying to load all entries of Dbpedia. With 2.5 G heap 
space, loading becomes problematic after some 70,000 relationships have been 
added to the supernode. With the indexed relationship no such problems arise 
and 1.6 million relationships are easily created without  performance 
degradation. 
Having real performance figures would be nice though.
Niels

> From: michael.hun...@neotechnology.com
> Date: Thu, 7 Jul 2011 22:56:17 +0200
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
> 
> Niels could you perhaps write up a blog post detailing the usage (also for 
> your own scenario and how that solution would compare to the naive supernodes 
> with just millions of relationships.
> 
> Also I'd like to see a performance comparision of both approaches.
> 
> Thanks so much for your work
> 
> Michael
> 
> Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen:
> 
> > 
> > I am glad to see a solution will be provided at the core level. 
> > Today, I pushed IndexedRelationships and IndexedRelationshipExpander to 
> > Git, see: 
> > https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship
> > This provides a solution to the issue, but is certainly not as fast as a 
> > solution in core would be. 
> > However, it does solve my issues and as a bonus, indexed relationships can 
> > be traversed in sorted order,this is especially pleasant, since I usually 
> > want to know only the recent additions of dense relationships.
> > Niels
> > 
> > 
> >> Date: Thu, 7 Jul 2011 21:37:26 +0200
> >> From: matt...@neotechnology.com
> >> To: user@lists.neo4j.org
> >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
> >> 
> >> 2011/7/7 Agelos Pikoulas 
> >> 
> >>> I think its the same problem pattern that been in discussion lately with
> >>> dense nodes or supernodes (check
> >>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
> >>> 
> >>> Michael Hunger has provided a quick solution to visiting the *few*
> >>> RelationshipTypes on a node that has *millions* of others, utilizing a
> >>> RelationshipExpander with an Index (check
> >>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)
> >>> 
> >>> Ideally this would be abstracted & implemented in the core distribution so
> >>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
> >>> efficiently...
> >>> 
> >> 
> >> Yes, I'm positive that something will be done on a core level to make
> >> getting relationships of a specific type regardless of the total number of
> >> relationships fast. In the foreseeable future hopefully.
> >> 
> >>> 
> >>> Agelos
> >>> 
> >>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White 
> >>> wrote:
> >>> 
> >>>> I use the shell as-is, but the messages.log is reporting...
> >>>> 
> >>>>Physical mem: 3962MB, Heap size: 881MB
> >>>> 
> >>>> My point is that if you ignore caching altogether, why did one run take
> >>>> 17x longer with only 2.4x more data? Considering this is a rather
> >>>> iterative algorithm, I don't see why you would even read a node or
> >>>> relationship more than once and thus a cache shouldn't matter at all.
> >>>> 
> >>>> In this particular case, I can't imagine taking 9+ minutes to read a
> >>>> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
> >>>> artifact of Cypher in which it is building a set of Rs before applying
> >>>> `count` rather than making count accept an iterable stream.
> >>>> 
> >>>> Andrew
> >>>> 
> >>>> On 07/06/2011 11:33 PM, David Montag wrote:
> >>>>> Hi Andrew,
> >>>>> 
> >>>>> How big is your configured Java heap? It could be that all the nodes
> >>> and
> >>>>> relationships don't fit into the cache.
> >>>>> 
> >>>>> David
> >>>>> 
> >>>>> On Wed, Jul 6, 2011 at 8:

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Michael Hunger
Niels could you perhaps write up a blog post detailing the usage (also for your 
own scenario and how that solution would compare to the naive supernodes with 
just millions of relationships.

Also I'd like to see a performance comparision of both approaches.

Thanks so much for your work

Michael

Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen:

> 
> I am glad to see a solution will be provided at the core level. 
> Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, 
> see: 
> https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship
> This provides a solution to the issue, but is certainly not as fast as a 
> solution in core would be. 
> However, it does solve my issues and as a bonus, indexed relationships can be 
> traversed in sorted order,this is especially pleasant, since I usually want 
> to know only the recent additions of dense relationships.
> Niels
> 
> 
>> Date: Thu, 7 Jul 2011 21:37:26 +0200
>> From: matt...@neotechnology.com
>> To: user@lists.neo4j.org
>> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
>> 
>> 2011/7/7 Agelos Pikoulas 
>> 
>>> I think its the same problem pattern that been in discussion lately with
>>> dense nodes or supernodes (check
>>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
>>> 
>>> Michael Hunger has provided a quick solution to visiting the *few*
>>> RelationshipTypes on a node that has *millions* of others, utilizing a
>>> RelationshipExpander with an Index (check
>>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)
>>> 
>>> Ideally this would be abstracted & implemented in the core distribution so
>>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
>>> efficiently...
>>> 
>> 
>> Yes, I'm positive that something will be done on a core level to make
>> getting relationships of a specific type regardless of the total number of
>> relationships fast. In the foreseeable future hopefully.
>> 
>>> 
>>> Agelos
>>> 
>>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White 
>>> wrote:
>>> 
>>>> I use the shell as-is, but the messages.log is reporting...
>>>> 
>>>>Physical mem: 3962MB, Heap size: 881MB
>>>> 
>>>> My point is that if you ignore caching altogether, why did one run take
>>>> 17x longer with only 2.4x more data? Considering this is a rather
>>>> iterative algorithm, I don't see why you would even read a node or
>>>> relationship more than once and thus a cache shouldn't matter at all.
>>>> 
>>>> In this particular case, I can't imagine taking 9+ minutes to read a
>>>> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
>>>> artifact of Cypher in which it is building a set of Rs before applying
>>>> `count` rather than making count accept an iterable stream.
>>>> 
>>>> Andrew
>>>> 
>>>> On 07/06/2011 11:33 PM, David Montag wrote:
>>>>> Hi Andrew,
>>>>> 
>>>>> How big is your configured Java heap? It could be that all the nodes
>>> and
>>>>> relationships don't fit into the cache.
>>>>> 
>>>>> David
>>>>> 
>>>>> On Wed, Jul 6, 2011 at 8:03 PM, Andrew White
>>>> wrote:
>>>>> 
>>>>>> Here is some interesting stats to consider. First, I split my nodes
>>> into
>>>>>> two groups, one node with 1.4M children and the other with 3.4M
>>>>>> children. While I do see some cache warm-up improvements, the
>>>>>> transversal doesn't seem to scale linearly; ie the larger super-node
>>> has
>>>>>> 2.4x more children but takes 17x longer to transverse.
>>>>>> 
>>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>>>>>> +--+
>>>>>> | count(r) |
>>>>>> +--+
>>>>>> | 1468486  |
>>>>>> +--+
>>>>>> 1 rows, 25724 ms
>>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>>>>>> +--+
>>>>>> | count(r) |
>>>>>> +--+
>>>>>> | 1468486  |
>>>>>> +--+
>>>>>> 1 rows, 19763 ms
&

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Niels Hoogeveen

I am glad to see a solution will be provided at the core level. 
Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, 
see: 
https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship
This provides a solution to the issue, but is certainly not as fast as a 
solution in core would be. 
However, it does solve my issues and as a bonus, indexed relationships can be 
traversed in sorted order,this is especially pleasant, since I usually want to 
know only the recent additions of dense relationships.
Niels


> Date: Thu, 7 Jul 2011 21:37:26 +0200
> From: matt...@neotechnology.com
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships
> 
> 2011/7/7 Agelos Pikoulas 
> 
> > I think its the same problem pattern that been in discussion lately with
> > dense nodes or supernodes (check
> > http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
> >
> > Michael Hunger has provided a quick solution to visiting the *few*
> > RelationshipTypes on a node that has *millions* of others, utilizing a
> > RelationshipExpander with an Index (check
> > http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)
> >
> > Ideally this would be abstracted & implemented in the core distribution so
> > that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
> > efficiently...
> >
> 
> Yes, I'm positive that something will be done on a core level to make
> getting relationships of a specific type regardless of the total number of
> relationships fast. In the foreseeable future hopefully.
> 
> >
> > Agelos
> >
> > On Thu, Jul 7, 2011 at 3:16 PM, Andrew White 
> > wrote:
> >
> > > I use the shell as-is, but the messages.log is reporting...
> > >
> > > Physical mem: 3962MB, Heap size: 881MB
> > >
> > > My point is that if you ignore caching altogether, why did one run take
> > > 17x longer with only 2.4x more data? Considering this is a rather
> > > iterative algorithm, I don't see why you would even read a node or
> > > relationship more than once and thus a cache shouldn't matter at all.
> > >
> > > In this particular case, I can't imagine taking 9+ minutes to read a
> > > mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
> > > artifact of Cypher in which it is building a set of Rs before applying
> > > `count` rather than making count accept an iterable stream.
> > >
> > > Andrew
> > >
> > > On 07/06/2011 11:33 PM, David Montag wrote:
> > > > Hi Andrew,
> > > >
> > > > How big is your configured Java heap? It could be that all the nodes
> > and
> > > > relationships don't fit into the cache.
> > > >
> > > > David
> > > >
> > > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White
> > >  wrote:
> > > >
> > > >> Here is some interesting stats to consider. First, I split my nodes
> > into
> > > >> two groups, one node with 1.4M children and the other with 3.4M
> > > >> children. While I do see some cache warm-up improvements, the
> > > >> transversal doesn't seem to scale linearly; ie the larger super-node
> > has
> > > >> 2.4x more children but takes 17x longer to transverse.
> > > >>
> > > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> > > >> +--+
> > > >> | count(r) |
> > > >> +--+
> > > >> | 1468486  |
> > > >> +--+
> > > >> 1 rows, 25724 ms
> > > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> > > >> +--+
> > > >> | count(r) |
> > > >> +--+
> > > >> | 1468486  |
> > > >> +--+
> > > >> 1 rows, 19763 ms
> > > >>
> > > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> > > >> +--+
> > > >> | count(r) |
> > > >> +--+
> > > >> | 3472174  |
> > > >> +--+
> > > >> 1 rows, 565448 ms
> > > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> > > >> +--+
> > > >> | count(r) |
> > > >> +--+
> > > >> | 3472174  |
> > > >> +--+
> > > >> 

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Mattias Persson
2011/7/7 Agelos Pikoulas 

> I think its the same problem pattern that been in discussion lately with
> dense nodes or supernodes (check
> http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
>
> Michael Hunger has provided a quick solution to visiting the *few*
> RelationshipTypes on a node that has *millions* of others, utilizing a
> RelationshipExpander with an Index (check
> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)
>
> Ideally this would be abstracted & implemented in the core distribution so
> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
> efficiently...
>

Yes, I'm positive that something will be done on a core level to make
getting relationships of a specific type regardless of the total number of
relationships fast. In the foreseeable future hopefully.

>
> Agelos
>
> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White 
> wrote:
>
> > I use the shell as-is, but the messages.log is reporting...
> >
> > Physical mem: 3962MB, Heap size: 881MB
> >
> > My point is that if you ignore caching altogether, why did one run take
> > 17x longer with only 2.4x more data? Considering this is a rather
> > iterative algorithm, I don't see why you would even read a node or
> > relationship more than once and thus a cache shouldn't matter at all.
> >
> > In this particular case, I can't imagine taking 9+ minutes to read a
> > mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
> > artifact of Cypher in which it is building a set of Rs before applying
> > `count` rather than making count accept an iterable stream.
> >
> > Andrew
> >
> > On 07/06/2011 11:33 PM, David Montag wrote:
> > > Hi Andrew,
> > >
> > > How big is your configured Java heap? It could be that all the nodes
> and
> > > relationships don't fit into the cache.
> > >
> > > David
> > >
> > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White
> >  wrote:
> > >
> > >> Here is some interesting stats to consider. First, I split my nodes
> into
> > >> two groups, one node with 1.4M children and the other with 3.4M
> > >> children. While I do see some cache warm-up improvements, the
> > >> transversal doesn't seem to scale linearly; ie the larger super-node
> has
> > >> 2.4x more children but takes 17x longer to transverse.
> > >>
> > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> > >> +--+
> > >> | count(r) |
> > >> +--+
> > >> | 1468486  |
> > >> +--+
> > >> 1 rows, 25724 ms
> > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> > >> +--+
> > >> | count(r) |
> > >> +--+
> > >> | 1468486  |
> > >> +--+
> > >> 1 rows, 19763 ms
> > >>
> > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> > >> +--+
> > >> | count(r) |
> > >> +--+
> > >> | 3472174  |
> > >> +--+
> > >> 1 rows, 565448 ms
> > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> > >> +--+
> > >> | count(r) |
> > >> +--+
> > >> | 3472174  |
> > >> +--+
> > >> 1 rows, 337975 ms
> > >>
> > >> Any ideas on this?
> > >> Andrew
> > >>
> > >> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
> > >>> Andrew,
> > >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
> > >>> order to count the relationships of a node, not returning them:
> > >>>
> > >>> start n=(1) match (n)-[r]-(x) return count(r)
> > >>>
> > >>> and try that several times to see if cold caches are initially
> slowing
> > >>> down things.
> > >>>
> > >>> or something along these lines. In the LS and Neoclipse the output
> and
> > >>> visualization will be slow for that amount of data.
> > >>>
> > >>> Cheers,
> > >>>
> > >>> /peter neubauer
> > >>>
> > >>> GTalk:  neubauer.peter
> > >>> Skype   peter.neubauer
> > >>> Phone   +46 704 106975
> > >>> LinkedIn   http://www.linkedin.com/in/neubauer
> > >>> Twitter  http://twitter.com/peterneubauer
> > >>>
> > >>> http://www.neo4j.org   - Your high performance graph
> > >> database.
> > >>> http://startupbootcamp.org/- Öresund - Innovation happens HERE.
> > >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
> > party.
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White
> > >>   wrote:
> >  I have a graph with roughly 10M nodes. Some of these nodes are
> highly
> >  connected to other nodes. For example I may have a single node with
> > 1M+
> >  relationships. A good analogy is a population that has a  "lives-in"
> >  relationship to a state. Now the problem...
> > 
> >  Both neoclipse or neo4j-shell are terribly slow when working with
> > these
> >  nodes. In the shell I would expect a `cd` to be very fast,
> >  much like selecting via a rowid in a standard DB. Instead, I usually
> > see
> >  several seconds delay. Doing a `ls` takes so long that I usually
> have
> > to
> >  just kill the process. In fact `ls` never outputs anything which is
> > odd
> >  sin

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Agelos Pikoulas
I think its the same problem pattern that been in discussion lately with
dense nodes or supernodes (check
http://lists.neo4j.org/pipermail/user/2011-July/009832.html).

Michael Hunger has provided a quick solution to visiting the *few*
RelationshipTypes on a node that has *millions* of others, utilizing a
RelationshipExpander with an Index (check
http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/)

Ideally this would be abstracted & implemented in the core distribution so
that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it
efficiently...

Agelos

On Thu, Jul 7, 2011 at 3:16 PM, Andrew White  wrote:

> I use the shell as-is, but the messages.log is reporting...
>
> Physical mem: 3962MB, Heap size: 881MB
>
> My point is that if you ignore caching altogether, why did one run take
> 17x longer with only 2.4x more data? Considering this is a rather
> iterative algorithm, I don't see why you would even read a node or
> relationship more than once and thus a cache shouldn't matter at all.
>
> In this particular case, I can't imagine taking 9+ minutes to read a
> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an
> artifact of Cypher in which it is building a set of Rs before applying
> `count` rather than making count accept an iterable stream.
>
> Andrew
>
> On 07/06/2011 11:33 PM, David Montag wrote:
> > Hi Andrew,
> >
> > How big is your configured Java heap? It could be that all the nodes and
> > relationships don't fit into the cache.
> >
> > David
> >
> > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White
>  wrote:
> >
> >> Here is some interesting stats to consider. First, I split my nodes into
> >> two groups, one node with 1.4M children and the other with 3.4M
> >> children. While I do see some cache warm-up improvements, the
> >> transversal doesn't seem to scale linearly; ie the larger super-node has
> >> 2.4x more children but takes 17x longer to transverse.
> >>
> >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> >> +--+
> >> | count(r) |
> >> +--+
> >> | 1468486  |
> >> +--+
> >> 1 rows, 25724 ms
> >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> >> +--+
> >> | count(r) |
> >> +--+
> >> | 1468486  |
> >> +--+
> >> 1 rows, 19763 ms
> >>
> >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> >> +--+
> >> | count(r) |
> >> +--+
> >> | 3472174  |
> >> +--+
> >> 1 rows, 565448 ms
> >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> >> +--+
> >> | count(r) |
> >> +--+
> >> | 3472174  |
> >> +--+
> >> 1 rows, 337975 ms
> >>
> >> Any ideas on this?
> >> Andrew
> >>
> >> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
> >>> Andrew,
> >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
> >>> order to count the relationships of a node, not returning them:
> >>>
> >>> start n=(1) match (n)-[r]-(x) return count(r)
> >>>
> >>> and try that several times to see if cold caches are initially slowing
> >>> down things.
> >>>
> >>> or something along these lines. In the LS and Neoclipse the output and
> >>> visualization will be slow for that amount of data.
> >>>
> >>> Cheers,
> >>>
> >>> /peter neubauer
> >>>
> >>> GTalk:  neubauer.peter
> >>> Skype   peter.neubauer
> >>> Phone   +46 704 106975
> >>> LinkedIn   http://www.linkedin.com/in/neubauer
> >>> Twitter  http://twitter.com/peterneubauer
> >>>
> >>> http://www.neo4j.org   - Your high performance graph
> >> database.
> >>> http://startupbootcamp.org/- Öresund - Innovation happens HERE.
> >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
> party.
> >>>
> >>>
> >>>
> >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White
> >>   wrote:
>  I have a graph with roughly 10M nodes. Some of these nodes are highly
>  connected to other nodes. For example I may have a single node with
> 1M+
>  relationships. A good analogy is a population that has a  "lives-in"
>  relationship to a state. Now the problem...
> 
>  Both neoclipse or neo4j-shell are terribly slow when working with
> these
>  nodes. In the shell I would expect a `cd` to be very fast,
>  much like selecting via a rowid in a standard DB. Instead, I usually
> see
>  several seconds delay. Doing a `ls` takes so long that I usually have
> to
>  just kill the process. In fact `ls` never outputs anything which is
> odd
>  since I would expect it to "stream" the output as it found it. I have
>  very similar performance issues with neoclipse.
> 
>  I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>  Disclaimer, I am new to Neo4j.
> 
>  Thanks,
>  Andrew
>  ___
>  Neo4j mailing list
>  User@lists.neo4j.org
>  https://lists.neo4j.org/mailman/listinfo/user
> 
> >>> ___
> >>> Neo4j 

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-07 Thread Andrew White
I use the shell as-is, but the messages.log is reporting...

 Physical mem: 3962MB, Heap size: 881MB

My point is that if you ignore caching altogether, why did one run take 
17x longer with only 2.4x more data? Considering this is a rather 
iterative algorithm, I don't see why you would even read a node or 
relationship more than once and thus a cache shouldn't matter at all.

In this particular case, I can't imagine taking 9+ minutes to read a 
mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an 
artifact of Cypher in which it is building a set of Rs before applying 
`count` rather than making count accept an iterable stream.

Andrew

On 07/06/2011 11:33 PM, David Montag wrote:
> Hi Andrew,
>
> How big is your configured Java heap? It could be that all the nodes and
> relationships don't fit into the cache.
>
> David
>
> On Wed, Jul 6, 2011 at 8:03 PM, Andrew White  wrote:
>
>> Here is some interesting stats to consider. First, I split my nodes into
>> two groups, one node with 1.4M children and the other with 3.4M
>> children. While I do see some cache warm-up improvements, the
>> transversal doesn't seem to scale linearly; ie the larger super-node has
>> 2.4x more children but takes 17x longer to transverse.
>>
>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>> +--+
>> | count(r) |
>> +--+
>> | 1468486  |
>> +--+
>> 1 rows, 25724 ms
>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>> +--+
>> | count(r) |
>> +--+
>> | 1468486  |
>> +--+
>> 1 rows, 19763 ms
>>
>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
>> +--+
>> | count(r) |
>> +--+
>> | 3472174  |
>> +--+
>> 1 rows, 565448 ms
>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
>> +--+
>> | count(r) |
>> +--+
>> | 3472174  |
>> +--+
>> 1 rows, 337975 ms
>>
>> Any ideas on this?
>> Andrew
>>
>> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
>>> Andrew,
>>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
>>> order to count the relationships of a node, not returning them:
>>>
>>> start n=(1) match (n)-[r]-(x) return count(r)
>>>
>>> and try that several times to see if cold caches are initially slowing
>>> down things.
>>>
>>> or something along these lines. In the LS and Neoclipse the output and
>>> visualization will be slow for that amount of data.
>>>
>>> Cheers,
>>>
>>> /peter neubauer
>>>
>>> GTalk:  neubauer.peter
>>> Skype   peter.neubauer
>>> Phone   +46 704 106975
>>> LinkedIn   http://www.linkedin.com/in/neubauer
>>> Twitter  http://twitter.com/peterneubauer
>>>
>>> http://www.neo4j.org   - Your high performance graph
>> database.
>>> http://startupbootcamp.org/- Öresund - Innovation happens HERE.
>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>>>
>>>
>>>
>>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White
>>   wrote:
 I have a graph with roughly 10M nodes. Some of these nodes are highly
 connected to other nodes. For example I may have a single node with 1M+
 relationships. A good analogy is a population that has a  "lives-in"
 relationship to a state. Now the problem...

 Both neoclipse or neo4j-shell are terribly slow when working with these
 nodes. In the shell I would expect a `cd` to be very fast,
 much like selecting via a rowid in a standard DB. Instead, I usually see
 several seconds delay. Doing a `ls` takes so long that I usually have to
 just kill the process. In fact `ls` never outputs anything which is odd
 since I would expect it to "stream" the output as it found it. I have
 very similar performance issues with neoclipse.

 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
 Disclaimer, I am new to Neo4j.

 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread David Montag
Hi Andrew,

How big is your configured Java heap? It could be that all the nodes and
relationships don't fit into the cache.

David

On Wed, Jul 6, 2011 at 8:03 PM, Andrew White  wrote:

> Here is some interesting stats to consider. First, I split my nodes into
> two groups, one node with 1.4M children and the other with 3.4M
> children. While I do see some cache warm-up improvements, the
> transversal doesn't seem to scale linearly; ie the larger super-node has
> 2.4x more children but takes 17x longer to transverse.
>
> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> +--+
> | count(r) |
> +--+
> | 1468486  |
> +--+
> 1 rows, 25724 ms
> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
> +--+
> | count(r) |
> +--+
> | 1468486  |
> +--+
> 1 rows, 19763 ms
>
> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> +--+
> | count(r) |
> +--+
> | 3472174  |
> +--+
> 1 rows, 565448 ms
> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
> +--+
> | count(r) |
> +--+
> | 3472174  |
> +--+
> 1 rows, 337975 ms
>
> Any ideas on this?
> Andrew
>
> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
> > Andrew,
> > if you upgrade to 1.4.M06, your shell should be able to do Cypher in
> > order to count the relationships of a node, not returning them:
> >
> > start n=(1) match (n)-[r]-(x) return count(r)
> >
> > and try that several times to see if cold caches are initially slowing
> > down things.
> >
> > or something along these lines. In the LS and Neoclipse the output and
> > visualization will be slow for that amount of data.
> >
> > Cheers,
> >
> > /peter neubauer
> >
> > GTalk:  neubauer.peter
> > Skype   peter.neubauer
> > Phone   +46 704 106975
> > LinkedIn   http://www.linkedin.com/in/neubauer
> > Twitter  http://twitter.com/peterneubauer
> >
> > http://www.neo4j.org   - Your high performance graph
> database.
> > http://startupbootcamp.org/- Öresund - Innovation happens HERE.
> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> >
> >
> >
> > On Wed, Jul 6, 2011 at 4:15 PM, Andrew White
>  wrote:
> >> I have a graph with roughly 10M nodes. Some of these nodes are highly
> >> connected to other nodes. For example I may have a single node with 1M+
> >> relationships. A good analogy is a population that has a  "lives-in"
> >> relationship to a state. Now the problem...
> >>
> >> Both neoclipse or neo4j-shell are terribly slow when working with these
> >> nodes. In the shell I would expect a `cd` to be very fast,
> >> much like selecting via a rowid in a standard DB. Instead, I usually see
> >> several seconds delay. Doing a `ls` takes so long that I usually have to
> >> just kill the process. In fact `ls` never outputs anything which is odd
> >> since I would expect it to "stream" the output as it found it. I have
> >> very similar performance issues with neoclipse.
> >>
> >> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
> >> Disclaimer, I am new to Neo4j.
> >>
> >> Thanks,
> >> Andrew
> >> ___
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
David Montag 
Neo Technology, www.neotechnology.com
Cell: 650.556.4411
Skype: ddmontag
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
Here is some interesting stats to consider. First, I split my nodes into 
two groups, one node with 1.4M children and the other with 3.4M 
children. While I do see some cache warm-up improvements, the 
transversal doesn't seem to scale linearly; ie the larger super-node has 
2.4x more children but takes 17x longer to transverse.

neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 1468486  |
+--+
1 rows, 25724 ms
neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 1468486  |
+--+
1 rows, 19763 ms

neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 3472174  |
+--+
1 rows, 565448 ms
neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 3472174  |
+--+
1 rows, 337975 ms

Any ideas on this?
Andrew

On 07/06/2011 09:55 AM, Peter Neubauer wrote:
> Andrew,
> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
> order to count the relationships of a node, not returning them:
>
> start n=(1) match (n)-[r]-(x) return count(r)
>
> and try that several times to see if cold caches are initially slowing
> down things.
>
> or something along these lines. In the LS and Neoclipse the output and
> visualization will be slow for that amount of data.
>
> Cheers,
>
> /peter neubauer
>
> GTalk:  neubauer.peter
> Skype   peter.neubauer
> Phone   +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter  http://twitter.com/peterneubauer
>
> http://www.neo4j.org   - Your high performance graph database.
> http://startupbootcamp.org/- Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White  wrote:
>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>> connected to other nodes. For example I may have a single node with 1M+
>> relationships. A good analogy is a population that has a  "lives-in"
>> relationship to a state. Now the problem...
>>
>> Both neoclipse or neo4j-shell are terribly slow when working with these
>> nodes. In the shell I would expect a `cd` to be very fast,
>> much like selecting via a rowid in a standard DB. Instead, I usually see
>> several seconds delay. Doing a `ls` takes so long that I usually have to
>> just kill the process. In fact `ls` never outputs anything which is odd
>> since I would expect it to "stream" the output as it found it. I have
>> very similar performance issues with neoclipse.
>>
>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>> Disclaimer, I am new to Neo4j.
>>
>> Thanks,
>> Andrew
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
I just tested with 1.4.M06 and performance seems about the same. Also, 
only the supernodes are affected, the child nodes are very fast.

On 07/06/2011 09:31 AM, Michael Hunger wrote:
> Andrew,
>
> could you please also try to access the graph via the latest Milestone 
> 1.4.M06 to see if things have improved.
>
> Does this behaviour only effect the supernodes or every node in your graph 
> (e.g. when you access, cd, ls a person-node?)
>
> We've been discussing some changes to the initial loading/caching that might 
> improve performance on heavily connected (super-)nodes.
>
> If our changes and tests are successful these change will be integrated in 
> early 1.5. Milestones.
>
> Cheers
>
> Michael
>
> Am 06.07.2011 um 16:15 schrieb Andrew White:
>
>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>> connected to other nodes. For example I may have a single node with 1M+
>> relationships. A good analogy is a population that has a  "lives-in"
>> relationship to a state. Now the problem...
>>
>> Both neoclipse or neo4j-shell are terribly slow when working with these
>> nodes. In the shell I would expect a `cd` to be very fast,
>> much like selecting via a rowid in a standard DB. Instead, I usually see
>> several seconds delay. Doing a `ls` takes so long that I usually have to
>> just kill the process. In fact `ls` never outputs anything which is odd
>> since I would expect it to "stream" the output as it found it. I have
>> very similar performance issues with neoclipse.
>>
>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>> Disclaimer, I am new to Neo4j.
>>
>> Thanks,
>> Andrew
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
I am on a standard filesystem (ext4). I haven't seen the issue again 
today so I wonder if it was a fluke.

Andrew

On 07/06/2011 12:29 PM, Paul Bandler wrote:
>> Any hints on the memory map issue are welcomed too.
> I experienced that on Solaris when I'd placed the db on a filesystem that 
> didn't support memory mapped I/o such as NFS
>
> Sent from my iPhone
>
> On 6 Jul 2011, at 17:48, Andrew White  wrote:
>
>> Any
>> hints on the memory map issue are welcomed too.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Mattias Persson
Just noticed that "ls" shell reads all relationships before displaying
them... I'll fix this tomorrow.

2011/7/6 Mattias Persson 

>
>
> 2011/7/6 Jim Webber 
>
>> Hi Rick,
>>
>> > Are you thinking maybe of lazily loading relationships in 1.5?  That
>> might be a huge boost.
>>
>> Added to the backlog to be discussed for inclusion in 1.5.
>>
>
> Neo4j _is_ lazily loading relationships... and have done since before 1.0.
> Maybe there's some issue with the shell only.
>
>>
>> Jim
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Mattias Persson
2011/7/6 Jim Webber 

> Hi Rick,
>
> > Are you thinking maybe of lazily loading relationships in 1.5?  That
> might be a huge boost.
>
> Added to the backlog to be discussed for inclusion in 1.5.
>

Neo4j _is_ lazily loading relationships... and have done since before 1.0.
Maybe there's some issue with the shell only.

>
> Jim
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
Logs are attached. I am using the Sun 64bit HotSpot JVM (see logs). For 
this particular graph I simply have a single root reference node (0) and 
millions of nodes with a 1:1 relationship with the root. For all 
intents, this version of the graph is like a flat table with all 
elements sharing the same parent. This is the simplest graph I could 
construct that will eventually represent a sub graph in a more complex 
system.


Some file sizes for the db store are...

  43M  neostore.nodestore.db
 424M  neostore.propertystore.db
 193M  neostore.propertystore.db.arrays
 1.1K  neostore.propertystore.db.index
 1.1K  neostore.propertystore.db.index.keys
 238M  neostore.propertystore.db.strings
 156M  neostore.relationshipstore.db
   10  neostore.relationshiptypestore.db
  129  neostore.relationshiptypestore.db.names

Andrew

On 07/06/2011 12:03 PM, Michael Hunger wrote:

Ok, then it is checking the connectedness which actually really traverses all 
the relationships between the current and the target node.

Could you share the whole messages.log file from that graph store?

Which JVM are you running?

If you can't share the db, could you please describe the structure of the 
graph, so which category of nodes has what number of (types of) relationships 
to which others?

Also does your node 0 contain the many rels or the node with the id 1 ?

Cheers

Michael

Am 06.07.2011 um 18:48 schrieb Andrew White:


When using `cd -a` it is indeed very fast. As to the logs, those where
from messages.log.

Sharing the graph-db would be tough considering I am generating this
graph off of several GB of data and my local upload is very limited. Any
hints on the memory map issue are welcomed too.

Thanks for all of your help so far. I am going to try/reply to the other
recommendations in other e-mails soonish.

Andrew

On 07/06/2011 11:32 AM, Michael Hunger wrote:

Andrew,

can you by chance share you graph-db or perhaps your generator script? Then we 
could evaluate that and see where the performance hit occurs.

Neo4j-shell checks the connectedness of the graph so that you can't get lost 
just while navigating.

Could you try to use cd -a 1 (this does absolute jumps w/o checking 
connectedness).

Are those logs you showed from neoclipse as well, or in messages.log in the 
graph-db directory?

The "unable to memory map" sounds not so good, that shouldn't be a problem in 
Ubuntu.

Cheers,

Michael

Am 06.07.2011 um 16:59 schrieb Andrew White:


This is consistently slow. I made a graph which just goes off of the
root reference node (0) and I am seeing the following...

(0)$ cd 1
(1)$ cd 0
(0)$ cd 1


It's almost like it is scanning the entire relationship list before
actually looking up the next node. Of note I have found the following
when running neoclipse...

WARNING: [/neostore.relationshipstore.db] Unable
to memory map


And I see this in the logs...

neostore.nodestore.db.mapped_memory=20M
neostore.propertystore.db.arrays.mapped_memory=130M
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.relationshipstore.db.mapped_memory=100M

Am I missing something obvious? Even without memory maps, I would expect
this to be somewhat faster since reading 156MB (the size of my
neostore.relationshipstore.db file) of relation data should be very
fast. Also, is there anyway to do a pre-warm up so that the first hit
isn't so slow? I would hate for my first user in PROD to get hammered
because a cache wasn't warmed up.

Thanks,
Andrew


On 07/06/2011 09:24 AM, Rick Bullotta wrote:

Hi, Andrew.

In general, this scenario (1 million+ relationships on a node) can be slow, but 
usually only the first time you access the node.  If you're only accessing the 
node once in a session, then yes, it will seem sluggish.  The Neoclipse issue 
is probably a combination of two issues: the first is lazily loading the node 
information the first time, and the second is the visual rendering of all those 
relationships.

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Andrew White
Sent: Wednesday, July 06, 2011 10:15 AM
To: user@lists.neo4j.org
Subject: [Neo4j] Performance issue on nodes with lots of relationships

I have a graph with roughly 10M nodes. Some of these nodes are highly
connected to other nodes. For example I may have a single node with 1M+
relationships. A good analogy is a population that has a  "lives-in"
relationship to a state. Now the problem...

Both neoclipse or neo4j-shell are terribly slow when working with these
nodes. In the shell I would expect a `cd` to be very fast,
much like selecting via a rowid in a standard DB. Instead, I usually see
several seconds delay. Doin

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Paul Bandler
>Any hints on the memory map issue are welcomed too.
I experienced that on Solaris when I'd placed the db on a filesystem that 
didn't support memory mapped I/o such as NFS

Sent from my iPhone

On 6 Jul 2011, at 17:48, Andrew White  wrote:

> Any 
> hints on the memory map issue are welcomed too.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Michael Hunger
Ok, then it is checking the connectedness which actually really traverses all 
the relationships between the current and the target node. 

Could you share the whole messages.log file from that graph store?

Which JVM are you running?

If you can't share the db, could you please describe the structure of the 
graph, so which category of nodes has what number of (types of) relationships 
to which others?

Also does your node 0 contain the many rels or the node with the id 1 ?

Cheers

Michael

Am 06.07.2011 um 18:48 schrieb Andrew White:

> When using `cd -a` it is indeed very fast. As to the logs, those where 
> from messages.log.
> 
> Sharing the graph-db would be tough considering I am generating this 
> graph off of several GB of data and my local upload is very limited. Any 
> hints on the memory map issue are welcomed too.
> 
> Thanks for all of your help so far. I am going to try/reply to the other 
> recommendations in other e-mails soonish.
> 
> Andrew
> 
> On 07/06/2011 11:32 AM, Michael Hunger wrote:
>> Andrew,
>> 
>> can you by chance share you graph-db or perhaps your generator script? Then 
>> we could evaluate that and see where the performance hit occurs.
>> 
>> Neo4j-shell checks the connectedness of the graph so that you can't get lost 
>> just while navigating.
>> 
>> Could you try to use cd -a 1 (this does absolute jumps w/o checking 
>> connectedness).
>> 
>> Are those logs you showed from neoclipse as well, or in messages.log in the 
>> graph-db directory?
>> 
>> The "unable to memory map" sounds not so good, that shouldn't be a problem 
>> in Ubuntu.
>> 
>> Cheers,
>> 
>> Michael
>> 
>> Am 06.07.2011 um 16:59 schrieb Andrew White:
>> 
>>> This is consistently slow. I made a graph which just goes off of the
>>> root reference node (0) and I am seeing the following...
>>> 
>>>(0)$ cd 1
>>>(1)$ cd 0
>>>(0)$ cd 1
>>> 
>>> 
>>> It's almost like it is scanning the entire relationship list before
>>> actually looking up the next node. Of note I have found the following
>>> when running neoclipse...
>>> 
>>>WARNING: [/neostore.relationshipstore.db] Unable
>>>to memory map
>>> 
>>> 
>>> And I see this in the logs...
>>> 
>>>neostore.nodestore.db.mapped_memory=20M
>>>neostore.propertystore.db.arrays.mapped_memory=130M
>>>neostore.propertystore.db.index.keys.mapped_memory=1M
>>>neostore.propertystore.db.index.mapped_memory=1M
>>>neostore.propertystore.db.mapped_memory=90M
>>>neostore.propertystore.db.strings.mapped_memory=130M
>>>neostore.relationshipstore.db.mapped_memory=100M
>>> 
>>> Am I missing something obvious? Even without memory maps, I would expect
>>> this to be somewhat faster since reading 156MB (the size of my
>>> neostore.relationshipstore.db file) of relation data should be very
>>> fast. Also, is there anyway to do a pre-warm up so that the first hit
>>> isn't so slow? I would hate for my first user in PROD to get hammered
>>> because a cache wasn't warmed up.
>>> 
>>> Thanks,
>>> Andrew
>>> 
>>> 
>>> On 07/06/2011 09:24 AM, Rick Bullotta wrote:
>>>> Hi, Andrew.
>>>> 
>>>> In general, this scenario (1 million+ relationships on a node) can be 
>>>> slow, but usually only the first time you access the node.  If you're only 
>>>> accessing the node once in a session, then yes, it will seem sluggish.  
>>>> The Neoclipse issue is probably a combination of two issues: the first is 
>>>> lazily loading the node information the first time, and the second is the 
>>>> visual rendering of all those relationships.
>>>> 
>>>> Rick
>>>> 
>>>> -Original Message-
>>>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] 
>>>> On Behalf Of Andrew White
>>>> Sent: Wednesday, July 06, 2011 10:15 AM
>>>> To: user@lists.neo4j.org
>>>> Subject: [Neo4j] Performance issue on nodes with lots of relationships
>>>> 
>>>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>>>> connected to other nodes. For example I may have a single node with 1M+
>>>> relationships. A good analogy is a population that has a  "lives-in"
>>>> relation

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
When using `cd -a` it is indeed very fast. As to the logs, those where 
from messages.log.

Sharing the graph-db would be tough considering I am generating this 
graph off of several GB of data and my local upload is very limited. Any 
hints on the memory map issue are welcomed too.

Thanks for all of your help so far. I am going to try/reply to the other 
recommendations in other e-mails soonish.

Andrew

On 07/06/2011 11:32 AM, Michael Hunger wrote:
> Andrew,
>
> can you by chance share you graph-db or perhaps your generator script? Then 
> we could evaluate that and see where the performance hit occurs.
>
> Neo4j-shell checks the connectedness of the graph so that you can't get lost 
> just while navigating.
>
> Could you try to use cd -a 1 (this does absolute jumps w/o checking 
> connectedness).
>
> Are those logs you showed from neoclipse as well, or in messages.log in the 
> graph-db directory?
>
> The "unable to memory map" sounds not so good, that shouldn't be a problem in 
> Ubuntu.
>
> Cheers,
>
> Michael
>
> Am 06.07.2011 um 16:59 schrieb Andrew White:
>
>> This is consistently slow. I made a graph which just goes off of the
>> root reference node (0) and I am seeing the following...
>>
>> (0)$ cd 1
>> (1)$ cd 0
>> (0)$ cd 1
>>
>>
>> It's almost like it is scanning the entire relationship list before
>> actually looking up the next node. Of note I have found the following
>> when running neoclipse...
>>
>> WARNING: [/neostore.relationshipstore.db] Unable
>> to memory map
>>
>>
>> And I see this in the logs...
>>
>> neostore.nodestore.db.mapped_memory=20M
>> neostore.propertystore.db.arrays.mapped_memory=130M
>> neostore.propertystore.db.index.keys.mapped_memory=1M
>> neostore.propertystore.db.index.mapped_memory=1M
>> neostore.propertystore.db.mapped_memory=90M
>> neostore.propertystore.db.strings.mapped_memory=130M
>> neostore.relationshipstore.db.mapped_memory=100M
>>
>> Am I missing something obvious? Even without memory maps, I would expect
>> this to be somewhat faster since reading 156MB (the size of my
>> neostore.relationshipstore.db file) of relation data should be very
>> fast. Also, is there anyway to do a pre-warm up so that the first hit
>> isn't so slow? I would hate for my first user in PROD to get hammered
>> because a cache wasn't warmed up.
>>
>> Thanks,
>> Andrew
>>
>>
>> On 07/06/2011 09:24 AM, Rick Bullotta wrote:
>>> Hi, Andrew.
>>>
>>> In general, this scenario (1 million+ relationships on a node) can be slow, 
>>> but usually only the first time you access the node.  If you're only 
>>> accessing the node once in a session, then yes, it will seem sluggish.  The 
>>> Neoclipse issue is probably a combination of two issues: the first is 
>>> lazily loading the node information the first time, and the second is the 
>>> visual rendering of all those relationships.
>>>
>>> Rick
>>>
>>> -Original Message-
>>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
>>> Behalf Of Andrew White
>>> Sent: Wednesday, July 06, 2011 10:15 AM
>>> To: user@lists.neo4j.org
>>> Subject: [Neo4j] Performance issue on nodes with lots of relationships
>>>
>>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>>> connected to other nodes. For example I may have a single node with 1M+
>>> relationships. A good analogy is a population that has a  "lives-in"
>>> relationship to a state. Now the problem...
>>>
>>> Both neoclipse or neo4j-shell are terribly slow when working with these
>>> nodes. In the shell I would expect a `cd` to be very fast,
>>> much like selecting via a rowid in a standard DB. Instead, I usually see
>>> several seconds delay. Doing a `ls` takes so long that I usually have to
>>> just kill the process. In fact `ls` never outputs anything which is odd
>>> since I would expect it to "stream" the output as it found it. I have
>>> very similar performance issues with neoclipse.
>>>
>>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>>> Disclaimer, I am new to Neo4j.
>>>
>>> Thanks,
>>> Andrew
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Michael Hunger
Andrew,

can you by chance share you graph-db or perhaps your generator script? Then we 
could evaluate that and see where the performance hit occurs.

Neo4j-shell checks the connectedness of the graph so that you can't get lost 
just while navigating.

Could you try to use cd -a 1 (this does absolute jumps w/o checking 
connectedness).

Are those logs you showed from neoclipse as well, or in messages.log in the 
graph-db directory? 

The "unable to memory map" sounds not so good, that shouldn't be a problem in 
Ubuntu.

Cheers,

Michael

Am 06.07.2011 um 16:59 schrieb Andrew White:

> This is consistently slow. I made a graph which just goes off of the 
> root reference node (0) and I am seeing the following...
> 
>(0)$ cd 1 
>(1)$ cd 0 
>(0)$ cd 1 
> 
> 
> It's almost like it is scanning the entire relationship list before 
> actually looking up the next node. Of note I have found the following 
> when running neoclipse...
> 
>WARNING: [/neostore.relationshipstore.db] Unable
>to memory map
> 
> 
> And I see this in the logs...
> 
>neostore.nodestore.db.mapped_memory=20M
>neostore.propertystore.db.arrays.mapped_memory=130M
>neostore.propertystore.db.index.keys.mapped_memory=1M
>neostore.propertystore.db.index.mapped_memory=1M
>neostore.propertystore.db.mapped_memory=90M
>neostore.propertystore.db.strings.mapped_memory=130M
>neostore.relationshipstore.db.mapped_memory=100M
> 
> Am I missing something obvious? Even without memory maps, I would expect 
> this to be somewhat faster since reading 156MB (the size of my 
> neostore.relationshipstore.db file) of relation data should be very 
> fast. Also, is there anyway to do a pre-warm up so that the first hit 
> isn't so slow? I would hate for my first user in PROD to get hammered 
> because a cache wasn't warmed up.
> 
> Thanks,
> Andrew
> 
> 
> On 07/06/2011 09:24 AM, Rick Bullotta wrote:
>> Hi, Andrew.
>> 
>> In general, this scenario (1 million+ relationships on a node) can be slow, 
>> but usually only the first time you access the node.  If you're only 
>> accessing the node once in a session, then yes, it will seem sluggish.  The 
>> Neoclipse issue is probably a combination of two issues: the first is lazily 
>> loading the node information the first time, and the second is the visual 
>> rendering of all those relationships.
>> 
>> Rick
>> 
>> -Original Message-----
>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
>> Behalf Of Andrew White
>> Sent: Wednesday, July 06, 2011 10:15 AM
>> To: user@lists.neo4j.org
>> Subject: [Neo4j] Performance issue on nodes with lots of relationships
>> 
>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>> connected to other nodes. For example I may have a single node with 1M+
>> relationships. A good analogy is a population that has a  "lives-in"
>> relationship to a state. Now the problem...
>> 
>> Both neoclipse or neo4j-shell are terribly slow when working with these
>> nodes. In the shell I would expect a `cd` to be very fast,
>> much like selecting via a rowid in a standard DB. Instead, I usually see
>> several seconds delay. Doing a `ls` takes so long that I usually have to
>> just kill the process. In fact `ls` never outputs anything which is odd
>> since I would expect it to "stream" the output as it found it. I have
>> very similar performance issues with neoclipse.
>> 
>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>> Disclaimer, I am new to Neo4j.
>> 
>> Thanks,
>> Andrew
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> 
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Jim Webber
Hi Rick,

> Are you thinking maybe of lazily loading relationships in 1.5?  That might be 
> a huge boost.

Added to the backlog to be discussed for inclusion in 1.5.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
This is consistently slow. I made a graph which just goes off of the 
root reference node (0) and I am seeing the following...

(0)$ cd 1 
(1)$ cd 0 
(0)$ cd 1 


It's almost like it is scanning the entire relationship list before 
actually looking up the next node. Of note I have found the following 
when running neoclipse...

WARNING: [/neostore.relationshipstore.db] Unable
to memory map


And I see this in the logs...

neostore.nodestore.db.mapped_memory=20M
neostore.propertystore.db.arrays.mapped_memory=130M
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.relationshipstore.db.mapped_memory=100M

Am I missing something obvious? Even without memory maps, I would expect 
this to be somewhat faster since reading 156MB (the size of my 
neostore.relationshipstore.db file) of relation data should be very 
fast. Also, is there anyway to do a pre-warm up so that the first hit 
isn't so slow? I would hate for my first user in PROD to get hammered 
because a cache wasn't warmed up.

Thanks,
Andrew


On 07/06/2011 09:24 AM, Rick Bullotta wrote:
> Hi, Andrew.
>
> In general, this scenario (1 million+ relationships on a node) can be slow, 
> but usually only the first time you access the node.  If you're only 
> accessing the node once in a session, then yes, it will seem sluggish.  The 
> Neoclipse issue is probably a combination of two issues: the first is lazily 
> loading the node information the first time, and the second is the visual 
> rendering of all those relationships.
>
> Rick
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
> Behalf Of Andrew White
> Sent: Wednesday, July 06, 2011 10:15 AM
> To: user@lists.neo4j.org
> Subject: [Neo4j] Performance issue on nodes with lots of relationships
>
> I have a graph with roughly 10M nodes. Some of these nodes are highly
> connected to other nodes. For example I may have a single node with 1M+
> relationships. A good analogy is a population that has a  "lives-in"
> relationship to a state. Now the problem...
>
> Both neoclipse or neo4j-shell are terribly slow when working with these
> nodes. In the shell I would expect a `cd` to be very fast,
> much like selecting via a rowid in a standard DB. Instead, I usually see
> several seconds delay. Doing a `ls` takes so long that I usually have to
> just kill the process. In fact `ls` never outputs anything which is odd
> since I would expect it to "stream" the output as it found it. I have
> very similar performance issues with neoclipse.
>
> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
> Disclaimer, I am new to Neo4j.
>
> Thanks,
> Andrew
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Peter Neubauer
Andrew,
if you upgrade to 1.4.M06, your shell should be able to do Cypher in
order to count the relationships of a node, not returning them:

start n=(1) match (n)-[r]-(x) return count(r)

and try that several times to see if cold caches are initially slowing
down things.

or something along these lines. In the LS and Neoclipse the output and
visualization will be slow for that amount of data.

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Wed, Jul 6, 2011 at 4:15 PM, Andrew White  wrote:
> I have a graph with roughly 10M nodes. Some of these nodes are highly
> connected to other nodes. For example I may have a single node with 1M+
> relationships. A good analogy is a population that has a  "lives-in"
> relationship to a state. Now the problem...
>
> Both neoclipse or neo4j-shell are terribly slow when working with these
> nodes. In the shell I would expect a `cd ` to be very fast,
> much like selecting via a rowid in a standard DB. Instead, I usually see
> several seconds delay. Doing a `ls` takes so long that I usually have to
> just kill the process. In fact `ls` never outputs anything which is odd
> since I would expect it to "stream" the output as it found it. I have
> very similar performance issues with neoclipse.
>
> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
> Disclaimer, I am new to Neo4j.
>
> Thanks,
> Andrew
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Rick Bullotta
Hi, Michael.

Are you thinking maybe of lazily loading relationships in 1.5?  That might be a 
huge boost.

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Michael Hunger
Sent: Wednesday, July 06, 2011 10:32 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships

Andrew,

could you please also try to access the graph via the latest Milestone 1.4.M06 
to see if things have improved.

Does this behaviour only effect the supernodes or every node in your graph 
(e.g. when you access, cd, ls a person-node?)

We've been discussing some changes to the initial loading/caching that might 
improve performance on heavily connected (super-)nodes.

If our changes and tests are successful these change will be integrated in 
early 1.5. Milestones.

Cheers

Michael

Am 06.07.2011 um 16:15 schrieb Andrew White:

> I have a graph with roughly 10M nodes. Some of these nodes are highly 
> connected to other nodes. For example I may have a single node with 1M+ 
> relationships. A good analogy is a population that has a  "lives-in" 
> relationship to a state. Now the problem...
> 
> Both neoclipse or neo4j-shell are terribly slow when working with these 
> nodes. In the shell I would expect a `cd ` to be very fast, 
> much like selecting via a rowid in a standard DB. Instead, I usually see 
> several seconds delay. Doing a `ls` takes so long that I usually have to 
> just kill the process. In fact `ls` never outputs anything which is odd 
> since I would expect it to "stream" the output as it found it. I have 
> very similar performance issues with neoclipse.
> 
> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
> Disclaimer, I am new to Neo4j.
> 
> Thanks,
> Andrew
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Michael Hunger
Andrew,

could you please also try to access the graph via the latest Milestone 1.4.M06 
to see if things have improved.

Does this behaviour only effect the supernodes or every node in your graph 
(e.g. when you access, cd, ls a person-node?)

We've been discussing some changes to the initial loading/caching that might 
improve performance on heavily connected (super-)nodes.

If our changes and tests are successful these change will be integrated in 
early 1.5. Milestones.

Cheers

Michael

Am 06.07.2011 um 16:15 schrieb Andrew White:

> I have a graph with roughly 10M nodes. Some of these nodes are highly 
> connected to other nodes. For example I may have a single node with 1M+ 
> relationships. A good analogy is a population that has a  "lives-in" 
> relationship to a state. Now the problem...
> 
> Both neoclipse or neo4j-shell are terribly slow when working with these 
> nodes. In the shell I would expect a `cd ` to be very fast, 
> much like selecting via a rowid in a standard DB. Instead, I usually see 
> several seconds delay. Doing a `ls` takes so long that I usually have to 
> just kill the process. In fact `ls` never outputs anything which is odd 
> since I would expect it to "stream" the output as it found it. I have 
> very similar performance issues with neoclipse.
> 
> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
> Disclaimer, I am new to Neo4j.
> 
> Thanks,
> Andrew
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Rick Bullotta
Hi, Andrew.

In general, this scenario (1 million+ relationships on a node) can be slow, but 
usually only the first time you access the node.  If you're only accessing the 
node once in a session, then yes, it will seem sluggish.  The Neoclipse issue 
is probably a combination of two issues: the first is lazily loading the node 
information the first time, and the second is the visual rendering of all those 
relationships.

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Andrew White
Sent: Wednesday, July 06, 2011 10:15 AM
To: user@lists.neo4j.org
Subject: [Neo4j] Performance issue on nodes with lots of relationships

I have a graph with roughly 10M nodes. Some of these nodes are highly 
connected to other nodes. For example I may have a single node with 1M+ 
relationships. A good analogy is a population that has a  "lives-in" 
relationship to a state. Now the problem...

Both neoclipse or neo4j-shell are terribly slow when working with these 
nodes. In the shell I would expect a `cd ` to be very fast, 
much like selecting via a rowid in a standard DB. Instead, I usually see 
several seconds delay. Doing a `ls` takes so long that I usually have to 
just kill the process. In fact `ls` never outputs anything which is odd 
since I would expect it to "stream" the output as it found it. I have 
very similar performance issues with neoclipse.

I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
Disclaimer, I am new to Neo4j.

Thanks,
Andrew
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
I have a graph with roughly 10M nodes. Some of these nodes are highly 
connected to other nodes. For example I may have a single node with 1M+ 
relationships. A good analogy is a population that has a  "lives-in" 
relationship to a state. Now the problem...

Both neoclipse or neo4j-shell are terribly slow when working with these 
nodes. In the shell I would expect a `cd ` to be very fast, 
much like selecting via a rowid in a standard DB. Instead, I usually see 
several seconds delay. Doing a `ls` takes so long that I usually have to 
just kill the process. In fact `ls` never outputs anything which is odd 
since I would expect it to "stream" the output as it found it. I have 
very similar performance issues with neoclipse.

I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
Disclaimer, I am new to Neo4j.

Thanks,
Andrew
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user