Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-02 Thread Rick Bullotta
Hi, Niels.

That's what we're doing now, but it has performance issues with large #'s of 
relationships when cars are constantly being added, since the color nodes 
become synchronization bottlenecks for updates.

Rick


From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of 
Niels Hoogeveen [pd_aficion...@hotmail.com]
Sent: Sunday, May 01, 2011 9:41 AM
To: user@lists.neo4j.org
Subject: Re: [Neo4j] Lucene/Neo Indexing Question

One option would be to create a unique value node for each distinct color and 
create a relationship from car to that value node. The value nodes can be 
grouped together with relationships to some reference node.

This gives the opportunity of finding all distinct colors, and it allows you to 
find all cars with that particular color.
 Date: Sun, 1 May 2011 14:41:40 +0200
 From: matt...@neotechnology.com
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
  Hi, Mattias.
 
  Here's a use case:
 
  I have a million nodes representing cars, and those nodes are all tagged 
  with some value, let's say a color name, as a property.  I have indexed 
  those nodes on the color property value.  Now I'd like to present a list of 
  the distinct color values with which nodes (cars) have been tagged.  At 
  present, I'd need to iterate through all million, read the property, and 
  maintain a distinct HashSet as I iterate through them.
 
  I've tried using relationships from the car node(s) to a set of color 
  node(s), but had scalability/performance issues when there are lots of car 
  nodes being added/deleted (the color node quickly becomes a hot 
  spot/synchronization choke point).

 Allright, yeah such nodes can become bottlenecks, so I see your
 problem for sure.
 
  Rick
 
 
  -Original Message-
  From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
  Behalf Of Mattias Persson
  Sent: Tuesday, April 26, 2011 2:17 PM
  To: Neo4j user discussions
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  Hi Rick,
 
  No, not really. What the use case for having such a method?
 
  2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
  Hi, all.
 
  Is there a method or suggested approach for obtaining a list of all of the 
  distinct key values in a given index?  I don't care about the indexed 
  nodes or relationships themselves, just the value(s) of the key.
 
  Thanks,
 
  Rick
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Hacker, Neo Technology
  www.neotechnology.com
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-02 Thread Michael Hunger
Perhaps then it is sensible to introduce a second layer of nodes, so that you 
split down your supernodes and distribute the write contention?

Would be interesting if putting a round robin on that second level of color 
nodes would be enough to spread lock contention?

This is what peter talks about in his activity stream update scenario.

And in general perhaps a step to a more performant in-graph index.

When thinking about in-graph indexes I thought it might perhaps be interesting 
to re-use the HashMap approach of declaring x (2^n) bucket-nodes then having 
from the index-root node relationships with the (re-distributed) hashcode  
(x-1) relationship-types to the bucket nodes and below the bucket node rels 
with the concrete value as an relationship attribute to the concrete nodes.

I think this will be addressed even better with Craig's indexes or the 
Collection abstractions that Andreas Kollegger is working on.

Cheers

Michael

Am 02.05.2011 um 12:16 schrieb Rick Bullotta:

 Hi, Niels.
 
 That's what we're doing now, but it has performance issues with large #'s of 
 relationships when cars are constantly being added, since the color nodes 
 become synchronization bottlenecks for updates.
 
 Rick
 
 
 From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf 
 Of Niels Hoogeveen [pd_aficion...@hotmail.com]
 Sent: Sunday, May 01, 2011 9:41 AM
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
 One option would be to create a unique value node for each distinct color and 
 create a relationship from car to that value node. The value nodes can be 
 grouped together with relationships to some reference node.
 
 This gives the opportunity of finding all distinct colors, and it allows you 
 to find all cars with that particular color.
 Date: Sun, 1 May 2011 14:41:40 +0200
 From: matt...@neotechnology.com
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, Mattias.
 
 Here's a use case:
 
 I have a million nodes representing cars, and those nodes are all tagged 
 with some value, let's say a color name, as a property.  I have indexed 
 those nodes on the color property value.  Now I'd like to present a list of 
 the distinct color values with which nodes (cars) have been tagged.  At 
 present, I'd need to iterate through all million, read the property, and 
 maintain a distinct HashSet as I iterate through them.
 
 I've tried using relationships from the car node(s) to a set of color 
 node(s), but had scalability/performance issues when there are lots of car 
 nodes being added/deleted (the color node quickly becomes a hot 
 spot/synchronization choke point).
 
 Allright, yeah such nodes can become bottlenecks, so I see your
 problem for sure.
 
 Rick
 
 
 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Mattias Persson
 Sent: Tuesday, April 26, 2011 2:17 PM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
 Hi Rick,
 
 No, not really. What the use case for having such a method?
 
 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, all.
 
 Is there a method or suggested approach for obtaining a list of all of the 
 distinct key values in a given index?  I don't care about the indexed 
 nodes or relationships themselves, just the value(s) of the key.
 
 Thanks,
 
 Rick
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-02 Thread Niels Hoogeveen

Have you thought about using the in-graph Timeline index for this? Make each 
color node the root of a Timeline and add the car nodes as entries to that 
index. This may reduce your synchronization problems and is something you can 
probably test without having to make too much of an investment. 

 From: rick.bullo...@thingworx.com
 To: user@lists.neo4j.org
 Date: Mon, 2 May 2011 04:09:59 -0700
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
 Hi, Michael.
 
 The nature of the domain model really doesn't lend itself to any logical 
 partioning of supernodes, so it would indeed have to be something very 
 arbitary/random.  
 
 For now, I think we will have to either deal with the performance issues or 
 switch to using Lucene for the indexing, but we can't do that yet until we 
 have the ability to query the list of terms for a given key (which is a 
 necessary function in our domain model).  We could perhaps keep a list of 
 terms as nodes *and* index them, but that seems redundant.
 
 Ultimately, I think the solution is to hide the complexity via the indexing 
 framework and to offer a variety of in-graph indexing models that address 
 specific types of domain requirements.
 
 Rick
 
 
 From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf 
 Of Michael Hunger [michael.hun...@neotechnology.com]
 Sent: Monday, May 02, 2011 3:49 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
 Perhaps then it is sensible to introduce a second layer of nodes, so that you 
 split down your supernodes and distribute the write contention?
 
 Would be interesting if putting a round robin on that second level of color 
 nodes would be enough to spread lock contention?
 
 This is what peter talks about in his activity stream update scenario.
 
 And in general perhaps a step to a more performant in-graph index.
 
 When thinking about in-graph indexes I thought it might perhaps be 
 interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes 
 then having from the index-root node relationships with the (re-distributed) 
 hashcode  (x-1) relationship-types to the bucket nodes and below the bucket 
 node rels with the concrete value as an relationship attribute to the 
 concrete nodes.
 
 I think this will be addressed even better with Craig's indexes or the 
 Collection abstractions that Andreas Kollegger is working on.
 
 Cheers
 
 Michael
 
 Am 02.05.2011 um 12:16 schrieb Rick Bullotta:
 
  Hi, Niels.
 
  That's what we're doing now, but it has performance issues with large #'s 
  of relationships when cars are constantly being added, since the color 
  nodes become synchronization bottlenecks for updates.
 
  Rick
 
  
  From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf 
  Of Niels Hoogeveen [pd_aficion...@hotmail.com]
  Sent: Sunday, May 01, 2011 9:41 AM
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  One option would be to create a unique value node for each distinct color 
  and create a relationship from car to that value node. The value nodes can 
  be grouped together with relationships to some reference node.
 
  This gives the opportunity of finding all distinct colors, and it allows 
  you to find all cars with that particular color.
  Date: Sun, 1 May 2011 14:41:40 +0200
  From: matt...@neotechnology.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
  Hi, Mattias.
 
  Here's a use case:
 
  I have a million nodes representing cars, and those nodes are all 
  tagged with some value, let's say a color name, as a property.  I have 
  indexed those nodes on the color property value.  Now I'd like to present 
  a list of the distinct color values with which nodes (cars) have been 
  tagged.  At present, I'd need to iterate through all million, read the 
  property, and maintain a distinct HashSet as I iterate through them.
 
  I've tried using relationships from the car node(s) to a set of color 
  node(s), but had scalability/performance issues when there are lots of 
  car nodes being added/deleted (the color node quickly becomes a hot 
  spot/synchronization choke point).
 
  Allright, yeah such nodes can become bottlenecks, so I see your
  problem for sure.
 
  Rick
 
 
  -Original Message-
  From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] 
  On Behalf Of Mattias Persson
  Sent: Tuesday, April 26, 2011 2:17 PM
  To: Neo4j user discussions
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  Hi Rick,
 
  No, not really. What the use case for having such a method?
 
  2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
  Hi, all.
 
  Is there a method or suggested approach for obtaining a list of all of 
  the distinct key values in a given index?  I don't care about the 
  indexed

Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-02 Thread Craig Taverner
Thinking back you your original domain description, cars with colors, surely
you have more properties than just colors to index?

If you have two or more properties, then you use combinations of properties
for the first level of the index tree, which provides your logical
partitioning of supernodes in a domain specific way. For example,
considering having the four properties color, manufacturer, model, year. The
first level of index nodes would be the set of unique combinations of all
possible properties (all existing combinations, actually). This set is much
larger than the set of colors. So red will occur many times. As a result you
dramatically reduce node contention, and the number of relationships per
node is much less. Then if you want to perform the query for all red cars,
actually your traverser needs to be only slightly more complex, basically
'find all cars with color red and any value of the other properties'.

This is the design of the 'amanzi-index' I started on github in December
(but did not complete). It was focusing on doing queries on multiple
properties at the same time, but does effectively cover your case of
reducing node contention, if you can add more properties to the index. It
also has the concept of a mapper from the domain specific property to the
index key, which was designed to reduce the number of index nodes, but in
your case you could also use it to increase the number of index nodes, using
some of the ideas by Jim and Michael. Jim suggested that instead or 'red'
always mapping to the same node, it could map to a set of different nodes
(randomly selected, or round robin). Michael discussed a distributed
hash-code, which I do not fully understand, but it does sound relevant :-)

So, in short, using the design of the amanzi-index you could help this
problem in two ways:

   - index together with other properties to get a domain-specific
   partitioning of the 'supernodes'
   - Add a mapper between the color and the index key to get partitioning of
   the supernodes


On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 Hi, Michael.

 The nature of the domain model really doesn't lend itself to any logical
 partioning of supernodes, so it would indeed have to be something very
 arbitary/random.

 For now, I think we will have to either deal with the performance issues or
 switch to using Lucene for the indexing, but we can't do that yet until we
 have the ability to query the list of terms for a given key (which is a
 necessary function in our domain model).  We could perhaps keep a list of
 terms as nodes *and* index them, but that seems redundant.

 Ultimately, I think the solution is to hide the complexity via the indexing
 framework and to offer a variety of in-graph indexing models that address
 specific types of domain requirements.

 Rick

 
 From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
 Behalf Of Michael Hunger [michael.hun...@neotechnology.com]
 Sent: Monday, May 02, 2011 3:49 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 Perhaps then it is sensible to introduce a second layer of nodes, so that
 you split down your supernodes and distribute the write contention?

 Would be interesting if putting a round robin on that second level of color
 nodes would be enough to spread lock contention?

 This is what peter talks about in his activity stream update scenario.

 And in general perhaps a step to a more performant in-graph index.

 When thinking about in-graph indexes I thought it might perhaps be
 interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes
 then having from the index-root node relationships with the (re-distributed)
 hashcode  (x-1) relationship-types to the bucket nodes and below the bucket
 node rels with the concrete value as an relationship attribute to the
 concrete nodes.

 I think this will be addressed even better with Craig's indexes or the
 Collection abstractions that Andreas Kollegger is working on.

 Cheers

 Michael

 Am 02.05.2011 um 12:16 schrieb Rick Bullotta:

  Hi, Niels.
 
  That's what we're doing now, but it has performance issues with large #'s
 of relationships when cars are constantly being added, since the color
 nodes become synchronization bottlenecks for updates.
 
  Rick
 
  
  From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
 Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com]
  Sent: Sunday, May 01, 2011 9:41 AM
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  One option would be to create a unique value node for each distinct color
 and create a relationship from car to that value node. The value nodes can
 be grouped together with relationships to some reference node.
 
  This gives the opportunity of finding all distinct colors, and it allows
 you to find all cars

Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-02 Thread Rick Bullotta
Ah, if only it were so...

The number of indexable properties (tags) is completely variable on a per car 
basis (e.g. I can add a driverMood tag for just a subset of cars) - meaning 
that the domain objects themselves can have a variable number of tags and can 
indeed even be tagged with two values from the same vocabulary (e.g. a car can 
have two-color paint, red and blue).

The round-robin idea has some merit, but of course, identifying/determining the 
sub-tree width (# of index randomly assigned index subnodes) is somewhat 
subjective in terms of determining what would help address the concurrency 
issues at the possible expense of traversal performance.  Also, the hotspot 
or supernode issue exists a number of other places in our application 
wherever we are constantly adding (or removing) content related to an entity in 
the system.  It seems that a lot of the current users of Neo are doing bulk 
loads and using it for analysis as opposed to using it like an OLTP data store 
(like we are), so I'm guessing the hotspot issue is unique to our domain.

I'm still leaning towards Lucene, but will experiment with a few approaches to 
see what works best in different scenarios, and will try implementing something 
along the lines of what you describe.



-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Craig Taverner
Sent: Monday, May 02, 2011 11:29 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Lucene/Neo Indexing Question

Thinking back you your original domain description, cars with colors, surely
you have more properties than just colors to index?

If you have two or more properties, then you use combinations of properties
for the first level of the index tree, which provides your logical
partitioning of supernodes in a domain specific way. For example,
considering having the four properties color, manufacturer, model, year. The
first level of index nodes would be the set of unique combinations of all
possible properties (all existing combinations, actually). This set is much
larger than the set of colors. So red will occur many times. As a result you
dramatically reduce node contention, and the number of relationships per
node is much less. Then if you want to perform the query for all red cars,
actually your traverser needs to be only slightly more complex, basically
'find all cars with color red and any value of the other properties'.

This is the design of the 'amanzi-index' I started on github in December
(but did not complete). It was focusing on doing queries on multiple
properties at the same time, but does effectively cover your case of
reducing node contention, if you can add more properties to the index. It
also has the concept of a mapper from the domain specific property to the
index key, which was designed to reduce the number of index nodes, but in
your case you could also use it to increase the number of index nodes, using
some of the ideas by Jim and Michael. Jim suggested that instead or 'red'
always mapping to the same node, it could map to a set of different nodes
(randomly selected, or round robin). Michael discussed a distributed
hash-code, which I do not fully understand, but it does sound relevant :-)

So, in short, using the design of the amanzi-index you could help this
problem in two ways:

   - index together with other properties to get a domain-specific
   partitioning of the 'supernodes'
   - Add a mapper between the color and the index key to get partitioning of
   the supernodes


On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 Hi, Michael.

 The nature of the domain model really doesn't lend itself to any logical
 partioning of supernodes, so it would indeed have to be something very
 arbitary/random.

 For now, I think we will have to either deal with the performance issues or
 switch to using Lucene for the indexing, but we can't do that yet until we
 have the ability to query the list of terms for a given key (which is a
 necessary function in our domain model).  We could perhaps keep a list of
 terms as nodes *and* index them, but that seems redundant.

 Ultimately, I think the solution is to hide the complexity via the indexing
 framework and to offer a variety of in-graph indexing models that address
 specific types of domain requirements.

 Rick

 
 From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
 Behalf Of Michael Hunger [michael.hun...@neotechnology.com]
 Sent: Monday, May 02, 2011 3:49 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 Perhaps then it is sensible to introduce a second layer of nodes, so that
 you split down your supernodes and distribute the write contention?

 Would be interesting if putting a round robin on that second level of color
 nodes would be enough to spread lock contention?

 This is what peter talks about in his activity stream update

Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-01 Thread Mattias Persson
2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, Mattias.

 Here's a use case:

 I have a million nodes representing cars, and those nodes are all tagged 
 with some value, let's say a color name, as a property.  I have indexed those 
 nodes on the color property value.  Now I'd like to present a list of the 
 distinct color values with which nodes (cars) have been tagged.  At present, 
 I'd need to iterate through all million, read the property, and maintain a 
 distinct HashSet as I iterate through them.

 I've tried using relationships from the car node(s) to a set of color 
 node(s), but had scalability/performance issues when there are lots of car 
 nodes being added/deleted (the color node quickly becomes a hot 
 spot/synchronization choke point).

Allright, yeah such nodes can become bottlenecks, so I see your
problem for sure.

 Rick


 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Mattias Persson
 Sent: Tuesday, April 26, 2011 2:17 PM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 Hi Rick,

 No, not really. What the use case for having such a method?

 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, all.

 Is there a method or suggested approach for obtaining a list of all of the 
 distinct key values in a given index?  I don't care about the indexed nodes 
 or relationships themselves, just the value(s) of the key.

 Thanks,

 Rick

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-01 Thread Rick Bullotta
Hi, Mattias.

I floated a proposal a couple days ago for enhancements to the index framework 
to support this type of stuff.

Here's what I was thinking:

Any thoughts on those suggestions?

Best,

Rick


From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of 
Mattias Persson [matt...@neotechnology.com]
Sent: Sunday, May 01, 2011 5:41 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Lucene/Neo Indexing Question

2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, Mattias.

 Here's a use case:

 I have a million nodes representing cars, and those nodes are all tagged 
 with some value, let's say a color name, as a property.  I have indexed those 
 nodes on the color property value.  Now I'd like to present a list of the 
 distinct color values with which nodes (cars) have been tagged.  At present, 
 I'd need to iterate through all million, read the property, and maintain a 
 distinct HashSet as I iterate through them.

 I've tried using relationships from the car node(s) to a set of color 
 node(s), but had scalability/performance issues when there are lots of car 
 nodes being added/deleted (the color node quickly becomes a hot 
 spot/synchronization choke point).

Allright, yeah such nodes can become bottlenecks, so I see your
problem for sure.

 Rick


 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Mattias Persson
 Sent: Tuesday, April 26, 2011 2:17 PM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 Hi Rick,

 No, not really. What the use case for having such a method?

 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, all.

 Is there a method or suggested approach for obtaining a list of all of the 
 distinct key values in a given index?  I don't care about the indexed nodes 
 or relationships themselves, just the value(s) of the key.

 Thanks,

 Rick

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




--
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-01 Thread Rick Bullotta
G.  I hate webmail clients.  In any case, here are my thoughts:

-   A method to get the underlying terms for a given key
-   A method to get all keys for the index
-   A method to remove all entities from an index that contain a given 
key/value or term (I know this could be done by searching then removing each 
one iteratively, but I suspect there are substantial performance optimizations 
that could be achieved if it were an atomic method call, plus this makes it 
REST friendly)
-   Utility functions for performing intersections and unions on multiple 
IndexHits iterators/search results (again, do-able today, but could probably be 
optimized at a lower level in the framework)



From: Rick Bullotta
Sent: Sunday, May 01, 2011 9:25 AM
To: Neo4j user discussions
Subject: RE: [Neo4j] Lucene/Neo Indexing Question

Hi, Mattias.

I floated a proposal a couple days ago for enhancements to the index framework 
to support this type of stuff.

Here's what I was thinking:

Any thoughts on those suggestions?

Best,

Rick


From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of 
Mattias Persson [matt...@neotechnology.com]
Sent: Sunday, May 01, 2011 5:41 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Lucene/Neo Indexing Question

2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, Mattias.

 Here's a use case:

 I have a million nodes representing cars, and those nodes are all tagged 
 with some value, let's say a color name, as a property.  I have indexed those 
 nodes on the color property value.  Now I'd like to present a list of the 
 distinct color values with which nodes (cars) have been tagged.  At present, 
 I'd need to iterate through all million, read the property, and maintain a 
 distinct HashSet as I iterate through them.

 I've tried using relationships from the car node(s) to a set of color 
 node(s), but had scalability/performance issues when there are lots of car 
 nodes being added/deleted (the color node quickly becomes a hot 
 spot/synchronization choke point).

Allright, yeah such nodes can become bottlenecks, so I see your
problem for sure.

 Rick


 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Mattias Persson
 Sent: Tuesday, April 26, 2011 2:17 PM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 Hi Rick,

 No, not really. What the use case for having such a method?

 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, all.

 Is there a method or suggested approach for obtaining a list of all of the 
 distinct key values in a given index?  I don't care about the indexed nodes 
 or relationships themselves, just the value(s) of the key.

 Thanks,

 Rick

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




--
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-01 Thread Niels Hoogeveen

One option would be to create a unique value node for each distinct color and 
create a relationship from car to that value node. The value nodes can be 
grouped together with relationships to some reference node. 

This gives the opportunity of finding all distinct colors, and it allows you to 
find all cars with that particular color. 
 Date: Sun, 1 May 2011 14:41:40 +0200
 From: matt...@neotechnology.com
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
  Hi, Mattias.
 
  Here's a use case:
 
  I have a million nodes representing cars, and those nodes are all tagged 
  with some value, let's say a color name, as a property.  I have indexed 
  those nodes on the color property value.  Now I'd like to present a list of 
  the distinct color values with which nodes (cars) have been tagged.  At 
  present, I'd need to iterate through all million, read the property, and 
  maintain a distinct HashSet as I iterate through them.
 
  I've tried using relationships from the car node(s) to a set of color 
  node(s), but had scalability/performance issues when there are lots of car 
  nodes being added/deleted (the color node quickly becomes a hot 
  spot/synchronization choke point).
 
 Allright, yeah such nodes can become bottlenecks, so I see your
 problem for sure.
 
  Rick
 
 
  -Original Message-
  From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
  Behalf Of Mattias Persson
  Sent: Tuesday, April 26, 2011 2:17 PM
  To: Neo4j user discussions
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  Hi Rick,
 
  No, not really. What the use case for having such a method?
 
  2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
  Hi, all.
 
  Is there a method or suggested approach for obtaining a list of all of the 
  distinct key values in a given index?  I don't care about the indexed 
  nodes or relationships themselves, just the value(s) of the key.
 
  Thanks,
 
  Rick
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Hacker, Neo Technology
  www.neotechnology.com
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 -- 
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-04-26 Thread Mattias Persson
Hi Rick,

No, not really. What the use case for having such a method?

2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, all.

 Is there a method or suggested approach for obtaining a list of all of the 
 distinct key values in a given index?  I don't care about the indexed nodes 
 or relationships themselves, just the value(s) of the key.

 Thanks,

 Rick

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-04-26 Thread Rick Bullotta
Hi, Mattias.

Here's a use case:

I have a million nodes representing cars, and those nodes are all tagged with 
some value, let's say a color name, as a property.  I have indexed those nodes 
on the color property value.  Now I'd like to present a list of the distinct 
color values with which nodes (cars) have been tagged.  At present, I'd need to 
iterate through all million, read the property, and maintain a distinct 
HashSet as I iterate through them.  

I've tried using relationships from the car node(s) to a set of color 
node(s), but had scalability/performance issues when there are lots of car 
nodes being added/deleted (the color node quickly becomes a hot 
spot/synchronization choke point).

Rick


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Mattias Persson
Sent: Tuesday, April 26, 2011 2:17 PM
To: Neo4j user discussions
Subject: Re: [Neo4j] Lucene/Neo Indexing Question

Hi Rick,

No, not really. What the use case for having such a method?

2011/4/26 Rick Bullotta rick.bullo...@thingworx.com:
 Hi, all.

 Is there a method or suggested approach for obtaining a list of all of the 
 distinct key values in a given index?  I don't care about the indexed nodes 
 or relationships themselves, just the value(s) of the key.

 Thanks,

 Rick

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user