Re: [Neo4j] Lucene/Neo Indexing Question
Hi, Niels. That's what we're doing now, but it has performance issues with large #'s of relationships when cars are constantly being added, since the color nodes become synchronization bottlenecks for updates. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Sunday, May 01, 2011 9:41 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question One option would be to create a unique value node for each distinct color and create a relationship from car to that value node. The value nodes can be grouped together with relationships to some reference node. This gives the opportunity of finding all distinct colors, and it allows you to find all cars with that particular color. Date: Sun, 1 May 2011 14:41:40 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Perhaps then it is sensible to introduce a second layer of nodes, so that you split down your supernodes and distribute the write contention? Would be interesting if putting a round robin on that second level of color nodes would be enough to spread lock contention? This is what peter talks about in his activity stream update scenario. And in general perhaps a step to a more performant in-graph index. When thinking about in-graph indexes I thought it might perhaps be interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes then having from the index-root node relationships with the (re-distributed) hashcode (x-1) relationship-types to the bucket nodes and below the bucket node rels with the concrete value as an relationship attribute to the concrete nodes. I think this will be addressed even better with Craig's indexes or the Collection abstractions that Andreas Kollegger is working on. Cheers Michael Am 02.05.2011 um 12:16 schrieb Rick Bullotta: Hi, Niels. That's what we're doing now, but it has performance issues with large #'s of relationships when cars are constantly being added, since the color nodes become synchronization bottlenecks for updates. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Sunday, May 01, 2011 9:41 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question One option would be to create a unique value node for each distinct color and create a relationship from car to that value node. The value nodes can be grouped together with relationships to some reference node. This gives the opportunity of finding all distinct colors, and it allows you to find all cars with that particular color. Date: Sun, 1 May 2011 14:41:40 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Have you thought about using the in-graph Timeline index for this? Make each color node the root of a Timeline and add the car nodes as entries to that index. This may reduce your synchronization problems and is something you can probably test without having to make too much of an investment. From: rick.bullo...@thingworx.com To: user@lists.neo4j.org Date: Mon, 2 May 2011 04:09:59 -0700 Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi, Michael. The nature of the domain model really doesn't lend itself to any logical partioning of supernodes, so it would indeed have to be something very arbitary/random. For now, I think we will have to either deal with the performance issues or switch to using Lucene for the indexing, but we can't do that yet until we have the ability to query the list of terms for a given key (which is a necessary function in our domain model). We could perhaps keep a list of terms as nodes *and* index them, but that seems redundant. Ultimately, I think the solution is to hide the complexity via the indexing framework and to offer a variety of in-graph indexing models that address specific types of domain requirements. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger [michael.hun...@neotechnology.com] Sent: Monday, May 02, 2011 3:49 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Perhaps then it is sensible to introduce a second layer of nodes, so that you split down your supernodes and distribute the write contention? Would be interesting if putting a round robin on that second level of color nodes would be enough to spread lock contention? This is what peter talks about in his activity stream update scenario. And in general perhaps a step to a more performant in-graph index. When thinking about in-graph indexes I thought it might perhaps be interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes then having from the index-root node relationships with the (re-distributed) hashcode (x-1) relationship-types to the bucket nodes and below the bucket node rels with the concrete value as an relationship attribute to the concrete nodes. I think this will be addressed even better with Craig's indexes or the Collection abstractions that Andreas Kollegger is working on. Cheers Michael Am 02.05.2011 um 12:16 schrieb Rick Bullotta: Hi, Niels. That's what we're doing now, but it has performance issues with large #'s of relationships when cars are constantly being added, since the color nodes become synchronization bottlenecks for updates. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Sunday, May 01, 2011 9:41 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question One option would be to create a unique value node for each distinct color and create a relationship from car to that value node. The value nodes can be grouped together with relationships to some reference node. This gives the opportunity of finding all distinct colors, and it allows you to find all cars with that particular color. Date: Sun, 1 May 2011 14:41:40 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed
Re: [Neo4j] Lucene/Neo Indexing Question
Thinking back you your original domain description, cars with colors, surely you have more properties than just colors to index? If you have two or more properties, then you use combinations of properties for the first level of the index tree, which provides your logical partitioning of supernodes in a domain specific way. For example, considering having the four properties color, manufacturer, model, year. The first level of index nodes would be the set of unique combinations of all possible properties (all existing combinations, actually). This set is much larger than the set of colors. So red will occur many times. As a result you dramatically reduce node contention, and the number of relationships per node is much less. Then if you want to perform the query for all red cars, actually your traverser needs to be only slightly more complex, basically 'find all cars with color red and any value of the other properties'. This is the design of the 'amanzi-index' I started on github in December (but did not complete). It was focusing on doing queries on multiple properties at the same time, but does effectively cover your case of reducing node contention, if you can add more properties to the index. It also has the concept of a mapper from the domain specific property to the index key, which was designed to reduce the number of index nodes, but in your case you could also use it to increase the number of index nodes, using some of the ideas by Jim and Michael. Jim suggested that instead or 'red' always mapping to the same node, it could map to a set of different nodes (randomly selected, or round robin). Michael discussed a distributed hash-code, which I do not fully understand, but it does sound relevant :-) So, in short, using the design of the amanzi-index you could help this problem in two ways: - index together with other properties to get a domain-specific partitioning of the 'supernodes' - Add a mapper between the color and the index key to get partitioning of the supernodes On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: Hi, Michael. The nature of the domain model really doesn't lend itself to any logical partioning of supernodes, so it would indeed have to be something very arbitary/random. For now, I think we will have to either deal with the performance issues or switch to using Lucene for the indexing, but we can't do that yet until we have the ability to query the list of terms for a given key (which is a necessary function in our domain model). We could perhaps keep a list of terms as nodes *and* index them, but that seems redundant. Ultimately, I think the solution is to hide the complexity via the indexing framework and to offer a variety of in-graph indexing models that address specific types of domain requirements. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger [michael.hun...@neotechnology.com] Sent: Monday, May 02, 2011 3:49 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Perhaps then it is sensible to introduce a second layer of nodes, so that you split down your supernodes and distribute the write contention? Would be interesting if putting a round robin on that second level of color nodes would be enough to spread lock contention? This is what peter talks about in his activity stream update scenario. And in general perhaps a step to a more performant in-graph index. When thinking about in-graph indexes I thought it might perhaps be interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes then having from the index-root node relationships with the (re-distributed) hashcode (x-1) relationship-types to the bucket nodes and below the bucket node rels with the concrete value as an relationship attribute to the concrete nodes. I think this will be addressed even better with Craig's indexes or the Collection abstractions that Andreas Kollegger is working on. Cheers Michael Am 02.05.2011 um 12:16 schrieb Rick Bullotta: Hi, Niels. That's what we're doing now, but it has performance issues with large #'s of relationships when cars are constantly being added, since the color nodes become synchronization bottlenecks for updates. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Sunday, May 01, 2011 9:41 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question One option would be to create a unique value node for each distinct color and create a relationship from car to that value node. The value nodes can be grouped together with relationships to some reference node. This gives the opportunity of finding all distinct colors, and it allows you to find all cars
Re: [Neo4j] Lucene/Neo Indexing Question
Ah, if only it were so... The number of indexable properties (tags) is completely variable on a per car basis (e.g. I can add a driverMood tag for just a subset of cars) - meaning that the domain objects themselves can have a variable number of tags and can indeed even be tagged with two values from the same vocabulary (e.g. a car can have two-color paint, red and blue). The round-robin idea has some merit, but of course, identifying/determining the sub-tree width (# of index randomly assigned index subnodes) is somewhat subjective in terms of determining what would help address the concurrency issues at the possible expense of traversal performance. Also, the hotspot or supernode issue exists a number of other places in our application wherever we are constantly adding (or removing) content related to an entity in the system. It seems that a lot of the current users of Neo are doing bulk loads and using it for analysis as opposed to using it like an OLTP data store (like we are), so I'm guessing the hotspot issue is unique to our domain. I'm still leaning towards Lucene, but will experiment with a few approaches to see what works best in different scenarios, and will try implementing something along the lines of what you describe. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Craig Taverner Sent: Monday, May 02, 2011 11:29 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Thinking back you your original domain description, cars with colors, surely you have more properties than just colors to index? If you have two or more properties, then you use combinations of properties for the first level of the index tree, which provides your logical partitioning of supernodes in a domain specific way. For example, considering having the four properties color, manufacturer, model, year. The first level of index nodes would be the set of unique combinations of all possible properties (all existing combinations, actually). This set is much larger than the set of colors. So red will occur many times. As a result you dramatically reduce node contention, and the number of relationships per node is much less. Then if you want to perform the query for all red cars, actually your traverser needs to be only slightly more complex, basically 'find all cars with color red and any value of the other properties'. This is the design of the 'amanzi-index' I started on github in December (but did not complete). It was focusing on doing queries on multiple properties at the same time, but does effectively cover your case of reducing node contention, if you can add more properties to the index. It also has the concept of a mapper from the domain specific property to the index key, which was designed to reduce the number of index nodes, but in your case you could also use it to increase the number of index nodes, using some of the ideas by Jim and Michael. Jim suggested that instead or 'red' always mapping to the same node, it could map to a set of different nodes (randomly selected, or round robin). Michael discussed a distributed hash-code, which I do not fully understand, but it does sound relevant :-) So, in short, using the design of the amanzi-index you could help this problem in two ways: - index together with other properties to get a domain-specific partitioning of the 'supernodes' - Add a mapper between the color and the index key to get partitioning of the supernodes On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: Hi, Michael. The nature of the domain model really doesn't lend itself to any logical partioning of supernodes, so it would indeed have to be something very arbitary/random. For now, I think we will have to either deal with the performance issues or switch to using Lucene for the indexing, but we can't do that yet until we have the ability to query the list of terms for a given key (which is a necessary function in our domain model). We could perhaps keep a list of terms as nodes *and* index them, but that seems redundant. Ultimately, I think the solution is to hide the complexity via the indexing framework and to offer a variety of in-graph indexing models that address specific types of domain requirements. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger [michael.hun...@neotechnology.com] Sent: Monday, May 02, 2011 3:49 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Perhaps then it is sensible to introduce a second layer of nodes, so that you split down your supernodes and distribute the write contention? Would be interesting if putting a round robin on that second level of color nodes would be enough to spread lock contention? This is what peter talks about in his activity stream update
Re: [Neo4j] Lucene/Neo Indexing Question
2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Hi, Mattias. I floated a proposal a couple days ago for enhancements to the index framework to support this type of stuff. Here's what I was thinking: Any thoughts on those suggestions? Best, Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson [matt...@neotechnology.com] Sent: Sunday, May 01, 2011 5:41 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
G. I hate webmail clients. In any case, here are my thoughts: - A method to get the underlying terms for a given key - A method to get all keys for the index - A method to remove all entities from an index that contain a given key/value or term (I know this could be done by searching then removing each one iteratively, but I suspect there are substantial performance optimizations that could be achieved if it were an atomic method call, plus this makes it REST friendly) - Utility functions for performing intersections and unions on multiple IndexHits iterators/search results (again, do-able today, but could probably be optimized at a lower level in the framework) From: Rick Bullotta Sent: Sunday, May 01, 2011 9:25 AM To: Neo4j user discussions Subject: RE: [Neo4j] Lucene/Neo Indexing Question Hi, Mattias. I floated a proposal a couple days ago for enhancements to the index framework to support this type of stuff. Here's what I was thinking: Any thoughts on those suggestions? Best, Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson [matt...@neotechnology.com] Sent: Sunday, May 01, 2011 5:41 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
One option would be to create a unique value node for each distinct color and create a relationship from car to that value node. The value nodes can be grouped together with relationships to some reference node. This gives the opportunity of finding all distinct colors, and it allows you to find all cars with that particular color. Date: Sun, 1 May 2011 14:41:40 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Allright, yeah such nodes can become bottlenecks, so I see your problem for sure. Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user