Re: [Neo4j] Lucene Index on Relationships
Hi, thanks a lot for the feedback. There are a lot of applications where indexed relationships will provide a speed benefit, but just to a limit. This advantage of sparse properties on a lot of edges ( I.e. 100 relationships with a property - as opposed to 1000 relationships without ) holds its benefits up to a "critical mass" where it makes no difference between indexing a iterating. I have some ideas on this and will try to write down my workarounds for this problem and post them. I also have an idea how to prevent the indexing of the whole graph. Normally one would create an index with ( Relationship, key, value ). Does it make sense to use the property name as key and start node ID as value, would this create smaller buckets? Best regards Marius 2010/6/21 Craig Taverner > A side comment, since I think indexing relationships with lucene might be > good, but think there might be alternatives for your current example. > > You said that the relationship property is a float from 0 to 1, so you > cannot use relationship types, but actually, when you consider that any > index is usually created by breaking data ranges (continuous or discrete) > into fewer, more discrete ranges, you can use a relationship type to > represent a range of floats. For example, if you have roughly even > distribution of floats between 0 and 1, try divide that into 100 parts > (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This > would certainly facilitate traversing relationships of specific float > values > (at least improve the performance dramatically, as in an index). > > Of course, this example focuses on traversing from a particular document. > If > you are searching for all relationships in the entire database with > particular float values, then a separate index would be better. > > On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz >wrote: > > > Hello guys, hello community! > > > > I'm currently evaluating neo4j for my thesis and have a wish :) > > I have already opened a ticket for this,( > > https://trac.neo4j.org/ticket/241 ) but > > I would like to hear what you guys think about it. > > > > Basically it just involves the ability to index Neo4j Relationships with > > Lucene Index. > > > > Neo4j works great on sparse graphs, but what happens when you have a very > > tight graph with several thousands of neighbors to one node? > > Additionally as soon as you store informations on Relationships you will > > get > > into trouble, because you will have to iterate through all those edges to > > find the > > properties you seek. > > > > If this sounds far fetched please take a look at this example where one > > might need properties on Relationships: > > One "Document" node is related to another Document node by a similarity > > function which is stored in the Relationships between those document > nodes. > > Lets just say that we save a float between [0 - 1] on those > relationships, > > which makes it impossible to create RelationshipTypes for every value. > > > > Using Index to fetch Relationships by their indexed properties would > > greatly > > speed up the process and increase the attractiveness of using properties > on > > Relationships. I would love to have quick access to Relationship > properties > > where I could add and implement fuzzy > > logic, probabilities, Bayesian networks, similarities, ranking ... and so > > on > > ... As said thank you for Relationship properties, they are great and > > already there, but what I miss is quick access to them. > > > > Thank you very much and best regards! > > > > Marius > > > > -- > > "Programs must be written for people to read, and only incidentally for > > machines to execute." > > > > - Abelson & Sussman, SICP, preface to the first edition > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- "Programs must be written for people to read, and only incidentally for machines to execute." - Abelson & Sussman, SICP, preface to the first edition ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene Index on Relationships
You got me again, Rick. I have not (yet) used my idea of ranged relationship types, and still use "buckets", or intermediate nodes (all over the place!). However, I am thinking of using a combination of the two approaches for my "composite" index. I have deviated from the classic binary tree because the total number of nodes created is unnecessarily high for an index (don't want the index to exceed the original data in size). Making fewer buckets, and compensating using relationship types, leads to a better balance (IMHO). On Mon, Jun 21, 2010 at 4:35 PM, Rick Bullotta < rick.bullo...@burningskysoftware.com> wrote: > I think the combination of relationship type + relevant property value(s) > is > a more appropriate context for an index, as opposed to for "all > relationships in the graph". > > FWIW, we achieve this today with Neo directly using the concept of "bucket" > nodes. Instead of having to create different relationship types for each > range of values, as Craig has suggested, we achieve a similar result by a > set of intermediate nodes that all have a relationship to a "bucket > collection" node, and individual nodes are attached via a common > relationship type to the appropriate "bucket" based on one or more values > in > the node. > > This gives us a fairly fast way to reduce the # of nodes quite quickly, > without the need for an external index. > > Just a thought. > > > -Original Message- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] > On > Behalf Of Mattias Persson > Sent: Monday, June 21, 2010 8:36 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] Lucene Index on Relationships > > Hi, > > how do you guys expect indexing for relationships to work? Would it be > an index just as for nodes... or per node? I often hear that it'd > speed up traversals if a node has many, many neighbours. But if the > relationship index would be for the entire graph (not per node) that > wouldn't really help, would it? > > 2010/6/21 Craig Taverner : > > A side comment, since I think indexing relationships with lucene might be > > good, but think there might be alternatives for your current example. > > > > You said that the relationship property is a float from 0 to 1, so you > > cannot use relationship types, but actually, when you consider that any > > index is usually created by breaking data ranges (continuous or discrete) > > into fewer, more discrete ranges, you can use a relationship type to > > represent a range of floats. For example, if you have roughly even > > distribution of floats between 0 and 1, try divide that into 100 parts > > (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This > > would certainly facilitate traversing relationships of specific float > values > > (at least improve the performance dramatically, as in an index). > > > > Of course, this example focuses on traversing from a particular document. > If > > you are searching for all relationships in the entire database with > > particular float values, then a separate index would be better. > > > > On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz > wrote: > > > >> Hello guys, hello community! > >> > >> I'm currently evaluating neo4j for my thesis and have a wish :) > >> I have already opened a ticket for this,( > >> https://trac.neo4j.org/ticket/241 ) but > >> I would like to hear what you guys think about it. > >> > >> Basically it just involves the ability to index Neo4j Relationships with > >> Lucene Index. > >> > >> Neo4j works great on sparse graphs, but what happens when you have a > very > >> tight graph with several thousands of neighbors to one node? > >> Additionally as soon as you store informations on Relationships you will > >> get > >> into trouble, because you will have to iterate through all those edges > to > >> find the > >> properties you seek. > >> > >> If this sounds far fetched please take a look at this example where one > >> might need properties on Relationships: > >> One "Document" node is related to another Document node by a similarity > >> function which is stored in the Relationships between those document > nodes. > >> Lets just say that we save a float between [0 - 1] on those > relationships, > >> which makes it impossible to create RelationshipTypes for every value. > >> > >> Using Index to fetch Relationships by their indexed properties would > >> greatly >
Re: [Neo4j] Lucene Index on Relationships
I think the combination of relationship type + relevant property value(s) is a more appropriate context for an index, as opposed to for "all relationships in the graph". FWIW, we achieve this today with Neo directly using the concept of "bucket" nodes. Instead of having to create different relationship types for each range of values, as Craig has suggested, we achieve a similar result by a set of intermediate nodes that all have a relationship to a "bucket collection" node, and individual nodes are attached via a common relationship type to the appropriate "bucket" based on one or more values in the node. This gives us a fairly fast way to reduce the # of nodes quite quickly, without the need for an external index. Just a thought. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Monday, June 21, 2010 8:36 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene Index on Relationships Hi, how do you guys expect indexing for relationships to work? Would it be an index just as for nodes... or per node? I often hear that it'd speed up traversals if a node has many, many neighbours. But if the relationship index would be for the entire graph (not per node) that wouldn't really help, would it? 2010/6/21 Craig Taverner : > A side comment, since I think indexing relationships with lucene might be > good, but think there might be alternatives for your current example. > > You said that the relationship property is a float from 0 to 1, so you > cannot use relationship types, but actually, when you consider that any > index is usually created by breaking data ranges (continuous or discrete) > into fewer, more discrete ranges, you can use a relationship type to > represent a range of floats. For example, if you have roughly even > distribution of floats between 0 and 1, try divide that into 100 parts > (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This > would certainly facilitate traversing relationships of specific float values > (at least improve the performance dramatically, as in an index). > > Of course, this example focuses on traversing from a particular document. If > you are searching for all relationships in the entire database with > particular float values, then a separate index would be better. > > On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote: > >> Hello guys, hello community! >> >> I'm currently evaluating neo4j for my thesis and have a wish :) >> I have already opened a ticket for this,( >> https://trac.neo4j.org/ticket/241 ) but >> I would like to hear what you guys think about it. >> >> Basically it just involves the ability to index Neo4j Relationships with >> Lucene Index. >> >> Neo4j works great on sparse graphs, but what happens when you have a very >> tight graph with several thousands of neighbors to one node? >> Additionally as soon as you store informations on Relationships you will >> get >> into trouble, because you will have to iterate through all those edges to >> find the >> properties you seek. >> >> If this sounds far fetched please take a look at this example where one >> might need properties on Relationships: >> One "Document" node is related to another Document node by a similarity >> function which is stored in the Relationships between those document nodes. >> Lets just say that we save a float between [0 - 1] on those relationships, >> which makes it impossible to create RelationshipTypes for every value. >> >> Using Index to fetch Relationships by their indexed properties would >> greatly >> speed up the process and increase the attractiveness of using properties on >> Relationships. I would love to have quick access to Relationship properties >> where I could add and implement fuzzy >> logic, probabilities, Bayesian networks, similarities, ranking ... and so >> on >> ... As said thank you for Relationship properties, they are great and >> already there, but what I miss is quick access to them. >> >> Thank you very much and best regards! >> >> Marius >> >> -- >> "Programs must be written for people to read, and only incidentally for >> machines to execute." >> >> - Abelson & Sussman, SICP, preface to the first edition >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene Index on Relationships
I am not sure was a per-node relationship index is. I concur that a relationship index doesn't help if each node has a relationship of the type we are interested in (like in a graph of employees, each employee would have a Manager relation). However, in a graph where there are lots of nodes and only a few of them have a relationship of the type we are interested in, it seems logical to me that the optimal way to start a query is with the index into the relationships. -Paul -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Monday, June 21, 2010 8:36 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene Index on Relationships Hi, how do you guys expect indexing for relationships to work? Would it be an index just as for nodes... or per node? I often hear that it'd speed up traversals if a node has many, many neighbours. But if the relationship index would be for the entire graph (not per node) that wouldn't really help, would it? 2010/6/21 Craig Taverner : > A side comment, since I think indexing relationships with lucene might be > good, but think there might be alternatives for your current example. > > You said that the relationship property is a float from 0 to 1, so you > cannot use relationship types, but actually, when you consider that any > index is usually created by breaking data ranges (continuous or discrete) > into fewer, more discrete ranges, you can use a relationship type to > represent a range of floats. For example, if you have roughly even > distribution of floats between 0 and 1, try divide that into 100 parts > (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This > would certainly facilitate traversing relationships of specific float values > (at least improve the performance dramatically, as in an index). > > Of course, this example focuses on traversing from a particular document. If > you are searching for all relationships in the entire database with > particular float values, then a separate index would be better. > > On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote: > >> Hello guys, hello community! >> >> I'm currently evaluating neo4j for my thesis and have a wish :) >> I have already opened a ticket for this,( >> https://trac.neo4j.org/ticket/241 ) but >> I would like to hear what you guys think about it. >> >> Basically it just involves the ability to index Neo4j Relationships with >> Lucene Index. >> >> Neo4j works great on sparse graphs, but what happens when you have a very >> tight graph with several thousands of neighbors to one node? >> Additionally as soon as you store informations on Relationships you will >> get >> into trouble, because you will have to iterate through all those edges to >> find the >> properties you seek. >> >> If this sounds far fetched please take a look at this example where one >> might need properties on Relationships: >> One "Document" node is related to another Document node by a similarity >> function which is stored in the Relationships between those document nodes. >> Lets just say that we save a float between [0 - 1] on those relationships, >> which makes it impossible to create RelationshipTypes for every value. >> >> Using Index to fetch Relationships by their indexed properties would >> greatly >> speed up the process and increase the attractiveness of using properties on >> Relationships. I would love to have quick access to Relationship properties >> where I could add and implement fuzzy >> logic, probabilities, Bayesian networks, similarities, ranking ... and so >> on >> ... As said thank you for Relationship properties, they are great and >> already there, but what I miss is quick access to them. >> >> Thank you very much and best regards! >> >> Marius >> >> -- >> "Programs must be written for people to read, and only incidentally for >> machines to execute." >> >> - Abelson & Sussman, SICP, preface to the first edition >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene Index on Relationships
Hi, how do you guys expect indexing for relationships to work? Would it be an index just as for nodes... or per node? I often hear that it'd speed up traversals if a node has many, many neighbours. But if the relationship index would be for the entire graph (not per node) that wouldn't really help, would it? 2010/6/21 Craig Taverner : > A side comment, since I think indexing relationships with lucene might be > good, but think there might be alternatives for your current example. > > You said that the relationship property is a float from 0 to 1, so you > cannot use relationship types, but actually, when you consider that any > index is usually created by breaking data ranges (continuous or discrete) > into fewer, more discrete ranges, you can use a relationship type to > represent a range of floats. For example, if you have roughly even > distribution of floats between 0 and 1, try divide that into 100 parts > (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This > would certainly facilitate traversing relationships of specific float values > (at least improve the performance dramatically, as in an index). > > Of course, this example focuses on traversing from a particular document. If > you are searching for all relationships in the entire database with > particular float values, then a separate index would be better. > > On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote: > >> Hello guys, hello community! >> >> I'm currently evaluating neo4j for my thesis and have a wish :) >> I have already opened a ticket for this,( >> https://trac.neo4j.org/ticket/241 ) but >> I would like to hear what you guys think about it. >> >> Basically it just involves the ability to index Neo4j Relationships with >> Lucene Index. >> >> Neo4j works great on sparse graphs, but what happens when you have a very >> tight graph with several thousands of neighbors to one node? >> Additionally as soon as you store informations on Relationships you will >> get >> into trouble, because you will have to iterate through all those edges to >> find the >> properties you seek. >> >> If this sounds far fetched please take a look at this example where one >> might need properties on Relationships: >> One "Document" node is related to another Document node by a similarity >> function which is stored in the Relationships between those document nodes. >> Lets just say that we save a float between [0 - 1] on those relationships, >> which makes it impossible to create RelationshipTypes for every value. >> >> Using Index to fetch Relationships by their indexed properties would >> greatly >> speed up the process and increase the attractiveness of using properties on >> Relationships. I would love to have quick access to Relationship properties >> where I could add and implement fuzzy >> logic, probabilities, Bayesian networks, similarities, ranking ... and so >> on >> ... As said thank you for Relationship properties, they are great and >> already there, but what I miss is quick access to them. >> >> Thank you very much and best regards! >> >> Marius >> >> -- >> "Programs must be written for people to read, and only incidentally for >> machines to execute." >> >> - Abelson & Sussman, SICP, preface to the first edition >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene Index on Relationships
A side comment, since I think indexing relationships with lucene might be good, but think there might be alternatives for your current example. You said that the relationship property is a float from 0 to 1, so you cannot use relationship types, but actually, when you consider that any index is usually created by breaking data ranges (continuous or discrete) into fewer, more discrete ranges, you can use a relationship type to represent a range of floats. For example, if you have roughly even distribution of floats between 0 and 1, try divide that into 100 parts (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This would certainly facilitate traversing relationships of specific float values (at least improve the performance dramatically, as in an index). Of course, this example focuses on traversing from a particular document. If you are searching for all relationships in the entire database with particular float values, then a separate index would be better. On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote: > Hello guys, hello community! > > I'm currently evaluating neo4j for my thesis and have a wish :) > I have already opened a ticket for this,( > https://trac.neo4j.org/ticket/241 ) but > I would like to hear what you guys think about it. > > Basically it just involves the ability to index Neo4j Relationships with > Lucene Index. > > Neo4j works great on sparse graphs, but what happens when you have a very > tight graph with several thousands of neighbors to one node? > Additionally as soon as you store informations on Relationships you will > get > into trouble, because you will have to iterate through all those edges to > find the > properties you seek. > > If this sounds far fetched please take a look at this example where one > might need properties on Relationships: > One "Document" node is related to another Document node by a similarity > function which is stored in the Relationships between those document nodes. > Lets just say that we save a float between [0 - 1] on those relationships, > which makes it impossible to create RelationshipTypes for every value. > > Using Index to fetch Relationships by their indexed properties would > greatly > speed up the process and increase the attractiveness of using properties on > Relationships. I would love to have quick access to Relationship properties > where I could add and implement fuzzy > logic, probabilities, Bayesian networks, similarities, ranking ... and so > on > ... As said thank you for Relationship properties, they are great and > already there, but what I miss is quick access to them. > > Thank you very much and best regards! > > Marius > > -- > "Programs must be written for people to read, and only incidentally for > machines to execute." > > - Abelson & Sussman, SICP, preface to the first edition > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user