Re: [Neo4j] Lucene Index on Relationships

2010-06-21 Thread Marius Kubatz
Hi,

thanks a lot for the feedback.

There are a lot of applications where indexed relationships will provide a
speed benefit, but just to a limit. This advantage of sparse properties on a
lot of edges ( I.e. 100 relationships with a property - as opposed to 1000
relationships without ) holds its benefits up to a "critical mass" where it
makes no difference between indexing a iterating. I have some ideas on this
and will try to write down my workarounds for this problem and post them.

I also have an idea how to prevent the indexing of the whole graph. Normally
one would create an index with ( Relationship,  key, value ).
Does it make sense to use the property name as key and start node ID as
value, would this create smaller buckets?

Best regards

Marius

2010/6/21 Craig Taverner 

> A side comment, since I think indexing relationships with lucene might be
> good, but think there might be alternatives for your current example.
>
> You said that the relationship property is a float from 0 to 1, so you
> cannot use relationship types, but actually, when you consider that any
> index is usually created by breaking data ranges (continuous or discrete)
> into fewer, more discrete ranges, you can use a relationship type to
> represent a range of floats. For example, if you have roughly even
> distribution of floats between 0 and 1, try divide that into 100 parts
> (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
> would certainly facilitate traversing relationships of specific float
> values
> (at least improve the performance dramatically, as in an index).
>
> Of course, this example focuses on traversing from a particular document.
> If
> you are searching for all relationships in the entire database with
> particular float values, then a separate index would be better.
>
> On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz  >wrote:
>
> > Hello guys, hello community!
> >
> > I'm currently evaluating neo4j for my thesis and have a wish :)
> > I have already opened a ticket for this,(
> > https://trac.neo4j.org/ticket/241 ) but
> > I would like to hear what you guys think about it.
> >
> > Basically it just involves the ability to index Neo4j Relationships with
> > Lucene Index.
> >
> > Neo4j works great on sparse graphs, but what happens when you have a very
> > tight graph with several thousands of neighbors to one node?
> > Additionally as soon as you store informations on Relationships you will
> > get
> > into trouble, because you will have to iterate through all those edges to
> > find the
> > properties you seek.
> >
> > If this sounds far fetched please take a look at this example where one
> > might need properties on Relationships:
> > One "Document" node is related to another Document node by a similarity
> > function which is stored in the Relationships between those document
> nodes.
> > Lets just say that we save a float between [0 - 1] on those
> relationships,
> > which makes it impossible to create RelationshipTypes for every value.
> >
> > Using Index to fetch Relationships by their indexed properties would
> > greatly
> > speed up the process and increase the attractiveness of using properties
> on
> > Relationships. I would love to have quick access to Relationship
> properties
> > where I could add and implement fuzzy
> > logic, probabilities, Bayesian networks, similarities, ranking ... and so
> > on
> > ... As said thank you for Relationship properties, they are great and
> > already there, but what I miss is quick access to them.
> >
> > Thank you very much and best regards!
> >
> > Marius
> >
> > --
> > "Programs must be written for people to read, and only incidentally for
> > machines to execute."
> >
> > - Abelson & Sussman, SICP, preface to the first edition
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
"Programs must be written for people to read, and only incidentally for
machines to execute."

- Abelson & Sussman, SICP, preface to the first edition
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene Index on Relationships

2010-06-21 Thread Craig Taverner
You got me again, Rick.

I have not (yet) used my idea of ranged relationship types, and still use
"buckets", or intermediate nodes (all over the place!). However, I am
thinking of using a combination of the two approaches for my "composite"
index. I have deviated from the classic binary tree because the total number
of nodes created is unnecessarily high for an index (don't want the index to
exceed the original data in size). Making fewer buckets, and compensating
using relationship types, leads to a better balance (IMHO).

On Mon, Jun 21, 2010 at 4:35 PM, Rick Bullotta <
rick.bullo...@burningskysoftware.com> wrote:

> I think the combination of relationship type + relevant property value(s)
> is
> a more appropriate context for an index, as opposed to for "all
> relationships in the graph".
>
> FWIW, we achieve this today with Neo directly using the concept of "bucket"
> nodes.  Instead of having to create different relationship types for each
> range of values, as Craig has suggested, we achieve a similar result by a
> set of intermediate nodes that all have a relationship to a "bucket
> collection" node, and individual nodes are attached via a common
> relationship type to the appropriate "bucket" based on one or more values
> in
> the node.
>
> This gives us a fairly fast way to reduce the # of nodes quite quickly,
> without the need for an external index.
>
> Just a thought.
>
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On
> Behalf Of Mattias Persson
> Sent: Monday, June 21, 2010 8:36 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] Lucene Index on Relationships
>
> Hi,
>
> how do you guys expect indexing for relationships to work? Would it be
> an index just as for nodes... or per node? I often hear that it'd
> speed up traversals if a node has many, many neighbours. But if the
> relationship index would be for the entire graph (not per node) that
> wouldn't really help, would it?
>
> 2010/6/21 Craig Taverner :
> > A side comment, since I think indexing relationships with lucene might be
> > good, but think there might be alternatives for your current example.
> >
> > You said that the relationship property is a float from 0 to 1, so you
> > cannot use relationship types, but actually, when you consider that any
> > index is usually created by breaking data ranges (continuous or discrete)
> > into fewer, more discrete ranges, you can use a relationship type to
> > represent a range of floats. For example, if you have roughly even
> > distribution of floats between 0 and 1, try divide that into 100 parts
> > (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
> > would certainly facilitate traversing relationships of specific float
> values
> > (at least improve the performance dramatically, as in an index).
> >
> > Of course, this example focuses on traversing from a particular document.
> If
> > you are searching for all relationships in the entire database with
> > particular float values, then a separate index would be better.
> >
> > On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz
> wrote:
> >
> >> Hello guys, hello community!
> >>
> >> I'm currently evaluating neo4j for my thesis and have a wish :)
> >> I have already opened a ticket for this,(
> >> https://trac.neo4j.org/ticket/241 ) but
> >> I would like to hear what you guys think about it.
> >>
> >> Basically it just involves the ability to index Neo4j Relationships with
> >> Lucene Index.
> >>
> >> Neo4j works great on sparse graphs, but what happens when you have a
> very
> >> tight graph with several thousands of neighbors to one node?
> >> Additionally as soon as you store informations on Relationships you will
> >> get
> >> into trouble, because you will have to iterate through all those edges
> to
> >> find the
> >> properties you seek.
> >>
> >> If this sounds far fetched please take a look at this example where one
> >> might need properties on Relationships:
> >> One "Document" node is related to another Document node by a similarity
> >> function which is stored in the Relationships between those document
> nodes.
> >> Lets just say that we save a float between [0 - 1] on those
> relationships,
> >> which makes it impossible to create RelationshipTypes for every value.
> >>
> >> Using Index to fetch Relationships by their indexed properties would
> >> greatly
>

Re: [Neo4j] Lucene Index on Relationships

2010-06-21 Thread Rick Bullotta
I think the combination of relationship type + relevant property value(s) is
a more appropriate context for an index, as opposed to for "all
relationships in the graph".

FWIW, we achieve this today with Neo directly using the concept of "bucket"
nodes.  Instead of having to create different relationship types for each
range of values, as Craig has suggested, we achieve a similar result by a
set of intermediate nodes that all have a relationship to a "bucket
collection" node, and individual nodes are attached via a common
relationship type to the appropriate "bucket" based on one or more values in
the node.

This gives us a fairly fast way to reduce the # of nodes quite quickly,
without the need for an external index.

Just a thought.


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Mattias Persson
Sent: Monday, June 21, 2010 8:36 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Lucene Index on Relationships

Hi,

how do you guys expect indexing for relationships to work? Would it be
an index just as for nodes... or per node? I often hear that it'd
speed up traversals if a node has many, many neighbours. But if the
relationship index would be for the entire graph (not per node) that
wouldn't really help, would it?

2010/6/21 Craig Taverner :
> A side comment, since I think indexing relationships with lucene might be
> good, but think there might be alternatives for your current example.
>
> You said that the relationship property is a float from 0 to 1, so you
> cannot use relationship types, but actually, when you consider that any
> index is usually created by breaking data ranges (continuous or discrete)
> into fewer, more discrete ranges, you can use a relationship type to
> represent a range of floats. For example, if you have roughly even
> distribution of floats between 0 and 1, try divide that into 100 parts
> (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
> would certainly facilitate traversing relationships of specific float
values
> (at least improve the performance dramatically, as in an index).
>
> Of course, this example focuses on traversing from a particular document.
If
> you are searching for all relationships in the entire database with
> particular float values, then a separate index would be better.
>
> On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz
wrote:
>
>> Hello guys, hello community!
>>
>> I'm currently evaluating neo4j for my thesis and have a wish :)
>> I have already opened a ticket for this,(
>> https://trac.neo4j.org/ticket/241 ) but
>> I would like to hear what you guys think about it.
>>
>> Basically it just involves the ability to index Neo4j Relationships with
>> Lucene Index.
>>
>> Neo4j works great on sparse graphs, but what happens when you have a very
>> tight graph with several thousands of neighbors to one node?
>> Additionally as soon as you store informations on Relationships you will
>> get
>> into trouble, because you will have to iterate through all those edges to
>> find the
>> properties you seek.
>>
>> If this sounds far fetched please take a look at this example where one
>> might need properties on Relationships:
>> One "Document" node is related to another Document node by a similarity
>> function which is stored in the Relationships between those document
nodes.
>> Lets just say that we save a float between [0 - 1] on those
relationships,
>> which makes it impossible to create RelationshipTypes for every value.
>>
>> Using Index to fetch Relationships by their indexed properties would
>> greatly
>> speed up the process and increase the attractiveness of using properties
on
>> Relationships. I would love to have quick access to Relationship
properties
>> where I could add and implement fuzzy
>> logic, probabilities, Bayesian networks, similarities, ranking ... and so
>> on
>> ... As said thank you for Relationship properties, they are great and
>> already there, but what I miss is quick access to them.
>>
>> Thank you very much and best regards!
>>
>> Marius
>>
>> --
>> "Programs must be written for people to read, and only incidentally for
>> machines to execute."
>>
>> - Abelson & Sussman, SICP, preface to the first edition
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene Index on Relationships

2010-06-21 Thread Paul A. Jackson
I am not sure was a per-node relationship index is.  I concur that a 
relationship index doesn't help if each node has a relationship of the type we 
are interested in (like in a graph of employees, each employee would have a 
Manager relation).  However, in a graph where there are lots of nodes and only 
a few of them have a relationship of the type we are interested in, it seems 
logical to me that the optimal way to start a query is with the index into the 
relationships.

-Paul

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Mattias Persson
Sent: Monday, June 21, 2010 8:36 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Lucene Index on Relationships

Hi,

how do you guys expect indexing for relationships to work? Would it be
an index just as for nodes... or per node? I often hear that it'd
speed up traversals if a node has many, many neighbours. But if the
relationship index would be for the entire graph (not per node) that
wouldn't really help, would it?

2010/6/21 Craig Taverner :
> A side comment, since I think indexing relationships with lucene might be
> good, but think there might be alternatives for your current example.
>
> You said that the relationship property is a float from 0 to 1, so you
> cannot use relationship types, but actually, when you consider that any
> index is usually created by breaking data ranges (continuous or discrete)
> into fewer, more discrete ranges, you can use a relationship type to
> represent a range of floats. For example, if you have roughly even
> distribution of floats between 0 and 1, try divide that into 100 parts
> (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
> would certainly facilitate traversing relationships of specific float values
> (at least improve the performance dramatically, as in an index).
>
> Of course, this example focuses on traversing from a particular document. If
> you are searching for all relationships in the entire database with
> particular float values, then a separate index would be better.
>
> On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote:
>
>> Hello guys, hello community!
>>
>> I'm currently evaluating neo4j for my thesis and have a wish :)
>> I have already opened a ticket for this,(
>> https://trac.neo4j.org/ticket/241 ) but
>> I would like to hear what you guys think about it.
>>
>> Basically it just involves the ability to index Neo4j Relationships with
>> Lucene Index.
>>
>> Neo4j works great on sparse graphs, but what happens when you have a very
>> tight graph with several thousands of neighbors to one node?
>> Additionally as soon as you store informations on Relationships you will
>> get
>> into trouble, because you will have to iterate through all those edges to
>> find the
>> properties you seek.
>>
>> If this sounds far fetched please take a look at this example where one
>> might need properties on Relationships:
>> One "Document" node is related to another Document node by a similarity
>> function which is stored in the Relationships between those document nodes.
>> Lets just say that we save a float between [0 - 1] on those relationships,
>> which makes it impossible to create RelationshipTypes for every value.
>>
>> Using Index to fetch Relationships by their indexed properties would
>> greatly
>> speed up the process and increase the attractiveness of using properties on
>> Relationships. I would love to have quick access to Relationship properties
>> where I could add and implement fuzzy
>> logic, probabilities, Bayesian networks, similarities, ranking ... and so
>> on
>> ... As said thank you for Relationship properties, they are great and
>> already there, but what I miss is quick access to them.
>>
>> Thank you very much and best regards!
>>
>> Marius
>>
>> --
>> "Programs must be written for people to read, and only incidentally for
>> machines to execute."
>>
>> - Abelson & Sussman, SICP, preface to the first edition
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene Index on Relationships

2010-06-21 Thread Mattias Persson
Hi,

how do you guys expect indexing for relationships to work? Would it be
an index just as for nodes... or per node? I often hear that it'd
speed up traversals if a node has many, many neighbours. But if the
relationship index would be for the entire graph (not per node) that
wouldn't really help, would it?

2010/6/21 Craig Taverner :
> A side comment, since I think indexing relationships with lucene might be
> good, but think there might be alternatives for your current example.
>
> You said that the relationship property is a float from 0 to 1, so you
> cannot use relationship types, but actually, when you consider that any
> index is usually created by breaking data ranges (continuous or discrete)
> into fewer, more discrete ranges, you can use a relationship type to
> represent a range of floats. For example, if you have roughly even
> distribution of floats between 0 and 1, try divide that into 100 parts
> (0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
> would certainly facilitate traversing relationships of specific float values
> (at least improve the performance dramatically, as in an index).
>
> Of course, this example focuses on traversing from a particular document. If
> you are searching for all relationships in the entire database with
> particular float values, then a separate index would be better.
>
> On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote:
>
>> Hello guys, hello community!
>>
>> I'm currently evaluating neo4j for my thesis and have a wish :)
>> I have already opened a ticket for this,(
>> https://trac.neo4j.org/ticket/241 ) but
>> I would like to hear what you guys think about it.
>>
>> Basically it just involves the ability to index Neo4j Relationships with
>> Lucene Index.
>>
>> Neo4j works great on sparse graphs, but what happens when you have a very
>> tight graph with several thousands of neighbors to one node?
>> Additionally as soon as you store informations on Relationships you will
>> get
>> into trouble, because you will have to iterate through all those edges to
>> find the
>> properties you seek.
>>
>> If this sounds far fetched please take a look at this example where one
>> might need properties on Relationships:
>> One "Document" node is related to another Document node by a similarity
>> function which is stored in the Relationships between those document nodes.
>> Lets just say that we save a float between [0 - 1] on those relationships,
>> which makes it impossible to create RelationshipTypes for every value.
>>
>> Using Index to fetch Relationships by their indexed properties would
>> greatly
>> speed up the process and increase the attractiveness of using properties on
>> Relationships. I would love to have quick access to Relationship properties
>> where I could add and implement fuzzy
>> logic, probabilities, Bayesian networks, similarities, ranking ... and so
>> on
>> ... As said thank you for Relationship properties, they are great and
>> already there, but what I miss is quick access to them.
>>
>> Thank you very much and best regards!
>>
>> Marius
>>
>> --
>> "Programs must be written for people to read, and only incidentally for
>> machines to execute."
>>
>> - Abelson & Sussman, SICP, preface to the first edition
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene Index on Relationships

2010-06-21 Thread Craig Taverner
A side comment, since I think indexing relationships with lucene might be
good, but think there might be alternatives for your current example.

You said that the relationship property is a float from 0 to 1, so you
cannot use relationship types, but actually, when you consider that any
index is usually created by breaking data ranges (continuous or discrete)
into fewer, more discrete ranges, you can use a relationship type to
represent a range of floats. For example, if you have roughly even
distribution of floats between 0 and 1, try divide that into 100 parts
(0%-100%, or 0.01 to 1.00), and make a relationship type for each. This
would certainly facilitate traversing relationships of specific float values
(at least improve the performance dramatically, as in an index).

Of course, this example focuses on traversing from a particular document. If
you are searching for all relationships in the entire database with
particular float values, then a separate index would be better.

On Mon, Jun 21, 2010 at 2:11 PM, Marius Kubatz wrote:

> Hello guys, hello community!
>
> I'm currently evaluating neo4j for my thesis and have a wish :)
> I have already opened a ticket for this,(
> https://trac.neo4j.org/ticket/241 ) but
> I would like to hear what you guys think about it.
>
> Basically it just involves the ability to index Neo4j Relationships with
> Lucene Index.
>
> Neo4j works great on sparse graphs, but what happens when you have a very
> tight graph with several thousands of neighbors to one node?
> Additionally as soon as you store informations on Relationships you will
> get
> into trouble, because you will have to iterate through all those edges to
> find the
> properties you seek.
>
> If this sounds far fetched please take a look at this example where one
> might need properties on Relationships:
> One "Document" node is related to another Document node by a similarity
> function which is stored in the Relationships between those document nodes.
> Lets just say that we save a float between [0 - 1] on those relationships,
> which makes it impossible to create RelationshipTypes for every value.
>
> Using Index to fetch Relationships by their indexed properties would
> greatly
> speed up the process and increase the attractiveness of using properties on
> Relationships. I would love to have quick access to Relationship properties
> where I could add and implement fuzzy
> logic, probabilities, Bayesian networks, similarities, ranking ... and so
> on
> ... As said thank you for Relationship properties, they are great and
> already there, but what I miss is quick access to them.
>
> Thank you very much and best regards!
>
> Marius
>
> --
> "Programs must be written for people to read, and only incidentally for
> machines to execute."
>
> - Abelson & Sussman, SICP, preface to the first edition
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user