Re: [Neo] Indexing Relationships?

2010-05-15 Thread Tobias Ivarsson
There is no indexing component for Relationships and there has never been
one.
The interesting question that you should have asked is: _will_ there ever be
one.

The answer to that question is: maybe, it has been prototyped as part of a
simplification of the entire indexing API.

The interesting thing to me would be to get a concrete use case for this.
I've heard requests for being able to index relationships a number of times,
but never a concrete use case for being able to do so. It's always been
vague hand waving like in this case "we have data that is heavily centered
on the relationships rather than nodes", WHAT is that data? WHY does it need
to be centered around the relationships? If you say that you have use cases
like these I believe that you do, I have no reason to believe that you are
lying, why would you. But I want to understand those use cases, and I want
to understand them in a setting where having support for indexing
relationships adds value to the business.

I would like it if we were able to index Relationships as part of the core
API by version 1.2, and having an actual use case for when it would improve
the implementation of an actual domain would certainly help speed up the
process, perhaps we could even sneak it into version 1.1.

Cheers,
Tobias

On Fri, May 14, 2010 at 5:05 PM, Alex D'Amour wrote:

> Hi all,
>
> I am working on an application that stores large network data from multiple
> domains in Neo4j databases. The object is to allow users to upload network
> datasets and then expose them to researchers over the web, allowing
> researchers to subset the data and eventually download their own subgraph
> of
> the original dataset.
>
> Many of the operations that we intend to support are covered by the Lucene
> and Traversal frameworks. However, we'd also like to perform relationship
> lookups in the same way that we perform node lookups since many networks
> have data that are heavily centered on the Relationships rather than nodes.
> Is there or has there ever been an indexing component for Relationships in
> Neo4j? If not, how difficult would it be to port the LuceneIndexService to
> index relationships as well as nodes (i.e. how much of the code is specific
> to Nodes rather than PropertyContainers)?
>
> I realize that this probably isn't the ideal way to interact with the graph
> and that better domain modeling would probably solve this if the framework
> didn't have to be generic. But in this case we'd like to support this type
> of interaction with simple graph structures with only one type of node and
> only one type of relationship since they are the structures that social
> network researchers are the most interested in.
>
> Thanks,
> Alex
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Tobias Ivarsson 
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Indexing Relationships?

2010-05-15 Thread Patrick Durusau
Tobias,

On 5/15/2010 7:32 AM, Tobias Ivarsson wrote:
> There is no indexing component for Relationships and there has never been
> one.
> The interesting question that you should have asked is: _will_ there ever be
> one.
>
> The answer to that question is: maybe, it has been prototyped as part of a
> simplification of the entire indexing API.
>
> The interesting thing to me would be to get a concrete use case for this.
> I've heard requests for being able to index relationships a number of times,
> but never a concrete use case for being able to do so. It's always been
> vague hand waving like in this case "we have data that is heavily centered
> on the relationships rather than nodes", WHAT is that data? WHY does it need
> to be centered around the relationships? If you say that you have use cases
> like these I believe that you do, I have no reason to believe that you are
> lying, why would you. But I want to understand those use cases, and I want
> to understand them in a setting where having support for indexing
> relationships adds value to the business.
>
>
I have never tried to formulate a specific use case for indexing 
relationships but your question prompted me to do some searching on the 
issue.

Devanand Rajoo Radindran - KeyConcept: Exploiting Hierarchical 
Relationships for Conceptually Indexed Data (thesis, 
http://www.ittc.ku.edu/research/thesis/documents/devanand__ravindran_thesis.pdf)
 
Exploits the hierarchical relationships for pruning and retrieval.

Xiao Renguo, et. al. - An Indexing Structure for Aggregation 
Relationships in OODB 
(http://www.springerlink.com/content/5mj5k9mgdntjvdxp/) Features of 
aggregation relationships discussed. (I am not logged so all I can see 
is the abstract.)

Hsinchun Chen, et. al. Semantic Indexing and searching using a Hopfield 
net. (http://ai.arizona.edu/intranet/papers/SemanitcIndexing.pdf) 
Generated *10,000,000 relationships.*

The point being that words/terms occur in *relationship* to each other, 
authors, documents, domains, etc.

Without context (read relationships) express or implied, there is no 
semantic.

The ability to explore relationships, which are the basis for any 
semantic, would be enhanced by the ability to index relationships. Yes?

Hope you are having a great weekend!

Patrick


-- 
Patrick Durusau
patr...@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Indexing Relationships?

2010-05-15 Thread Kim Soldal


I also find interest in this functionality, and would like to contribute a 
possible use case.
We have just wrapped up a project using Neo4j to describe a Transportation 
Network Graph (http://code.google.com/p/gotogate/). In this graph each node is 
a stop for some transport, while there is one realation between two nodes for 
each transportation that travels between these. The interesting part would be 
that if this graph were to be translated to a map overlay for a single route to 
be presented to the user by requrest for example, we would now have to traverse 
the entire graph until we find a relation with a name that matches the 
requested transport name. This would be less than optimal in a large 
transportation network such as for an entire country. Obviously the indexing 
for such a graph would be costly to include all relations, but I agree with 
Alex D'Amour that the functionality is useful. Traversing this graph to find 
one specific transportation line would be way more costly. I would like to see 
this functionality as optional on the index service, since it would slow down 
implementations that do not need relation indexing.
 
Cheers
Kim

> From: tobias.ivars...@neotechnology.com
> Date: Sat, 15 May 2010 13:32:36 +0200
> To: user@lists.neo4j.org
> Subject: Re: [Neo] Indexing Relationships?
> 
> There is no indexing component for Relationships and there has never been
> one.
> The interesting question that you should have asked is: _will_ there ever be
> one.
> 
> The answer to that question is: maybe, it has been prototyped as part of a
> simplification of the entire indexing API.
> 
> The interesting thing to me would be to get a concrete use case for this.
> I've heard requests for being able to index relationships a number of times,
> but never a concrete use case for being able to do so. It's always been
> vague hand waving like in this case "we have data that is heavily centered
> on the relationships rather than nodes", WHAT is that data? WHY does it need
> to be centered around the relationships? If you say that you have use cases
> like these I believe that you do, I have no reason to believe that you are
> lying, why would you. But I want to understand those use cases, and I want
> to understand them in a setting where having support for indexing
> relationships adds value to the business.
> 
> I would like it if we were able to index Relationships as part of the core
> API by version 1.2, and having an actual use case for when it would improve
> the implementation of an actual domain would certainly help speed up the
> process, perhaps we could even sneak it into version 1.1.
> 
> Cheers,
> Tobias
> 
> On Fri, May 14, 2010 at 5:05 PM, Alex D'Amour wrote:
> 
> > Hi all,
> >
> > I am working on an application that stores large network data from multiple
> > domains in Neo4j databases. The object is to allow users to upload network
> > datasets and then expose them to researchers over the web, allowing
> > researchers to subset the data and eventually download their own subgraph
> > of
> > the original dataset.
> >
> > Many of the operations that we intend to support are covered by the Lucene
> > and Traversal frameworks. However, we'd also like to perform relationship
> > lookups in the same way that we perform node lookups since many networks
> > have data that are heavily centered on the Relationships rather than nodes.
> > Is there or has there ever been an indexing component for Relationships in
> > Neo4j? If not, how difficult would it be to port the LuceneIndexService to
> > index relationships as well as nodes (i.e. how much of the code is specific
> > to Nodes rather than PropertyContainers)?
> >
> > I realize that this probably isn't the ideal way to interact with the graph
> > and that better domain modeling would probably solve this if the framework
> > didn't have to be generic. But in this case we'd like to support this type
> > of interaction with simple graph structures with only one type of node and
> > only one type of relationship since they are the structures that social
> > network researchers are the most interested in.
> >
> > Thanks,
> > Alex
> > ___
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> 
> 
> 
> -- 
> Tobias Ivarsson 
> Hacker, Neo Technology
> www.neotechnology.com
> Cellphone: +46 706 534857
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
  
_
Windows 7: Se direkte-TV fra den bærbare PCen. Finn ut mer.
http://windows.microsoft.com/windows-7
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Implementing new persistence source

2010-05-15 Thread Jawad Stouli


Hi everyone, 

I would be very interested in getting more information
that would help me implement new persistence sources. I have read (there:
http://www.mail-archive.com/user@lists.neo4j.org/msg6.html) that it
should not be that difficult (or, at least, it is possible) but I still
have some difficulties while navigating through the sources to understand
exactly how it should be done. 

Besides, I have read that using MySQL was
less efficient than Nioneo. Was the difference really important ? 

Best,


Jawad
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Fwd: Node not in use exception when using tx event handler

2010-05-15 Thread Garrett Smith
Is this something I should open a ticket for, or is it something the
dev team is aware of? Or is it user error?

Garrett


-- Forwarded message --
From: Garrett Smith 
Date: Thu, May 13, 2010 at 2:52 PM
Subject: Node not in use exception when using tx event handler
To: Neo4j Users 


I'm running into the exception below when I try to delete a node when
first starting up a graph database.

I'm experimenting with a transaction event handler. The error,
however, occurs before my handler gets called.

org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: Node[10] not in use
       at 
org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.nodeGetProperties(WriteTransaction.java:1009)
       at 
org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaConnection$NodeEventConsumerImpl.getProperties(NeoStoreXaConnection.java:228)
       at 
org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.nodeLoadProperties(NioNeoDbPersistenceSource.java:432)
       at 
org.neo4j.kernel.impl.persistence.PersistenceManager.loadNodeProperties(PersistenceManager.java:100)
       at 
org.neo4j.kernel.impl.core.NodeManager.loadProperties(NodeManager.java:628)
       at org.neo4j.kernel.impl.core.NodeImpl.loadProperties(NodeImpl.java:84)
       at 
org.neo4j.kernel.impl.core.Primitive.ensureFullLightProperties(Primitive.java:591)
       at 
org.neo4j.kernel.impl.core.Primitive.getAllCommittedProperties(Primitive.java:604)
       at 
org.neo4j.kernel.impl.core.LockReleaser.populateNodeRelEvent(LockReleaser.java:855)
       at 
org.neo4j.kernel.impl.core.LockReleaser.getTransactionData(LockReleaser.java:740)
       at 
org.neo4j.kernel.impl.core.NodeManager.getTransactionData(NodeManager.java:914)
       at 
org.neo4j.kernel.impl.core.TransactionEventsSyncHook.beforeCompletion(TransactionEventsSyncHook.java:39)
       at 
org.neo4j.kernel.impl.transaction.TransactionImpl.doBeforeCompletion(TransactionImpl.java:341)
       at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:556)
       at 
org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:103)
       at 
org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:410)
       at gv.graph.Nodes.deleteNode(Nodes.java:349)
       at gv.graph.NodeDelete.handle(NodeDelete.java:20)
       at gv.graph.MessageHandler.run(MessageHandler.java:59)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:619)
May 13, 2010 2:42:56 PM
org.neo4j.kernel.impl.transaction.TransactionImpl doBeforeCompletion
WARNING: Caught exception from tx
syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6]
beforeCompletion()
May 13, 2010 2:42:56 PM
org.neo4j.kernel.impl.transaction.TransactionImpl doAfterCompletion
WARNING: Caught exception from tx
syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6]
afterCompletion()

Code details:

URL: https://svn.neo4j.org/components/kernel/trunk
Repository Root: https://svn.neo4j.org
Repository UUID: 0b971d98-bb2f-0410-8247-b05b2b5feb2a
Revision: 4415
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Fwd: Node not in use exception when using tx event handler

2010-05-15 Thread Tobias Ivarsson
Create a ticket for it, I've tagged it for reviewing when I get back to the
office, you had the great unfortune to send this right at the beginning of a
4 day Swedish holiday.

If you could supply code that can reproduce it that would be even better.

Cheers,
Tobias

On Sat, May 15, 2010 at 8:42 PM, Garrett Smith  wrote:

> Is this something I should open a ticket for, or is it something the
> dev team is aware of? Or is it user error?
>
> Garrett
>
>
> -- Forwarded message --
> From: Garrett Smith 
> Date: Thu, May 13, 2010 at 2:52 PM
> Subject: Node not in use exception when using tx event handler
> To: Neo4j Users 
>
>
> I'm running into the exception below when I try to delete a node when
> first starting up a graph database.
>
> I'm experimenting with a transaction event handler. The error,
> however, occurs before my handler gets called.
>
> org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: Node[10] not in
> use
>at
> org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.nodeGetProperties(WriteTransaction.java:1009)
>at
> org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaConnection$NodeEventConsumerImpl.getProperties(NeoStoreXaConnection.java:228)
>at
> org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.nodeLoadProperties(NioNeoDbPersistenceSource.java:432)
>at
> org.neo4j.kernel.impl.persistence.PersistenceManager.loadNodeProperties(PersistenceManager.java:100)
>at
> org.neo4j.kernel.impl.core.NodeManager.loadProperties(NodeManager.java:628)
>at
> org.neo4j.kernel.impl.core.NodeImpl.loadProperties(NodeImpl.java:84)
>at
> org.neo4j.kernel.impl.core.Primitive.ensureFullLightProperties(Primitive.java:591)
>at
> org.neo4j.kernel.impl.core.Primitive.getAllCommittedProperties(Primitive.java:604)
>at
> org.neo4j.kernel.impl.core.LockReleaser.populateNodeRelEvent(LockReleaser.java:855)
>at
> org.neo4j.kernel.impl.core.LockReleaser.getTransactionData(LockReleaser.java:740)
>at
> org.neo4j.kernel.impl.core.NodeManager.getTransactionData(NodeManager.java:914)
>at
> org.neo4j.kernel.impl.core.TransactionEventsSyncHook.beforeCompletion(TransactionEventsSyncHook.java:39)
>at
> org.neo4j.kernel.impl.transaction.TransactionImpl.doBeforeCompletion(TransactionImpl.java:341)
>at
> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:556)
>at
> org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:103)
>at
> org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:410)
>at gv.graph.Nodes.deleteNode(Nodes.java:349)
>at gv.graph.NodeDelete.handle(NodeDelete.java:20)
>at gv.graph.MessageHandler.run(MessageHandler.java:59)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:619)
> May 13, 2010 2:42:56 PM
> org.neo4j.kernel.impl.transaction.TransactionImpl doBeforeCompletion
> WARNING: Caught exception from tx
> syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6
> ]
> beforeCompletion()
> May 13, 2010 2:42:56 PM
> org.neo4j.kernel.impl.transaction.TransactionImpl doAfterCompletion
> WARNING: Caught exception from tx
> syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6
> ]
> afterCompletion()
>
> Code details:
>
> URL: https://svn.neo4j.org/components/kernel/trunk
> Repository Root: https://svn.neo4j.org
> Repository UUID: 0b971d98-bb2f-0410-8247-b05b2b5feb2a
> Revision: 4415
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Tobias Ivarsson 
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Indexing Relationships?

2010-05-15 Thread Niels Hoogeveen

I use relationships to encode paths in the graph based on the meta model.
For example:
Class(Article) --> Relationship(Author) --> Class(User) --> Property(Username)
Right now I encode this using an md5 encoding of the above path, add a property 
to the first entity in the path, using the md5 encoding as the key (the value 
is irrelevant), relationships (with a DynamicRelationshipType with a name equal 
to the md5 key) are used to link the various items in the path.
Finding the path requires a traversal from the first Class node in the path, 
following the given relationships. This traversal can potentially be expensive 
when a class takes many instances (all have a relationship to the class). 
When relationships were indexed, the path could be encoded by giving each 
relationship making up the path a property encoding the path, then use the 
index to retrieve all relationships making up the path and lay those 
relationships head to toe to construct the path. No longer would a traversal be 
necessary and the cost of the operation only depends on the number of elements 
in the path, and not to the number of relationships one of the elements in the 
path can potentially have.
Niels

 From: tobias.ivars...@neotechnology.com
> Date: Sat, 15 May 2010 13:32:36 +0200
> To: user@lists.neo4j.org
> Subject: Re: [Neo] Indexing Relationships?
> 
> There is no indexing component for Relationships and there has never been
> one.
> The interesting question that you should have asked is: _will_ there ever be
> one.
> 
> The answer to that question is: maybe, it has been prototyped as part of a
> simplification of the entire indexing API.
> 
> The interesting thing to me would be to get a concrete use case for this.
> I've heard requests for being able to index relationships a number of times,
> but never a concrete use case for being able to do so. It's always been
> vague hand waving like in this case "we have data that is heavily centered
> on the relationships rather than nodes", WHAT is that data? WHY does it need
> to be centered around the relationships? If you say that you have use cases
> like these I believe that you do, I have no reason to believe that you are
> lying, why would you. But I want to understand those use cases, and I want
> to understand them in a setting where having support for indexing
> relationships adds value to the business.
> 
> I would like it if we were able to index Relationships as part of the core
> API by version 1.2, and having an actual use case for when it would improve
> the implementation of an actual domain would certainly help speed up the
> process, perhaps we could even sneak it into version 1.1.
> 
> Cheers,
> Tobias
> 
> On Fri, May 14, 2010 at 5:05 PM, Alex D'Amour wrote:
> 
> > Hi all,
> >
> > I am working on an application that stores large network data from multiple
> > domains in Neo4j databases. The object is to allow users to upload network
> > datasets and then expose them to researchers over the web, allowing
> > researchers to subset the data and eventually download their own subgraph
> > of
> > the original dataset.
> >
> > Many of the operations that we intend to support are covered by the Lucene
> > and Traversal frameworks. However, we'd also like to perform relationship
> > lookups in the same way that we perform node lookups since many networks
> > have data that are heavily centered on the Relationships rather than nodes.
> > Is there or has there ever been an indexing component for Relationships in
> > Neo4j? If not, how difficult would it be to port the LuceneIndexService to
> > index relationships as well as nodes (i.e. how much of the code is specific
> > to Nodes rather than PropertyContainers)?
> >
> > I realize that this probably isn't the ideal way to interact with the graph
> > and that better domain modeling would probably solve this if the framework
> > didn't have to be generic. But in this case we'd like to support this type
> > of interaction with simple graph structures with only one type of node and
> > only one type of relationship since they are the structures that social
> > network researchers are the most interested in.
> >
> > Thanks,
> > Alex
> > ___
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> 
> 
> 
> -- 
> Tobias Ivarsson 
> Hacker, Neo Technology
> www.neotechnology.com
> Cellphone: +46 706 534857
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
  
_
New Windows 7: Find the right PC for you. Learn more.
http://windows.microsoft.com/shop
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Metamodel DataRange class

2010-05-15 Thread Niels Hoogeveen

The class DataRange in the meta model component at this moment is a subclass of 
RdfDataTypeRange, which in my opinion is not optimal. DataRange is used to 
enumerate the values a certain PropertyType can have and in that sense can be 
seen as a DatatypeClassRange with further restrictions. 
DatatypeClassRange has some dependencies on RDF but only in its 
rdfLiteralToJavaObject and javaObjectToRdfLiteral methods, both of which are 
not required to be used. DataRange on the other hand has dependencies on RDF in 
the internalLoad and internalStore methods, which use is not optional.
As a result it is possible to give the DatatypeClassRange constructor as 
argument the class java.lang.String, and do the appropriate cast of Object to 
String in user code. The same is not possible with DataRange, which has a 
constructor having a String as first argument, which needs to correspond with 
some predefined RDF types. So instead of giving the argument "java.lang.String" 
and doing the proper cast in user code, the argument needs to be 
"http://www.w3.org/2001/XMLSchema#string";. 
This dependency on RDF is far from ideal. I would like to be able to say that a 
DataRange can have any type of class, and do the proper casting/transformation 
in user code. With DatatypeClassRange I can do that. It is possible to use any 
class for DatatypeClassRange and do serialization to and from a property value 
in user code (after all, any serializable class can be written into a byte 
array or into a String). 
My suggestion is to make DataRange a subclass of DatatypeClassRange, changing 
the first constructor argument from String into Class and have a check that all 
Objects passed as the second constructor argument conform to that Class. 
Of course I am willing to make this change, but I'd like to have feedback 
before doing so.
Niels Hoogeveen   
_
New Windows 7: Find the right PC for you. Learn more.
http://windows.microsoft.com/shop
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Traversal Speed is just 1 millisecond per node

2010-05-15 Thread suryadev vasudev
We are considering Neo4J for a decision making application. The application
is analogous to a Library having 15 million books. We have BOOKS, PUBLISHERS
and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to
one publisher. STUDENTS may borrow a book, reserve a book or return a
borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY
and  RETURNS between BOOKS and STUDENTS.
When we traverse starting from a publisher, the traversing speed is 200-1000
nodes per millisecond. This is pure traversal to get a book count by
publisher.
The Neo is failing us when we make a slightly complex query.
Starting with a publisher, retrieve all books that are currently lent out.
Starting with a publisher, retrieve all books that were borrowed between May
1 2010 and May 10 2010.
The response time we got was 1-2 millisecond per book.
Before running the test, we created between 0-3 relationships for each book.
We have seeded 15,000 students ,1000 publishers and 15 million books.
And the server is a 8GB RAM machine.
I wonder why the traversal is drastically slow?
Regards
SDev
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversal Speed is just 1 millisecond per node

2010-05-15 Thread Craig Taverner
My 2 cents, without knowing the structure of your data (which is needed to
really answer the question).

I assume when you say 'slightly complex query' you are probably using a
custom traverser that looks at properties of nodes and/or relationships to
make the decision, or possibly even follows a relationship to make the
decision. All of these options will slow things down. Your original
traverser probably only considered relationship types and directions,
loading from only the relationships table. The new one hits the properties
tables, possibly for both nodes and relationships.

If this is the case, the improvement is much the same as you would do in a
relational database, which is to index the data. However, indexing is
different in a graph, and I think the best way to do that in your case is to
build additional graph structures that allow the new traverser to only look
at relationships. For example, you say that you are interested in books from
a particular published currently lent out. Consider having the publisher not
have direct relationships to their books (a publisher index), but instead
have relationships to 'borrowed' and 'not borrowed' nodes and those are
related to the books (effectively a combined publisher-borrowing_status
index). When a book is borrowed, move it's relationship. Since borrowing a
book occurs occasionally over very long times (days or weeks), this database
edit has no performance cost, but makes the query you are looking for very
fast. To add a time period to this situation, consider the TimeLineIndex.
Alternatively extend the previous concept to have nodes representing books
borrowed on certain days, for example.

The real solution is really dependent on your data and the kinds of queries
you plan to make. You probably already made the publisher-book relationships
because you planned to make a query like that. The more complex queries you
wish to make the more complex structure you will probably devise. Neo4j is
great in that you can keep optimizing by adding appropriate structure
without removing previous capabilities.

On Sat, May 15, 2010 at 11:34 PM, suryadev vasudev <
suryadev.vasu...@gmail.com> wrote:

> We are considering Neo4J for a decision making application. The application
> is analogous to a Library having 15 million books. We have BOOKS,
> PUBLISHERS
> and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to
> one publisher. STUDENTS may borrow a book, reserve a book or return a
> borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY
> and  RETURNS between BOOKS and STUDENTS.
> When we traverse starting from a publisher, the traversing speed is
> 200-1000
> nodes per millisecond. This is pure traversal to get a book count by
> publisher.
> The Neo is failing us when we make a slightly complex query.
> Starting with a publisher, retrieve all books that are currently lent out.
> Starting with a publisher, retrieve all books that were borrowed between
> May
> 1 2010 and May 10 2010.
> The response time we got was 1-2 millisecond per book.
> Before running the test, we created between 0-3 relationships for each
> book.
> We have seeded 15,000 students ,1000 publishers and 15 million books.
> And the server is a 8GB RAM machine.
> I wonder why the traversal is drastically slow?
> Regards
> SDev
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversal Speed is just 1 millisecond per node

2010-05-15 Thread Marko Rodriguez
Hi,

Adding onto Craig's thoughts, I'd like to point you to some related work in 
this area:

1. Modeling a library as a graph.
- slides: 
http://www.slideshare.net/slidarko/a-practical-ontology-for-the-largescale-modeling-of-scholarly-artifacts-and-their-usage-3879791
- article: http://arxiv.org/abs/0708.1150

2. Doing 'slightly complex queries' as graph traversal over graph databases 
such as Neo4j:
- software framework: http://pipes.tinkerpop.com
- pipes give you fine-grained control over your walker with 
good speed: http://bit.ly/aa29MO
- related article: http://arxiv.org/abs/0806.2274
- related article: http://arxiv.org/abs/1004.1001

Take care,
Marko.

http://tinkerpop.com
http://markorodriguez.com

On May 15, 2010, at 4:05 PM, Craig Taverner wrote:

> My 2 cents, without knowing the structure of your data (which is needed to
> really answer the question).
> 
> I assume when you say 'slightly complex query' you are probably using a
> custom traverser that looks at properties of nodes and/or relationships to
> make the decision, or possibly even follows a relationship to make the
> decision. All of these options will slow things down. Your original
> traverser probably only considered relationship types and directions,
> loading from only the relationships table. The new one hits the properties
> tables, possibly for both nodes and relationships.
> 
> If this is the case, the improvement is much the same as you would do in a
> relational database, which is to index the data. However, indexing is
> different in a graph, and I think the best way to do that in your case is to
> build additional graph structures that allow the new traverser to only look
> at relationships. For example, you say that you are interested in books from
> a particular published currently lent out. Consider having the publisher not
> have direct relationships to their books (a publisher index), but instead
> have relationships to 'borrowed' and 'not borrowed' nodes and those are
> related to the books (effectively a combined publisher-borrowing_status
> index). When a book is borrowed, move it's relationship. Since borrowing a
> book occurs occasionally over very long times (days or weeks), this database
> edit has no performance cost, but makes the query you are looking for very
> fast. To add a time period to this situation, consider the TimeLineIndex.
> Alternatively extend the previous concept to have nodes representing books
> borrowed on certain days, for example.
> 
> The real solution is really dependent on your data and the kinds of queries
> you plan to make. You probably already made the publisher-book relationships
> because you planned to make a query like that. The more complex queries you
> wish to make the more complex structure you will probably devise. Neo4j is
> great in that you can keep optimizing by adding appropriate structure
> without removing previous capabilities.
> 
> On Sat, May 15, 2010 at 11:34 PM, suryadev vasudev <
> suryadev.vasu...@gmail.com> wrote:
> 
>> We are considering Neo4J for a decision making application. The application
>> is analogous to a Library having 15 million books. We have BOOKS,
>> PUBLISHERS
>> and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to
>> one publisher. STUDENTS may borrow a book, reserve a book or return a
>> borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY
>> and  RETURNS between BOOKS and STUDENTS.
>> When we traverse starting from a publisher, the traversing speed is
>> 200-1000
>> nodes per millisecond. This is pure traversal to get a book count by
>> publisher.
>> The Neo is failing us when we make a slightly complex query.
>> Starting with a publisher, retrieve all books that are currently lent out.
>> Starting with a publisher, retrieve all books that were borrowed between
>> May
>> 1 2010 and May 10 2010.
>> The response time we got was 1-2 millisecond per book.
>> Before running the test, we created between 0-3 relationships for each
>> book.
>> We have seeded 15,000 students ,1000 publishers and 15 million books.
>> And the server is a 8GB RAM machine.
>> I wonder why the traversal is drastically slow?
>> Regards
>> SDev
>> ___
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversal Speed is just 1 millisecond per node

2010-05-15 Thread rick . bullotta
   Don't forget that there is a DRAMATIC difference between "warm"
   benchmark results and "cold" results.  If you can do a few extensive
   queries to pre-load the nodes/relationships, the results should be much
   better.



   Also, it would be useful to look at your code, as I suspect there is
   something in there that is causing the three order of magnitude
   reduction in performance.







    Original Message 
   Subject: [Neo] Traversal Speed is just 1 millisecond per node
   From: suryadev vasudev 
   Date: Sat, May 15, 2010 5:34 pm
   To: user@lists.neo4j.org
   We are considering Neo4J for a decision making application. The
   application
   is analogous to a Library having 15 million books. We have BOOKS,
   PUBLISHERS
   and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship
   to
   one publisher. STUDENTS may borrow a book, reserve a book or return a
   borrowed book. Each is a relationship type meaning BORROWED_BY,
   RESERVED_BY
   and RETURNS between BOOKS and STUDENTS.
   When we traverse starting from a publisher, the traversing speed is
   200-1000
   nodes per millisecond. This is pure traversal to get a book count by
   publisher.
   The Neo is failing us when we make a slightly complex query.
   Starting with a publisher, retrieve all books that are currently lent
   out.
   Starting with a publisher, retrieve all books that were borrowed
   between May
   1 2010 and May 10 2010.
   The response time we got was 1-2 millisecond per book.
   Before running the test, we created between 0-3 relationships for each
   book.
   We have seeded 15,000 students ,1000 publishers and 15 million books.
   And the server is a 8GB RAM machine.
   I wonder why the traversal is drastically slow?
   Regards
   SDev
   ___
   Neo mailing list
   User@lists.neo4j.org
   [1]https://lists.neo4j.org/mailman/listinfo/user

References

   1. https://lists.neo4j.org/mailman/listinfo/user
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversal Speed is just 1 millisecond per node

2010-05-15 Thread rick . bullotta
   Also, can you describe how you are using properties in this
   scenario?  What types of properties, approximate size of the data,
   etc...



    Original Message 
   Subject: [Neo] Traversal Speed is just 1 millisecond per node
   From: suryadev vasudev 
   Date: Sat, May 15, 2010 5:34 pm
   To: user@lists.neo4j.org
   We are considering Neo4J for a decision making application. The
   application
   is analogous to a Library having 15 million books. We have BOOKS,
   PUBLISHERS
   and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship
   to
   one publisher. STUDENTS may borrow a book, reserve a book or return a
   borrowed book. Each is a relationship type meaning BORROWED_BY,
   RESERVED_BY
   and RETURNS between BOOKS and STUDENTS.
   When we traverse starting from a publisher, the traversing speed is
   200-1000
   nodes per millisecond. This is pure traversal to get a book count by
   publisher.
   The Neo is failing us when we make a slightly complex query.
   Starting with a publisher, retrieve all books that are currently lent
   out.
   Starting with a publisher, retrieve all books that were borrowed
   between May
   1 2010 and May 10 2010.
   The response time we got was 1-2 millisecond per book.
   Before running the test, we created between 0-3 relationships for each
   book.
   We have seeded 15,000 students ,1000 publishers and 15 million books.
   And the server is a 8GB RAM machine.
   I wonder why the traversal is drastically slow?
   Regards
   SDev
   ___
   Neo mailing list
   User@lists.neo4j.org
   [1]https://lists.neo4j.org/mailman/listinfo/user

References

   1. https://lists.neo4j.org/mailman/listinfo/user
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversal Speed is just 1 millisecond per node

2010-05-15 Thread suryadev vasudev
Here is a rough design and volume

NODE:PUBLISHER (1000 publishers)
published_id
Publisher_name
Publisher address
Publisher City
Publisher State
Publisher Country
Publisher Primary Email
Publisher URL

NODE:STUDENT (15,000 students)
Student_id
Student_first_name
Student_last_name
Student_registration_date
Student_course_completion_date
Student_Email_id

NODE:BOOK (15 million books)
Book_id
Book_ISBN
Book_name
Book_Primary_Author
Book_Secondary_Author
Book_Published_year
Book_subject

RELATIONSHIP: PUBLISHED_BY
Purchase_date
Purchase_approved_by
Purchase_contract_number

RELATIONSHIP: BORROWED_BY
borrowed_date
due_date

RELATIONSHIP: RETURNED_BY
borrowed_date
due_date
returned_date
due_amount_paid

RELATIONSHIP: RESERVED_BY
reservation_date

The BORROWED_BY relationship is maintained for an active borrowing. This
relationship is deleted and RETURNED_BY relationship is created when book is
returned. So there can be a maximum of one BORROWED_BY relationship for any
one book. Off course there will be more than one RETURNED_BY for a book.

Many students can reserve the book at any time. All will get a email when a
book is returned.

The application is expected to provide dashboard services and analytical
reports

Student dashboard:
All books borrowed, returned and reserved by a student for a date range
Book dashboard:
Lending history of a book for a given date range
Publisher dashboard:
All books for a particular publisher, lending history
Librarian dashboard:
Lending activities for a given date range (by publisher, by hour of day etc)
How many books were not in the library for a given day


Coming from a strong RDBMS background, I had instructed my team to stick to
nodes and their natural relationships. Creating a artificial relationship
CURRENTLY_BORROWED between publisher and book was not in our mind.

When I first read about traversal speed of 1000-3000/millisecond,  I added
some buffer and assumed 500/millisecond as a realistic speed. I am not
giving up so easily after seeing 1/millisecond. I look forward to responses
from other users.

The real challenges will be around queries for a publisher. A publisher will
have around 15,000 books and a query like "Given a published ID, what
percentage of his books were never borrowed"  will need full browsing. My
hope was that I could browse through and get the answer in 30 milliseconds.
But it looks like it will take a minimum of 15 seconds.

Some publishers will have 50,000 books and I can't imagine a response time
of 50 seconds.

So, I have to achieve at least 500/millisecond if not the original 1000.

Regards
SDev

On Sat, May 15, 2010 at 4:59 PM, wrote:

>   Also, can you describe how you are using properties in this
>   scenario?  What types of properties, approximate size of the data,
>   etc...
>
>
>
>    Original Message 
>   Subject: [Neo] Traversal Speed is just 1 millisecond per node
>   From: suryadev vasudev 
>   Date: Sat, May 15, 2010 5:34 pm
>   To: user@lists.neo4j.org
>   We are considering Neo4J for a decision making application. The
>   application
>   is analogous to a Library having 15 million books. We have BOOKS,
>   PUBLISHERS
>   and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship
>   to
>   one publisher. STUDENTS may borrow a book, reserve a book or return a
>   borrowed book. Each is a relationship type meaning BORROWED_BY,
>   RESERVED_BY
>   and RETURNS between BOOKS and STUDENTS.
>   When we traverse starting from a publisher, the traversing speed is
>   200-1000
>   nodes per millisecond. This is pure traversal to get a book count by
>   publisher.
>   The Neo is failing us when we make a slightly complex query.
>   Starting with a publisher, retrieve all books that are currently lent
>   out.
>   Starting with a publisher, retrieve all books that were borrowed
>   between May
>   1 2010 and May 10 2010.
>   The response time we got was 1-2 millisecond per book.
>   Before running the test, we created between 0-3 relationships for each
>   book.
>   We have seeded 15,000 students ,1000 publishers and 15 million books.
>   And the server is a 8GB RAM machine.
>   I wonder why the traversal is drastically slow?
>   Regards
>   SDev
>   ___
>   Neo mailing list
>   User@lists.neo4j.org
>   [1]https://lists.neo4j.org/mailman/listinfo/user
>
> References
>
>   1. https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user