Re: [Neo] Traversal Speed is just 1 millisecond per node
Here is a rough design and volume NODE:PUBLISHER (1000 publishers) published_id Publisher_name Publisher address Publisher City Publisher State Publisher Country Publisher Primary Email Publisher URL NODE:STUDENT (15,000 students) Student_id Student_first_name Student_last_name Student_registration_date Student_course_completion_date Student_Email_id NODE:BOOK (15 million books) Book_id Book_ISBN Book_name Book_Primary_Author Book_Secondary_Author Book_Published_year Book_subject RELATIONSHIP: PUBLISHED_BY Purchase_date Purchase_approved_by Purchase_contract_number RELATIONSHIP: BORROWED_BY borrowed_date due_date RELATIONSHIP: RETURNED_BY borrowed_date due_date returned_date due_amount_paid RELATIONSHIP: RESERVED_BY reservation_date The BORROWED_BY relationship is maintained for an active borrowing. This relationship is deleted and RETURNED_BY relationship is created when book is returned. So there can be a maximum of one BORROWED_BY relationship for any one book. Off course there will be more than one RETURNED_BY for a book. Many students can reserve the book at any time. All will get a email when a book is returned. The application is expected to provide dashboard services and analytical reports Student dashboard: All books borrowed, returned and reserved by a student for a date range Book dashboard: Lending history of a book for a given date range Publisher dashboard: All books for a particular publisher, lending history Librarian dashboard: Lending activities for a given date range (by publisher, by hour of day etc) How many books were not in the library for a given day Coming from a strong RDBMS background, I had instructed my team to stick to nodes and their natural relationships. Creating a artificial relationship CURRENTLY_BORROWED between publisher and book was not in our mind. When I first read about traversal speed of 1000-3000/millisecond, I added some buffer and assumed 500/millisecond as a realistic speed. I am not giving up so easily after seeing 1/millisecond. I look forward to responses from other users. The real challenges will be around queries for a publisher. A publisher will have around 15,000 books and a query like "Given a published ID, what percentage of his books were never borrowed" will need full browsing. My hope was that I could browse through and get the answer in 30 milliseconds. But it looks like it will take a minimum of 15 seconds. Some publishers will have 50,000 books and I can't imagine a response time of 50 seconds. So, I have to achieve at least 500/millisecond if not the original 1000. Regards SDev On Sat, May 15, 2010 at 4:59 PM, wrote: > Also, can you describe how you are using properties in this > scenario? What types of properties, approximate size of the data, > etc... > > > > Original Message > Subject: [Neo] Traversal Speed is just 1 millisecond per node > From: suryadev vasudev > Date: Sat, May 15, 2010 5:34 pm > To: user@lists.neo4j.org > We are considering Neo4J for a decision making application. The > application > is analogous to a Library having 15 million books. We have BOOKS, > PUBLISHERS > and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship > to > one publisher. STUDENTS may borrow a book, reserve a book or return a > borrowed book. Each is a relationship type meaning BORROWED_BY, > RESERVED_BY > and RETURNS between BOOKS and STUDENTS. > When we traverse starting from a publisher, the traversing speed is > 200-1000 > nodes per millisecond. This is pure traversal to get a book count by > publisher. > The Neo is failing us when we make a slightly complex query. > Starting with a publisher, retrieve all books that are currently lent > out. > Starting with a publisher, retrieve all books that were borrowed > between May > 1 2010 and May 10 2010. > The response time we got was 1-2 millisecond per book. > Before running the test, we created between 0-3 relationships for each > book. > We have seeded 15,000 students ,1000 publishers and 15 million books. > And the server is a 8GB RAM machine. > I wonder why the traversal is drastically slow? > Regards > SDev > ___ > Neo mailing list > User@lists.neo4j.org > [1]https://lists.neo4j.org/mailman/listinfo/user > > References > > 1. https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Traversal Speed is just 1 millisecond per node
Also, can you describe how you are using properties in this scenario? What types of properties, approximate size of the data, etc... Original Message Subject: [Neo] Traversal Speed is just 1 millisecond per node From: suryadev vasudev Date: Sat, May 15, 2010 5:34 pm To: user@lists.neo4j.org We are considering Neo4J for a decision making application. The application is analogous to a Library having 15 million books. We have BOOKS, PUBLISHERS and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to one publisher. STUDENTS may borrow a book, reserve a book or return a borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY and RETURNS between BOOKS and STUDENTS. When we traverse starting from a publisher, the traversing speed is 200-1000 nodes per millisecond. This is pure traversal to get a book count by publisher. The Neo is failing us when we make a slightly complex query. Starting with a publisher, retrieve all books that are currently lent out. Starting with a publisher, retrieve all books that were borrowed between May 1 2010 and May 10 2010. The response time we got was 1-2 millisecond per book. Before running the test, we created between 0-3 relationships for each book. We have seeded 15,000 students ,1000 publishers and 15 million books. And the server is a 8GB RAM machine. I wonder why the traversal is drastically slow? Regards SDev ___ Neo mailing list User@lists.neo4j.org [1]https://lists.neo4j.org/mailman/listinfo/user References 1. https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Traversal Speed is just 1 millisecond per node
Don't forget that there is a DRAMATIC difference between "warm" benchmark results and "cold" results. If you can do a few extensive queries to pre-load the nodes/relationships, the results should be much better. Also, it would be useful to look at your code, as I suspect there is something in there that is causing the three order of magnitude reduction in performance. Original Message Subject: [Neo] Traversal Speed is just 1 millisecond per node From: suryadev vasudev Date: Sat, May 15, 2010 5:34 pm To: user@lists.neo4j.org We are considering Neo4J for a decision making application. The application is analogous to a Library having 15 million books. We have BOOKS, PUBLISHERS and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to one publisher. STUDENTS may borrow a book, reserve a book or return a borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY and RETURNS between BOOKS and STUDENTS. When we traverse starting from a publisher, the traversing speed is 200-1000 nodes per millisecond. This is pure traversal to get a book count by publisher. The Neo is failing us when we make a slightly complex query. Starting with a publisher, retrieve all books that are currently lent out. Starting with a publisher, retrieve all books that were borrowed between May 1 2010 and May 10 2010. The response time we got was 1-2 millisecond per book. Before running the test, we created between 0-3 relationships for each book. We have seeded 15,000 students ,1000 publishers and 15 million books. And the server is a 8GB RAM machine. I wonder why the traversal is drastically slow? Regards SDev ___ Neo mailing list User@lists.neo4j.org [1]https://lists.neo4j.org/mailman/listinfo/user References 1. https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Traversal Speed is just 1 millisecond per node
Hi, Adding onto Craig's thoughts, I'd like to point you to some related work in this area: 1. Modeling a library as a graph. - slides: http://www.slideshare.net/slidarko/a-practical-ontology-for-the-largescale-modeling-of-scholarly-artifacts-and-their-usage-3879791 - article: http://arxiv.org/abs/0708.1150 2. Doing 'slightly complex queries' as graph traversal over graph databases such as Neo4j: - software framework: http://pipes.tinkerpop.com - pipes give you fine-grained control over your walker with good speed: http://bit.ly/aa29MO - related article: http://arxiv.org/abs/0806.2274 - related article: http://arxiv.org/abs/1004.1001 Take care, Marko. http://tinkerpop.com http://markorodriguez.com On May 15, 2010, at 4:05 PM, Craig Taverner wrote: > My 2 cents, without knowing the structure of your data (which is needed to > really answer the question). > > I assume when you say 'slightly complex query' you are probably using a > custom traverser that looks at properties of nodes and/or relationships to > make the decision, or possibly even follows a relationship to make the > decision. All of these options will slow things down. Your original > traverser probably only considered relationship types and directions, > loading from only the relationships table. The new one hits the properties > tables, possibly for both nodes and relationships. > > If this is the case, the improvement is much the same as you would do in a > relational database, which is to index the data. However, indexing is > different in a graph, and I think the best way to do that in your case is to > build additional graph structures that allow the new traverser to only look > at relationships. For example, you say that you are interested in books from > a particular published currently lent out. Consider having the publisher not > have direct relationships to their books (a publisher index), but instead > have relationships to 'borrowed' and 'not borrowed' nodes and those are > related to the books (effectively a combined publisher-borrowing_status > index). When a book is borrowed, move it's relationship. Since borrowing a > book occurs occasionally over very long times (days or weeks), this database > edit has no performance cost, but makes the query you are looking for very > fast. To add a time period to this situation, consider the TimeLineIndex. > Alternatively extend the previous concept to have nodes representing books > borrowed on certain days, for example. > > The real solution is really dependent on your data and the kinds of queries > you plan to make. You probably already made the publisher-book relationships > because you planned to make a query like that. The more complex queries you > wish to make the more complex structure you will probably devise. Neo4j is > great in that you can keep optimizing by adding appropriate structure > without removing previous capabilities. > > On Sat, May 15, 2010 at 11:34 PM, suryadev vasudev < > suryadev.vasu...@gmail.com> wrote: > >> We are considering Neo4J for a decision making application. The application >> is analogous to a Library having 15 million books. We have BOOKS, >> PUBLISHERS >> and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to >> one publisher. STUDENTS may borrow a book, reserve a book or return a >> borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY >> and RETURNS between BOOKS and STUDENTS. >> When we traverse starting from a publisher, the traversing speed is >> 200-1000 >> nodes per millisecond. This is pure traversal to get a book count by >> publisher. >> The Neo is failing us when we make a slightly complex query. >> Starting with a publisher, retrieve all books that are currently lent out. >> Starting with a publisher, retrieve all books that were borrowed between >> May >> 1 2010 and May 10 2010. >> The response time we got was 1-2 millisecond per book. >> Before running the test, we created between 0-3 relationships for each >> book. >> We have seeded 15,000 students ,1000 publishers and 15 million books. >> And the server is a 8GB RAM machine. >> I wonder why the traversal is drastically slow? >> Regards >> SDev >> ___ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Traversal Speed is just 1 millisecond per node
My 2 cents, without knowing the structure of your data (which is needed to really answer the question). I assume when you say 'slightly complex query' you are probably using a custom traverser that looks at properties of nodes and/or relationships to make the decision, or possibly even follows a relationship to make the decision. All of these options will slow things down. Your original traverser probably only considered relationship types and directions, loading from only the relationships table. The new one hits the properties tables, possibly for both nodes and relationships. If this is the case, the improvement is much the same as you would do in a relational database, which is to index the data. However, indexing is different in a graph, and I think the best way to do that in your case is to build additional graph structures that allow the new traverser to only look at relationships. For example, you say that you are interested in books from a particular published currently lent out. Consider having the publisher not have direct relationships to their books (a publisher index), but instead have relationships to 'borrowed' and 'not borrowed' nodes and those are related to the books (effectively a combined publisher-borrowing_status index). When a book is borrowed, move it's relationship. Since borrowing a book occurs occasionally over very long times (days or weeks), this database edit has no performance cost, but makes the query you are looking for very fast. To add a time period to this situation, consider the TimeLineIndex. Alternatively extend the previous concept to have nodes representing books borrowed on certain days, for example. The real solution is really dependent on your data and the kinds of queries you plan to make. You probably already made the publisher-book relationships because you planned to make a query like that. The more complex queries you wish to make the more complex structure you will probably devise. Neo4j is great in that you can keep optimizing by adding appropriate structure without removing previous capabilities. On Sat, May 15, 2010 at 11:34 PM, suryadev vasudev < suryadev.vasu...@gmail.com> wrote: > We are considering Neo4J for a decision making application. The application > is analogous to a Library having 15 million books. We have BOOKS, > PUBLISHERS > and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to > one publisher. STUDENTS may borrow a book, reserve a book or return a > borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY > and RETURNS between BOOKS and STUDENTS. > When we traverse starting from a publisher, the traversing speed is > 200-1000 > nodes per millisecond. This is pure traversal to get a book count by > publisher. > The Neo is failing us when we make a slightly complex query. > Starting with a publisher, retrieve all books that are currently lent out. > Starting with a publisher, retrieve all books that were borrowed between > May > 1 2010 and May 10 2010. > The response time we got was 1-2 millisecond per book. > Before running the test, we created between 0-3 relationships for each > book. > We have seeded 15,000 students ,1000 publishers and 15 million books. > And the server is a 8GB RAM machine. > I wonder why the traversal is drastically slow? > Regards > SDev > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Traversal Speed is just 1 millisecond per node
We are considering Neo4J for a decision making application. The application is analogous to a Library having 15 million books. We have BOOKS, PUBLISHERS and STUDENTS as nodes. Every book will have a PUBLISHED_BY relationship to one publisher. STUDENTS may borrow a book, reserve a book or return a borrowed book. Each is a relationship type meaning BORROWED_BY, RESERVED_BY and RETURNS between BOOKS and STUDENTS. When we traverse starting from a publisher, the traversing speed is 200-1000 nodes per millisecond. This is pure traversal to get a book count by publisher. The Neo is failing us when we make a slightly complex query. Starting with a publisher, retrieve all books that are currently lent out. Starting with a publisher, retrieve all books that were borrowed between May 1 2010 and May 10 2010. The response time we got was 1-2 millisecond per book. Before running the test, we created between 0-3 relationships for each book. We have seeded 15,000 students ,1000 publishers and 15 million books. And the server is a 8GB RAM machine. I wonder why the traversal is drastically slow? Regards SDev ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Metamodel DataRange class
The class DataRange in the meta model component at this moment is a subclass of RdfDataTypeRange, which in my opinion is not optimal. DataRange is used to enumerate the values a certain PropertyType can have and in that sense can be seen as a DatatypeClassRange with further restrictions. DatatypeClassRange has some dependencies on RDF but only in its rdfLiteralToJavaObject and javaObjectToRdfLiteral methods, both of which are not required to be used. DataRange on the other hand has dependencies on RDF in the internalLoad and internalStore methods, which use is not optional. As a result it is possible to give the DatatypeClassRange constructor as argument the class java.lang.String, and do the appropriate cast of Object to String in user code. The same is not possible with DataRange, which has a constructor having a String as first argument, which needs to correspond with some predefined RDF types. So instead of giving the argument "java.lang.String" and doing the proper cast in user code, the argument needs to be "http://www.w3.org/2001/XMLSchema#string";. This dependency on RDF is far from ideal. I would like to be able to say that a DataRange can have any type of class, and do the proper casting/transformation in user code. With DatatypeClassRange I can do that. It is possible to use any class for DatatypeClassRange and do serialization to and from a property value in user code (after all, any serializable class can be written into a byte array or into a String). My suggestion is to make DataRange a subclass of DatatypeClassRange, changing the first constructor argument from String into Class and have a check that all Objects passed as the second constructor argument conform to that Class. Of course I am willing to make this change, but I'd like to have feedback before doing so. Niels Hoogeveen _ New Windows 7: Find the right PC for you. Learn more. http://windows.microsoft.com/shop ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing Relationships?
I use relationships to encode paths in the graph based on the meta model. For example: Class(Article) --> Relationship(Author) --> Class(User) --> Property(Username) Right now I encode this using an md5 encoding of the above path, add a property to the first entity in the path, using the md5 encoding as the key (the value is irrelevant), relationships (with a DynamicRelationshipType with a name equal to the md5 key) are used to link the various items in the path. Finding the path requires a traversal from the first Class node in the path, following the given relationships. This traversal can potentially be expensive when a class takes many instances (all have a relationship to the class). When relationships were indexed, the path could be encoded by giving each relationship making up the path a property encoding the path, then use the index to retrieve all relationships making up the path and lay those relationships head to toe to construct the path. No longer would a traversal be necessary and the cost of the operation only depends on the number of elements in the path, and not to the number of relationships one of the elements in the path can potentially have. Niels From: tobias.ivars...@neotechnology.com > Date: Sat, 15 May 2010 13:32:36 +0200 > To: user@lists.neo4j.org > Subject: Re: [Neo] Indexing Relationships? > > There is no indexing component for Relationships and there has never been > one. > The interesting question that you should have asked is: _will_ there ever be > one. > > The answer to that question is: maybe, it has been prototyped as part of a > simplification of the entire indexing API. > > The interesting thing to me would be to get a concrete use case for this. > I've heard requests for being able to index relationships a number of times, > but never a concrete use case for being able to do so. It's always been > vague hand waving like in this case "we have data that is heavily centered > on the relationships rather than nodes", WHAT is that data? WHY does it need > to be centered around the relationships? If you say that you have use cases > like these I believe that you do, I have no reason to believe that you are > lying, why would you. But I want to understand those use cases, and I want > to understand them in a setting where having support for indexing > relationships adds value to the business. > > I would like it if we were able to index Relationships as part of the core > API by version 1.2, and having an actual use case for when it would improve > the implementation of an actual domain would certainly help speed up the > process, perhaps we could even sneak it into version 1.1. > > Cheers, > Tobias > > On Fri, May 14, 2010 at 5:05 PM, Alex D'Amour wrote: > > > Hi all, > > > > I am working on an application that stores large network data from multiple > > domains in Neo4j databases. The object is to allow users to upload network > > datasets and then expose them to researchers over the web, allowing > > researchers to subset the data and eventually download their own subgraph > > of > > the original dataset. > > > > Many of the operations that we intend to support are covered by the Lucene > > and Traversal frameworks. However, we'd also like to perform relationship > > lookups in the same way that we perform node lookups since many networks > > have data that are heavily centered on the Relationships rather than nodes. > > Is there or has there ever been an indexing component for Relationships in > > Neo4j? If not, how difficult would it be to port the LuceneIndexService to > > index relationships as well as nodes (i.e. how much of the code is specific > > to Nodes rather than PropertyContainers)? > > > > I realize that this probably isn't the ideal way to interact with the graph > > and that better domain modeling would probably solve this if the framework > > didn't have to be generic. But in this case we'd like to support this type > > of interaction with simple graph structures with only one type of node and > > only one type of relationship since they are the structures that social > > network researchers are the most interested in. > > > > Thanks, > > Alex > > ___ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Tobias Ivarsson > Hacker, Neo Technology > www.neotechnology.com > Cellphone: +46 706 534857 > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _ New Windows 7: Find the right PC for you. Learn more. http://windows.microsoft.com/shop ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Fwd: Node not in use exception when using tx event handler
Create a ticket for it, I've tagged it for reviewing when I get back to the office, you had the great unfortune to send this right at the beginning of a 4 day Swedish holiday. If you could supply code that can reproduce it that would be even better. Cheers, Tobias On Sat, May 15, 2010 at 8:42 PM, Garrett Smith wrote: > Is this something I should open a ticket for, or is it something the > dev team is aware of? Or is it user error? > > Garrett > > > -- Forwarded message -- > From: Garrett Smith > Date: Thu, May 13, 2010 at 2:52 PM > Subject: Node not in use exception when using tx event handler > To: Neo4j Users > > > I'm running into the exception below when I try to delete a node when > first starting up a graph database. > > I'm experimenting with a transaction event handler. The error, > however, occurs before my handler gets called. > > org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: Node[10] not in > use >at > org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.nodeGetProperties(WriteTransaction.java:1009) >at > org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaConnection$NodeEventConsumerImpl.getProperties(NeoStoreXaConnection.java:228) >at > org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.nodeLoadProperties(NioNeoDbPersistenceSource.java:432) >at > org.neo4j.kernel.impl.persistence.PersistenceManager.loadNodeProperties(PersistenceManager.java:100) >at > org.neo4j.kernel.impl.core.NodeManager.loadProperties(NodeManager.java:628) >at > org.neo4j.kernel.impl.core.NodeImpl.loadProperties(NodeImpl.java:84) >at > org.neo4j.kernel.impl.core.Primitive.ensureFullLightProperties(Primitive.java:591) >at > org.neo4j.kernel.impl.core.Primitive.getAllCommittedProperties(Primitive.java:604) >at > org.neo4j.kernel.impl.core.LockReleaser.populateNodeRelEvent(LockReleaser.java:855) >at > org.neo4j.kernel.impl.core.LockReleaser.getTransactionData(LockReleaser.java:740) >at > org.neo4j.kernel.impl.core.NodeManager.getTransactionData(NodeManager.java:914) >at > org.neo4j.kernel.impl.core.TransactionEventsSyncHook.beforeCompletion(TransactionEventsSyncHook.java:39) >at > org.neo4j.kernel.impl.transaction.TransactionImpl.doBeforeCompletion(TransactionImpl.java:341) >at > org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:556) >at > org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:103) >at > org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:410) >at gv.graph.Nodes.deleteNode(Nodes.java:349) >at gv.graph.NodeDelete.handle(NodeDelete.java:20) >at gv.graph.MessageHandler.run(MessageHandler.java:59) >at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:619) > May 13, 2010 2:42:56 PM > org.neo4j.kernel.impl.transaction.TransactionImpl doBeforeCompletion > WARNING: Caught exception from tx > syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6 > ] > beforeCompletion() > May 13, 2010 2:42:56 PM > org.neo4j.kernel.impl.transaction.TransactionImpl doAfterCompletion > WARNING: Caught exception from tx > syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6 > ] > afterCompletion() > > Code details: > > URL: https://svn.neo4j.org/components/kernel/trunk > Repository Root: https://svn.neo4j.org > Repository UUID: 0b971d98-bb2f-0410-8247-b05b2b5feb2a > Revision: 4415 > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Tobias Ivarsson Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Fwd: Node not in use exception when using tx event handler
Is this something I should open a ticket for, or is it something the dev team is aware of? Or is it user error? Garrett -- Forwarded message -- From: Garrett Smith Date: Thu, May 13, 2010 at 2:52 PM Subject: Node not in use exception when using tx event handler To: Neo4j Users I'm running into the exception below when I try to delete a node when first starting up a graph database. I'm experimenting with a transaction event handler. The error, however, occurs before my handler gets called. org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: Node[10] not in use at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.nodeGetProperties(WriteTransaction.java:1009) at org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaConnection$NodeEventConsumerImpl.getProperties(NeoStoreXaConnection.java:228) at org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.nodeLoadProperties(NioNeoDbPersistenceSource.java:432) at org.neo4j.kernel.impl.persistence.PersistenceManager.loadNodeProperties(PersistenceManager.java:100) at org.neo4j.kernel.impl.core.NodeManager.loadProperties(NodeManager.java:628) at org.neo4j.kernel.impl.core.NodeImpl.loadProperties(NodeImpl.java:84) at org.neo4j.kernel.impl.core.Primitive.ensureFullLightProperties(Primitive.java:591) at org.neo4j.kernel.impl.core.Primitive.getAllCommittedProperties(Primitive.java:604) at org.neo4j.kernel.impl.core.LockReleaser.populateNodeRelEvent(LockReleaser.java:855) at org.neo4j.kernel.impl.core.LockReleaser.getTransactionData(LockReleaser.java:740) at org.neo4j.kernel.impl.core.NodeManager.getTransactionData(NodeManager.java:914) at org.neo4j.kernel.impl.core.TransactionEventsSyncHook.beforeCompletion(TransactionEventsSyncHook.java:39) at org.neo4j.kernel.impl.transaction.TransactionImpl.doBeforeCompletion(TransactionImpl.java:341) at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:556) at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:103) at org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:410) at gv.graph.Nodes.deleteNode(Nodes.java:349) at gv.graph.NodeDelete.handle(NodeDelete.java:20) at gv.graph.MessageHandler.run(MessageHandler.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) May 13, 2010 2:42:56 PM org.neo4j.kernel.impl.transaction.TransactionImpl doBeforeCompletion WARNING: Caught exception from tx syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6] beforeCompletion() May 13, 2010 2:42:56 PM org.neo4j.kernel.impl.transaction.TransactionImpl doAfterCompletion WARNING: Caught exception from tx syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6] afterCompletion() Code details: URL: https://svn.neo4j.org/components/kernel/trunk Repository Root: https://svn.neo4j.org Repository UUID: 0b971d98-bb2f-0410-8247-b05b2b5feb2a Revision: 4415 ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Implementing new persistence source
Hi everyone, I would be very interested in getting more information that would help me implement new persistence sources. I have read (there: http://www.mail-archive.com/user@lists.neo4j.org/msg6.html) that it should not be that difficult (or, at least, it is possible) but I still have some difficulties while navigating through the sources to understand exactly how it should be done. Besides, I have read that using MySQL was less efficient than Nioneo. Was the difference really important ? Best, Jawad ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing Relationships?
I also find interest in this functionality, and would like to contribute a possible use case. We have just wrapped up a project using Neo4j to describe a Transportation Network Graph (http://code.google.com/p/gotogate/). In this graph each node is a stop for some transport, while there is one realation between two nodes for each transportation that travels between these. The interesting part would be that if this graph were to be translated to a map overlay for a single route to be presented to the user by requrest for example, we would now have to traverse the entire graph until we find a relation with a name that matches the requested transport name. This would be less than optimal in a large transportation network such as for an entire country. Obviously the indexing for such a graph would be costly to include all relations, but I agree with Alex D'Amour that the functionality is useful. Traversing this graph to find one specific transportation line would be way more costly. I would like to see this functionality as optional on the index service, since it would slow down implementations that do not need relation indexing. Cheers Kim > From: tobias.ivars...@neotechnology.com > Date: Sat, 15 May 2010 13:32:36 +0200 > To: user@lists.neo4j.org > Subject: Re: [Neo] Indexing Relationships? > > There is no indexing component for Relationships and there has never been > one. > The interesting question that you should have asked is: _will_ there ever be > one. > > The answer to that question is: maybe, it has been prototyped as part of a > simplification of the entire indexing API. > > The interesting thing to me would be to get a concrete use case for this. > I've heard requests for being able to index relationships a number of times, > but never a concrete use case for being able to do so. It's always been > vague hand waving like in this case "we have data that is heavily centered > on the relationships rather than nodes", WHAT is that data? WHY does it need > to be centered around the relationships? If you say that you have use cases > like these I believe that you do, I have no reason to believe that you are > lying, why would you. But I want to understand those use cases, and I want > to understand them in a setting where having support for indexing > relationships adds value to the business. > > I would like it if we were able to index Relationships as part of the core > API by version 1.2, and having an actual use case for when it would improve > the implementation of an actual domain would certainly help speed up the > process, perhaps we could even sneak it into version 1.1. > > Cheers, > Tobias > > On Fri, May 14, 2010 at 5:05 PM, Alex D'Amour wrote: > > > Hi all, > > > > I am working on an application that stores large network data from multiple > > domains in Neo4j databases. The object is to allow users to upload network > > datasets and then expose them to researchers over the web, allowing > > researchers to subset the data and eventually download their own subgraph > > of > > the original dataset. > > > > Many of the operations that we intend to support are covered by the Lucene > > and Traversal frameworks. However, we'd also like to perform relationship > > lookups in the same way that we perform node lookups since many networks > > have data that are heavily centered on the Relationships rather than nodes. > > Is there or has there ever been an indexing component for Relationships in > > Neo4j? If not, how difficult would it be to port the LuceneIndexService to > > index relationships as well as nodes (i.e. how much of the code is specific > > to Nodes rather than PropertyContainers)? > > > > I realize that this probably isn't the ideal way to interact with the graph > > and that better domain modeling would probably solve this if the framework > > didn't have to be generic. But in this case we'd like to support this type > > of interaction with simple graph structures with only one type of node and > > only one type of relationship since they are the structures that social > > network researchers are the most interested in. > > > > Thanks, > > Alex > > ___ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Tobias Ivarsson > Hacker, Neo Technology > www.neotechnology.com > Cellphone: +46 706 534857 > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _ Windows 7: Se direkte-TV fra den bærbare PCen. Finn ut mer. http://windows.microsoft.com/windows-7 ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing Relationships?
Tobias, On 5/15/2010 7:32 AM, Tobias Ivarsson wrote: > There is no indexing component for Relationships and there has never been > one. > The interesting question that you should have asked is: _will_ there ever be > one. > > The answer to that question is: maybe, it has been prototyped as part of a > simplification of the entire indexing API. > > The interesting thing to me would be to get a concrete use case for this. > I've heard requests for being able to index relationships a number of times, > but never a concrete use case for being able to do so. It's always been > vague hand waving like in this case "we have data that is heavily centered > on the relationships rather than nodes", WHAT is that data? WHY does it need > to be centered around the relationships? If you say that you have use cases > like these I believe that you do, I have no reason to believe that you are > lying, why would you. But I want to understand those use cases, and I want > to understand them in a setting where having support for indexing > relationships adds value to the business. > > I have never tried to formulate a specific use case for indexing relationships but your question prompted me to do some searching on the issue. Devanand Rajoo Radindran - KeyConcept: Exploiting Hierarchical Relationships for Conceptually Indexed Data (thesis, http://www.ittc.ku.edu/research/thesis/documents/devanand__ravindran_thesis.pdf) Exploits the hierarchical relationships for pruning and retrieval. Xiao Renguo, et. al. - An Indexing Structure for Aggregation Relationships in OODB (http://www.springerlink.com/content/5mj5k9mgdntjvdxp/) Features of aggregation relationships discussed. (I am not logged so all I can see is the abstract.) Hsinchun Chen, et. al. Semantic Indexing and searching using a Hopfield net. (http://ai.arizona.edu/intranet/papers/SemanitcIndexing.pdf) Generated *10,000,000 relationships.* The point being that words/terms occur in *relationship* to each other, authors, documents, domains, etc. Without context (read relationships) express or implied, there is no semantic. The ability to explore relationships, which are the basis for any semantic, would be enhanced by the ability to index relationships. Yes? Hope you are having a great weekend! Patrick -- Patrick Durusau patr...@durusau.net Chair, V1 - US TAG to JTC 1/SC 34 Convener, JTC 1/SC 34/WG 3 (Topic Maps) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing Relationships?
There is no indexing component for Relationships and there has never been one. The interesting question that you should have asked is: _will_ there ever be one. The answer to that question is: maybe, it has been prototyped as part of a simplification of the entire indexing API. The interesting thing to me would be to get a concrete use case for this. I've heard requests for being able to index relationships a number of times, but never a concrete use case for being able to do so. It's always been vague hand waving like in this case "we have data that is heavily centered on the relationships rather than nodes", WHAT is that data? WHY does it need to be centered around the relationships? If you say that you have use cases like these I believe that you do, I have no reason to believe that you are lying, why would you. But I want to understand those use cases, and I want to understand them in a setting where having support for indexing relationships adds value to the business. I would like it if we were able to index Relationships as part of the core API by version 1.2, and having an actual use case for when it would improve the implementation of an actual domain would certainly help speed up the process, perhaps we could even sneak it into version 1.1. Cheers, Tobias On Fri, May 14, 2010 at 5:05 PM, Alex D'Amour wrote: > Hi all, > > I am working on an application that stores large network data from multiple > domains in Neo4j databases. The object is to allow users to upload network > datasets and then expose them to researchers over the web, allowing > researchers to subset the data and eventually download their own subgraph > of > the original dataset. > > Many of the operations that we intend to support are covered by the Lucene > and Traversal frameworks. However, we'd also like to perform relationship > lookups in the same way that we perform node lookups since many networks > have data that are heavily centered on the Relationships rather than nodes. > Is there or has there ever been an indexing component for Relationships in > Neo4j? If not, how difficult would it be to port the LuceneIndexService to > index relationships as well as nodes (i.e. how much of the code is specific > to Nodes rather than PropertyContainers)? > > I realize that this probably isn't the ideal way to interact with the graph > and that better domain modeling would probably solve this if the framework > didn't have to be generic. But in this case we'd like to support this type > of interaction with simple graph structures with only one type of node and > only one type of relationship since they are the structures that social > network researchers are the most interested in. > > Thanks, > Alex > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Tobias Ivarsson Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user