Re: [Neo] Garbage nodes
All very good points! And i have learnt to never underestimate others experiences. However, I think that in some application garbage-collection might be useful, especially apps with very interconnected graphs. A traverser would do the job, but what if there could be a hook in Neo that could be called whenever a connection/node is created/deleted. Ten we could at the end of the transaction check if a node has no more incoming connections and is not a root node. The node can be safely deleted, and this would cascade such that all unreachable nodes are deleted. Since a hook can be implemented outside of the neo core, this would then be an optional feature just a thought. -atle On Wed, 2009-11-11 at 12:43 -0500, Todd Stavish wrote: Based on my experience with Object Databases, I can't recommend a persistent garbage collector implementation (PGC). Just like all parts of your object model don't need to be expressed in your graph, persistence doesn't need garbage collectors. Based on what I've seen implemented by customers, GC's are meant for dynamic memory allocation only. One consequence, we found, is that the PGC often delays the application. To avoid this, you end up avoiding or delaying the PGC for a long time, which means there is no automatic GC, so ultimately the application is making a manual process of a supposedly automatic operation. The delay also ends up bloating the database, ie the impact of not deleting the nodes is that you end up using more storage space, when you create new nodes you end up allocating more because the old ones are still around. Not sure about Neo4j, but this also ended up fragmenting the database files. So for us, we found that we were setting ourselves up for bloat and fragmentation, and poor performance because you are going to have access more data. It was also kind of fragile, because if you make a mistake and not root something, you may end up having the PGC accidentally delete something that you wanted. We had a tragically sad experience were a customer was complaining about the PGC taking a long time, the PGC was deleting more objects then they intended, good data was lost (the performance was fine). Please take this as a data point on the decision, it's possible that PGCs could be implemented in a way that don't experience these problems, and end-users could be guided to avoid pitfalls, or PGCs might work for a certain class of applications and not others. -Todd On Fri, Nov 6, 2009 at 12:47 PM, Johan Svensson jo...@neotechnology.com wrote: I'll jump in on this one then. Node or relationship IDs will only be reused if they are deleted. There is no possibility that any IDs change without deleting. Regards, -Johan On Fri, Nov 6, 2009 at 3:11 PM, Andrea Puddu kiedi...@gmail.com wrote: Hi Peter, I'm sorry if I jump in the discussion. I have a question about garbage collection. Are node IDs reused only if some node is deleted? I mean, is there any possibility that some IDs change without deleting any nodes? And what about relationship IDs? Thank you in advance. Andrea ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Garbage nodes
Based on my experience with Object Databases, I can't recommend a persistent garbage collector implementation (PGC). Just like all parts of your object model don't need to be expressed in your graph, persistence doesn't need garbage collectors. Based on what I've seen implemented by customers, GC's are meant for dynamic memory allocation only. One consequence, we found, is that the PGC often delays the application. To avoid this, you end up avoiding or delaying the PGC for a long time, which means there is no automatic GC, so ultimately the application is making a manual process of a supposedly automatic operation. The delay also ends up bloating the database, ie the impact of not deleting the nodes is that you end up using more storage space, when you create new nodes you end up allocating more because the old ones are still around. Not sure about Neo4j, but this also ended up fragmenting the database files. So for us, we found that we were setting ourselves up for bloat and fragmentation, and poor performance because you are going to have access more data. It was also kind of fragile, because if you make a mistake and not root something, you may end up having the PGC accidentally delete something that you wanted. We had a tragically sad experience were a customer was complaining about the PGC taking a long time, the PGC was deleting more objects then they intended, good data was lost (the performance was fine). Please take this as a data point on the decision, it's possible that PGCs could be implemented in a way that don't experience these problems, and end-users could be guided to avoid pitfalls, or PGCs might work for a certain class of applications and not others. -Todd On Fri, Nov 6, 2009 at 12:47 PM, Johan Svensson jo...@neotechnology.com wrote: I'll jump in on this one then. Node or relationship IDs will only be reused if they are deleted. There is no possibility that any IDs change without deleting. Regards, -Johan On Fri, Nov 6, 2009 at 3:11 PM, Andrea Puddu kiedi...@gmail.com wrote: Hi Peter, I'm sorry if I jump in the discussion. I have a question about garbage collection. Are node IDs reused only if some node is deleted? I mean, is there any possibility that some IDs change without deleting any nodes? And what about relationship IDs? Thank you in advance. Andrea ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Garbage nodes
I'd vote for a garbage collector too. I often have code that deletes large trees of nodes, and it takes time, but I'd rather disconnect the tree, load my new data and then delete the old data later (during idle time). Obviously this can be done at application level, but it would be a nice convenience to have a sweeper that does this for us. Even one called manually, but preferably a garbage collector. I guess one issue with this is that, for performance reasons, you have two competing requirements: * Wanting data loaded and deleted in larger chunks (reduce disk fragmentation) * Have garbage collection run inconspiciously so it does not affect app performance (I think these two conflict) On Fri, Nov 6, 2009 at 3:21 PM, Atle Prange atle.pra...@gmail.com wrote: Thanks for the quick reply On Fri, 2009-11-06 at 14:57 +0100, Peter Neubauer wrote: Hi Atle, there is no automatic garbage collection in Neo4j. If you disconnect a node or subgraph form the root, it will still exist in the DB, but of course it is harder to reach via traversals, you will have to look up the graph by IDs. Ah, that was what i expected, but not what i hoped for (or even dreamed about?) Then i guess i have to write a garbage-collector-traverser that cleans nodes that are no longer reachable from well known/root nodes, and run the traversal in given intervals. -atle However, on a storage level, node IDs are reused when they are freed by deleting a node, so, you can end up with new nodes having smaller IDs than older nodes if you deleted a node with a very low ID before. Is that what you meant by garbage collection? Cheers, /peter neubauer Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org- Relationships count. http://www.oredev.se - The best thing since the wall came down. http://www.linkedprocess.org - Distributed computing on LinkedData scale On Fri, Nov 6, 2009 at 2:29 PM, Atle Prange atle.pra...@gmail.com wrote: Hi, is there some sort of garbage collection in Neo4j? I have a vague memory of reading something about that all nodes must be reached from the root node, if they are not they are removed. Is this correct? -atle ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user