Re: [Neo] Garbage nodes

2009-11-13 Thread Atle Prange
All very good points! And i have learnt to never underestimate others
experiences. However, I think that in some application
garbage-collection might be useful, especially apps with very
interconnected graphs.

A traverser would do the job, but what if there could be a hook in Neo
that could be called whenever a connection/node is created/deleted. Ten
we could at the end of the transaction check if a node has no more
incoming connections and is not a root node. The node can be safely
deleted, and this would cascade such that all unreachable nodes are
deleted.
Since a hook can be implemented outside of the neo core, this would then
be an optional feature

just a thought.

-atle


On Wed, 2009-11-11 at 12:43 -0500, Todd Stavish wrote:
 Based on my experience with Object Databases, I can't recommend a
 persistent garbage collector implementation (PGC). Just like all parts
 of your object model don't need to be expressed in your graph,
 persistence doesn't need garbage collectors.
 
 Based on what I've seen implemented by customers, GC's are meant for
 dynamic memory allocation only.  One consequence, we found, is that
 the PGC often delays the application. To avoid this, you end up
 avoiding or delaying the PGC for a long time, which means there is no
 automatic GC, so ultimately the application is making a manual process
 of a supposedly automatic operation.
 
 The delay also ends up bloating the database, ie the impact of not
 deleting the nodes is that you end up using more storage space, when
 you create new nodes you end up allocating more because the old ones
 are still around.
 
 Not sure about Neo4j, but this also ended up fragmenting the database
 files. So for us,  we found that we were setting ourselves up for
 bloat and fragmentation, and poor performance because you are going to
 have access more data. It was also kind of fragile, because if you
 make a mistake and not root something, you may end up having the PGC
 accidentally delete something that you wanted. We had a tragically sad
 experience were a customer was complaining about the PGC taking a long
 time, the PGC was deleting more objects then they intended, good data
 was lost (the performance was fine).
 
 Please take this as a data point on the decision, it's possible that
 PGCs could be implemented in a way that don't experience these
 problems, and end-users could be guided to avoid pitfalls, or PGCs
 might work for a certain class of applications and not others.
 
 -Todd
 
 
 
 
 
 
 
 On Fri, Nov 6, 2009 at 12:47 PM, Johan Svensson jo...@neotechnology.com 
 wrote:
  I'll jump in on this one then.
 
  Node or relationship IDs will only be reused if they are deleted.
  There is no possibility that any IDs change without deleting.
 
  Regards,
  -Johan
 
  On Fri, Nov 6, 2009 at 3:11 PM, Andrea Puddu kiedi...@gmail.com wrote:
  Hi Peter,
 
  I'm sorry if I jump in the discussion. I have a question about
  garbage collection.
  Are node IDs reused only if some node is deleted? I mean, is there any
  possibility that some IDs change without deleting any nodes? And what
  about relationship IDs?
 
  Thank you in advance.
  Andrea
 
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Garbage nodes

2009-11-11 Thread Todd Stavish
Based on my experience with Object Databases, I can't recommend a
persistent garbage collector implementation (PGC). Just like all parts
of your object model don't need to be expressed in your graph,
persistence doesn't need garbage collectors.

Based on what I've seen implemented by customers, GC's are meant for
dynamic memory allocation only.  One consequence, we found, is that
the PGC often delays the application. To avoid this, you end up
avoiding or delaying the PGC for a long time, which means there is no
automatic GC, so ultimately the application is making a manual process
of a supposedly automatic operation.

The delay also ends up bloating the database, ie the impact of not
deleting the nodes is that you end up using more storage space, when
you create new nodes you end up allocating more because the old ones
are still around.

Not sure about Neo4j, but this also ended up fragmenting the database
files. So for us,  we found that we were setting ourselves up for
bloat and fragmentation, and poor performance because you are going to
have access more data. It was also kind of fragile, because if you
make a mistake and not root something, you may end up having the PGC
accidentally delete something that you wanted. We had a tragically sad
experience were a customer was complaining about the PGC taking a long
time, the PGC was deleting more objects then they intended, good data
was lost (the performance was fine).

Please take this as a data point on the decision, it's possible that
PGCs could be implemented in a way that don't experience these
problems, and end-users could be guided to avoid pitfalls, or PGCs
might work for a certain class of applications and not others.

-Todd







On Fri, Nov 6, 2009 at 12:47 PM, Johan Svensson jo...@neotechnology.com wrote:
 I'll jump in on this one then.

 Node or relationship IDs will only be reused if they are deleted.
 There is no possibility that any IDs change without deleting.

 Regards,
 -Johan

 On Fri, Nov 6, 2009 at 3:11 PM, Andrea Puddu kiedi...@gmail.com wrote:
 Hi Peter,

 I'm sorry if I jump in the discussion. I have a question about
 garbage collection.
 Are node IDs reused only if some node is deleted? I mean, is there any
 possibility that some IDs change without deleting any nodes? And what
 about relationship IDs?

 Thank you in advance.
 Andrea

 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Garbage nodes

2009-11-07 Thread Craig Taverner
I'd vote for a garbage collector too. I often have code that deletes large
trees of nodes, and it takes time, but I'd rather disconnect the tree, load
my new data and then delete the old data later (during idle time). Obviously
this can be done at application level, but it would be a nice convenience to
have a sweeper that does this for us. Even one called manually, but
preferably a garbage collector.

I guess one issue with this is that, for performance reasons, you have two
competing requirements:
* Wanting data loaded and deleted in larger chunks (reduce disk
fragmentation)
* Have garbage collection run inconspiciously so it does not affect app
performance
(I think these two conflict)

On Fri, Nov 6, 2009 at 3:21 PM, Atle Prange atle.pra...@gmail.com wrote:

 Thanks for the quick reply

 On Fri, 2009-11-06 at 14:57 +0100, Peter Neubauer wrote:
  Hi Atle,
  there is no automatic garbage collection in Neo4j. If you disconnect a
  node or subgraph form the root, it will still exist in the DB, but of
  course it is harder to reach via traversals, you will have to look up
  the graph by IDs.
 
 Ah, that was what i expected, but not what i hoped for (or even dreamed
 about?)

 Then i guess i have to write a garbage-collector-traverser that cleans
 nodes that are no longer reachable from well known/root nodes, and run
 the traversal in given intervals.

 -atle


  However, on a storage level, node IDs are reused when they are freed
  by deleting a node, so, you can end up with new nodes having smaller
  IDs than older nodes if you deleted a node with a very low ID before.
  Is that what you meant by garbage collection?
 
  Cheers,
 
  /peter neubauer
 
  Neo Technology
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org- Relationships count.
  http://www.oredev.se   - The best thing since the wall came
 down.
  http://www.linkedprocess.org   - Distributed computing on LinkedData
 scale
 
 
 
  On Fri, Nov 6, 2009 at 2:29 PM, Atle Prange atle.pra...@gmail.com
 wrote:
   Hi,
  
   is there some sort of garbage collection in Neo4j? I have a vague
 memory
   of reading something about that all nodes must be reached from the root
   node, if they are not they are removed. Is this correct?
  
   -atle
  
   ___
   Neo mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user