Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hi everybody, -- Sorry the delay in my answer First, I made slides (a PDF) about the embed method and how its algorithm works: https://drive.google.com/file/d/0B8eKvcE21R9RMkNPYlZiOTEwZnM/view I don't know if the Smalltalk community or other communities have this kind of method (embed) but I believe not, because otherwise we would probably know it! It is true that gc is not a new concept in db world but but perhaps before it has been used quite separately in the db. What is new in the embed method? I think that at least the high level method/abstraction where the result of the embed operation is defined clearly between two memory systems (i.e. “the network level understanding”). Also I think that in the next lower level binding the update part and the gc part together (“embed = update + gc”) is a new idea. And what is important in this binding is that the the update part is able to collect information for the gc part and the gc part has full network/graph level understanding in db even it is done locally and therefore efficiently. Reference cycles (circular substructures) are not problem in the algorithm as is described in the demo implementation. On the other hand, gc can be done more easily when garbage (or db) does not contain circular substructures and both algorithms can be applied in real systems, the simpler first. It is true that storing reference counts takes some extra bytes in nodes in db. That does not sound a big problem, because objects often contain quite much user data and memories are quite big today, even in small devices. Updating of reference counts is not a big performance problem in my opinion. If reference counts must be updated, it often means that you are making a big update operation when updating of reference counts is a small part of the operation. In addition, you can often update reference count with the same SQL update statement (in O/R systems) which updates fields containing user data. (However in a pure reference lose you are not updating the user data of an object.) Also note that it is possible to keep reference count information (sum of increments and decrements) in the run-time memory during the embed method. Therefore, if, for example, a node loses a reference and gets a new reference in an update operation, no updates are needed for the reference count. I demonstrate this in the demo implementation in the level of single (parent) node (the method handleReferencesFromGrayNodesInDB()). Perhaps in practice most update operations do not change references of objects. In these cases the embed method is, of course, a pure store/update operation. Personally I think that the db model where the persistence is a concept in the db, and is defined by reachability from persistent root nodes is natural for programmers and designers, and very object db like. Defining temporally, during a single update operation, the persistence of a node or some nodes does not sound theoretically/practically clear to me. Note also that when persistence is defined by reachability, you can at anytime make a selected node in the db as a persistent root node to be sure that it, and reachable nodes from it, will not disappear in future operations. This simple method is implemented in the demo implementation. It is incrORC(), the opposite is decrORC() (which is a general delete operation). Just as Tilmann says I also think that the embed method has a solid theoretical foundation(!). Therefore, in my opinion, the embed method is not just a one new update method. If I think with the terms of protocols then, in my opinion, the embed method defines a new natural higher level layer between a client program and the db (remote memory). This layer is clean and makes updating a natural and easy operation for a programmer; updating becomes nearly as easy as modifying object structures in the run-time memory. And even though the embed method fits best for complex update operations it is also useful in quite simple situations, for example, when a single reference is replaced with a new one and perhaps some scalar fields of some objects are updated at the same time. There are of course some worst, or tedious, cases for the plain embed method. However it is possible to derive additional “decorated” methods from the plain embed method. Also these methods are still quite general and easy to understand for a user as network level operations. I describe examples of these methods (”handlings”) in Notes paragraph in my blog. http://hvirkkun.blogspot.fi/2016/04/a-new-communication-theory-on-complex.html Gc can be made even more efficient, and it can be helped easily if an application has extra knowledge of non-garbage nodes, as I explain in the demo implementation. And, of course, embed can be called for any substructure, which increases efficiency of the operation. These features tackle many inefficiencies and make the embed more versatile. However, still also simple update commands provided separately can be u
Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
HI Heikki, > On May 20, 2016, at 10:55 AM, Hei Virk wrote: > > Hi everybody: > > I will comment later more. > > -I am just going to keep a presentation about the topic in my university > here in Finland for theorists and I am making those slides at a moment.. > > One social observation: > In my understanding the situation has been that people (me too) have not > been understood that that a modified object structure can always be > returned back into the database in a deterministic and meaningful way. - I > mean that the well defined result exists always for an operation. This is > because the modifications in Run-Time memory can be so arbitrary. The way JDO works is that you can retrieve complex object graphs from the datastore in a single operation (or multiple operations). Once the object graphs are in memory, the relationship between the objects in memory and the persistent representation (objects in an ODBMS or rows in an RDBMS) is managed by the PersistenceManager. If a user makes a change to any of the objects in the graph, the change is automatically detected and when changes are flushed to the datastore (either explicitly or at transaction commit), the changes are made persistent. Many datastores require that changes be applied in a specific order so as to avoid introducing inconsistencies (foreign key constraints). The implementation is required to understand the constraints and avoid the inconsistencies. Reachability of all objects might be defined as a constraint in an object datastore. The exact mechanism for making changes is decided by the implementation. Garbage collection of persistent objects could be done as part of the flush/commit operation, depending on how the implementation works and which datastore is used. Leaving unreachable objects in the datastore might be considered an inconsistency that the implementation should avoid. Regards, Craig > > For this reason no one has been able to think the problem? Therefore the > method can not have been in any db standard either. > > > Regards, > Heikki > > > On Fri, May 20, 2016 at 7:49 PM, Tilmann Zäschke wrote: > >> Hello Hei, >> sorry, I posted my answer first only to the dev list, I'm not sure you are >> on it. >> So here it is again, with some updates: >> >> >> Hi, >> >> I haven't read the whole post in detail, but I understand the main idea is >> garbage collection in databases. This reminds me of similar concepts in >> osome object databases, such as GemStone/Smalltalk ( >> https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm). >> However, I'm not aware of any major Java-ODBMS that supports it. >> >> Some comments: >> >> - As Andy describes, it could be supported via a property of the PM. >> However, there may be scenarios when 'embed' and 'makePersistent' are >> regularily use in the same transaction, so if the feature would be widely >> used, it may make sense to have a dedicated 'embed' method. >> >> - as far as I am aware, the JVM has moved away from reference counting. >> The situation in an DBMS may be different to an in-memory JVM, but if an >> ODBMS implements GC, it may be worth checking out other garbage collection >> techniques. One problem are reference cycles, I haven't read the full >> blogpost how this is solved. >> >> - It seams that enabling garbage collection via reference counting will >> have quite some impact on performance, because objects get bigger (little >> impact) and because it requires updating of additional objects in the >> database to update the reference counters (the counters also need updating >> when _adding_ references). I would therefore argue that support for garbage >> collection should be determined at database creation time, or as a flag >> when defining database schema. This would allow avoiding the GC overhead >> when GC is not required. >> >> - We would need to be able to identify root objects which should not be >> garbage collected. This could also be done via a flag on the schema, i.e. >> every object of a class that has no reference counters is automatically a >> root object. >> >> Cheers, >> Tilmann >> >> >> >> On 13.05.2016 15:49, Andy Jefferson wrote: >>> Hi, -I have invented a new declarative method to update object databases and > I > would like to introduce that method you. > > My question is: > Should the functionality of this method included in some future > versions of > JDO (and perhaps some other definitions/systems under Apache)? > Also I am interested about your comments/critique on the method. > > The name of the method is "embed" and it resembles the command > pm.makePersistent(s); > used in JDO-enabled applications. The embed method is called in > equivalent > way, > db.embed(s); > but it has a larger functionality than the current makePersistent > method. > My opinion/comme
Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hi Heikki, I found something in the old ODMG 3.0 standard: 6.3.1.2 Object Deletion In the Smalltalk binding, as in Smalltalk, there is no notion of explicit deletion of objects. An object is removed from the database during garbage collection if that object is not referenced by any other persistent object. The delete() operation from interface 'Object' is not supported. -> So it seems that some kind of Garbage Collection was envisaged by the standard, if only for Smalltalk. However, as I said before, I'm not aware that this was ever implemented outside the SmallTalk community, and I don't know SmallTalk well enough to tell how much GC might differ from Java GC. So while the term 'GC in databases' doesn't seem to be entirely new, your approach may bring new ideas to the table, including a solid theoretical foundation. Maybe it would even explain why it hasn't been implemented in any major DBMS. For example, POET Navajo supports ODMG 3 but apparently chose _not_ to support GC (first page, last paragraph): http://www.odbms.org/wp-content/uploads/2013/11/038.01-Jordan-An-Object-Database-for-Embedded-Environments-April-2000.pdf Regards, Tilmann On 20.05.2016 19:55, Hei Virk wrote: Hi everybody: I will comment later more. -I am just going to keep a presentation about the topic in my university here in Finland for theorists and I am making those slides at a moment.. One social observation: In my understanding the situation has been that people (me too) have not been understood that that a modified object structure can always be returned back into the database in a deterministic and meaningful way. - I mean that the well defined result exists always for an operation. This is because the modifications in Run-Time memory can be so arbitrary. For this reason no one has been able to think the problem? Therefore the method can not have been in any db standard either. Regards, Heikki On Fri, May 20, 2016 at 7:49 PM, Tilmann Zäschke wrote: Hello Hei, sorry, I posted my answer first only to the dev list, I'm not sure you are on it. So here it is again, with some updates: Hi, I haven't read the whole post in detail, but I understand the main idea is garbage collection in databases. This reminds me of similar concepts in osome object databases, such as GemStone/Smalltalk ( https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm). However, I'm not aware of any major Java-ODBMS that supports it. Some comments: - As Andy describes, it could be supported via a property of the PM. However, there may be scenarios when 'embed' and 'makePersistent' are regularily use in the same transaction, so if the feature would be widely used, it may make sense to have a dedicated 'embed' method. - as far as I am aware, the JVM has moved away from reference counting. The situation in an DBMS may be different to an in-memory JVM, but if an ODBMS implements GC, it may be worth checking out other garbage collection techniques. One problem are reference cycles, I haven't read the full blogpost how this is solved. - It seams that enabling garbage collection via reference counting will have quite some impact on performance, because objects get bigger (little impact) and because it requires updating of additional objects in the database to update the reference counters (the counters also need updating when _adding_ references). I would therefore argue that support for garbage collection should be determined at database creation time, or as a flag when defining database schema. This would allow avoiding the GC overhead when GC is not required. - We would need to be able to identify root objects which should not be garbage collected. This could also be done via a flag on the schema, i.e. every object of a class that has no reference counters is automatically a root object. Cheers, Tilmann On 13.05.2016 15:49, Andy Jefferson wrote: Hi, -I have invented a new declarative method to update object databases and I would like to introduce that method you. My question is: Should the functionality of this method included in some future versions of JDO (and perhaps some other definitions/systems under Apache)? Also I am interested about your comments/critique on the method. The name of the method is "embed" and it resembles the command pm.makePersistent(s); used in JDO-enabled applications. The embed method is called in equivalent way, db.embed(s); but it has a larger functionality than the current makePersistent method. My opinion/comments, others may have other thoughts : For something to be part of JDO, it needs to be widely applicable across other datastores, not just for one "type" of datastore. If it is only implementable on 1 type of datastore then standardising it would be "premature"; that isn't to say that it isn't applicable to others. From what I read of your method it is basically a makePersistent (attach) but with "deleteOrphans"
Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hi everybody: I will comment later more. -I am just going to keep a presentation about the topic in my university here in Finland for theorists and I am making those slides at a moment.. One social observation: In my understanding the situation has been that people (me too) have not been understood that that a modified object structure can always be returned back into the database in a deterministic and meaningful way. - I mean that the well defined result exists always for an operation. This is because the modifications in Run-Time memory can be so arbitrary. For this reason no one has been able to think the problem? Therefore the method can not have been in any db standard either. Regards, Heikki On Fri, May 20, 2016 at 7:49 PM, Tilmann Zäschke wrote: > Hello Hei, > sorry, I posted my answer first only to the dev list, I'm not sure you are > on it. > So here it is again, with some updates: > > > Hi, > > I haven't read the whole post in detail, but I understand the main idea is > garbage collection in databases. This reminds me of similar concepts in > osome object databases, such as GemStone/Smalltalk ( > https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm). > However, I'm not aware of any major Java-ODBMS that supports it. > > Some comments: > > - As Andy describes, it could be supported via a property of the PM. > However, there may be scenarios when 'embed' and 'makePersistent' are > regularily use in the same transaction, so if the feature would be widely > used, it may make sense to have a dedicated 'embed' method. > > - as far as I am aware, the JVM has moved away from reference counting. > The situation in an DBMS may be different to an in-memory JVM, but if an > ODBMS implements GC, it may be worth checking out other garbage collection > techniques. One problem are reference cycles, I haven't read the full > blogpost how this is solved. > > - It seams that enabling garbage collection via reference counting will > have quite some impact on performance, because objects get bigger (little > impact) and because it requires updating of additional objects in the > database to update the reference counters (the counters also need updating > when _adding_ references). I would therefore argue that support for garbage > collection should be determined at database creation time, or as a flag > when defining database schema. This would allow avoiding the GC overhead > when GC is not required. > > - We would need to be able to identify root objects which should not be > garbage collected. This could also be done via a flag on the schema, i.e. > every object of a class that has no reference counters is automatically a > root object. > > Cheers, > Tilmann > > > > On 13.05.2016 15:49, Andy Jefferson wrote: >> >>> Hi, >>> >>> -I have invented a new declarative method to update object databases and I would like to introduce that method you. My question is: Should the functionality of this method included in some future versions of JDO (and perhaps some other definitions/systems under Apache)? Also I am interested about your comments/critique on the method. The name of the method is "embed" and it resembles the command pm.makePersistent(s); used in JDO-enabled applications. The embed method is called in equivalent way, db.embed(s); but it has a larger functionality than the current makePersistent method. >>> My opinion/comments, others may have other thoughts : >>> >>> For something to be part of JDO, it needs to be widely applicable across >>> other datastores, not just for one "type" of datastore. If it is only >>> implementable on 1 type of datastore then standardising it would be >>> "premature"; that isn't to say that it isn't applicable to others. >>> >>> From what I read of your method it is basically a makePersistent >>> (attach) but with "deleteOrphans" enabled; i.e it addresses the object >>> graph defined by the input object, and additionally deletes orphaned >>> objects. Firstly it is entirely reasonable to be able to do that right now >>> in terms of API (**assuming the JDO implementation provided the >>> internals**) since the user can call pm.makePersistent and can have set >>> properties on the PersistenceManager just before that call >>> (pm.setProperty(...), so can set some "deleteOrphans" flag). From that I >>> can conclude that I don't think there would be any change to the API needed >>> to provide your mode of operation. >>> >>> Deleting orphans may vary dependent on the precise relation, hence why >>> JDO allows metadata on each field, so some orphans can be deleted and >>> others not. >>> >>> Clearly a JDO implementation would need to provide a "deleteOrphans" >>> mode internally to support this. >>> >>> Detail of how your method works in an object database, whilst >>> interesting, is not of relevance to the JDO spec question, since
Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hello Hei, sorry, I posted my answer first only to the dev list, I'm not sure you are on it. So here it is again, with some updates: Hi, I haven't read the whole post in detail, but I understand the main idea is garbage collection in databases. This reminds me of similar concepts in osome object databases, such as GemStone/Smalltalk (https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm). However, I'm not aware of any major Java-ODBMS that supports it. Some comments: - As Andy describes, it could be supported via a property of the PM. However, there may be scenarios when 'embed' and 'makePersistent' are regularily use in the same transaction, so if the feature would be widely used, it may make sense to have a dedicated 'embed' method. - as far as I am aware, the JVM has moved away from reference counting. The situation in an DBMS may be different to an in-memory JVM, but if an ODBMS implements GC, it may be worth checking out other garbage collection techniques. One problem are reference cycles, I haven't read the full blogpost how this is solved. - It seams that enabling garbage collection via reference counting will have quite some impact on performance, because objects get bigger (little impact) and because it requires updating of additional objects in the database to update the reference counters (the counters also need updating when _adding_ references). I would therefore argue that support for garbage collection should be determined at database creation time, or as a flag when defining database schema. This would allow avoiding the GC overhead when GC is not required. - We would need to be able to identify root objects which should not be garbage collected. This could also be done via a flag on the schema, i.e. every object of a class that has no reference counters is automatically a root object. Cheers, Tilmann On 13.05.2016 15:49, Andy Jefferson wrote: Hi, -I have invented a new declarative method to update object databases and I would like to introduce that method you. My question is: Should the functionality of this method included in some future versions of JDO (and perhaps some other definitions/systems under Apache)? Also I am interested about your comments/critique on the method. The name of the method is "embed" and it resembles the command pm.makePersistent(s); used in JDO-enabled applications. The embed method is called in equivalent way, db.embed(s); but it has a larger functionality than the current makePersistent method. My opinion/comments, others may have other thoughts : For something to be part of JDO, it needs to be widely applicable across other datastores, not just for one "type" of datastore. If it is only implementable on 1 type of datastore then standardising it would be "premature"; that isn't to say that it isn't applicable to others. From what I read of your method it is basically a makePersistent (attach) but with "deleteOrphans" enabled; i.e it addresses the object graph defined by the input object, and additionally deletes orphaned objects. Firstly it is entirely reasonable to be able to do that right now in terms of API (**assuming the JDO implementation provided the internals**) since the user can call pm.makePersistent and can have set properties on the PersistenceManager just before that call (pm.setProperty(...), so can set some "deleteOrphans" flag). From that I can conclude that I don't think there would be any change to the API needed to provide your mode of operation. Deleting orphans may vary dependent on the precise relation, hence why JDO allows metadata on each field, so some orphans can be deleted and others not. Clearly a JDO implementation would need to provide a "deleteOrphans" mode internally to support this. Detail of how your method works in an object database, whilst interesting, is not of relevance to the JDO spec question, since the spec doesn't define HOW an implementation implements features Regards
Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hi, As far as I can tell garbage collection is an established concept in object databases, I think the oldest one that support it is GemStone/Smalltalk (https://downloads.gemtalksystems.com/docs/GemStone64/3.3.x/GS64-ReleaseNotes-3.3/GS64-ReleaseNotes-3.3.htm). However, I'm not aware of any major Java-ODBMS that supports it. Some comments: - As Andy describes, it could be supported via a property of the PM. However, there may be scenarios when 'embed' and 'makePersistent' are regularily use in the same transaction, so if the feature would be widely used, it may make sense to have a dedicated 'embed' method. - as far as I am aware, the JVM has moved away from reference counting. The situation in an DBMS may be different to an in-memory JVM, but if an ODBMS implements GC, it may be worth checking out other garbage collection techniques. - It seams that enabling garbage collection via reference counting will have quite some impact on performance, because objects get bigger (little impact) and because it requires updating of additional objects in the database to update the reference counters (the counters also need updating when _adding_ references). I would therefore argue that support for garbage collection should be determined at database creation time, or as a flag when defining database schema. This would allow avoiding the GC overhead when GC is not required. Cheers, Tilmann On 13.05.2016 15:49, Andy Jefferson wrote: Hi, -I have invented a new declarative method to update object databases and I would like to introduce that method you. My question is: Should the functionality of this method included in some future versions of JDO (and perhaps some other definitions/systems under Apache)? Also I am interested about your comments/critique on the method. The name of the method is "embed" and it resembles the command pm.makePersistent(s); used in JDO-enabled applications. The embed method is called in equivalent way, db.embed(s); but it has a larger functionality than the current makePersistent method. My opinion/comments, others may have other thoughts : For something to be part of JDO, it needs to be widely applicable across other datastores, not just for one "type" of datastore. If it is only implementable on 1 type of datastore then standardising it would be "premature"; that isn't to say that it isn't applicable to others. From what I read of your method it is basically a makePersistent (attach) but with "deleteOrphans" enabled; i.e it addresses the object graph defined by the input object, and additionally deletes orphaned objects. Firstly it is entirely reasonable to be able to do that right now in terms of API (**assuming the JDO implementation provided the internals**) since the user can call pm.makePersistent and can have set properties on the PersistenceManager just before that call (pm.setProperty(...), so can set some "deleteOrphans" flag). From that I can conclude that I don't think there would be any change to the API needed to provide your mode of operation. Deleting orphans may vary dependent on the precise relation, hence why JDO allows metadata on each field, so some orphans can be deleted and others not. Clearly a JDO implementation would need to provide a "deleteOrphans" mode internally to support this. Detail of how your method works in an object database, whilst interesting, is not of relevance to the JDO spec question, since the spec doesn't define HOW an implementation implements features Regards
Re: A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hi, > -I have invented a new declarative method to update object databases and I > would like to introduce that method you. > > My question is: > Should the functionality of this method included in some future versions of > JDO (and perhaps some other definitions/systems under Apache)? > Also I am interested about your comments/critique on the method. > > The name of the method is "embed" and it resembles the command > pm.makePersistent(s); > used in JDO-enabled applications. The embed method is called in equivalent > way, > db.embed(s); > but it has a larger functionality than the current makePersistent method. My opinion/comments, others may have other thoughts : For something to be part of JDO, it needs to be widely applicable across other datastores, not just for one "type" of datastore. If it is only implementable on 1 type of datastore then standardising it would be "premature"; that isn't to say that it isn't applicable to others. >From what I read of your method it is basically a makePersistent (attach) but >with "deleteOrphans" enabled; i.e it addresses the object graph defined by the >input object, and additionally deletes orphaned objects. Firstly it is >entirely reasonable to be able to do that right now in terms of API >(**assuming the JDO implementation provided the internals**) since the user >can call pm.makePersistent and can have set properties on the >PersistenceManager just before that call (pm.setProperty(...), so can set some >"deleteOrphans" flag). From that I can conclude that I don't think there would >be any change to the API needed to provide your mode of operation. Deleting orphans may vary dependent on the precise relation, hence why JDO allows metadata on each field, so some orphans can be deleted and others not. Clearly a JDO implementation would need to provide a "deleteOrphans" mode internally to support this. Detail of how your method works in an object database, whilst interesting, is not of relevance to the JDO spec question, since the spec doesn't define HOW an implementation implements features Regards -- Andy DataNucleus (Web: http://www.datanucleus.org Twitter: @datanucleus)
A new general method to update object databases / Should it be included in Java Data Objects, JDO ?
Hi the JDO developer community and user community! -I have invented a new declarative method to update object databases and I would like to introduce that method you. My question is: Should the functionality of this method included in some future versions of JDO (and perhaps some other definitions/systems under Apache)? Also I am interested about your comments/critique on the method. The name of the method is "embed" and it resembles the command pm.makePersistent(s); used in JDO-enabled applications. The embed method is called in equivalent way, db.embed(s); but it has a larger functionality than the current makePersistent method. The embed method does not only store or update the content of object structure (graph) s into the object database (persistence-by-reachability) but it also removes possible garbage objects automatically from the object database if those appear during an update operation. The embed method, its implementation, requires some natural properties from the object database. Persistence in the object database is defined by reachability from persistent root objects in the database. The embed method applies well known reference count techniques. Therefore the nodes (objects) in the database have one field for incoming references and also one field which defines if a node is a persistent root node (object). The method is efficient, because it works locally in the database; it understands local topology of the database and examines only related objects in the object database, not the whole database which would not be acceptable. The method works with quite general conditions because it models both the object database and the modified object structure s in the run-time memory as directed graphs of nodes or objects without any artificial limitations. For example different kinds of circular substructures and their modifications are allowed. The method relies an a general communication theory (as I call it at a moment..) which states that modified complex information represented as directed graph of nodes can always be transferred back to its original system in an exact and meaningful way, which includes garbage collection. I have written a blog about the embed method and related things, and I put the same content in arXiv. http://hvirkkun.blogspot.fi/2016/04/a-new-communication-theory-on-complex.html -Comments and critique are welcome. Best Regards, Heikki Virkkunen Email: hvirk...@gmail.com Phone: +358 40 706 3912