From: Matthias Troyer <[EMAIL PROTECTED]> >1) "Definition of serialization": ... agree
>2) "Serialization engine". agree - except whether arrays should be primitive types. We differ on this but I believe that this is actually a small point that would ultimately be resolved by running some tests on a given implementation so it way premature to try to agree on it now. >3) "Archive preamble": .. agree - I believe that the archive preamble and maybe a "post amble?" are useful and almost necessary for robust systems. However, I think the library should encourage rather than require this. In any case this is local to the "Serialization Engine" ("archive" in the submitted system. >4) "Serialization of UDT (user defined types)": is the next level up. agree >5) "Versioning": The next level for me is versioning support. We have >discussed versioning support on a per-archive and a per-class level. I >would like to see both variants supported. Per-class versioning is more >flexible, but has two disadvantages: i) it introduces overhead and ii) >it writes extra information into the stream, which might make the >output incompatible with some applications. >Regarding i: we have to write the version number for each UDT >encountered, but want to write it only once per UDT. We thus have to >keep track of which UDTs have been serialized so far, and whenever a >new UDT is encountered, its version number must be written to the >archive. This introduces overhead, especially if many small objects >have to be serialized. overhead for version number is 1 or 2 bytes per class definition. tracking the classes so far serialized is not expensive. What is expensive is tracking all the objects serialized so that pointers can be correctly handled. >I see a two-pronged approach as the best solution: >a) both per-archive versioning, per-class versioning and no versioning >should be supported for compatibility with other formats (issue ii) >above) per-archive versioning can easily be handled by appending to a default preamble. >b) if per-class versioning is used, it should be possible to turn it >off for some classes by a traits class - this will get rid of the >overhead (issue i) above) when versioning is turned off for a UDT. hmmm - I will have to think about this. Using MFC one has to take extra steps to include versioning. On my last commercial project using MFC serialization I didn't do this on some classes because I was assured that "that class will never change". Of course it did after the first version of the application shipped and ended up creating a lot of extra work. So I resolved that I would just "spend" the on byte per class definition and be done with it. similar logic applies to the archive preamble. My original modivation was the concern that existent archives never become obsolete by improvements in code - including the archiving systems. So I needed a version for the archive system itself - hence the preamble. >6) "Advanced functionality": Robert's serialization library includes >further functionality, such as the serialization of pointers and of >polymorphic types. Here I want to focus on serialization of pointers. I >have not checked the implementation of Robert's library in detail, and >thus please correct me if I view this wrongly. Serialization of >pointers requires the conversion of a pointer to an integer. When >serializing objects, the archive thus has to keep track of the >addresses of objects, in order to later convert pointers into numbers. >This again introduces overhead. Robert addresses this partially in his >library by showing how to bypass this system for a UDT. His approach >however requires that if I want to bypass the pointer serialization >mechanism for a type T, then I have to re-implement serialization of >all standard containers of type T, such as std::vector<T>, >std::list<T>, std::stack<T>, ... for my type T. My proposal that I have >mentioned before is, to just add another traits type, which specifies >whether for a type T the pointer serialization scheme can be bypassed >(like versioning above) and a faster, optimized serialization used. This analysis is in general correct. Bookkeeping for objects that may be serialized as pointers is inherently expensive. And the current system doesn't provide a clean way to skip this book keeping for objects that are know never to be serialized as pointers. Lately, I have been be cleaning up the implementation along the lines suggested by G. Rozenthal. My intention was to make the library more "provably correct" and "logically transparent". I didn't forsee any change of functionality. However, as things get moved around to a more logical organization, certain things sort of mysteriously appear. In particular, the current library skips pointer bookkeeping for fundamental types. In the future the types for which the book keeping will be skipped will be alterable by the user similar to the manner which you suggest. I believethat you will find that this addresses your concern in a natural and complete way. A really, really fundamental issue in the submitted library is the usage of "Archive" as a virtual base class. This is the traditional way of separating interface from implementation. Advantages ======== a) we're used to it b) it permits total separation of UDT serialization specification from archive implementation. UDT serialization specifications don't even have to be recompiled for different archives. c) logically decouples UDT serialization concept from archive implementation concept. d) permits any UDT serialization implementation to work with with any archive implementation e) less compile time dependency - implies simpler code and faster compilations. Disadvantages =========== a) Does not permit archive implementation and UDT serialization to be coupled. This is the fundamental obstacle to serialization in XML format. b) virtual functions incurr some extra overhead in calling A newer way would be to use template specialization rather than virtual base class to implement the interface / implementation paradigm Advantages ======== a) Permits archive implementation and UDT serialization to be coupled thereby permitting archives to be "smarter" and facilitating implementation of something like XML. b) not virtual function call over head Disadvantages ========== a) we're really not used to it yet b) requires coupling of archive and UDT specification. This can make the system harder to understand and use in simple cases. System requires recompilation of the everything for every combination of UDT and archive used in a program. c) significantly larger executables d) much longer compile/build times In the submitted library, I chose option 1 primarily because of a) Whether or not this is the best choice really depends on the other factors mentioned above so I don't see an obvious answer here. In fact, for most situations either would work just as well. Robert Ramey _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost