[boost] Serialization Review Results

David Abrahams Mon, 09 Dec 2002 13:15:30 -0800

   The Serialization library submission by Robert Ramey is not
   accepted into Boost at this time.


First of all, I'd like to acknowledge that this was a *very* difficult
review for all concerned. It was tough for the reviewers, for me as
review manager, and especially for Robert Ramey, the library author.
Rendering a decision on the library was correspondingly difficult.
I thank Robert for his work and his patience with the review process,
and I hope that he finds the energy to follow through until we have a
Boost library.

At one point during the review process, Robert wrote to me privately,
expressing the opinion that 

After spending the better part of a weekend looking over the library
documentation and re-reading all of the review commentary, I can
understand why Robert might be tempted to conclude that no single
serialization library design would satisfy Boost because there were
just too many conflicting desires on the part of reviewers.  However,
I hope he donesn't.  I believe the "no serialization library designed
by just one person is likely to satisfy Boost" is much closer to the
truth.

Fortunately, there was great interest in this library (which is why
the scrutiny was so intense) and Robert received many enthusiastic
offers of collaboration from reviewers.  I believe the best path
for Boost and for this library is as follows:

0. Reconsider the problem domain in a collaborative environment. If
   there are enough participants, a mailing list would be a good start
   (I can set up a SourceForge mailing list upon request), and adding
   a Wiki Page is easy enough.  This process should give strong
   consideration to problem domains other than ones originally
   envisioned for the library.  It should also reflect a reluctance to
   begin writing code too early.  

1. Agreement on terms. In particular, I strongly suggest beginning
   with the definitions of serialization and persistence outlined by
   Augustus Saunders in
   http://lists.boost.org/MailArchives/boost/msg39598.php.  I realize
   that Robert didn't like those definitions, but they resonated for
   most people (including me), and seem to provide an excellent
   starting point.

   Robert said "I didn't try to define Persistence as I see it as a
   more general notion". Distinctions are usful to the extent that
   they partition the space of things actually being considered.  If
   persistence is defined to be even more general than everything
   we're talking about, it's not useful to us. Since we get to choose
   the definition, let's choose one we can apply ;-)

2. Careful description of scope. Answer questions like: 
     * Is this a persistence or serialization library?
     * Is it important to be able to plug in arbitrary archive
       formats?
     * Is it important to be able to use the same UDT serialization
       code to write several different archive formats?
     * What kinds of applications are we intending to serve?
     * What kinds of applications are we explicitly NOT intending to
       serve? 

3. Careful consideration of the appropriate interface for describing
   the serialization of UDTs on a conforming compiler. In particular,
   consider the lexical cost of requiring users to specialize library
   templates. Also consider that the use of operator<< is going to
   invoke ADL anyway, so maybe the interface should just use
   that. Serialization of class template specializations and other
   classes should use the same mechanisms.

   Subsequent consideration of how close the interface can come on
   broken compilers, should the participants decide they wish to serve
   that user base.

4. Once coding begins, it should go quickly, and proceed in the boost
   sandbox.

5. Well, Item 3 drifted a bit into technical issues, so here's a
   more-comprehensive list of technical issues I'd like to see
   considered carefully and collaboratively.  I'm sorry that I didn't
   take the time to bring some of these up during the review period,
   which was a bit overwhelming just to watch ;-). 

   * Dave Harris suggested several times that integers should be
     written in the binary archive in a variable-length format.  This
     echoes a philosophy on serialization which I've had for years,
     provides many benefits and would seem to allow drastic
     simplification of the library if it is decided that the current
     scope will be retained, since it entirely obviates the need a
     text archive format (the same could be done for floating point
     numbers). The only application I can imagine this approach being
     unsuitable for would be extremely fast, relatively small
     in-memory archives... and I'd have to see benchmarks and a real
     use-case to be convinced of that.

   * Boost already has a mechanism for exploring the internal
     structure of UDTs.  It's called visit_each, and it's used by the
     signals library to discover bound signal collaborators within
     function objects. Could this be exploited for serialization of
     composite types?

   * Boost already has a mechanism for registering inheritance
     relationships and convertibility among classes. It's not part of
     the public interface, but is an implementation detail of
     Boost.Python. Should this be exploited for serialization?

   * Objects without default constructors really should be
     deserializable.  One possible approach is offered by Python's
     serialization mechanism ("pickler").  A class' __getinitargs__
     function (if defined) will be called to get the arguments that
     should be passed to the class' constructor to reconstitute an
     instance of that class.  It should be possible to build a similar
     mechanism around boost::tuple.

   * Is it important to allow all UDTs to be separately versioned?
     Every time I have implemented serialization and started with such
     a system, I eventually dropped it in favor of a whole-archive
     version number.  Changing the format of a single class always
     creates a backward compatibility problem for new archives anyway.
     Allowing the archive to carry the version number also simplifies
     the [de]serialization interface.  If separate versioning is in
     fact important and useful, a rationale should be provided.

   * Registration of participating classes must not be required to be
     monolithic.  More generally, the library must support users who
     use polymorphism to insulate themselves from compilation
     dependencies.

   * Strong consideration should be given to a "you don't pay for what
     you don't use" approach.  As Ralf Grosse-Kunstleve pointed out to
     me, C++ is not really good at serialization, natively.  One of
     the only reasons to use it instead of a language with stronger
     reflection capabilities has got to be that it is fast.  Avoiding
     virtual function calls for serializing large arrays of small
     objects (e.g. complex or rational numbers) must be possible.

   * I would like to see the requirement to use *only* ANSI/ISO C++
     loosened.  Serialization is one of those areas which is simply
     not well-supported by standard C++, IMO.  Part of what we're
     doing here at Boost is expanding the scope of C++ by providing
     support for things like threading and the filesystem.  Much may
     be gained by allowing some components to use extra-legal
     constructs that can be easily ported to a majority of platforms.
     Two areas that spring to mind are pointer comparisons outside a
     single array for unserializing internal object pointers, and the
     use of type_info::name() for type identification.  Even if these
     were optional components to the library, they could provide
     enormous benefit for some applications.

     [BTW, since the review I have discovered some issues with
     type_info::name() and EDG compilers which may make it unsuitable
     for type identification in that context, depending on the
     application].

Given the enormous interest in addressing this problem domain (or
domains) shown by Boost members, and the many offers of participation,
it would be a real shame if this review didn't ultimately produce a
Boost library that we can all stand behind. Broader collaboration in
the Boost tradition seems like the best way to get there.

Thanks to everyone for their participation in this review.
Special, extra thanks to Robert Ramey for bringing forward his
submission which stirred up this discussion and, I hope, gave us a
start in the right direction.

-- 
                       David Abrahams
   [EMAIL PROTECTED] * http://www.boost-consulting.com
Boost support, enhancements, training, and commercial distribution

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

[boost] Serialization Review Results

Reply via email to