muse on Compact Structs, pack/unpack

John M. Dlugosz Tue, 01 Apr 2008 03:13:20 -0700

#[ are there people paying attention to these issues on other mailing lists? ]


= on Compact structs
revision 1, initial posting

What functions serialize/deserialize to the C view?

If these are to be member functions, they would be applicable only if the 
struct is compact, and erroneous to call otherwise.  It seems like a compact 
struct ought to have a role supported for that purpose, and to make a compact 
struct you declare that role.  This would then allow the compiler to check at 
compile-time that all the properties were indeed native types, things also 
supporting the compact role, or declared to be non-state data.

Any class could be forced to be compact regardless of contents if it explicitly 
supplied the serialization functions.  

There are some cases where a memory layout can be variable, such as a 
polymorphic type.  The top-level class can read just the first few bytes, 
determine the correct variant, and create an instance of that.  The class would 
hold a pointer to the variant, and not appear to be a compact struct, except 
for the custom logic implemented in the pack/unpack functions.

A structure may be variable length, having a length-prefixed string or other 
array, immediately followed by other fields.  Having a @list of things as an 
attribute would normally render it non-compact, but it could have other 
properties attached that tell it how to read/write that list, in the spirit of 
IDL.  If nothing is available that work in this case, then attaching a custom 
reader/writer property to just that attribute would do the trick, something 
sourly missing in systems such as .NET Serialization.

        class example_record 
           does Compact 
           is finalized
           {
           has Str $.name
              is rw
                  should serialize (:length_prefix(2), :encoding(UTF16), 
                                                        padding => {:char("\0", 
:strip} );
           has int16 $.val is rw;
           has Str $!presentation_cache;  # private, not included in compact 
struct I/O
           method presentation () returns Str
              {
                  unless defined $!presentation_cache {
                     ... # compute it
                         }
                  return $!presentation_cache;
                  }
           }

I'm not entirely sure from S12, so I'd like confirmation on this:  The 
availability of trait 'serialize' is a role nested inside role Compact, so it 
is available (with this meaning) in this scope.

My example uses a mundane int16, which knows to pack/unpack as 2 bytes.  But it 
also has a length-prefixed string, which is explained using the trait, so the 
class as a whole is still able to pack/unpack.  It also has private data, which 
by default is ignored by the pack/unpack logic and does not interfere with its 
ability.

This is what I envision for a class that can take the place of pack/unpack in a 
declarative way.  I've used pack/unpack for disk-based records and wished that 
the P5 ability was extensible to my own codes to go with already-defined 
packable structures.  I've also used .NET serialization and found its 
declarative ability to be lacking to the point where it is often easier to 
write the function from scratch.

Meanwhile, how do I use it?

        my Buf $temp = $record;
        $stream.print ($temp);
        
        $stream.print (Buf $record);

That is a bit baroque.  Two issues here: is 'print' still the only way to 
output?  I think 'print' would convert all arguments to string form, so 
printing an int16 would format the number into text.  It also deals with 
encoding issues.  Printing a Buf would seem to turn that Buf back into a Str 
using all the rules set up for that.

So how about a binary output function, 'write'?  It will imply binary output, 
so knowing this can help hide the differences between text/binary on platforms 
that have it.  It will convert its arguments to Buf by default.

        $stream.write ($record);  # just what I need
        
        my int16 $x = 42;
        $stream.write ($x);  # emits 2 bytes, exactly as stored in the primitive
        $stream.print ($x);  # emits characters "4" and "2" in the proper 
encoding

So as mentioned at the start of this muse, where do the functions live?  It is 
stated that the type name used as a listop is a conversion function.  Is that 
special syntax that knows to look for some way to accomplish that?  I'll 
proceed on this assumption, and basic C++ ideas as a strawman, that it looks 
for things in each of the two classes (coming and going) and built-in rules, 
perhaps a chain of things.

Suppose that one of the places it looks is 

        multi conversion:<Buf> () { ... }
        
that can take adjectives to control the conversion, or additional positional 
arguments.

This is supplied by the Compact role, whose implementation handles common 
adverbs and invariant logic.  But it calls another function, pack, for the 
basic packing work.  The version supplied with the Compact role would know 
about primitive types and to recurse on other Compact items, and iterate over 
the contents of the class.  To do something different, a class can write its 
own pack function with the required signature to knock-out the default version 
in the role.

        class VLI<2.0.1 cpan:DLUGOSZ> 
           does Compact
           {
           # implements packing/unpacking as described in 
<http://www.dlugosz.com/ZIP2/VLI.html>
           has Int $.value is rw;
           multi pack (*%adverbs) returns Buf
                {
                ... respects standard signed/unsigned option,
                   supports unique options for encoding variations
                }
           } # end class VLI

That is an example where my "Compact"ness can clearly be seen as a pack/unpack 
format.  Other examples would be structures that appear in the Win32 API, that 
I can discuss if necessary.

Now something I've learned from doing .NET serialization (XML as it happens, 
but that does not matter to the issue).  Here I have an Int, that is a 
perfectly ordinary Int like any other except for its serialization logic.  I 
can use it in a larger class to make it serialize how I want.  But to access 
the value in that class, I have an extra layer of indirection.

        class C does Compact is rw
           {
           has int32 $x;
           has VLI $y;
           }
           
        my C $c;  # for a Compact class, empty prototype is not undef but 
default values
        $c.x= 5;  # OK
        $c.y.value = 1234567812345678;  # yuck

I'd really like to attach the serialization semantics class to the Int declared 
in the outer class, where that semantics class does not contain the data itself.

        class VLI<2.0.1 cpan:DLUGOSZ> 
           does Compact::helper
           {
           # implements packing/unpacking as described in 
<http://www.dlugosz.com/ZIP2/VLI.html>
           multi pack (Int value, *%adverbs) returns Buf
                {
                ... respects standard signed/unsigned option,
                   supports unique options for encoding variations
                }
           } # end class VLI

        class C does Compact is rw
                {
                has int32 $x;
                has Int $y is VLI;
                }

Finally, different kinds of serialization can use similar mechanisms, and more 
than one can be applied to the same class and all the traits/properties/roles 
should play nice with each other.  Such combined classes should also cascade.

muse on Compact Structs, pack/unpack

Reply via email to