This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE hooks in pack/unpack =head1 VERSION Maintainer: Ilya Zakharevich <[EMAIL PROTECTED]> Date: 16 September 2000 Mailing List: [EMAIL PROTECTED] Number: 250 Version: 1 Status: Developing =head1 ABSTRACT How to specify pack()/unpack() user-recipes. =head1 DESCRIPTION The following enhancement covers almost all the of the remaining ways to store binary data, but it is substantially higher on the "bizzareness" scale: C<'R[ID,TYPE]'> in a TEMPLATE: during unpack() extracts a value as TYPE and uses it to choose between several choices of templates. Behaves as if C<R[ID,TYPE]> is replaced by the chosen template. The templates to chose from are looked in/via the array/hash/subroutine referenced by C<$unpack::recipes[ID]>. Similarly, C<'R{ID,TYPE}'> uses C<$unpack::recipes{ID}> instead. The returned template may be replaced by a reference to array of the form [TEMPLATE, \&POSTPROCESS] In such a case a value is extracted with TEMPLATE, then is postprocessed by calling C<POSTPROCESS($extracted)>, the return value replaces the extracted value. Optional: one should be able to specify that some bits of the last extracted value which are ignored: C<'R[ID,FROM..TO,TYPE]'> uses bits from FROM to TO (shifted right by FROM) as the index. C<'R[ID,mod,TYPE]'> uses the last extracted value modulo the length of the array referenced by $unpack::recipes[ID]. This extracts UTF8 chars of up to 2-byte encoded length: sub utf8_2byte_postprocess { (($_[0] & 0x1F00) >> 2) | ($_[0] & 0x3F) } local $unpack::recipes{UTF8} = [ 'C', [ 'n', \&utf8_2byte_postprocess ] ]; $n = unpack 'R{UTF8,7..7,C}', $str; Symmetrically, during pack() C<'R[ID]'> etc. make ID lookup in %pack::recipes or @pack::recipes. The resulting array/hash/subroutine reference is indexed-by/called-with the next argument to pack. The result is appended to the target string. The symmetric example to the UTF8 example above: sub utf8_2byte_save { return pack "C", $_[0] if $_[0] <= 127; pack 'n', 0x80C0 | (($_[0] & 0x7C0)<<2) | ($_[0] & 0x3F); }; local $pack::recipes{UTF8} = \&utf8_2byte_save; $str = pack 'R{UTF8}', $n; Optionally, to allow a usage of the same TEMPLATE during pack() and during unpack(), anything after the first comma in the argument to C<'R'> is ignored: $str = pack 'R{UTF8}', $n; $str = pack 'R{UTF8,7:7,C}', $n; are equivalent. The usage of pack() with I<only one> C<'R'> type inside is obviously an overkill, but it comes very handy if C<'R'> is a part of a more complicated construct, as in $str = pack 'N/R{UTF8,7:7,C}', @array; or $str = pack 'N/( \g[ N/R{UTF8,7:7,C} ] )', @array_of_arrays; In addition to "funny ways to encode simple data", this same proposal allows handling of streams which consist of repeated blocks of the form int type; union { struct type_0 t0; struct type_1 t1; ... } as well as many other similar problems. =head1 MIGRATION ISSUES None. =head1 IMPLEMENTATION Straightforward. =head1 REFERENCES RFC 142: Enhanced Pack/Unpack RFC 246: pack/unpack uncontrovercial enhancements RFC 248: enhanced groups in pack/unpack