If you use U0 or C0 mode switches in a unpack format like s(nU0v)2S
the n is in a rather weird state: it's C0 mode the first round, U0
mode on later ones.
And in a format like C/(U0)stuff it depends on the value of count
whether U0 mode will apply to stuff or not.
Both of these convinced me that U0/C0 mode should really be scoped
to the enclosing () (making it feel more like a compile time pragma than
a runtime pragma). The following patch makes it so.
Before:
perl -wle 'print for unpack("a(U0)U", "b\341\277\274")'
b
8188
After:
./perl -Ilib -wle 'print for unpack("a(U0)U", "b\341\277\274")'
b
225
Patch is relative to 5.8.6:
--- pp_pack.c.old Sat Jan 29 13:26:27 2005
+++ pp_pack.c Sat Jan 29 14:05:27 2005
@@ -618,6 +618,10 @@
while (len--) {
symptr->patptr = savsym.grpbeg;
unpack_rec(symptr, ss, strbeg, strend, &ss );
+ if (savsym.flags & FLAG_UNPACK_DO_UTF8)
+ symptr->flags |= FLAG_UNPACK_DO_UTF8;
+ else
+ symptr->flags &= ~FLAG_UNPACK_DO_UTF8;
if (ss == strend && savsym.howlen == e_star)
break; /* No way to continue */
}