In message <[EMAIL PROTECTED]>
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> utf8 and utf16 are both variable length encodings for space reasons.
> There's not much reason to space-compact something then expand the heck out
> of it. On the other hand, I'd really, *really* rather not have Unicode
> constants in anything other than UTF-32, so I'd as soon we chopped out the
> utf-8 and utf-16 constant support from this.
>
> A should be the prefix for US-ASCII characters.
> U should be the prefix for Unicode characters
> N should be the prefix for the native character set (and the default)
>
> Beyond that I'm not sure what, if anything, we should accommodate in the
> assembler.
Attached is a patch to drop the U8, U16 and U32 prefixes and
add U and N prefixes.
I havn't added the A prefix because I'm still not clear what
encoding those are supposed to map to. I can understand the
following mappings:
N => enc_native
U => enc_utf32
but what is A supposed to map to exactly? or is the assembler
supposed to mangle an A string into an N or U string and then
put it in the bytecode in one of those formats?
Tom
--
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/
Index: Assembler.pm
===================================================================
RCS file: /home/perlcvs/parrot/Parrot/Assembler.pm,v
retrieving revision 1.8
diff -u -w -r1.8 Assembler.pm
--- Assembler.pm 2001/10/09 02:45:36 1.8
+++ Assembler.pm 2001/10/09 21:25:28
@@ -279,7 +279,7 @@
=cut
-my %encodings=('' => 0, 'U8' => 1, 'U16' => 2, 'U32' => 3);
+my %encodings=('' => 0, 'N' => 0, 'U' => 3);
my %opcodes = Parrot::Opcode::read_ops( -f "../opcode_table" ? "../opcode_table" :
"opcode_table" );
@@ -662,7 +662,7 @@
sub replace_string_constants {
my $code = shift;
- $code =~
s/(U(?:8|16|32))?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg;
+ $code =~ s/([NU])?\"([^\\\"]*(?:\\.[^\\\"]*)*)\"/constantize_string($2,$1)/eg;
return $code;
}