I created a new branch called charset_massacre that contains my proposed charset/encoding merge. Now, all the string function pointers live in a single string vtable. There are the following convenience macros to call the string functions:

STRING_length
STRING_byte_length
STRING_max_bytes_per_codepoint

STRING_equal
STRING_compare
STRING_index
STRING_rindex
STRING_hash
STRING_validate

STRING_scan
STRING_ord
STRING_substr

STRING_is_cclass
STRING_find_cclass
STRING_find_not_cclass

STRING_get_grapemes // typo, will be fixed
STRING_compose
STRING_decompose

STRING_upcase
STRING_downcase
STRING_titlecase
STRING_upcase_first
STRING_downcase_first
STRING_titlecase_first

STRING_ITER_INIT
STRING_iter_get
STRING_iter_skip
STRING_iter_get_and_advance
STRING_iter_set_and_advance
STRING_iter_set_position

These macros replace the old CHARSET_* and ENCODING_* macros. I also renamed some of the functions to match the corresponding Parrot opcodes. My longer term plan is to switch a lot of Parrot_str_* calls to those macros. Another notable change of the string API is that the charset argument has been removed from Parrot_str_new_init.

The charset has also been removed from the packfile. I'm not sure what this entails.

The API of the ByteBuffer PMC has changed a little. The get_string and build_string methods no longer have a charset argument.

Another minor issue that affected some tests is that trans_charset to "unicode" still works, but the resulting strings will have a charsetname of "utf8".

I also removed the interactive charset and encoding configuration step. Parrot doesn't work with only a subset of charsets or ancodings. It probably wouldn't even compile.

The following opcodes can be deprecated:

- charset
- charsetname
- find_charset
- trans_charset

Any code that uses these opcodes should replace them by the corresponding encoding opcodes. The list of supported encodings is:

- ascii
- iso-8859-1
- binary
- utf8
- utf16
- ucs2
- ucs4

If both trans_charset and trans_encoding are used, only trans_encoding is needed.

Especially if you are a language implementer, it would be nice if you could test your implementation with the new branch.

Nick
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to