On Mon, 2005-12-26 at 00:40 +0100, Leopold Toetsch wrote: > On Dec 25, 2005, at 23:48, Steve Gunnell wrote: [snip] > > > When using the Read/Readline opcodes how do we specify what encoding is > > to be assumed for the incoming string? > > There is one output encoding filter currently: > > pout = getstdout > push pout, 'utf8' > > The same should work with an import filter, that is a (TODO) read > method implemented in src/io/io_utf8.c or similar. > Patches welcome. [snip]
Does this look like a suitable implementation of PIO_utf8_read? ------------------------------------------------------------------- static size_t PIO_utf8_read(theINTERP, ParrotIOLayer *l, ParrotIO *io, STRING *s) { size_t got; got = PIO_read_down(interpreter, l->down, io, s); s->charset = Parrot_unicode_charset_ptr; s->encoding = Parrot_utf8_encoding_ptr; return got; } ------------------------------------------------------------------- I'm making the assumption here that I'm reading something that is already utf8 not something that needs converting. I assume that it is safer to preform the mods in this order in case a lower level replaces "s" without copying its attributes? I also had a look at PIO_utf8_peek but the lower levels like PIO_buf_peek seem hard coded to return 1 byte and we might need up to 3 to return an utf8 character. Any comments? Cheers, Steve Gunnell