On Mon, 2005-12-26 at 00:40 +0100, Leopold Toetsch wrote:
> On Dec 25, 2005, at 23:48, Steve Gunnell wrote:
[snip]
> 
> > When using the Read/Readline opcodes how do we specify what encoding is
> > to be assumed for the incoming string?
> 
> There is one output encoding filter currently:
> 
>    pout = getstdout
>    push pout, 'utf8'
> 
> The same should work with an import filter, that is a (TODO) read 
> method implemented in src/io/io_utf8.c or similar.
> Patches welcome.
[snip]

Does this look like a suitable implementation of PIO_utf8_read?

-------------------------------------------------------------------
static size_t
PIO_utf8_read(theINTERP, ParrotIOLayer *l, ParrotIO *io, STRING *s)
{
    size_t got;

    got = PIO_read_down(interpreter, l->down, io, s);

    s->charset = Parrot_unicode_charset_ptr;
    s->encoding = Parrot_utf8_encoding_ptr;

    return got;
}
-------------------------------------------------------------------

I'm making the assumption here that I'm reading something that is
already utf8 not something that needs converting.

I assume that it is safer to preform the mods in this order in case a
lower level replaces "s" without copying its attributes?

I also had a look at PIO_utf8_peek but the lower levels like
PIO_buf_peek seem hard coded to return 1 byte and we might need up to 3
to return an utf8 character. 

Any comments?

Cheers,

Steve Gunnell

Reply via email to