Re: [rust-dev] Proposed API for character encodings

Simon Sapin Sun, 22 Sep 2013 01:38:24 -0700

Le 21/09/2013 16:38, Olivier Renaud a écrit :

I'd expect this offset to be absolute. After all, the only thing that the
programmer can do with this information at this point is to report it to the
user ; if the programmer wanted to handle the error, he could have done it by
using a trap. A relative offset has no meaning outside of the processing loop,
whereas an absolute offset can still be useful even outside of the program (if
the source of the stream is a file, then an absolute offset will give the exact
location of the error in the file).


A counter is super cheap, I would'nt worry about its cost. Actually, it just
has to be incremented once for each call to 'feed'.

Well to get the position inside a given chunk of input you still have tocount individual bytes. (Maybe with Iterator::enumerate?) Unless maybewe do dirty pointer arithmetic…

If possible, I’d rather find a way to not have to pay that cost in thecommon case where the error handling is *not* abort and DecodeError isnever used.

This is also a bit annoying as each implementation will have to repeatthe counting logic, but maybe it’s still worth it.

Note : for the encoder, you will have to specify wether the offset is a 'code
point' count or a 'code unit' count.

Yes. I don’t know yet. If we do [1] and make the input generic it willprobably have to be code points.


[1] https://mail.mozilla.org/pipermail/rust-dev/2013-September/005662.html

Otherwise, it may be preferable to match Str::slice and count UTF-8bytes. (Which I suppose is what you call code units?)


--
Simon Sapin
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposed API for character encodings

Reply via email to