Markus Kuhn said:
>> > AFAICT iconv(3) requires that the length of the input be known in
>> advance.
<snip>
> On the other hand, I have great difficulty to envision a real-world
> situation, where the user of iconv
>
>   - knows that the input is zero terminated

I have great difficulty in envisioning the opposite. Binary file formats
and network protocols have a lot of zero terminated strings in all sorts
of encodings.

>   - does not know whether this is an 8-bit, 16-bit or 32-bit
>     wide and aligned zero

Again for me it's rare that an application would not need to know what
data it's dealing with. Applications do not exist in a vacuum. You have to
do I/O in which case the the encoding of text is usually predefined or
negotiated. You do not always have the luxury of defining how text is
represeted throughout the system.

> which is the situation it seems to me Michael might refer to.
>
> So the question to Michael is: What is the actual problem you want to
> solve?

Doing I/O with applications not written in a vacuum. Encoding and decoding
binary network protocols and serialized forms.

I have an alternative interface into libiconv that I use a lot and like
very much:

  size_t dec_mbsncpy(char **src,
      size_t sn,
      char *dst,
      size_t dn,
      int cn,
      const char *fromcode);

This decodes the string at src as fromcode (e.g. "EUC-JP") into dst as the
locale encoding (e.g. UTF-8). Unlike iconv this DOES stop when a decoded
UCS code of 0 is encountered. Also, if dst is NULL it goes through the
conversion but returns the precise number of bytes necessary to hold the
decoded string had dst not been NULL. There's also an enc_mbsncpy function
for encoding from the locale encoding to tocode.

To do this stuff with iconv is much harder and in some cases impossible
(i.e. topic of this post). The idea behind encdec [1] is to make what 90%
of what people want to do really easy.

The problem I'm faced with right now is that encdec is just non-standard.
It requires dragging 900K of code tables from libiconv into the mix. Right
now I have a project that I want to be very plain and portable so I want
to swap out my encdec routines with iconv ones. But the code is getting
ugly. I'll have to scan the src to determine it's length in advance.
That's slow and dangerous.

Mike

[1] http://www.ioplex.com/~miallen/encdec/



--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to