Re: String rationale

James Mastros Mon, 29 Oct 2001 14:09:38 -0800

On Mon, Oct 29, 2001 at 08:32:16PM +0000, Tom Hughes wrote:
> We have established that the first two will not work because of the
> unicode problem.
Hm.  I think instead of requiring Unicode to support everything, we should
require Unicode to support /nothing/.  If A and B have no mutual transcoding
function, we should use Unicode as a intermediary.  (This means that
charsets that are lossy to unicode need to transcode to eachother directly,
like Far Eastern sets.  (And Klingon, but that can't transcode to anything.))


This still makes Unicode a special case, but not a terrible one.  (In fact,
unicode can be treated like any other charset, except when we want to
trancode between mutualy incompatable sets, since we always try both A->B
and A<-B.

(Notational note: A->B means that A is implementing a transcoding from itself
to B.  A<-B means that A is implementing a transcoding from B to A.)

> That leaves the third, which is what I have implemented. When looking to
> transcode from A to B it will first ask A if can it transcode to B and
> if that fails then it will ask B if it can transcode from A.
I propose another variant on this:
If that fails, it asks A to transcode to Unicode, and B to transcode from
Unicode.  (Not Unicode to transcode to B; Unicode implements no transcodings.)

> The problem it raises is, whois reponsible for transcoding from ASCII to
> Latin-1? and back again? If we're not careful both ends will implement
> both translations and we will have effective duplication.
1) Neither.  Each must support transcoding to and from Unicode.
2) But either can support converting directly if it wants.

I also think that, for efficency, we might want a "7-bit chars match ASCII"
flag, since most charactersets do, and that means that we don't have to deal
with the overhead for strings that fit in 7 bits.  This smells of premature
optimization, though, so sombody just file this away in their heads for
future reference.

That would also mean that neither is responsible for converting between
Latin-1 and ASCII, because core will do it, most of the time, and the rest
of the time, it isn't possible.

Hm.  But it isn't possible _losslessly_, though it is possibly lossfuly.
IMHO, there should be two ways to transcode, or the transcoding function
should flag to it's caller somehow.

(Sorry for the train-of-thought, but I think it's decently clear.)

(BTW, for those paying attention, I'm waiting on this discussion for my
chr/ord patch, since I want them in terms of charsets, not encodings.)

       -=- James Mastros

Re: String rationale

Reply via email to