On Fri, Dec 02, 2022 at 02:32:47PM -0800, Ali Çehreli via Digitalmars-d-learn wrote: > On 12/2/22 13:44, rikki cattermole wrote: > > > Yeah you're right, its code unit not code point. > > This proves yet again how badly chosen those names are. I must look it > up every time before using one or the other. > > So they are both "code"? One is a "unit" and the other is a "point"? > Sheesh! [...]
Think of Unicode as a vector space. A code point is a point in this space, and a code unit is one of the unit vectors; although some points can be reached with a single unit vector, to get to a general point you need to combine one or more unit vectors. Furthermore, the set of unit vectors you have depends on which coordinate system (i.e., encoding) you're using. Reencoding a Unicode string is essentially changing your coordinate system. ;-) (Exercise for the reader: compute the transformation matrix for reencoding. :-P) Also, a grapheme is a curve through this space (you *graph* the curve, you see), and as we all know, a curve may consist of more than one point. :-D (Exercise for the reader: what's the Hausdorff dimension of the set of strings over Unicode space? :-P) T -- First Rule of History: History doesn't repeat itself -- historians merely repeat each other.