<[EMAIL PROTECTED]> writes: > Yes, you support (and worry about) encodings simply because of a C limitation > dating from 1974, if I recall correctly... > In Java, for example, a "char" is a very well defined datum, namely a Unicode > point. While in C it can be some char or another (or an error!) depending on > what encoding was used.
No, you're being confused by C's idiosyncratic terminology. "char" in C just means 1-byte integral data type. If you want to store a unicode code point you use a different data type. Incidentally I'm not sure but I don't think it's true that "char" in Java stores a unicode code point. I thought Java used UTF16 internally for strings and strings stored arrays of chars. In which case "char" in Java stores two bytes of a UTF16 encoded string which is pretty analogous to storing UTF8 encoded strings in C where each "char" stores one byte of a UTF8 encoded string. > Think of it this way: if I give you a Java String you will perfectly know > what > I meant; if I send you a C char* you don't know what it is in the absence of > extra information - you can even use it as a uint8*, as it is actually done > in md5.c. That's because you're comparing apples to oranges. In C you don't even know if a char* is a string at all. It's a pointer to some bytes and those could contain anything. And think about what happens in Java if you have to deal with UTF8 encoded strings or Big5 encoded strings. They aren't "strings" in the Java object hierarchy so when someone passes you a "MyString" you have the same problems of needing to know what encoding was used. Presumably you would put that in a member variable of the MyString class but that just goes to how the data structures in C are laid out and what you're considering "extra information". -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq