Re: Regarding hex strings

foobar Fri, 19 Oct 2012 11:51:26 -0700

On Friday, 19 October 2012 at 15:07:44 UTC, Don Clugston wrote:

On 19/10/12 16:07, foobar wrote:
On Friday, 19 October 2012 at 13:19:09 UTC, Don Clugston wrote:
We can still have both (assuming the code points arevalid...):
string foo = "\ua1\ub2\uc3"; // no .dup
That doesn't compile.
Error: escape hex sequence has 2 hex digits instead of 4
Come on, "assuming the code points are valid". It says so 4lines above!
It isn't the same.
Hex strings are the raw bytes, eg UTF8 code points. (ie, itincludes the high bits that indicate the length of each char).
\u makes dchars.
"\u00A1" is not the same as x"A1" nor is it x"00 A1". It's twonon-zero bytes.

Yes, the \u requires code points and not code-units for aspecific UTF encoding, which you are correct in pointing out arefour hex digits and not two.This is a very reasonable choice to prevent/reduce Unicodeencoding errors.


http://dlang.org/lex.html#HexString states:

"Hex strings allow string literals to be created using hex data.The hex data need not form valid UTF characters."

I _already_ said that I consider this a major semantic bug as itviolates the principle of least surprise - the programmer'sexpectation that the D string types which are Unicode accordingto the spec to, well, actually contain _valid_ Unicode and _not_arbitrary binary data.Given the above, the design of \u makes perfect sense for_strings_ - you can use _valid_ code-points (not code units) inhex form.

For general purpose binary data (i.e. _not_ UTF encoded Unicodetext) I also _already_ said IMO should be either stored asubyte[] or better yet their own types that would ensure thecorrect invariants for the data type, be it audio, video, or justa different text encoding.

In neither case the hex-string is relevant IMO. In the former itpotentially violates the type's invariant and in the latter wealready have array literals.

Using a malformed _string_ to initialize ubyte[] IMO is simplyless readable. How did that article call such features, "WAT"?

Re: Regarding hex strings

Reply via email to