On Apr 28, 2004, at 4:57 AM, Bryan C. Warnock wrote:

{snipped, obviously}

Hmmm... very good.

One question.

Does (that which the masses normally refer to as) binary data
fall inside or outside the scope of a string?

Outside. Conceptually, JPEG isn't a string any more than an XML document is an MP3.


Some languages make this very clear by providing a separate data type to hold a "blob of bytes". Java uses a byte[] for this (an array of bytes), rather than a String. And Objective-C (via the Foundation framework) has an NSData class for this (whereas strings are represented via NSString).

Now, languages such as Perl5 can get away with trojaning binary data into a string, because some encodings (for example, ISO-8859-1 and MacRoman) have the property that any sequence of bytes can be decoded into a string. That is, you can take an arbitrary blob of bytes, and _pretend_ that it represents textual data encoded in ISO-8859-1 (for example). But it's sort of a hack, and subverts the semantic purpose of a string. (And it implies that you can uppercase a JPEG, for instance). Only some encodings let you get away with this--for example, not every byte sequence is valid UTF-8, so an arbitrary byte blob likely wouldn't decode if you tried to pretend that it was the UTF-8-encoded version of something. The major practical downside of doing something like this is that it leads to confusion, and propagates the viewpoint that a string is just a blob of bytes. And the conceptual downside is that if a string is fundamentally intended to represent textual data, then it doesn't make much sense to use it to represent something non-textual.

JEff



Reply via email to