On Wed, Apr 28, 2004 at 03:30:07PM -0700, Jeff Clites wrote:
: Outside. Conceptually, JPEG isn't a string any more than an XML 
: document is an MP3.

I'm not vehemently opposed to redefining the meaning of "string"
this way, but I would like to point out that the term used to have
a more general meaning.  Witness terms like "bit string".

: Some languages make this very clear by providing a separate data type 
: to hold a "blob of bytes". Java uses a byte[] for this (an array of 
: bytes), rather than a String. And Objective-C (via the Foundation 
: framework) has an NSData class for this (whereas strings are 
: represented via NSString).

Another approach is to say that (in general) strings are sequences
of abstract integers, and byte strings (and their ilk) impose size
constraint, while text strings impose various semantic constraints.
This is more in line with the historical usage of "string".

: Now, languages such as Perl5 can get away with trojaning binary data 
: into a string, because some encodings (for example, ISO-8859-1 and 
: MacRoman) have the property that any sequence of bytes can be decoded 
: into a string. That is, you can take an arbitrary blob of bytes, and 
: _pretend_ that it represents textual data encoded in ISO-8859-1 (for 
: example). But it's sort of a hack, and subverts the semantic purpose of 
: a string.

Hmm, that implies a logical ordering of constraints that was not
present at the time.

: (And it implies that you can uppercase a JPEG, for instance). 
: Only some encodings let you get away with this--for example, not every 
: byte sequence is valid UTF-8, so an arbitrary byte blob likely wouldn't 
: decode if you tried to pretend that it was the UTF-8-encoded version of 
: something. The major practical downside of doing something like this is 
: that it leads to confusion, and propagates the viewpoint that a string 
: is just a blob of bytes. And the conceptual downside is that if a 
: string is fundamentally intended to represent textual data, then it 
: doesn't make much sense to use it to represent something non-textual.

I think of a string as a fundamental data type that can be *used* to
represent text when properly typed.  But strings are more fundamental
than text--you can have a string of tokens, for instance.  Just because
various string types were confused in the past is no reason to settle
on a single string type as "the only true string".  If you can do it,
fine, but you'll have to come up with a substitute name for the more
general concept, or you're going to be fighting the culture continually
from here on out.  I don't like culture wars...

I'm speaking strictly on a cultural level there.  I'm certainly of
the opinion that Perl 6's Str type should assume textiness, and that
bit or byte or object strings should be declared some other way.
Alternately, the term "string" could be relegated to the category of
things that are too general to instantiate, and then we force text
strings to be declared as Text or some such.  "String" would become
a role or some such instead.  But that's language design, and I'm in
the wrong list for that...

Larry

Reply via email to