On Tuesday, January 13, 2015 at 10:38:23 PM UTC-5, ele...@gmail.com wrote: > > Probably right if the mutations for adding extensions etc are not > conveniently available with Vector{uint8}. >
It would certainly be possible to define these operations, e.g. concatenation of a string with a bytevector. But even then I think a bytevector would be the wrong choice. When I look at a filename, I don't want to see UInt8[0x66,0x6f,0x6f,0x2e,0x74,0x78,0x74], I want to see "foo.txt". And by returning a (potentially invalid) UTF8String, that's what I get in the *vast* majority of cases—non-UTF8 filenames seem to be pretty rare nowadays even on Unix systems (e.g. many GNU/Linux systems have defaulted to displaying filenames as UTF-8 for a decade now). Even for a non-UTF8 filename where I get mojibake, in most cases it will be in some other 1-byte superset of ASCII, so the displayed results will still be somewhat useful: I'd much rather see "Foo££££????.txt" than a list of byte values. I guess a third alternative would be to define an UnknownEncodingString type that stores an array of bytes and displays by default as UTF-8 (or even tries to guess the encoding) and supports concatenation and a few other carefully chosen operations, but not iteration over codepoints and other things that can't be implemented without knowing the encoding. The idea being to prevent programmers from trying to perform operations on filenames that may fail on strings with unknown encodings. But this seems like it would be a lot of hassle for little benefit these days.