Re: [julia-users] readdir returns inconsistent types

Steven G. Johnson Wed, 14 Jan 2015 09:05:29 -0800


On Tuesday, January 13, 2015 at 10:38:23 PM UTC-5, ele...@gmail.com wrote:
>
> Probably right if the mutations for adding extensions etc are not 
> conveniently available with Vector{uint8}.
>


It would certainly be possible to define these operations, e.g. 
concatenation of a string with a bytevector.   But even then I think a 
bytevector would be the wrong choice.   When I look at a filename, I don't 
want to see UInt8[0x66,0x6f,0x6f,0x2e,0x74,0x78,0x74], I want to see 
"foo.txt".  And by returning a (potentially invalid) UTF8String, that's 
what I get in the *vast* majority of cases—non-UTF8 filenames seem to be 
pretty rare nowadays even on Unix systems (e.g. many GNU/Linux systems have 
defaulted to displaying filenames as UTF-8 for a decade now).  Even for a 
non-UTF8 filename where I get mojibake, in most cases it will be in some 
other 1-byte superset of ASCII, so the displayed results will still be 
somewhat useful: I'd much rather see "FooÂ£Â£Â£Â£????.txt" than a list of 
byte values.

I guess a third alternative would be to define an UnknownEncodingString 
type that stores an array of bytes and displays by default as UTF-8 (or 
even tries to guess the encoding) and supports concatenation and a few 
other carefully chosen operations, but not iteration over codepoints and 
other things that can't be implemented without knowing the encoding.  The 
idea being to prevent programmers from trying to perform operations on 
filenames that may fail on strings with unknown encodings.   But this seems 
like it would be a lot of hassle for little benefit these days.

Re: [julia-users] readdir returns inconsistent types

Reply via email to