On 2020-12-29, Walter Dnes <waltd...@waltdnes.org> wrote:
> On Tue, Dec 29, 2020 at 05:11:36PM +0200, Andreas K. Huettel wrote
>> Hi Walter, 
>> 
>> > "-pch -roaming -sendmail -spell -tcpd -udev -udisks -unicode -upower
>> > -xinerama"
>> 
>> mostly out of curiosity, why do you want to disable unicode support
>> here?
>> 
>> This feels odd to me since utf8 has effectively become the standard
>> encoding over the past years.
>
>   I don't know if this has improved over the years, but my initial
> experience with unicode was rather negative.  The fact that text
> files were twice as large wasn't a major problem in itself.  The
> real showstopper was that importing text files into spreadsheets
> and text-editors and word processors failed miseraby.

You must be talking about some sort of weird "wide" encoding (is there
such a thing as UTF-16?). I've never seen a file like that.  Everybody
and everything uses UTF-8 these days and has for years. UTF-8 is a
superset of ASCII, and doesn't increase size of the file unless
non-ascii characters are used. Converting an ASCII file to UTF-8
encoding is a noop. An ASCII file _is_ UTF-8.

>   I looked at a unicode text file with a binary viewer.  It turns out
> that a simple text string like "1234" was actually...
>
> "1" binary-zero "2" binary-zero "3" binary-zero "4" binary zero, etc.
>
>   This padding explains why the file was twice as large, and also why
> "a simple textfile import" failed miserably.

I've never seen a file like that. All the Unicode I run into is UTF-8,
and a UTF-8 file with the string "1234" is the same exact 4 bytes as
an ASCII file with the string "1234".

>   On top of that Cyrillic letters like "m", "i", "c", and "o" are
> considered different from their English equivalants.  Security experts
> showed proof-of-cocept attacks where clicking on "microsoft.com" can
> take you to a hostile domain (queue the jokes).  I don't speak or read
> or write any languages which have thousands of unique characters.
> Seeing Chinese spam "as it was intended to be seen", is not a priority
> for me.





Reply via email to