On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:
On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
It's not hard. I think a lot of us remember when a 14.4 modem
was cutting-edge.
Well, then apparently you're unaware of how bloated web pages
are nowadays. It used to take me minutes to download popular
web pages _back then_ at _top speed_, and those pages were a
_lot_ smaller.
It's telling that you think the encoding of the text is anything
but the tiniest fraction of the problem. You should look at
where the actual weight of a "modern" web page comes from.
Codepages and incompatible encodings were terrible then, too.
Never again.
This only shows you probably don't know the difference between
an encoding and a code page,
"I suggested a single-byte encoding for most languages, with
double-byte for the ones which wouldn't fit in a byte. Use some
kind of header or other metadata to combine strings of different
languages, _rather than encoding the language into every
character!_"
Yeah, that? That's codepages. And your exact proposal to put
encodings in the header was ALSO tried around the time that
Unicode was getting hashed out. It sucked. A lot. (Not as bad
as storing it in the directory metadata, though.)
Well, when you _like_ a ludicrous encoding like UTF-8, not
sure your opinion matters.
It _is_ kind of ludicrous, isn't it? But it really is the
least-bad option for the most text. Sorry, bub.
I think we can do a lot better.
Maybe. But no one's done it yet.
The vast majority of software is written for _one_ language,
the local one. You may think otherwise because the software
that sells the most and makes the most money is
internationalized software like Windows or iOS, because it can
be resold into many markets. But as a percentage of lines of
code written, such international code is almost nothing.
I'm surprised you think this even matters after talking about web
pages. The browser is your most common string processing
situation. Nothing else even comes close.
largely ignoring the possibilities of the header scheme I
suggested.
"Possibilities" that were considered and discarded decades ago by
people with way better credentials. The era of single-byte
encodings is gone, it won't come back, and good riddance to bad
rubbish.
I could call that "trolling" by all of you, :) but I'll instead
call it what it likely is, reactionary thinking, and move on.
It's not trolling to call you out for clearly not doing your
homework.
I don't think you understand: _you_ are the special case.
Oh, I understand perfectly. _We_ (whoever "we" are) can handle
any sequence of glyphs and combining characters (correctly-formed
or not) in any language at any time, so we're the special case...?
Yeah, it sounds funny to me, too.
The 5 billion people outside the US and EU are _not the special
case_.
Fortunately, it works for them to.
The problem is all the rest, and those just below who cannot
afford it at all, in part because the tech is not as efficient
as it could be yet. Ditching UTF-8 will be one way to make it
more efficient.
All right, now you've found the special case; the case where the
generic, unambiguous encoding may need to be lowered to something
else: people for whom that encoding is suboptimal because of
_current_ network constraints.
I fully acknowledge it's a couple billion people and that's
nothing to sneeze at, but I also see that it's a situation that
will become less relevant over time.
-Wyatt