On Wednesday, 1 June 2016 at 16:45:04 UTC, Joakim wrote:
On Wednesday, 1 June 2016 at 15:02:33 UTC, Wyatt wrote:
It's not hard. I think a lot of us remember when a 14.4 modem was cutting-edge.

Well, then apparently you're unaware of how bloated web pages are nowadays. It used to take me minutes to download popular web pages _back then_ at _top speed_, and those pages were a _lot_ smaller.

It's telling that you think the encoding of the text is anything but the tiniest fraction of the problem. You should look at where the actual weight of a "modern" web page comes from.

Codepages and incompatible encodings were terrible then, too.

Never again.

This only shows you probably don't know the difference between an encoding and a code page,

"I suggested a single-byte encoding for most languages, with double-byte for the ones which wouldn't fit in a byte. Use some kind of header or other metadata to combine strings of different languages, _rather than encoding the language into every character!_"

Yeah, that? That's codepages. And your exact proposal to put encodings in the header was ALSO tried around the time that Unicode was getting hashed out. It sucked. A lot. (Not as bad as storing it in the directory metadata, though.)

Well, when you _like_ a ludicrous encoding like UTF-8, not sure your opinion matters.

It _is_ kind of ludicrous, isn't it? But it really is the least-bad option for the most text. Sorry, bub.

I think we can do a lot better.

Maybe.  But no one's done it yet.

The vast majority of software is written for _one_ language, the local one. You may think otherwise because the software that sells the most and makes the most money is internationalized software like Windows or iOS, because it can be resold into many markets. But as a percentage of lines of code written, such international code is almost nothing.

I'm surprised you think this even matters after talking about web pages. The browser is your most common string processing situation. Nothing else even comes close.

largely ignoring the possibilities of the header scheme I suggested.

"Possibilities" that were considered and discarded decades ago by people with way better credentials. The era of single-byte encodings is gone, it won't come back, and good riddance to bad rubbish.

I could call that "trolling" by all of you, :) but I'll instead call it what it likely is, reactionary thinking, and move on.

It's not trolling to call you out for clearly not doing your homework.

I don't think you understand: _you_ are the special case.

Oh, I understand perfectly. _We_ (whoever "we" are) can handle any sequence of glyphs and combining characters (correctly-formed or not) in any language at any time, so we're the special case...?

Yeah, it sounds funny to me, too.

The 5 billion people outside the US and EU are _not the special case_.

Fortunately, it works for them to.

The problem is all the rest, and those just below who cannot afford it at all, in part because the tech is not as efficient as it could be yet. Ditching UTF-8 will be one way to make it more efficient.

All right, now you've found the special case; the case where the generic, unambiguous encoding may need to be lowered to something else: people for whom that encoding is suboptimal because of _current_ network constraints.

I fully acknowledge it's a couple billion people and that's nothing to sneeze at, but I also see that it's a situation that will become less relevant over time.

-Wyatt

Reply via email to