On Mon, 26 Nov 2018 at 15:21, Guy Dunphy via cctalk <cctalk@classiccmp.org> wrote:
> Defects in the ASCII code table. This was a great improvement at the time, > but fails to implement several utterly essential concepts. The lack of these > concepts in the character coding scheme underlying virtually all information > processing since the 1960s, was unfortunate. Just one (of many) bad > consequences has been the proliferation of 'patch-up' text coding schemes > such as proprietry document formats (MS Word for eg), postscript, pdf, html > (and its even more nutty academia-gone-mad variants like XML), UTF-8, unicode > and so on. This is fascinating stuff and I am very interested to see how it comes out, but I think there is a problem here which I wanted to highlight. The thing is this. You seem to be discussing what you perceive as _general_ defects in ASCII, but they are I think not _general_ defects. They are specific to your purpose, and I don't know what that is exactly, but I have a feeling it is not a general overall universal goal. Just consider what "A.S.C.I.I." stands for. [1] it's American. Yes it has lots of issues internationally, but it does the job well for American English. As a native English speaker I rue the absence of £ but the fact that Americans as so unfamiliar with the symbol that they even appropriate its name for the unrelated # which already had a perfectly good name of its own, but ASCII is American and Americans don't use £. Fine. [2] The "I.I." bit. Historical accidents aside, vestigial traces of specific obsolete hardware implementations, it's _not a markup language_. Its function is unrelated to those of HTML or XML or anything like that. It's for "information interchange". That means from computer or program to other computer or program. It's an encoding and that's all. We needed a standard one. We got it. It has flaws, many flaws, but it worked. No it doesn't contain æ and å and ä and ø and ö. That's a problem for Scandinavians. It doesn't contain š and č and ṡ and ý (among others) and that's a problem for Roman-alphabet-using Slavs. Even broadening the discussion to 8-bit ANSI... It does have a very poor way of encoding é and à and so on, which indicates the relative importance of Latin-language users in the Americas, compared to Slavs and so on. But markup languages, formatting, control signalling, all that sort of stuff is a separate discussion to encoding standards. Attempt to bring them into encoding systems and the problem explodes in complexity and becomes insoluble. Additionally, it also makes a bit of a mockery of OSes focussed on raw text streams, such as Unix, and whereas I am no great lover of Unix, it does provide me with a job, and less headaches than Windows. So, overall, all I wanted to say was: identify the problem domain specifically and how to separate that from over, *overlapping* domains before attacking ASCII for weaknesses that are not actually weaknesses at all but indeed strengths for a lot of its use-cases. Saying that, I'd really like to read more about this project. It looks like it peripherally intersects with one of my own big ones. -- Liam Proven - Profile: https://about.me/liamproven Email: lpro...@cix.co.uk - Google Mail/Hangouts/Plus: lpro...@gmail.com Twitter/Facebook/Flickr: lproven - Skype/LinkedIn: liamproven UK: +44 7939-087884 - ČR (+ WhatsApp/Telegram/Signal): +420 702 829 053