Re: Punched tape (was: "Re: American English translation of character names")
Anto'nio Martins-Tuva'lkin wrote: > Anyway -- it was space for three wholes, the small whole for the > tractor wheel, and space for four more, IIRC. > > |O OoOO | > ... > > Any bells ringing? Wouldn't this be a nice "complete" set of chars to > be encoded, a la Braille patterns?... This is not so much a script as a UTF. (In fact, Ken Whistler has already done something similar as a joke; search the Unicode mailing list archives for "BTF".) The analogy with Braille is tempting, but Braille has mappings to many other alphabets besides the commonly seen English/Latin mapping. There is Cyrillic Braille, Hebrew Braille, kana Braille, etc. More importantly, there is the concept of "Level 2 Braille" in which a single dot pattern or a combination of two or three is assigned a meaning that varies depending on context, and is not always mnemonically derived from the individual letters. Punched-tape codes and card codes don't have these characteristics. You can find more codes for punched cards and tape, as well as internal codes for early computers, at Dik Winter's site: http://homepages.cwi.nl/~dik/english/codes/ or at Roman Czyborra's site, rumored to be at http://czyborra.com but usually not available. BTW, speaking of Roman, thanks to everyone who responded to my inquiries about his whereabouts. Actually, I admit I was primarily interested in what had happened to his *site*, since it was (and still is) usually unavailable. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
RE: Punched tape (was: "Re: American English translation of character names")
Your 7-bit paper tape system was rather unusual, actually, and was not a Telex system. Telex systems used what was then termed a "5-level code"; i.e., 5 bits. The code was often called "baudot" but its formal name was International Telegraph Alphabet #2 (ITA2). It was standardized by the International Telegraph Union (now International Telecommunications Union, the second-oldest international treaty body and now a specialty agency of the UN). ITA2 provided 32 code points. Several of these were reserved for special functions: carriage return, line feed, "letters" (forced a shift into letters case for subsequent characters), "figures" (forced a shift into figures and punctuation for subsequent characters), space and "blank". The remaining 26 code points represented A-Z (in letters case) and punctuation, digits, and other special symbols (vulgar fractions, meteorological symbols, bell signal, etc depending on local conventions). These systems used paper tape with 2 holes, a tractor hole for feeding the tape, and then 3 holes in a column; each of the 5 holes represented one of the bits in the 5-bit encoding. Later, ASCII paper tape systems became common; many of these used 3 holes, a tractor hole, and 5 holes to represent 8-bit encodings (including a parity bit). "7-level" tape systems were used in some special applications. Some of the ones that I encountered were based on ASCII without the 8th parity bit; others used special encodings to control typesetting equipment. Two-level tape systems were used throughout the first six decades of the 20th century to key submarine telegraph cables. All of these tape systems were mechanisms for storing information. They aren't alphabets. -- Eric Scace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Anto'nio Martins-Tuva'lkin Sent: 2004 January 6 15:05 To: [EMAIL PROTECTED] Subject: Punched tape (was: "Re: American English translation of character names") On 2003.12.19, 00:24, Carl W. Brown <[EMAIL PROTECTED]> wrote: > Jill, > >> I'm a programmer, and I'm older than most programmers. I'm old enough >> to remember punched paper tape ... <...> > Yes I worked with paper tape as well. I even worked on one machine > that I would write programs on paper tape loops. Well, I'm only 34 but I did work with one of these, on a telex machine for terminal. I still keep some reels of tape, punched with my some of high school stuff. Anyway -- it was space for three wholes, the small whole for the tractor wheel, and space for four more, IIRC. |O OoOO | |O oOOO | | OOo O O| |OO oOO | | O o OO| |OO o| |O Oo OO | |O o| | OOo OOO| |O OoO | | OOo OOO| |O o OOO| Any bells ringing? Wouldn't this be a nice "complete" set of chars to be encoded, a la Braille patterns?... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| Rua Alberto Bramão, 8-1º d.to | PT-1700-132 LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Punched tape (was: "Re: American English translation of character names")
On 2003.12.19, 00:24, Carl W. Brown <[EMAIL PROTECTED]> wrote: > Jill, > >> I'm a programmer, and I'm older than most programmers. I'm old enough >> to remember punched paper tape ... <...> > Yes I worked with paper tape as well. I even worked on one machine > that I would write programs on paper tape loops. Well, I'm only 34 but I did work with one of these, on a telex machine for terminal. I still keep some reels of tape, punched with my some of high school stuff. Anyway -- it was space for three wholes, the small whole for the tractor wheel, and space for four more, IIRC. |O OoOO | |O oOOO | | OOo O O| |OO oOO | | O o OO| |OO o| |O Oo OO | |O o| | OOo OOO| |O OoO | | OOo OOO| |O o OOO| Any bells ringing? Wouldn't this be a nice "complete" set of chars to be encoded, a la Braille patterns?... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| Rua Alberto Bramão, 8-1º d.to | PT-1700-132 LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
RE: American English translation of character names
> particularly the 1930s, 40s, 50s, and 60s sections, and > follow the many > links from each entry. In particular, you can see the basic > character set > of the IBM 360 (as generated by the IBM 29 Card Punch) here: > > http://www.columbia.edu/acis/history/029.html > > (scroll down a bit after the photo). http://www.unicode.org/Public/MAPPINGS/VENDORS/IBM/IBM360.TXT 404 Not Found :-)
RE: American English translation of character names
Jill, >I'm a programmer, and I'm older than most >programmers. I'm old enough to remember >punched paper tape ... but not quite old >enough to remember punched cards. Don't feel bad. My first job was for IBM helping them set up a production line for the 1401. This was a computer that had no vacuum tubes. All transistors. The plant manager was not happy because it was taking up valuable floor space that could have been used to build time clocks. After all this was IBM's real business not computers. It used punched cards however when the 360 came out and IBM switched to EBCDIC they changed the punch card encoding. The special characters changed to accommodate new characters like the not sign. The 024 keypunches had to be replaced. Be careful because even after the 360, other companies like Control Data continued to use the old BCD punch card encoding. Yes I worked with paper tape as well. I even worked on one machine that I would write programs on paper tape loops. You were very limited in what you could write. Branching was difficult. The first APL system that I worked on was the 5101. It was a predecessor of the PC (5150) It had an APL keyboard. They also offered Basic but the Basic was very limited. For example strings were limited to 18 characters. No bad for a $30,000+ machine. Carl
RE: American English translation of character names
Philippe Verdy > Isn't a caron a model (or trademark?) for crochet hooks? > When I look at some handwritten texts using hacek, it looks much > more like a rounded and oblique crochet hook than to a > reversed circumflex (as seen in Unicode charts). > > The handwritten hacek glyph looks approximately like this, > it is completely rounded without the angular shape: > (select a monospace font to view it) > > ## > ### > ### >### ### > ### >###### > > > ## > > It is easily read distinctly from the breve and accute accents, > and it's not even a mirrored comma above. > The glyph is visibly drawn as a continuous stroke from the > middle-left to the thiner upper-right. I should have noted also that this handwritten glyph is coherent with its possible notation on the right side of letters with large ascenders, notably D, L, l and T. Which makes sense in that case, because this apostrophe is also more or less interpreted as a variant of the accute accent, and not a simply reversed circumflex. "Hacek" (pronounced hatchek, with the 'h' expirated, and with 'a' pronounced nearly like a short schwa) also means "little hook" in Czech... So the rounded "hook" glyph makes sense here, where the angular shape in Unicode charts is suspect and may have come from a historic bad interpretation of the Czech hatchek accent of by other latinists and typographers, who may have just borrowed the same metal shape used for circumflex to print Czech texts. If someone can find in a Czech library some old Handwritten scripts or even some source of Czech calligraphy, we could see if the angular modern form of hacek corresponds to its initial shape. __ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com <>
Re: American English translation of character names
On Thursday 2003.12.18 04:05:53 -0800, Peter Kirk wrote: > On 18/12/2003 02:51, Arcane Jill wrote: > > >... > >In fact, until Kenneth Whistler's email about American English - I > >actually thought the Unicode character names /were/ in American > >English, because they are certainly not in my native dialect (although > >I did know that most Americans don't say "full stop"). Rest assured, > >Kenneth, we in Britain do /not/ refer to slash as "solidus", > >underscore as "low line", backslash as "reverse solidus", paragraph > >sign as "pilcrow sign", and so on. I have no idea where these terms > >came from, but, take it from someone who lives here, they are not in > >common usage in Britain. (With the exceptions of "full stop" and > >"anticlockwise"). Curious -- I wonder where those "official" names > >came from? > > > They are not the names used by British programmers. But they are perhaps > the names which were used by British typesetters, and maybe American > ones too, in the old days of hot metal. > > > > >I've never attached any importance to the "proper" names (and I'm also > >a programmer). In fact, I don't even see why a Unicode character /has/ > >to have a "proper name" at all. ASCII characters never had them. And, > >hey - the official names for CJK Unified Ideographs Extension A (for Hopefully most of you will agree that having official names for Unicode characters in ASCII-only English is very useful when various characters get discussed on mailing lists such as this one. It saves having to look up hex values endlessly, since many still don't have (or, as in my case, don't always have access to) Unicode-enabled email clients. I personally think that it is an *interesting* omission that the CJK ideographs do not have meaningful names. I'm probably going to be just opening up a can of worms by suggesting a meaningful CJK ideograph naming system (and I fully expect lots of comments back from the experts to the tune of "Yes, the CJK group considered all manner of things like this before, but it wouldn't work because of X, Y, and Z..." or "You really don't know what you are talking about"). But assuming that risk, I'm going to say it anyway and give some reasons for why I would do it this way: A useful system for naming CJK ideographs would be to construct names by stringing together: (1) An indicator if the character is simplified (SIMPLIFIED) or traditional (TRADITIONAL) for ideographs originating in China which come in both traditional and simplified forms, or an indicator for a variant form (VARIANT) if an encoded variant of another more commonly-used glyph. Omit indicator if the character of Chinese origin only comes in one form. If the character was "invented" by the Japanese, use "JAPANESE" as the indicator. If invented by the Koreans, use "KOREAN" as the indicator. If invented by the Vietnamese, use "VIETNAMESE" as the indicator. (2) If the character is used in Chinese, then the primary pronounciation of the ideograph in modern standard Mandarin Chinese using pinyin followed by a digit 1-4 to indicate the tone under the primary pronounciation. If the character does not appear in Chinese but rather was invented by the Japanese, Korean, or historical Vietnamese, then provide the primary pronounciation in Japanese if used in Japan, Korean if used only in Korea, Vietnamese if use historically only in Vietnam. (3) The primary meaning of the character in english according to the primary language in which that character appears. For example: ç u7231 SIMPLIFIED AI4 LOVE æ u611B TRADITIONAL AI4 LOVE æ u6208 GE1 SPEAR ç u70BA TRADITIONAL WEI2 TO BE ç u7232 VARIANT WEI2 TO BE å u5713 TRADITIONAL YUAN2 CIRCLE å u5186 JAPANESE EN YEN Standardized names such as these, at least for the BMP CJK characters, would make it pretty clear to most knowledgeable readers what characters were being discussed even when unable to see the glyphs for whatever reasons. Perhaps more importantly, if this were in the unihan database, which is the database that most developers are going to access first, it would be trivial to query out various useful subsets of ideographs, such as the TRADITIONAL vs. SIMPLIFIED (vs. the "Doesn't change" subset), or those that are uniquely JAPANESE, etc. I'm not saying it would be the complete solution for everything -- of course not. But it would put this information "at ones fingertips", so to speak, in a prominent database that many people look at. > >example) tell me nothing more than the script and codepoint anyway. I > >tend to regard them as "comments". > > > Agreed. The names are useful for selecting a character from a drop-down > list. But they are only useful if they are accurate. I agree with Doug > that "As a programmer, I can't personally imagine designing a program > that relies on the Unicode names to identify characters uniquely". I > suspect that
RE: American English translation of character names
>Yes, I did both cards and punched paper tape as a teenager. > I did them too. Nothing to do with Unicode, but those who would like an introduction to punched cards and early computing (mainly IBM oriented) are welcome to take a look at this: http://www.columbia.edu/acis/history/ particularly the 1930s, 40s, 50s, and 60s sections, and follow the many links from each entry. In particular, you can see the basic character set of the IBM 360 (as generated by the IBM 29 Card Punch) here: http://www.columbia.edu/acis/history/029.html (scroll down a bit after the photo). And for a fascinating (to some :-) history of the early development of IBM and ASCII character sets, see: Mackenzie, Charles E., Character Sets, History and Development, Addison-Wesley (1980). It might be surprising to learn that there was almost as much discussion, argument, and compromise over the early 64- and 96-character and 8-bit character sets as there is today over the worldwide Universal Character Set. Well, maybe not so surprising since the demand for including characters was so great and the space so small. - Frank
RE: American English translation of character names
Hi Jill -- I'll try to answer your questions. Yes, I did both cards and punched paper tape as a teenager. In fact, I used paper tape on Teletype Corp model 33 ASR teleprinter machines. Sigh: didn't even think I was dating myself that badly *grin*. I was lucky: my father got involved in computer programming very early on and he was thoughtful enough to teach me a couple of languages, carry my card decks to work, bring print-outs home, etc. Slow turnaround but... Mechanical teleprinter machines were a special treat for me. I fell in love with them as a 14 year old kid when I saw them at the local meteorological office. I got some of my own around age 16, and learned out to maintain and repair them. When I got to college, I earned a lot of spending money as a free-lance repairman for all those Model 33 and 35 Teletype machines used as computer consoles in laboratories around campus. As best as I recall, some versions of IBM 360 FORTRAN compilers and PL/C used the not symbol. It was represented on punched cards as a an L-shaped character. Imagine an uppercase L, rotated 90? clockwise, and then reflected around the vertical axis so that the downward stroke is on the right. I haven't looked at the U+00AC glyph to see if it is the same. If it is necessary to come up with some historical references, I'll check my college programming course material. I think the keypunch machines that produced the cards were known as IBM 2714s. I think C chose "!" as the negation operator (to be precise) because it was a widely-available glyph on common keyboards which did not yet have a meaning assigned to it. But it's all a bit arbitrary, this assignment of programming operators to glyphs. And... on the other aspects of the thread about keyboard layouts... laptop keyboards are often laid out rather differently when it comes to the less frequently used punctuation and diacritical marks. This makes it quite entertaining when one jumps between American, German, Finnish and French laptop computers, as I was forced to do on a recent trip to Albania. -- Eric Scace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Arcane Jill Sent: 2003 December 18 11:36 To: [EMAIL PROTECTED] Subject: RE: American English translation of character names > -Original Message- > From: Eric Scace [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 18, 2003 3:57 PM > To: John Cowan; Arcane Jill > Cc: [EMAIL PROTECTED] > Subject: RE: American English translation of character names > > >The logical "not" glyph got into EBCDIC because the > concept was needed in computer programming. I'm a programmer, and I'm older than most programmers. I'm old enough to remember punched paper tape ... but not quite old enough to remember punched cards. I am interested in this, though. Could you possibly clarify which computer language used (the EBCDIC equivalent of) U+00AC? I only ask because I'm not aware of one, and I'm intrigued. >In the late 1970s the C programming language was one of > the first to use the glyph "!" to mean logical "not"; e.g., > "!=". "!" is used to mean "logical not" in contexts other than just "not equal". As in, for example: bool b1 = ! b2; (although there wasn't a bool type back then). I remember that BASIC used the keyword "NOT" for the same purpose. C also uses "~" as a "bitwise not". So ... let me see if I have understood you correctly, because this is a tad confusing (but very interesting). You are saying that ... in the days of punched cards ... there was an EBCDIC code whose meaning was LOGICAL NOT. So far so good - but how would such a character code have been written? Was it written like the U+00AC glyph is now? Or did its visual appearance vary depending on who was writing it? Or ... did it even have a visual appearance at all? I figure that, if it didn't have the visual appearance of the U+00AC glyph then "logical not" would map better to Unicode character U+223C TILDE OPERATOR (also known as "not", according to the code charts) which at least looks like the character mathematicians use. On the other hand, if it did have U+00AC appearance then fair enough. > etc). Earlier keyboard languages used a different > workaround; e.g., "<>" for "not equal". Yeah, I always wondered why C chose to deploy ! to mean "not". Weird. Maybe they just picked a character at random and said "Ah yes - we'll use that one - no-one else seems to be using it for anything" Jill
RE: American English translation of character names
Arcane Jill wrote: You are saying that ... in the days of punched cards ... there was an EBCDIC code whose meaning was LOGICAL NOT. So far so good - but how would such a character code have been /written/? Was it written like the U+00AC glyph is now? Yes, exactly the same. It appeared in original EBCDIC in 1964. See http://homepages.cwi.nl/~dik/english/codes/stand.html#ebcdic It appeared on IBM mainframe terminal keyboards. It still appears on terminals in an EBCDIC environment. Jim Allan
RE: American English translation of character names
Michael Everson writes: > >John Cowan wrote: > >> The most mysterious term is "caron" for the hacek accent: this word > >> seems to exist only in ISO standards, and nobody has any idea where it > >> came from. > > This doesn't make any sense to me, but in any case it does not > explain the origin of the word "caron". The most plausible suggestion > I've ever come up with is folk-etymological: It's a CARet that sits > ON the vowel. :-( Isn't a caron a model (or trademark?) for crochet hooks? When I look at some handwritten texts using hacek, it looks much more like a rounded and oblique crochet hook than to a reversed circumflex (as seen in Unicode charts). The handwritten hacek glyph looks approximately like this, it is completely rounded without the angular shape: (select a monospace font to view it) ## ### ### ### ### ### ###### ## It is easily read distinctly from the breve and accute accents, and it's not even a mirrored comma above. The glyph is visibly drawn as a continuous stroke from the middle-left to the thiner upper-right. __ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com <>
RE: American English translation of character names
John Cowan wrote: > In the New York City subway system (of underground trains, that is, > not underground pedestrian tunnels!), this letter has been > consistently avoided since 1967, when the system of distinguishing trains > by letter or number was instituted. The only other letters never used are > I and O (presumably to avoid confusion with 1 and 0, though 0 has never > been used either), and Y. Why Y is a mystery to me: perhaps there has > simply never been a need for it. Probably, having to get train "Why?" to reach one's workplace could have a negative effect on employees' attitude towards hard working. _ Marco
RE: American English translation of character names
> -Original Message- > From: Eric Scace [mailto:[EMAIL PROTECTED]] > Sent: Thursday, December 18, 2003 3:57 PM > To: John Cowan; Arcane Jill > Cc: [EMAIL PROTECTED] > Subject: RE: American English translation of character names > > > The logical "not" glyph got into EBCDIC because the > concept was needed in computer programming. I'm a programmer, and I'm older than most programmers. I'm old enough to remember punched paper tape ... but not quite old enough to remember punched cards. I am interested in this, though. Could you possibly clarify which computer language used (the EBCDIC equivalent of) U+00AC? I only ask because I'm not aware of one, and I'm intrigued. > In the late 1970s the C programming language was one of > the first to use the glyph "!" to mean logical "not"; e.g., > "!=". "!" is used to mean "logical not" in contexts other than just "not equal". As in, for example: bool b1 = ! b2; (although there wasn't a bool type back then). I remember that BASIC used the keyword "NOT" for the same purpose. C also uses "~" as a "bitwise not". So ... let me see if I have understood you correctly, because this is a tad confusing (but very interesting). You are saying that ... in the days of punched cards ... there was an EBCDIC code whose meaning was LOGICAL NOT. So far so good - but how would such a character code have been written? Was it written like the U+00AC glyph is now? Or did its visual appearance vary depending on who was writing it? Or ... did it even have a visual appearance at all? I figure that, if it didn't have the visual appearance of the U+00AC glyph then "logical not" would map better to Unicode character U+223C TILDE OPERATOR (also known as "not", according to the code charts) which at least looks like the character mathematicians use. On the other hand, if it did have U+00AC appearance then fair enough. > etc). Earlier keyboard languages used a different > workaround; e.g., "<>" for "not equal". Yeah, I always wondered why C chose to deploy ! to mean "not". Weird. Maybe they just picked a character at random and said "Ah yes - we'll use that one - no-one else seems to be using it for anything" Jill
RE: American English translation of character names
Arcane Jill wrote: (Incidently, the code charts for U+00AC (NOT SIGN) also say "= angled dash (in typography)." So I'm still a bit confused about in which discipline it is actually known as "not sign"). The not sign is often used in logical notation in Boolean algebra or sentential logic. See http://whatis.techtarget.com/definition/0,,sid9_gci843775,00.html Other conventions are often used instead, especially use of the tilde. I believe, but could be mistaken, that use of tilde for "logical not" is older usage and that the specific "logical not" sign was introduced as a substitution because the tilde most often suggests approximation in mathematic use. The not sign is used on the IBM mainframe platform in some computer languages, notably REXX. See http://www.ilook.fsnet.co.uk/rexx/rexcmdc5.htm The backslash was also given the meaning "logical not" in REXX at some stage as an alternate in environments where the "logical not" sign was not available. Versions of REXX adapted to ASCII generally replace the "logical not" sign by either ~ or ^ or allow either as well as recognizing the backslash. See also http://www.uwm.edu/IMT/Computing/sasdoc8/sashtml/mindex/sc-index.htm for its use in another computer language. Use of ^ meaning "logical not" generally derives from the use of "^" as a translation of the proper not sign in text files from EBCDIC to ASCII where the two symbols are normally equated. For example, from http://www.printek.com/products/autoforms.html << The following commands use the logical not ( ) sign or a caret (^). IBM terminals generally have the logical not sign. PC's running a terminal emulation program have a caret. In either case, both characters are a shift 6 on the keyboard. >> Jim Allan
Re: American English translation of character names
At 09:01 -0500 2003-12-18, John Cowan wrote: "Underscore" would suggest rather U+0332, the combining low line. As for "pilcrow", it's probably descended from a perversion of "paragraph", but nobody knows for sure. The OED gives other forms for it: 15th-century pylcraft(e), pilecrafte; 16th-century pilcrowe; 17th-century pilkrow, pill-crow, peelcrow, pilgrow. Apparently for pilled crow, cf. pilcord, pilgarlic. The application of the word, with the form pylcraft, has suggested that it originated in a perversion of PARAGRAPH, through pargrafte, *parcrafte, etc.: cf quote c 1460 and 1617. But the history of the word is obscure, and evidence is wanting. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: American English translation of character names
Arcane Jill wrote: > Or, indeed, why the "proper" name for a character must be in > English, and spellable in ASCII, instead of, say, Japanese. The names are in English in the English version of the standard. The French version of 10646 appropriately has French names, not restricted to ASCII but to the repertoire of ISO 8859-15. See http://iquebec.ifrance.com/hapax/ListeDesNoms-4.0.0.txt (work in progress). -- François
RE: American English translation of character names
The logical "not" glyph got into EBCDIC because the concept was needed in computer programming. An example is the instruction that if A does not equal B, then do something. IBM picked up the glyph and incorporated it into its punch card systems. In the late 1970s the C programming language was one of the first to use the glyph "!" to mean logical "not"; e.g., "!=". This was a response to the use of mechanisms other than punch cards to enter program instructions (keyboards and CRTs, teletypewriters, etc). Earlier keyboard languages used a different workaround; e.g., "<>" for "not equal". (Apologies if this duplicates earlier information; I'm jumping into the thread rather late.) -- Eric Scace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of John Cowan Sent: 2003 December 18 09:31 To: Arcane Jill Cc: [EMAIL PROTECTED] Subject: Re: American English translation of character names Arcane Jill scripsit: > For example, U+00AC (NOT > SIGN) is something that most people I know would describe in terms like > "oh - you know - that character to the left of 'one' on a keyboard, if > you press shift". On the standard U.S. keyboard, that gesture generates ~. If I turn on the U.S.-International keyboard, then RightAlt-\ gives me the NOT SIGN, where \ is the rightmost key in the QWERTYUIOP row. > (And even then, the usual response is "Oh that one - > I've never used it. What's it for?". Curiously, to a mathematician, > tilde, overscore and prime are used in various contexts to mean "not > sign"; wheras to a programmer, exclamation mark and tilde could mean > "not sign". In this case, it's logicians who use U+00AC for "not", or at least some of them. It got into EBCDIC, I don't know exactly how, and from there into ISO 8859-1. > Curiously, [the BBC] never had the same problem > with the name of the letter P. In the New York City subway system (of underground trains, that is, not underground pedestrian tunnels!), this letter has been consistently avoided since 1967, when the system of distinguishing trains by letter or number was instituted. The only other letters never used are I and O (presumably to avoid confusion with 1 and 0, though 0 has never been used either), and Y. Why Y is a mystery to me: perhaps there has simply never been a need for it. -- The Imperials are decadent, 300 pound John Cowan <[EMAIL PROTECTED]> free-range chickens (except they have http://www.reutershealth.com teeth, arms instead of wings andhttp://www.ccil.org/~cowan dinosaurlike tails).--Elyse Grasso
RE: American English translation of character names
At 16:21 +0100 2003-12-18, Philippe Verdy wrote: John Cowan wrote: The most mysterious term is "caron" for the hacek accent: this word seems to exist only in ISO standards, and nobody has any idea where it came from. I think it may have occured in some typographic terminology, because the intial glyph looked more like a crochet hook than to a reversed circumflex, i.e. caron was not angular in handwritten form, as it is now in typesetted fonts, but looked like a rounded and oblique check mark (a slight variation of the accute accent with a small rounded hook on its bottom end, but still much more distinctful from the lower half-circle form used by breve). This doesn't make any sense to me, but in any case it does not explain the origin of the word "caron". The most plausible suggestion I've ever come up with is folk-etymological: It's a CARet that sits ON the vowel. :-( -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: American English translation of character names
John Cowan wrote: > The most mysterious term is "caron" for the hacek accent: this word > seems to exist only in ISO standards, and nobody has any idea where it > came from. I think it may have occured in some typographic terminology, because the intial glyph looked more like a crochet hook than to a reversed circumflex, i.e. caron was not angular in handwritten form, as it is now in typesetted fonts, but looked like a rounded and oblique check mark (a slight variation of the accute accent with a small rounded hook on its bottom end, but still much more distinctful from the lower half-circle form used by breve). __ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com <>
Re: American English translation of character names
On Thu, 18 Dec 2003 09:30:42 -0500, John Cowan wrote: > In this case, it's logicians who use U+00AC for "not", or at least > some of them. It got into EBCDIC, I don't know exactly how, and from > there into ISO 8859-1. Wasn't it used for that purpose in APL? John. -- -- Over 2000 webcams from ski resorts around the world - www.snoweye.com -- Translate your technical documents and web pages- www.tradoc.fr
Re: American English translation of character names
Arcane Jill scripsit: > For example, U+00AC (NOT > SIGN) is something that most people I know would describe in terms like > "oh - you know - that character to the left of 'one' on a keyboard, if > you press shift". On the standard U.S. keyboard, that gesture generates ~. If I turn on the U.S.-International keyboard, then RightAlt-\ gives me the NOT SIGN, where \ is the rightmost key in the QWERTYUIOP row. > (And even then, the usual response is "Oh that one - > I've never used it. What's it for?". Curiously, to a mathematician, > tilde, overscore and prime are used in various contexts to mean "not > sign"; wheras to a programmer, exclamation mark and tilde could mean > "not sign". In this case, it's logicians who use U+00AC for "not", or at least some of them. It got into EBCDIC, I don't know exactly how, and from there into ISO 8859-1. > Curiously, [the BBC] never had the same problem > with the name of the letter P. In the New York City subway system (of underground trains, that is, not underground pedestrian tunnels!), this letter has been consistently avoided since 1967, when the system of distinguishing trains by letter or number was instituted. The only other letters never used are I and O (presumably to avoid confusion with 1 and 0, though 0 has never been used either), and Y. Why Y is a mystery to me: perhaps there has simply never been a need for it. -- The Imperials are decadent, 300 pound John Cowan <[EMAIL PROTECTED]> free-range chickens (except they have http://www.reutershealth.com teeth, arms instead of wings andhttp://www.ccil.org/~cowan dinosaurlike tails).--Elyse Grasso
Re: American English translation of character names
Arcane Jill scripsit: > In fact, until Kenneth Whistler's email about American English - I > actually thought the Unicode character names /were/ in American English, > because they are certainly not in my native dialect (although I did know > that most Americans don't say "full stop"). My father and I never could convince my mother (native German speaker who immigrated to the U.S. at age 12) that the football (i.e. American rugby) player she dated in high school was a "fullback" and not a "full stop". > Rest assured, Kenneth, we in > Britain do /not/ refer to slash as "solidus", underscore as "low line", > backslash as "reverse solidus", paragraph sign as "pilcrow sign", and so > on. "Solidus" is probably the most interesting one: it's Latin for "shilling", and until 1971 the usual way of writing "six shillings eightpence" was 6/8, i.e. "sex solidi octo denarii". In this use, the / descends from U+017F, the old "long s". "Underscore" would suggest rather U+0332, the combining low line. As for "pilcrow", it's probably descended from a perversion of "paragraph", but nobody knows for sure. The most mysterious term is "caron" for the hacek accent: this word seems to exist only in ISO standards, and nobody has any idea where it came from. -- John Cowan [EMAIL PROTECTED] www.reutershealth.com www.ccil.org/~cowan Original line from _The Warrior's Apprentice_ by Lois McMaster Bujold: "Only on Barrayar would pulling a loaded needler start a stampede toward one." English-to-Russian-to-English mangling thereof: "Only on Barrayar you risk to lose support instead of finding it when you threat with the charged weapon."
RE: American English translation of character names
Thanks, that's interesting. It may well be the case that printers, typesetters, etc., are the only people who actually need these things to have names, so I guess their names should be respected. The rest of us just seem to get by without them, somehow. For example, U+00AC (NOT SIGN) is something that most people I know would describe in terms like "oh - you know - that character to the left of 'one' on a keyboard, if you press shift". (And even then, the usual response is "Oh that one - I've never used it. What's it for?". Curiously, to a mathematician, tilde, overscore and prime are used in various contexts to mean "not sign"; wheras to a programmer, exclamation mark and tilde could mean "not sign". Do printers, typesetters, editors and publishers use U+00AC to actually mean "not sign" then, or is it an arbitrary name? (Incidently, the code charts for U+00AC (NOT SIGN) also say "= angled dash (in typography)." So I'm still a bit confused about in which discipline it is actually known as "not sign"). Going back to the American English point, our terms for things are really not so far apart. "Counterclockwise" sounds just as acceptable to my ears as "Anticlockwise". I confess that "period" still sounds weird to my ears, but every programmer calls that character "dot" anyway. In short, Kenneth's "translation into American" is more understandable to me, in Britain, than the original. Okay, so we now have an explanation - they are typesetters' terms. (I don't know if they are British or American, but don't think it really matters, now that we've established that the majority of the population don't use them). As an amusing aside, when character names migrated from programmers to the general public via BBC television (because TV presenters started having to read out email addresses and URIs), they purposefully started a new trend of referring to slash (solidus) as "right-slash" or "forward-slash". Everyone else had called it "slash" for as long as I could remember, but the BBC couldn't allow their presenters to say "slash" because (in Britain, at least), the verb 'to slash' is a slang term meaning 'to urinate'. Curiously, they never had the same problem with the name of the letter P. Jill > -Original Message- > From: Séamas Ó Brógáin [mailto:[EMAIL PROTECTED]] > Sent: Thursday, December 18, 2003 12:05 PM > To: Unicode-L > Subject: RE: American English translation of character names > > > Jill Ramonsky wrote: > > > . . . I have no idea where these terms came from, but, take it from > > someone who lives here, they are not in common usage in Britain. > > If you were a printer, typesetter, editor or publisher---i.e. one of > those who _use_ all these characters and therefore must have > names for > them---you would probably be more familiar with traditional > terminology. > > Séamas Ó Brógáin > > >
RE: American English translation of character names
> Or, indeed, why the "proper" name for a character must be in English, > and spellable in ASCII, instead of, say, Japanese. Because it's an English character list; limiting the use of the list to those who know 15 languages wouldn't be of much help. And ASCII, because once you've restricted it to English, it's not much of a restriction, and there's few channels where ASCII gets restricted, but many where arbitrary UTF-8 isn't accepted. > In fact, I don't even see why a Unicode character /has/ to > have a "proper name" at all. Because a great pain of Unicode is the lack of a standard JIS X0218-Unicode mapping, and part of that reason is the fact that JIS X0218 is a glyph standard without proper names and definitions of what the characters are. > ASCII characters never had them. http://www.itscj.ipsj.or.jp/ISO-IR/006.pdf (ISO 646, USA Version X3.4 - 1968) certainly seems to have them. > And, hey - > the official names for CJK Unified Ideographs Extension A (for example) > tell me nothing more than the script and codepoint anyway. And they are the exceptions to the rules. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Re: American English translation of character names
On 18/12/2003 02:51, Arcane Jill wrote: ... In fact, until Kenneth Whistler's email about American English - I actually thought the Unicode character names /were/ in American English, because they are certainly not in my native dialect (although I did know that most Americans don't say "full stop"). Rest assured, Kenneth, we in Britain do /not/ refer to slash as "solidus", underscore as "low line", backslash as "reverse solidus", paragraph sign as "pilcrow sign", and so on. I have no idea where these terms came from, but, take it from someone who lives here, they are not in common usage in Britain. (With the exceptions of "full stop" and "anticlockwise"). Curious -- I wonder where those "official" names came from? They are not the names used by British programmers. But they are perhaps the names which were used by British typesetters, and maybe American ones too, in the old days of hot metal. I've never attached any importance to the "proper" names (and I'm also a programmer). In fact, I don't even see why a Unicode character /has/ to have a "proper name" at all. ASCII characters never had them. And, hey - the official names for CJK Unified Ideographs Extension A (for example) tell me nothing more than the script and codepoint anyway. I tend to regard them as "comments". Agreed. The names are useful for selecting a character from a drop-down list. But they are only useful if they are accurate. I agree with Doug that "As a programmer, I can't personally imagine designing a program that relies on the Unicode names to identify characters uniquely". I suspect that the issue is more that WG2 people who are not programmers decided on behalf of programmers, but without asking them, that stability of names would be a good thing. And maybe because they want to make sure their work lasts 1000 years. Well, I don't want to be offensive to WG2 again, so I invite WG2 members to correct me on this and explain why stability of character names is considered so important. Don't just say "we promised stability so we must deliver", I want to know why the promise was made and to whom. If the people to whom the promise was made don't actually want it, then maybe WG2 can be released from its unwise commitment. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: American English translation of character names
Jill Ramonsky wrote: . . . I have no idea where these terms came from, but, take it from someone who lives here, they are not in common usage in Britain. If you were a printer, typesetter, editor or publisher---i.e. one of those who _use_ all these characters and therefore must have names for them---you would probably be more familiar with traditional terminology. Séamas Ó Brógáin
RE: American English translation of character names
> From: Christopher John Fynn [mailto:[EMAIL PROTECTED]] > There is plenty of disagreement about what the "proper" name for many > characters should be Or, indeed, why the "proper" name for a character must be in English, and spellable in ASCII, instead of, say, Japanese. > From: Kenneth Whistler [mailto:[EMAIL PROTECTED]] > And, indeed, some of us have toyed around with the notion of > publishing an American English translation of the Unicode > names list, including such obvious improvements as: In fact, until Kenneth Whistler's email about American English - I actually thought the Unicode character names were in American English, because they are certainly not in my native dialect (although I did know that most Americans don't say "full stop"). Rest assured, Kenneth, we in Britain do not refer to slash as "solidus", underscore as "low line", backslash as "reverse solidus", paragraph sign as "pilcrow sign", and so on. I have no idea where these terms came from, but, take it from someone who lives here, they are not in common usage in Britain. (With the exceptions of "full stop" and "anticlockwise"). Curious -- I wonder where those "official" names came from? I've never attached any importance to the "proper" names (and I'm also a programmer). In fact, I don't even see why a Unicode character has to have a "proper name" at all. ASCII characters never had them. And, hey - the official names for CJK Unified Ideographs Extension A (for example) tell me nothing more than the script and codepoint anyway. I tend to regard them as "comments". Jill