Re: U+0140
On 2004.04.15, 19:47, Kenneth Whistler <[EMAIL PROTECTED]> wrote: > 0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L; 006C 00B7;... <...> > The character *was* in ISO 6937 for Catalan. And mistakenly so. > Noting the Catalan association in the Unicode names list is > different from any recommendation that U+0140 is the preferred > character for the representation of l followed by a middle dot in > Catalan text. But it is surely an excellent way to contribute to the (false) idea that Unicode doesn't serve the need of minority languages. :-( Is it so difficult to replace or remove that misleading indication?... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: U+0140 Catalan middle-dot
On 2004.04.16, 09:40, Peter Kirk <[EMAIL PROTECTED]> wrote: > can you describe to me EXACTLY how the shape and behaviour of the > Catalan middle dot differs from the behaviour of U+2027 <...> This > strongly suggests that U+2027 is the appropriate character for > Catalan. Apparently U+2027 is indeed suitable for spelling Catalan, provided that it is not ignorable by search and matching routines -- like, f.i., a soft hyphen is. --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: U+0140
Elaine Keown Tucson Dear Asmus Freytag and Ken Whistler: I would be pleased if you at Unicode would choose to further describe your existing middle dot collection. I have *no* interest in _more_ middle dots, enough is enough. Eventually I hope there *will* be helpful notes on which middle dot to use in Ancient Hebrew and/or Samaritan Hebrew and/or Dead Sea Scrolls Hebrew. Elaine Keown __ Do you Yahoo!? Yahoo! Photos: High-quality 4x6 digital prints for 25¢ http://photos.yahoo.com/ph/print_splash
Re: U+0140
On Saturday, April 17, 2004 10:28 PM TU+1, AntÃnio Martins-TuvÃlkin wrote: >> As I wrote earlier, if you know the text under inspection is >> Catalan, a very simple regular expression will deal with that. Any >> half-decent Catalan word processor do it already, by the way. > > What about the odd Catalan phrase within a text in Guarani or > Cherokee? Then, you do not know the text under inspection is Catalan, the "if" is not asserted, so you are not supposed to act accordingly. That is, nobody will beg you because a double click on colÂlegi does not select the whole word; and any reader can test its own word processor, please double click the Catalan word before, and test if it is recognized as such, even if surrounded by bad English instead of Guarani! > Unicode, do not forget, supposedly brings correctness to > multilingual text... And then? Would you try to say that selecting word in multilingual text should always do the "right thing"? You were merely dreaming, I believe; and you know it perfectly; having posting less than 2 minutes ago the case of apostrophes, which is about impossible to sort out in the average multilingual text. Furthermore, what is "the right thing" varies from people to people, so achieving perfection here is a mere dream. Or are you trying to make the point that inventing a new point for  in Catalan would bring any added correctness to multilingual texts? It is certain that the compatibility encoding of U+0140 is not very welcome from my eyes, since: - it is almost unused, but for the case it might be, informaticians like me do have to check for it: so it is just a waste of my time, I would say :-( - one that reads TUS and does not know Spanish uses at the respect, might think that colÂlegi should be written coÅlegi, "co\u0140legi", because the former is not listed as a letter, and only the latter references itself as "Catalan", without mentionning the "right thing to do" - the only advantage I am able to see, namely that the typographers will design the mid dot raised in U+0140 relative to the position it has in U+00B7, is not exploited in practice; we even see a lot of fonts where the dot in U+0140 is not balanced between the l, which clearly show that the majority of typographers have no idea about the use of this character, and they probably merely build it a compound of U+006C and U+00B7... Others use a reduced size for the dot in U+0140 (which is unpleasing to my eyes). Only a few fonts do provide U+0140 with a reduced width for the dot, which might be considered good typography. Further note about typography: I have compared on some (widely available) fonts the layout of Ål versus lÂl and also the upper dot of the colon. I found that almost nobody use the upper dot of the colon. One of the few I found, namely Linotype Palatino (I cite it since I generally consider it a nice design), does use the upper dot of the colon for Å. And the result is really ugly, because the dot is way too high (about 65% of l-height), thanks to the modern habbit of the higher x-heights... Antoine
Re: U+0140
At 03:49 PM 4/19/2004, Kenneth Whistler wrote: The Unicode Standard is not prescriptive about rendering, beyond the basics required to simply ensure correct mapping of textual content into streams of characters. If one font vendor wants to have a raised glyph for the MIDDLE DOT and another wants to have a lowered glyph for the same character, it is not the Unicode Standard's business to put the two vendors in a room until one gives up and admits the other one is correct. I'm sorry but that part of your answer is a bit disingenous in the context of the issue most recently discussed on this thread. That involved the case of two characters 00B7 and 0387, which have been post-hoc unified via canonical equivalence. We are discovering that the vast majority of *multi-script* fonts makes a distinction in glyph based on the character code (ignoring the canonical equivalence). This therefore is not the simple case of a Greek font using a higher dot for 00B7 as an ano teleia and a Latin font using a lower one for the mid dot. We clearly *do* see a variation of treatments of 00B7 across fonts, but in all cases that I've seen, these are intended as variations of the middle dot, not variations to accommodate the use of this character as ano teleia. In other words, we have an issue that the equivalence of identity of these two characters asserted by Unicode is fundamentally not respected by the implementers. And apparently it's not the case of a small minority. I think that kind of situation *is* a problem for the standard. A./
Re: U+0140
Peter Constable wrote: And if... someone finds a well documented script in which a true middle dot and an x-height dot are used contrastively, That would be a somewhat surprising and not-to-be-recommended design for a writing system. Not to be completely ruled out, though. But we can probably wait to cross that encoding bridge when we come to it. We already have conrasted use of a baseline dot (period or full stop) and a mid-dot (word separator or stylistic hyphen), so why would you be surprised by contrasted use of mid-dot and x-height dot? Vertical alignment is clearly sometimes a semantic feature. I've seen plenty of business cards in which the mid-dot is used as a stylistic division between parts of a telephone number instead of spaces, periods or hyphens. I don't like the style, but people do it. Presumably some Greek people do it also, in which case they are contrasting the mid-dot and the ano teleia. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: U+0140
Peter Kirk continued this... > On 19/04/2004 13:03, Kenneth Whistler wrote: > > >... Those other middle dots give > >people textual representation alternatives now, if they need to make > >distinctions, and textual rendering alternatives, if they need to make > >middle dots which display with slightly different heights, sizes, or > >spacings, depending on the rendering requirements. > > > > > > Ken, does Unicode specify height, size and spacing distinctions between > the various middle dots which you listed? No. > If I understand correctly, it > certainly doesn't do so exhaustively. Correct. > So in effect what you are > suggesting here is that people make and use their own private > distinctions between characters which are not defined by Unicode. Not at all. I am suggesting that people who use Unicode characters *will* use them according to their identity. However, that doesn't mean that identification of a character neatly solves all issues of their rendering, nor will it automatically make things neat and tidy when people use characters in different contexts which may have different rendering concerns. The Unicode Standard is not prescriptive about rendering, beyond the basics required to simply ensure correct mapping of textual content into streams of characters. If one font vendor wants to have a raised glyph for the MIDDLE DOT and another wants to have a lowered glyph for the same character, it is not the Unicode Standard's business to put the two vendors in a room until one gives up and admits the other one is correct. > This > sounds very like advising people to ignore Unicode character identiies > and properties and do their own thing. Rather strange advice from > someone in your position, surely? I love the way you put positions in peoples' mouths. By the way, I challenge you to point to the Unicode character properties in the Unicode Character Database which define the relative position for middle dots with respect to x-height of a font, or the spacing of middle dots, for example. > > Surely, in the current situation and if further proliferation of middle > dots is considered undesirable, It is undesirable, yes. > users should be advised to presume that > distinctions between middle dots are not a plain text matter No, they should not. Because the existence of multiple different middle dots in the standard which are *not* canonical equivalents of each other makes it a plain text matter. > and so > should be handled by markup, including language selection. In some cases, yes -- it depends on the effect which is intended, and the context and application it occurs in. > > And if (as I just suggested on the Hebrew list might be true of some > variant Hebrew pointing systems) someone finds a well documented script > in which a true middle dot and an x-height dot are used contrastively, > the correct approach would be either to accept, reluctantly, that at > least one new dot needs to be encoded; or else for Unicode to define > clearly which existing character should be used for which dot in this > script. Or: None of the Above The users of characters for particular domains bear their own responsibility to define their usage. It is not up to the Unicode Consortium to go around defining everyone's spelling rules and orthographic conventions for them. If there are things unclear in the standard which are making its use difficult for people in certain cases, then that is certainly a concern of the Unicode Technical Committee. And if someone brings in convincing evidence of the existence of a semantically significant plain text distinction between two dots that cannot plausibly be handled by *any* combination of the multitudinous dot characters already present in the standard, then the UTC might consider that sufficient justification to encode yet another middle dot. Given, however, the fact that there already are so many dot characters, and given that their rendering often varies by font, the chance of getting some additional pair of dot distinctions by height on the line canonized with yet another dot encoding seems unlikely to me. It is a will-'o-the-wisp to expect any and all multilingual Unicode text to display "correctly" to any arbitrary n-th degree of typographical rectitude with any and all Unicode-conformant fonts. The use of specific fonts with specific designs is *precisely* to enable plain text (or marked-up text, for that matter) to be displayed as desired for particular contexts. The criterion for Unicode plain text is basically *legible* text. > The worst thing that could happen would be for different text > providers to make different and incompatible selections among the > existing characters, leading to total confusion. But that seems to be > the approach which you, Ken, are advocating. I see. And thank you, Peter, for pointing that error out to me. Text providers have their own responsibility to ensure that they are using inte
RE: U+0140
> And if... someone finds a well documented script > in which a true middle dot and an x-height dot are used contrastively, That would be a somewhat surprising and not-to-be-recommended design for a writing system. Not to be completely ruled out, though. But we can probably wait to cross that encoding bridge when we come to it. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: U+0140
On 19/04/2004 13:03, Kenneth Whistler wrote: ... Those other middle dots give people textual representation alternatives now, if they need to make distinctions, and textual rendering alternatives, if they need to make middle dots which display with slightly different heights, sizes, or spacings, depending on the rendering requirements. Ken, does Unicode specify height, size and spacing distinctions between the various middle dots which you listed? If I understand correctly, it certainly doesn't do so exhaustively. So in effect what you are suggesting here is that people make and use their own private distinctions between characters which are not defined by Unicode. This sounds very like advising people to ignore Unicode character identiies and properties and do their own thing. Rather strange advice from someone in your position, surely? Surely, in the current situation and if further proliferation of middle dots is considered undesirable, users should be advised to presume that distinctions between middle dots are not a plain text matter and so should be handled by markup, including language selection. And if (as I just suggested on the Hebrew list might be true of some variant Hebrew pointing systems) someone finds a well documented script in which a true middle dot and an x-height dot are used contrastively, the correct approach would be either to accept, reluctantly, that at least one new dot needs to be encoded; or else for Unicode to define clearly which existing character should be used for which dot in this script. The worst thing that could happen would be for different text providers to make different and incompatible selections among the existing characters, leading to total confusion. But that seems to be the approach which you, Ken, are advocating. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
John Hudson responded to Michael Everson: > Michael Everson wrote: > > >> This would make the mid-dot too high. The top dot of the colon usually > >> sits toward the top of the x-height; the *mid*-dot should sit lower, > > John, I just don't believe you. I don't believe that in all the history > > of Greek and Catalan typography this careful hairsplitting has *always* > > taken place; certainly in scientific transcription the HALF TRIANGULAR > > COLON is just the top dot in the TRIANGULAR COLON, and in Americanist > > transcription where the dot-colons are used instead of triangles I would > > say the same applies. > > I never contested that the dots of a colon correspond to the triangles of the > linguistic > long vowel marker. They clearly do. What I contested was that the typographic > mid-point > (U+00B7) corresponded to the top dot of a colon. It clearly does not. It is called a > mid-point because it sits midway up the x-height. It is used in this position for a > variety of stylistic purposes, ... I think we have two typographers here arguing somewhat at cross-purposes. Clearly the typographic "mid-point" behaves as John has mentioned, and is designed as such in many fine fonts (examples seen among the exhibits that Asmus gathered). But just a clearly, there is a long, long tradition in Americanist orthographic practice (which is used widely for linguistic orthographies outside of Native America as well) of using a "raised dot" for an indication of vocalic (and occasionally consonantal) length. For 100 years, that raised dot was mechanically generated by, among other means, filing the lower dot off a colon key on a mechanical typewriter. (I have such a typewriter sitting on my desk.) Linguists got used to this raised dot height, coordinated with a colon in design (which then could be used, among other things to indicate a prolonged length, when two degrees of length were in question), and that preference made its way into print, at least for many North American languages, where the raised dot could be printed at x-height, rather than at midway up the x-height, which would be too low for most of the linguistic usage. Enter the electronic age. ASCII had no MIDDLE DOT. It was period (.), colon (:) or the highway. Early linguistic material on computers made do with those, because they had no choice. The IBM PC and the Macintosh introduced a MIDDLE DOT (0xFA [= IBM CDRA SD63 "Middle Dot"] and 0xE1, respectively). When ISO 8859-1 was defined, it also had a MIDDLE DOT (0xB7). *Everybody* made use of that MIDDLE DOT for anything that was vaguely in the ballpark -- the typographical mid-point, the linguistic length mark, the mathematical multiplication operator, the Greek ano teleia, the dictionary hyphenation point, and, yes, the Catalan middle dot. The fact that each of those usages might have extremely fine typographical hairs to split regarding the rendering was so much horsepucky as far as the character identity was concerned. You used what you had available to represent your data. The Unicode Standard, for a variety of reasons -- some of which included compatibility mapping concerns to other standards which had started to proliferate middle dots -- added a collection of middle dots *besides* U+00B7, *the* middle dot, to its repertoire. Those other middle dots give people textual representation alternatives now, if they need to make distinctions, and textual rendering alternatives, if they need to make middle dots which display with slightly different heights, sizes, or spacings, depending on the rendering requirements. What is clear, however, is that it is utterly impossible to satisfy everybody regarding middle dots. Typographical purists will always want plain text to make more distinctions. Text processing requirements will abhor the splitting of text representation into more and more difficult-to- distinguish glyph representations without clear semantic differences. And dot proliferation *always* poses difficulty for establishing character properties. Before people bluster on too much further on this thread, it would be good for everyone to recall that the *reason* why U+00B7 has problematical properties is that it was inherently ambiguous in *preexisting* usage (that is, prior to Unicode altogether) as punctuation versus length mark (and other things as well). This puts it in the same grabbag of very difficult, ambiguous ASCII characters, such as "~", "*", and "'" which also acquired conflicting usages during their reign among the small set of available punctuation and symbols in ASCII. History has consequences. The history of a character's encoding also has consequences for how the Unicode Standard is to be used and interpreted. --Ken
Re: U+0140
From: "John Hudson" <[EMAIL PROTECTED]> > 'Careful hairsplitting' always takes place when people care about typography. How very true. On one hand, there's people who put a cedilla under "a" when typesetting Polish, on the other hand, there's people who adjust the vertical position of hyphens when typesetting all-caps. And there's lot in-between. But it is important to realize that there _always_ were people who adjusted the hyphen in all-caps settings. Gutenberg's own typesetting was careful hairsplitting. This is a very typical and essential dilemma, which is one of the reasons why there is no easy answer to the glyph vs. character question, or more precisely, why the "character" definition in Unicode is so, well, vague. Since the decision on what is a "character" and what is "merely" a "glyph variant" is made somewhat arbitrarily (albeit in a committee process). There are far too many exceptions to the rule for Unicode to be consistent and easy-to-use. But since written human language never was consistent and easy-to-use, I guess it's something very natural and we will all live with that. Adam
Re: U+0140
> From Unicode's perspective, the consistent difference in treatment of 00B7 > and 0387 is embarrassing, given the fact of their canonical equivalence. There are to be sure, features of Unicode that are "embarassing", but I don't think this is one of them. Take another case: even if consistent practice in Poland is to have the grave accent in à at a different angle than what is practice in France, that does not make it a mistake for us to have encoded both as Ã. These sorts of preferences can be taken into account in the tailoring of fonts to particular practices, and this issue doesn't not require that we let a thousand middle dots bloom. And canonical equivalence was the mechanism for saying that two variants of character really should never have been encoded (but we had to for compatibility reasons). Mark __ http://www.macchiato.com â à â - Original Message - From: "Asmus Freytag" <[EMAIL PROTECTED]> To: "Michael Everson" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Sat, 2004 Apr 17 15:32 Subject: Re: U+0140 > At 01:54 PM 4/17/2004, Michael Everson wrote: > >The samples Asmus sent suggest to me that a school of typographers made a > >set of bad decisions, even if they were really famous and got paid lots of > >money and their fonts are widely shipped! > > In all charity, Michael, your opinion seems to be mainly your personal > point of view. I'd love to see any evidence of either mid-dot or ano teleia > being consistently shown the way you claim it should be, but can't find it. > > I've attached a second set of samples. > > As you can see there are a few fonts, most designed for user interfaces, > that give 00B7 and 0387 the same treatment. I've put them on the top. The > rest, and it's a diverse lot, does not. > > Also, as to your view of the relation between mid-dot and colon, it's clear > that this is not readily shared among typographers. > > From Unicode's perspective, the consistent difference in treatment of 00B7 > and 0387 is embarrassing, given the fact of their canonical equivalence. > > A./ > > PS: John had written: > > >>This would make the mid-dot too high. The top dot of the colon usually > >>sits toward the top of the x-height; the *mid*-dot should sit lower, > >>optically midway up the x-height (which means slightly higher than the > >>actual halfway mark). The top dot of a colon is typically closer to the > >>height of the Greek ano teleia, which aligns with the x-height (and which > >>should align with the cap height in all-cap settings, and with the > >>small-cap height in smallcap settings). > > which pretty much is the way most of the samples have it, but there are > some interesting differences, esp. among the more decorative fonts.
Re: U+0140
Michael Everson wrote: This would make the mid-dot too high. The top dot of the colon usually sits toward the top of the x-height; the *mid*-dot should sit lower, optically midway up the x-height (which means slightly higher than the actual halfway mark). The top dot of a colon is typically closer to the height of the Greek ano teleia, which aligns with the x-height (and which should align with the cap height in all-cap settings, and with the small-cap height in smallcap settings). John, I just don't believe you. I don't believe that in all the history of Greek and Catalan typography this careful hairsplitting has *always* taken place; certainly in scientific transcription the HALF TRIANGULAR COLON is just the top dot in the TRIANGULAR COLON, and in Americanist transcription where the dot-colons are used instead of triangles I would say the same applies. I never contested that the dots of a colon correspond to the triangles of the linguistic long vowel marker. They clearly do. What I contested was that the typographic mid-point (U+00B7) corresponded to the top dot of a colon. It clearly does not. It is called a mid-point because it sits midway up the x-height. It is used in this position for a variety of stylistic purposes, e.g. in place of hyphens in phone numbers in stationery, which is why most type designers put it at this height. I can assure you that the vast majority of type designers don't even know that Catalan uses a dot, let alone that it might use this dot. The obvious solution to present usage is language system typographic tagging, in which a distinction can be made in the size, height and spacing of the dot for Catalan and non-Catalan use. 'Careful hairsplitting' always takes place when people care about typography. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: U+0140
At 01:54 PM 4/17/2004, Michael Everson wrote: The samples Asmus sent suggest to me that a school of typographers made a set of bad decisions, even if they were really famous and got paid lots of money and their fonts are widely shipped! In all charity, Michael, your opinion seems to be mainly your personal point of view. I'd love to see any evidence of either mid-dot or ano teleia being consistently shown the way you claim it should be, but can't find it. I've attached a second set of samples. As you can see there are a few fonts, most designed for user interfaces, that give 00B7 and 0387 the same treatment. I've put them on the top. The rest, and it's a diverse lot, does not. Also, as to your view of the relation between mid-dot and colon, it's clear that this is not readily shared among typographers. From Unicode's perspective, the consistent difference in treatment of 00B7 and 0387 is embarrassing, given the fact of their canonical equivalence. A./ PS: John had written: This would make the mid-dot too high. The top dot of the colon usually sits toward the top of the x-height; the *mid*-dot should sit lower, optically midway up the x-height (which means slightly higher than the actual halfway mark). The top dot of a colon is typically closer to the height of the Greek ano teleia, which aligns with the x-height (and which should align with the cap height in all-cap settings, and with the small-cap height in smallcap settings). which pretty much is the way most of the samples have it, but there are some interesting differences, esp. among the more decorative fonts. <>
Re: U+0140
On 17/04/2004 13:57, Philippe Verdy wrote: ... Who's to blame there? Only software designers that have not offered better keyboards to enter a regular Ano Teleia on Greek keyboards, or accepted incorrectly to use the approximation between the middle-dot punctuation and the Greek Ano Teleia. May be the votes from Greek typographers were not heard at the ISO or UTC decision commitees when such unification was incorrectly decided... In my opinion, the ones to blame are the UTC, for freezing canonical equivalences like this, also combining classes, character names etc, when they have obviously not been checked in detail with the user community. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
At 09:03 -0700 2004-04-17, John Hudson wrote: Michael Everson wrote: So for me, MIDDLE DOT is to COLON as MODIFIER LETTER HALF TRIANGULAR COLON is to MODIFIER LETTER TRIANGULAR COLON. This would make the mid-dot too high. The top dot of the colon usually sits toward the top of the x-height; the *mid*-dot should sit lower, optically midway up the x-height (which means slightly higher than the actual halfway mark). The top dot of a colon is typically closer to the height of the Greek ano teleia, which aligns with the x-height (and which should align with the cap height in all-cap settings, and with the small-cap height in smallcap settings). John, I just don't believe you. I don't believe that in all the history of Greek and Catalan typography this careful hairsplitting has *always* taken place; certainly in scientific transcription the HALF TRIANGULAR COLON is just the top dot in the TRIANGULAR COLON, and in Americanist transcription where the dot-colons are used instead of triangles I would say the same applies. António said: Another nail in the coffin of "use U+00B7 : MIDDLE DOT for Catalan": Perhaps because it is exclusively used between "L"s (a "high" letter in both cases), Catalan middot is placed exactly as Michael has it: The top dot of a colon (careful Catalan typewriter users do/did just this, erasing or masking the bottom dot of a colon). This evidence would suggest to me that my analysis is correct. The samples Asmus sent suggest to me that a school of typographers made a set of bad decisions, even if they were really famous and got paid lots of money and their fonts are widely shipped! But that's just my opinion. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: U+0140
- Original Message - From: "John Hudson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, April 17, 2004 6:03 PM Subject: Re: U+0140 > Michael Everson wrote: > > > I have had suboptimal connectivity over the last while, and so have > > missed some of this discussion. As a type designer I personally consider > > the middle dot to be ordinary punctuation that should harmonize with > > other punctuation marks. My solution to this is to treat it as the top > > dot of a colon. So for me, MIDDLE DOT is to COLON as MODIFIER LETTER > > HALF TRIANGULAR COLON is to MODIFIER LETTER TRIANGULAR COLON. > > This would make the mid-dot too high. The top dot of the colon usually sits toward the top > of the x-height; the *mid*-dot should sit lower, optically midway up the x-height (which > means slightly higher than the actual halfway mark). The top dot of a colon is typically > closer to the height of the Greek ano teleia, which aligns with the x-height (and which > should align with the cap height in all-cap settings, and with the small-cap height in > smallcap settings). So we can see three different vertical positions for this middle-dot, and two are encoded: (1) centered at the middle of the x-height and baseline: this is the mathemical middle-dot symbol, because most mathematical variables are lowercase letters, making this position appropriate to note a multiplication. There's some large horizontal gap between the two variables or number, and the horizontal position is centered between the right edge of the previous character and the left edge of the next character. This is basically the U+00B7 character which can also be used as a punctuation mark, notably in dictionnary entries. Its weight should be the same as the regular dot on the baseline for sentence periods. Note that Unicode also defines a superfluous mathematical middle-dot symbol (I wonder if this is caused by the fact that mathematical formulas often happen to use Greek letters; this symbol at U+22C5 however is thicker, but still thiner than the bullet operator U+2219, itself thiner than the bullet punctuation U+2219 which sits on the baseline...) (2) centered exactly at the x-height: this is the normal position for the Catalan symbol and for the Greek Ano Teleia. The horizontal gap is minimal, just enough to make the dot easily distinct when reading, from the two surrounding character. So the horizontal spacing is smaller than with the middle dot in (1). One bad thing is that Greek Ano Teleia was unified with the middle dot. If it had not been so, the Catalan middle dot could have been unified with the Greek Ano Teleia. It's significant that fonts actually do not respect the unification of Greek Ano Teleia (2) and the middle-dot symbol or punctuation (1): it demonstrates that these two should not have been unified with a canonical equivalence... (3) the upper dot of the colon or semi-colon is in fact a better position for the Catalan middle-dot; we can see them as a middle-dot diacritic centered above another character (a period or comma), but below the upper dot used on lowercase letters or uppercase letters. For the Catalan middle-dot, the base character should be the thinest space (sixth of cadratin) whose invisible height would be the middle of the x-height, under which other baseline punctuations are drawn (period, comma, connecting underscore. Michael can be right by saying that this position should match with the vertical position of the hyphen, where in that case the hyphenation point is probably the best character to use for rendering the Catalan middle-dot: this dot or hyphen is not centered at the x-height but just just below it so that the dot fits fully under that x-height with a tiny vertical gap under it, approximately the weight of the dot or hyphen. A more exact definition would be computed by using exactly the middle of the M-height. Characters (2) and (3) are very near from each other, as they are both modifiers for surrounding letters, and not a symbol or punctuation themselves. But currently Unicode has unified the first 2 cases, by the canonical equivalence for Ano Teleia and the middle-dot symbol/punctuation, which is probably wrong, even if there's a legacy use of U+00B7 on keyboards that generate ISO 8859 Greek text. The unification in fact comes from the mapping of the ISO 8859 repertoire to Unicode, at the time when the hyphenation point did not exist, or possible even before with some legacy mappings between unrelated ISO 8859 repertoires (notably between Basic-Latin/Greek and Basic-Latin/Latin1). Who's to blame there? Only software designers that have not offered better keyboards to enter a regular Ano Teleia on Greek keyboards, or accepted incorrectly to use the approximation between the middle-dot punctuation and the Greek Ano Teleia. May be the votes from Greek typographers were n
Re: U+0140
On 2004.04.16, 19:34, Antoine Leca <[EMAIL PROTECTED]> wrote: > As I wrote earlier, if you know the text under inspection is > Catalan, a very simple regular expression will deal with that. Any > half-decent Catalan word processor do it already, by the way. What about the odd Catalan phrase within a text in Guarani or Cherokee? Unicode, do not forget, supposedly brings correctness to multilingual text... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: U+0140
On 2004.04.16, 14:26, Ernest Cline <[EMAIL PROTECTED]> wrote: >> From: Antoine Leca <[EMAIL PROTECTED]> >> >> ... it is vastly more easy to keep the obvious unification, rather >> than trying to distort it and trying to make a conditional mapping, >> if Mathematics, · => U+00B7, if Catalan, · => U+2027, if NoSeQue, · >> => some_other_random_middle_dot, etc. > > I don't see that as being any worse than the set of HYPHEN_MINUS, > HYPHEN, MINUS SIGN, etc. Or -- to bring this back to textual/orthographic widely used kludges in legation data vs. typographical correctness "from now on" -- than U+0027 : APOSTROPHE vs. U+02BC : MODIFIER LETTER APOSTROPHE... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: U+0140
On 2004.04.17, 17:03, John Hudson <[EMAIL PROTECTED]> wrote: > Michael Everson wrote: >> My solution to this is to treat it as the top dot of a colon. > > This would make the mid-dot too high. The top dot of the colon > usually sits toward the top of the x-height; the *mid*-dot should > sit lower, optically midway up the x-height Another nail in the coffin of "use U+00B7 : MIDDLE DOT for Catalan": Perhaps because it is exclusively used between "L"s (a "high" letter in both cases), Catalan middot is placed exactly as Michael has it: The top dot of a colon (careful Catalan typewriter users do/did just this, erasing or masking the bottom dot of a colon). --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
RE: U+0140
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Kenneth Whistler > Thanks to Eric and Patrick for digging out my answer on this perennial > question from a couple years back, and saving me the trouble of > having to rummage around to find it. :-) > > Also, it should be noted... Last year, I started putting character stories online. I didn't known when I started it that I was about to move, so I only got a couple online and wasn't able to keep adding to it. Anyway, I've captured this one and added it to that small but perhaps growing collection: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=UnicodeC haracterStories Peter Constable
Re: U+0140
Michael Everson wrote: I have had suboptimal connectivity over the last while, and so have missed some of this discussion. As a type designer I personally consider the middle dot to be ordinary punctuation that should harmonize with other punctuation marks. My solution to this is to treat it as the top dot of a colon. So for me, MIDDLE DOT is to COLON as MODIFIER LETTER HALF TRIANGULAR COLON is to MODIFIER LETTER TRIANGULAR COLON. This would make the mid-dot too high. The top dot of the colon usually sits toward the top of the x-height; the *mid*-dot should sit lower, optically midway up the x-height (which means slightly higher than the actual halfway mark). The top dot of a colon is typically closer to the height of the Greek ano teleia, which aligns with the x-height (and which should align with the cap height in all-cap settings, and with the small-cap height in smallcap settings). John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: U+0140
Sent the previous message before it was ready. At 12:32 -0700 2004-04-15, Kenneth Whistler wrote: Note that while the particular combination <006C, 00B7, 006C> is a peculiarity of Catalan orthography, U+00B7 MIDDLE DOT (often called a 'raised period') is very widely used, indeed, in technical orthographies for many languages, particularly in the Americas, where it is used much more commonly than the IPA characters U+02D0 MODIFIER LETTER TRIANGULAR COLON or U+02D1 MODIFIER LETTER HALF TRIANGULAR COLON to indicate vocalic (or less commonly, consonantal) length. In Cornish lexicography, the middle dot is used regularly to mark the vowel of the stressed syllable when it is not penultimate (as it is in most words). I have had suboptimal connectivity over the last while, and so have missed some of this discussion. As a type designer I personally consider the middle dot to be ordinary punctuation that should harmonize with other punctuation marks. My solution to this is to treat it as the top dot of a colon. So for me, MIDDLE DOT is to COLON as MODIFIER LETTER HALF TRIANGULAR COLON is to MODIFIER LETTER TRIANGULAR COLON. For HYPHENATION POINT I would place its height at whatever the height of a HYPHEN was and be done with it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: U+0140
At 12:32 -0700 2004-04-15, Kenneth Whistler wrote: Note that while the particular combination <006C, 00B7, 006C> is a peculiarity of Catalan orthography, U+00B7 MIDDLE DOT (often called a 'raised period') is very widely used, indeed, in technical orthographies for many languages, particularly in the Americas, where it is used much more commonly than the IPA characters U+02D0 MODIFIER LETTER TRIANGULAR COLON or U+02D1 MODIFIER LETTER HALF TRIANGULAR COLON to indicate vocalic (or less commonly, consonantal) length. In Cornish lexicography, the middle dot is used regularly to mark the vowel of the stressed syllable when it is not penultimate (as it is in most words). I have had suboptimal connectivity over the last while, but as a type designer I personally consider the middle dot to be ordinary punctuation that should harmonize with other punctuation marks. My solution to this is to treat it as the top dot of a colon. So for me, MIDDLE DOT is to COLON as MODIFIER LETTER HALF TRIANGULAR COLON is to MODIFIER LETTER TRIANGULAR COLON. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: U+0140 Catalan middle-dot
At 06:16 PM 4/15/2004, Philippe Verdy wrote: The other reason is that the middle-dot, being a punctuation, would be likely to have extra spacing on both sides, which would make it inappropriate for rendering Catalan words. Also such punctuation would probably forbid kerning of the middle-dot within the open area of a uppercase L, something which would be acceptable for reading Catalan (as it was acceptable with U+2027 in Teletext/Videotex). In the sample I just sent out, you'll see that 00B7 in some fonts has a rather large space on the left of the dot. A./
GREEK ANO TELEIA (was: Re: U+0140)
* Asmus Freytag <[EMAIL PROTECTED]> [2004-04-16 11:28]: > weight. (See attached sample). If data is normalized, the appearance of ano > teleia will change (since 0387 will change into 00B7) and users will be > disappointed. Yes, I know - I've seen professionally published magazines with the wrong ano teleia glyph, bigger and lower than it should be. It was probably not even caused by normalization - I think most Greek keyboards produce 00B7 and not 0387. Since language-dependend glyph selection isn't very widespread for now, would it be too much to ask font designers to put a MIDDLE DOT glyph appropriate for Greek in fonts capable of displaying Greek text? That's a just a wish, BTW - I don't expect designers to do what I say just because I sent a message in a mailing list ;-) -- Alexandros Diamantidis * [EMAIL PROTECTED]
Re: U+0140
On Friday, April 16, 2004 12:37 PM, Philippe Verdy va escriure: > In some future, we could see U+013F and U+0140 used more often than L > or l plus U+00B7... I (personally) hope we would not. > Notably in word processors that can detect these > sequences in Catalan text and substitute them with the ligatures, > which create a more acceptable letter form and allows easier text > handling for (e.g.) word selection in user interfaces and dictionnary > lookups. As I wrote earlier, if you know the text under inspection is Catalan, a very simple regular expression will deal with that. Any half-decent Catalan word processor do it already, by the way. > The fact that there's no such L-middle-dot on keyboards should not be > a limit: word processors have more key bindings and more intelligence > than the default keys found on keyboards. Yes yes yes. Particularly when I want to insert afterwards a · between two ll, when it appears I missed it on the first shot (yes, it happens). Or when I want to remove a superfluous one that I typed by mistake (yes, it happens too). With your "intelligence", this latter point will prove being a headache: on the first shot, a normal user will place the caret just after the dot, and press Rubout. Slurp, the whole U+0140 is swallowed, but usually the user will not notice it. So at the second sight (perhaps a lot of time after, perhaps after an useless additional printout), she will have to type in the first l. Intelligent keyboards might be great. But to be so, they have to bring *much* added value (like, obviously, to be able to type in a language impossible otherwise; or, more simply, to avoid typing every five minutes Alt+0156). If they bring only very little value, they are more annoying that anything else, particularly when they are non permanent but rather operate from time to time. This would be the case here: as Catalan writer, I type about texts sometimes in the word processor, where I would be "helped". And sometimes in the mail reader, or on the console, where I would not, for example because I do not want to wait two full minutes for the whole "helpers" to come in everytime I have to type the name of the user of a given process... > When I see a Catalan word coded with it looks very > ugly (notably with monospaced fonts or in Teletext) and I'm sure that > Catalan readers don't like the default presentation. Yes it looks ugly. But this is in fact less ugly for me than seeing l.l or l-l. Ugliness is in the eye of the beholder, of course. When you are in the habit of seeing about every hour some rendering of l·l, you will not notice it. And in fact, I notice more when someone use the kerned version advocated by Gabriel Valiente, because nowadays it is unusual. And I certainly would not use the kerned version for some institutional version, because I do not want to incommodate my readers (this problem showed up about 20 days ago between us; and there were no debate). > They will much > appreciate the support for the ligated > encodings. What do you prefer? El col·legi Miguel Hernández de Riola? El co[]legi Miguel Hernández de Riola? ([] is ASCII art for a box, which is how many many people would see any use of U+013F...) > I don't think they can be considered "compatibility > characters" just introduced for compatibility with a past ISO > standard for Videotex and Telelext. Sorry, you are fighting a lost battle: everyone here do not use them, so all the corpus is already encoded without them. The mills of Don Quixote are in Mota del Cuervo, it is only about 200 km from here, but this is not the Catalan-speaking region ;-). > The only safe way to change things would then be to have a middle-dot > diacritic (combining but with combining class 0) to be used instead > of U+00B7, even if there's no canonical equivalence with the U+013F > and U+0140 ligatures... A Catalan keyboard would then return this new > dot instead of U+00B7, and word processors or input method editors > would easily find a way to represent it using the ligature when it > follows a L. [snip] May I suggest U+1000B7 for this new character? Antoine
Re: U+0140
On Friday, April 16, 2004 3:26 PM, Ernest Cline va escriure: > I don't see that as being any worse than the set of HYPHEN_MINUS, > HYPHEN, MINUS SIGN, etc. Sorry, I did not make me clear. I am not intenting to say this is undoable, nor that · case is particularly complex. It is doable (as I showed with the regular expressions), and it is NOT complex. I was just saying this is presently not done, and it is IMHO not worth doing. > Given the nature of U+0140 (and U+013F) when hyphenated, might it > not be a good idea to assign these two characters their own Line > Break class for the Line Breaking Algorithm of UAX #14? I do not know if it is a good idea or not (I am not the guys who can argue on this; furthermore these characters are very infrequent), but your understanding of the behaviour is correct. Antoine
Re: U+0140
At 12:26 AM 4/16/2004, Alexandros Diamantidis wrote: * Philippe Verdy <[EMAIL PROTECTED]> [2004-04-16 01:22]: > > U+0387 GREEK ANO TELEIA > wrong form? it's a small square, and is the greek semicolon, and is then > separating words. U+0387 is canonically equivalent to U+00B7. About its shape, whether it's square or round depends on what the full stop looks like in that font - they should look exactly the same, only the "ano teleia" (upper dot) should be at x-height. If two characters are canonically equivalent, they can't have a consistently distinct appearance. Nevertheless, most fonts appear to give a different glyph to 0387 than to 00B7, not only in height, but also in weight. (See attached sample). If data is normalized, the appearance of ano teleia will change (since 0387 will change into 00B7) and users will be disappointed. In any environment where data are normalized, getting the correct appearance requires the use of OpenType with language dependent glyph selection (and a layout engine that supports this - or the use of a Greek specific font. A./ PS: it's water under the bridge by now, but in my opinion, this is another example of questionable unification of punctuation based on considering only the 'ink' and not the positioning of it. If one is considering only the roughest of plain text, having only a single code for a 'dot somewhere in the middle of the line' yields acceptable results, but it does make the use of such plain text as back-bone for typographically correct rendering unnecessarily difficult. The extreme form of such 'plain text only' approach is using ` and ' as stand-in for the single quotes. However, for paleo punctuation, where there's no comparable established typographical tradition requiring consistent differentiation, the use of unified punctuation is preferable. <>
Re: U+0140
Elaine Keown Tucson Hi, I kept the amazing list of middle dots listed this week on the main Unicode list for future reference. Hebrew (Hebrew from 1200 B.C.E. - present) needs at least 1 middle dot. Elaine __ Do you Yahoo!? Yahoo! Tax Center - File online by April 15th http://taxes.yahoo.com/filing.html
Re: U+0140 Catalan middle-dot
On 16/04/2004 03:11, Philippe Verdy wrote: ... Did you read this PDF seriously: ... No, but I read what I needed to. ... it really discusses about a hack needed to reposition the middle-dot correctly so that the Catalan dot will: - not alter the interletter space - will be drawn on a higher position (approximately at the x-height) than middle-dot (in the middle of the x-height and baseline), with a horizontal position that centers it between the vertical stems of the two surrounding l or L (this makes a difference for the uppercase letter). These are matters for the font. This kind of horizontal and vertical kerning can be done easily with modern technologies. ... Most modern text renderers on computers display the 00B7 incorrectly for Catalan (notably in user interfaces and in web browsers). This is a matter of fonts, not of renderers. Most modern text renderers are capable of displaying either 00B7 or 2027 correctly if the font is set up for that, e.g. to display them as ligatures, or to move the dot depending on context. So, for a typographic point of view, the U+013F and U+0140 ligatures ... If these are ligatures, they don't need their own Unicode code points, and such code points should be treated as alphabetic presentation forms, included onyl for compatibility reasons. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
Philippe Verdy wrote: - in "collège" where 'o' is often pronounced open, unlike in "colatéral" where "o" is always closed. Hmm, "collatéral" is written with two l's in french (from latin cum + latus). Bernard
Re: U+0140
> [Original Message] > From: Antoine Leca <[EMAIL PROTECTED]> > > ... it is vastly more easy to keep the obvious unification, rather than > trying to distort it and trying to make a conditional mapping, if > Mathematics, · => U+00B7, if Catalan, · => U+2027, if NoSeQue, · => > some_other_random_middle_dot, etc. Unlike hyphenation rules (where the > mapping might very well be · => U+2027, by the way), which are pretty easy > to pinpoint, tagging Catalan in bulk text is clearly not a easy task. Even > when considering the fairly restrictive rules for it to occur (requiring > NFC): I don't see that as being any worse than the set of HYPHEN_MINUS, HYPHEN, MINUS SIGN, etc., which depending upon your taste in such matters could be seen as an example of what to do or what not to do. That said, let me switch the topic to something almost completely different. Given the nature of U+0140 (and U+013F) when hyphenated, might it not be a good idea to assign these two characters their own Line Break class for the Line Breaking Algorithm of UAX #14? These two characters if I understand the comments correctly, always provide a line breaking opportunity after them, but if that line break opportunity is taken, the dot must disappear, so an implementation that is not prepared to remove the dot should ignore the opportunity.
Re: U+0140
On Friday, April 16, 2004 12:31 AM, Peter Kirk va escriure: >> Peter Kirk a écrit : >> >>> What is U+2027 intended for? The name suggests that it might be what >>> is needed for Catalan. >> >> Hyphenation point is primarily used to visibly indicate >> syllabification of words. Syllable breaks are potential line breaking >> opportunities in the middle of words. The hyphenation point It is >> mainly used in dictionaries and similar works. When an actual line >> break falls inside a word containing hyphenation point characters, >> the hyphenation point is rendered as a regular hyphen at the end of >> the line. > > Well, this sounds just like the required behaviour for Catalan, as > described by Anto'nio Martins-Tuva'lkin on 28th March. He wrote: > >> Something happends when the "L·L" coincides with a soft line end. I'm >> no expert in Catalan typesetting but IIRC the dot becomes a hyphen, >> while regular "LL"s cannot be broken. António is correct. But this is not the main point of ·. Main point for · is to disambiguate orthographies. Hyphenation behaviour is only a secondary role. Besides, it is vastly more easy to keep the obvious unification, rather than trying to distord it and trying to make a conditional mapping, if Mathematics, · => U+00B7, if Catalan, · => U+2027, if NoSeQue, · => some_other_random_middle_dot, etc. Unlike hyphenation rules (where the mapping might very well be · => U+2027, by the way), which are pretty easy to pinpoint, tagging Catalan in bulk text is clearly not a easy task. Even when considering the fairly restrictive rules for it to occur (requiring NFC): /[aAàÀeEéÉèÈiIíÍïÏoOóÓòÒuUúÚ]l·l[aàeéèiíoóòuú]/ /[AÀEÉÈIÍÏOÓÒUÚ]L·L[AÀEÉÈIÍOÓÒUÚ]/ Antoine
Re: U+0140
From: "Antoine Leca" <[EMAIL PROTECTED]> > And yes, similarly to Catalan, the emphatic/prolongated l sound is not > usualy marked. In French, the emphatic/prolongated l (written with a double l) is usually marked by altering the phonetic of the preceding vowel, such as - in "collège" where 'o' is often pronounced open, unlike in "colatéral" where "o" is always closed. - if the preceding vowel is a 'e' it is clearly and always pronounced like a 'è' in "désceller" instead of the neutral 'e' in "déceler". - If the preceding vowel is a 'i' with another previous vowel the non-final sequence 'ill' notes a 'y' half-vowel sound like in "maille"; if there's no vowel before that i, the i is a plain vowel, and the double l is generally non emphatic like in "ville" (or "village" or the imported English term "grill") with a long i that shortens the l sound, to compare with "vile" (the feminine form of the adjective "vil") or "vilénie" where the i is short and the l emphatic... - There are known exceptions when i is not preceded by another vowel; between "mille" (long i, emphatic l) and "grille" (long i, half-vowel 'y') - With a preceding 'u', "mûle" or "mûlet" or "tubulure" use a short 'ü' sound and a l which may be emphatic/long if terminal, unlike "bulle" with a long 'u' sound and a non emphatic short l... Historically, Catalan and French had the same writing system.
Re: U+0140
From: "Antoine Leca" <[EMAIL PROTECTED]> > On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure: > > I thought it was already answered in this list by a Catalan speaking > > contributor: the sequence L+middle-dot in Catalan is NOT a combining > > sequence. > > No? Then was is it? Looks like very much one, to me. It is more exactly a ligature, not a combining sequence. But the second character of the ligature works more like a diacritic, and not as a separate punctuation or symbol. In some future, we could see U+013F and U+0140 used more often than L or l plus U+00B7... Notably in word processors that can detect these sequences in Catalan text and substitute them with the ligatures, which create a more acceptable letter form and allows easier text handling for (e.g.) word selection in user interfaces and dictionnary lookups. The fact that there's no such L-middle-dot on keyboards should not be a limit: word processors have more key bindings and more intelligence than the default keys found on keyboards. When I see a Catalan word coded with it looks very ugly (notably with monospaced fonts or in Teletext) and I'm sure that Catalan readers don't like the default presentation. They will much appreciate the support for the ligated encodings. I don't think they can be considered "compatibility characters" just introduced for compatibility with a past ISO standard for Videotex and Telelext. The compatibility decompositions in the UCD are bad suggestions (only fallbacks) which create problems that did not exist in the Videotex standard (they already create a problem for internationalized domain names). But now that decomposition are normative, there's no way to change it in Unicode. The only safe way to change things would then be to have a middle-dot diacritic (combining but with combining class 0) to be used instead of U+00B7, even if there's no canonical equivalence with the U+013F and U+0140 ligatures... A Catalan keyboard would then return this new dot instead of U+00B7, and word processors or input method editors would easily find a way to represent it using the ligature when it follows a L. If such character was added, I would give it the general category "Mn", a combining class 0, to match linguistic expectations, and it would work with IRI and IDN as well, and would immediately work with all basic Unicode text processing without needing an exception for Catalan. This new character could have a compatibility decomposition into U+00B7 only as a fallback; and the existing ligatures U+013F and U+0140 could be commented by providing a better decomposition with this new character, than the compatibility decompositions with U+00B7.
Re: U+0140 Catalan middle-dot
From: "Peter Kirk" <[EMAIL PROTECTED]> > On 15/04/2004 18:16, Philippe Verdy wrote: > >So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not the > >exact character to represent this middle dot in all usages, ... > > Philippe, before jumping to this conclusion, please can you describe to > me EXACTLY how the shape and behaviour of the Catalan middle dot differs > from the behaviour of U+2027 defined in Unicode Standard Annex #14, > http://www.unicode.org/unicode/standard/reports/tr14/tr14-15.html: > > > 2027 > > HYPHENATION POINT > > A hyphenation point is a raised dot, which is used primarily to > > visibly indicate syllabification of words. Syllable breaks are > > potential line break opportunities in the middle of words. It is > > mainly used in dictionaries and similar works. When an actual line > > break falls inside a word containing hyphenation point characters, the > > hyphenation point is rendered as a regular hyphen at the end of the line. > > > > From the descriptions which you and Anto'nio have provided and from > http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf, it seems to me > that the Catalan behaviour is exactly as described for U+2027 in USA > #14, perhaps because the Catalan usage has been borrowed from dictionary > usage or vice versa. This strongly suggests that U+2027 is the > appropriate character for Catalan. Did you read this PDF seriously: it really discusses about a hack needed to reposition the middle-dot correctly so that the Catalan dot will: - not alter the interletter space - will be drawn on a higher position (approximately at the x-height) than middle-dot (in the middle of the x-height and baseline), with a horizontal position that centers it between the vertical stems of the two surrounding l or L (this makes a difference for the uppercase letter). So the encoded l-with-middle-dot and L-with-middle-dot, if properly created for Catalan using these guidelines, will render much better than 'L' or 'l' followed by U+00B7 and even better than U+2027. If rendering is not important for you (it matters when one wants to create a renderer), consider the case of collation, and text analysis. My view about the precombined ligatures L-with-middle-dot is that their "letter" general category makes things easier for writers and readers, even if both agree that there's no such dotted-L letter in Catalan, but clearly a single L with an additional but separate phonetic mark. Another point: the middle dot in Catalan seems to be used only between a pair of L letters. Typographers consider the double L with a middle-dot as a ligature, and Catalan phonetic uses a dotted pair to change the phonetic (and even the meaning) of a double-L from the "L mouillé" (where it is pronounced like y between vowels), to a consonantal palatal L. Last note: Catalan words starting by a double-L exist, but they apparently never take a middle dot (because such orthograph always designates a consonnantal palatal L, sometimes pronounced with some stress or with a audible palato-lingual click or some prenasalisation; this pronounciation depends on the 4 local dialects spoken) The phonetic distinction of medial double-L did not exist in medieval Catalan texts where this mark was not written (like in French). The Catalan middle-dot was then introduced later with a clear intent to not alter the number of letters and their relative positions in the typography. Most modern text renderers on computers display the 00B7 incorrectly for Catalan (notably in user interfaces and in web browsers). So, for a typographic point of view, the U+013F and U+0140 ligatures are much better than their compatibility decomposition. I don't think they can be described as compatibility characters. So the ISO 6937 standard for Videotex was right when it defined this ligature to respect the normal typography, but the compatibility decompositions using U+00B7 in Unicode are certainly not the best ones (they are widely used today simply because the ligatures were missing in ISO-8859-1 and in Windows 1252, and there was no other alternative than using U+00B7 for that function).
Re: U+0140
On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure: > I thought it was already answered in this list by a Catalan speaking > contributor: the sequence L+middle-dot in Catalan is NOT a combining > sequence. No? Then was is it? Looks like very much one, to me. > The middle dot in Catalan plays a role similar to an hyphen > between syllables, to mark a distinction with words where, for > example a double-L would create an alternate reading. Yes (although I am not sure we can write "similar to hyphens", since I do not know the history of the hyphen). > The dot indicates that each L must be read distinctly (or read > with a long or emphatic L). Ought to. I.e., it would be precious prononciation, at least for the Barcelonian way of speaking. In other places, the prolongated prononciation may be the default for litterate speech, too (this is the case here in Valencia). Colloquial speech definitively makes no difference between l·l and l. The very reason for the dot is to disambiguate between two identical orthographies inherited from the past, without actually changing the orthographies (i.e., dropping one l, or adopting the standard but bulky "tl" digraph). So, "ll" now unambiguously designs palatal l (the IPA code of which I am presently unable to found in Unicode, it is a turned y), coming form colloquial words, while "l·l" unambiguously designs may-be-prolongated [l] directly coming from Latin. Before the reform (~100 years ago), both were written identically, which leads to problems. > In French for example we have words like "maille" to be read as > /maj/, and the same "-ill-" written diphtongs after another vowel > occur in Catalan. It is written -i- (not ï nor í), occuring after some vowel. Like "mai" (never), which is sounded the same as "maille" in Parisian French. > But French will not write "-ill-" if it occurs > between two vowels where the two L must have the sound L (if this > occurs in french, only 1 L is written, and the emphatic/long sound is > not marked). Of course not "-ill-" (why on earth someone will introduce an -i- where there is no reason for it?), but rather "-ll-", like in "collège" or "parallèle". TWO L's ;-). This is after the two most used words in Catalan that have the ·, namely "col·legi" and "paral·lel". And yes, similarly to Catalan, the emphatic/prolongated l sound is not usualy marked. > Catalan has this orthograph, and writes the > emphatic/long L distinctly. So it needs a symbol for that. The > middle-dot is then considered in Catalan as a letter, This is not a letter. Not as much as harly anyone will consider apostrophe as being a letter in Romance languages (or in English either). Note that I am _not_ saying · is like an apostrophe in Catalan (the latter is a punctation symbol, which separates words). But it is not a letter. Neither are ´ or ¸, either. > that will occur in the middle of words. Specifically between L (either lower or upper-case, but not a mixture). There are other rules, too, such as IIRC the letters surrounding the l should be vowels (Not 100% sure here, and did not care to check). > I don't know if the middle-dot can be used in Catalan as a cadidate > position for a line break with hyphenation: It is. > if yes, is it kept before > the hyphen, or is the middle-dot used alone, or is the middle-dot > replaced by a regular hyphen? The latter. > I don't know. But if the middle-dot > must be replaced by a hyphen, then it is a punctuation (similar to > hyphens used in compound-words). What is the first k in a hyphenated "dicke" in German? (it becomes "dik-ke"). At any rate, I will not tag it as "punctuation"! Here we are a similar case: when l·l is hyphenated, the former "diglyph", i.e. "l·", is transformed to "l". The obvious reason is that there is no more need to disambiguate, since a palatized "ll" will never be hyphenated in Catalan (nor in Castilian, nor will "lh" in Portuguese or Occitan, nor will "gli" in Italian). > But in Catalan, the middle dot should not be kerned into the > preceding uppercase L, like it would appear if it was considered > equivalent to . Sorry, but who are you to dictate laws about kerning in Catalan? Kerning is essentially an optional feature related to fonts, and I do not see any reason to avoid "kerning" a L and a · (which would be in a title, moreover), if the result is aesthetically unpleasant, perhaps becasue the font designer did not consider the case. > If there's something really missing for Catalan, it's a middle-dot > letter with general category "Lo", and combining class 0 (i.e. NOT > combining). It's unfortunate that almost all legacy Catalan text > transcoded to Unicode are based on the middle-dot symbol (the one > mapped in ISO-8859-1 and ISO-8859-15) which is not seen by Unicode as > a letter (Lo) but as a symbol only. Considered that the · is present on any Spanish keyboard these days (shift 3), and that on the other hand almost no keyboard except ancient typewriters do h
Re: U+0140 Catalan middle-dot
On 15/04/2004 18:16, Philippe Verdy wrote: ... The Catalan middle-dot is a plain orthographic letter and should be treated as such, and not by borrowing a punctuation sign or symbol which may have other conflicting uses. What I suggested is that the general category, despite its weak definition, is still a good indicator of which character to use. So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not the exact character to represent this middle dot in all usages, ... Philippe, before jumping to this conclusion, please can you describe to me EXACTLY how the shape and behaviour of the Catalan middle dot differs from the behaviour of U+2027 defined in Unicode Standard Annex #14, http://www.unicode.org/unicode/standard/reports/tr14/tr14-15.html: 2027 HYPHENATION POINT A hyphenation point is a raised dot, which is used primarily to visibly indicate syllabification of words. Syllable breaks are potential line break opportunities in the middle of words. It is mainly used in dictionaries and similar works. When an actual line break falls inside a word containing hyphenation point characters, the hyphenation point is rendered as a regular hyphen at the end of the line. Please don't waste our time with further discussion of how various dictionaries indicate syllable breaks, especially when they don't use U+2027 at all, but rather a vertical line i.e. a quite different character. From the descriptions which you and Anto'nio have provided and from http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf, it seems to me that the Catalan behaviour is exactly as described for U+2027 in USA #14, perhaps because the Catalan usage has been borrowed from dictionary usage or vice versa. This strongly suggests that U+2027 is the appropriate character for Catalan. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
On 15/04/2004 16:22, Philippe Verdy wrote: ... There are also, including combining middle dots (most of these listed at U+00B7): U+0387 GREEK ANO TELEIA wrong form? it's a small square, and is the greek semicolon, and is then separating words. This should not be a small square; it should be identical to U+00B7 to which it is canonically equivalent. U+05BC HEBREW POINT DAGESH OR MAPIQ where would you position it according to the Catalan L letter which has a distinct directionality, and should not inherit of the complexity of the Hebrew script? Why isn't there even U+0307 COMBINING DOT BELOW or U+0323 COMBINING DOT ABOVE in your list? Surely U+05BC doesn't have inherent directionality? I thought that combining characters took the directionality of their base characters. I was only including middle height dots. The list of dots in other positions in much longer - at least four others just in Hebrew. I wasn't seriously suggesting any of these as suitable for Catalan, except possibly for HYPHENATION POINT. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
* Philippe Verdy <[EMAIL PROTECTED]> [2004-04-16 01:22]: > > U+0387 GREEK ANO TELEIA > wrong form? it's a small square, and is the greek semicolon, and is then > separating words. U+0387 is canonically equivalent to U+00B7. About its shape, whether it's square or round depends on what the full stop looks like in that font - they should look exactly the same, only the "ano teleia" (upper dot) should be at x-height. -- Αλέξανδρος Διαμαντίδης * [EMAIL PROTECTED]
Re: U+0140
Kenneth Whistler wrote: 00B7;MIDDLE DOT;Po;0;ON;N; 10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; 16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; I was meaning to ask about this. I'm all over not encoding Yet Another middle dot, but I was wondering. In my research on Samaritan, I've found that they frequently write (you guessed it) a middle dot to separate words (they like to use space to enable them to do this cool columnar writing thing). I was assuming that this could be conflated with someone else's middle-dot-word-separator; would that be U+10101? As far as I am concerned, U+00B7 should be sufficient for that. I wasn't sure if character properties or whatever made a difference, since this is supposed to be a word separator. Whatever; I'm sufficiently confident that THIS dot, at least, won't have to be encoded. Note that as part of the ongoing work to cover Greek paleographic needs, a large number of multiple dot punctuation characters are currently under ballot for addition to 10646 (and Unicode). See 2056, 2058..205E at: http://www.unicode.org/alloc/Pipeline.html These are (proposed to be) encoded in the General Punctuation block to ensure that *everyone* is clear that their intended use is general, so we don't have to keep cloning more and more such dot combinations to handle the dot punctuation for each different paleographic tradition. Yeah, everyone uses dots. Samaritan cantillation has various colons and two-dot-leader looking things, and small circles... but also combinations, like colon-line, colon-angle, stuff like that. ~mark
Re: U+0140 Catalan middle-dot
From: "Patrick Andries" <[EMAIL PROTECTED]> > Philippe Verdy a écrit : > >From: "Patrick Andries" <[EMAIL PROTECTED]> > >>Peter Kirk a écrit : > >>>What is U+2027 intended for? The name suggests that it might be what > >>>is needed for Catalan. > >>>[PA] Isn't this the one that should be used in dictionaries ? > >>> > >>See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html > >>2027 > >>HYPHENATION POINT > >>Hyphenation point is primarily used to visibly indicate syllabification > >>of words. Syllable breaks are potential line breaking opportunities in > >>the middle of words. The hyphenation point It is mainly used in > >>dictionaries and similar works. When an actual line break falls inside a > >>word containing hyphenation point characters, the hyphenation point is > >>rendered as a regular hyphen at the end of the line. > > > >This last sentence is wrong, at least in my Larousse dictionnaries: > > > I believe it simply describes certain practices (Anglo-Saxon, American > ?), maybe this should be clearer. This just demonstrate that the "only one dot character fits all" strategy is too simplist. There are atual usages in such serious publications as very common dictionnaries, of multiple dots which have their own semantics and rendering particularities. The Catalan middle-dot is a plain orthographic letter and should be treated as such, and not by borrowing a punctuation sign or symbol which may have other conflicting uses. What I suggested is that the general category, despite its weak definition, is still a good indicator of which character to use. So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not the exact character to represent this middle dot in all usages, even if there's a important legacy history of using the ISO-8859-1 middle-dot in Catalan (or a legacy use of L-middle-dot in ISO 6937 which was defined just for convenience with older technologies that could not display acceptably the sequence in Catalan due to the excessive space. So a ligature was probably preferable in the Videotex context.) My opinion is that U+2027 already meant in Teletext or Videotex two abstract characters even for Catalan readers (and this can explain why there's a compatibility decomposition, as a legacy acceptable but poor fallback). The other reason is that the middle-dot, being a punctuation, would be likely to have extra spacing on both sides, which would make it inappropriate for rendering Catalan words. Also such punctuation would probably forbid kerning of the middle-dot within the open area of a uppercase L, something which would be acceptable for reading Catalan (as it was acceptable with U+2027 in Teletext/Videotex). I looked for handwritten forms of two lowercase l with an intermediate middle dot and it clearly shows that Catalan write them without extra spacing: the dot fits well within the open area between the connecting baseline and the two ascending loops (and sometimes it appears as a horizontal or slanted medial stroke that connect the two loops, or as a ligature of the two lowercase l letters, or the dot is put within the ascending loop of the first l). I don't know which form the Catalan children learn at school to write correctly the three letters, or if they are taught whever this dot is a diacritic or a special hyphen... My readings only show that there's no such L-with middle-dot in the Catalan alphabet, and it is not most often considered as a letter despite it represents a distinctive sound. An interesting article about Catalan typesetting with TeX is on: http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf * It is noted that the usual middle dot (which normally appears at half the baseline and the x-height) is not exactly what is needed for catalan (where it should be placed at half the current height of the current middle-dot and the ascender height). Another feature is that the dot should be at equal distance of the two vertical stems of lowercase or uppercase L, which keep their normal distance that would be used in absence of this dot...) * So the dot is naturally kerned into the first uppercase L, but usually not between lowercase letters where it takes its space within the inter-letter spacing. * It also discusses the allowed hyphenations and their correct rendering...
Re: U+0140
> >00B7;MIDDLE DOT;Po;0;ON;N; > >10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; > >16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; > I was meaning to ask about this. I'm all over not encoding Yet Another > middle dot, but I was wondering. In my research on Samaritan, I've > found that they frequently write (you guessed it) a middle dot to > separate words (they like to use space to enable them to do this cool > columnar writing thing). I was assuming that this could be conflated > with someone else's middle-dot-word-separator; would that be U+10101? As far as I am concerned, U+00B7 should be sufficient for that. But if you were looking for a punctuation mark distinguished from U+00B7, specifically for archaic textual practice, my choice would be U+16EB (and the Runic double dot, U+16EC) as an alternative. Scripts.txt treats these as common punctuation: 16EB..16ED; Common # Po [3] RUNIC SINGLE PUNCTUATION..RUNIC CROSS PUNCTUATION Unfortunately, software may be making over-aggressive assumptions about script identity in some cases, which can throw off implementations that pick up punctuation out of another script block. Note that as part of the ongoing work to cover Greek paleographic needs, a large number of multiple dot punctuation characters are currently under ballot for addition to 10646 (and Unicode). See 2056, 2058..205E at: http://www.unicode.org/alloc/Pipeline.html These are (proposed to be) encoded in the General Punctuation block to ensure that *everyone* is clear that their intended use is general, so we don't have to keep cloning more and more such dot combinations to handle the dot punctuation for each different paleographic tradition. --Ken
Re: U+0140
At 03:31 PM 4/15/2004, Peter Kirk wrote: [PA] Isn't this the one that should be used in dictionaries ? See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html Why are you guys citing the 1999 (!) version of this TR? It's 2004, Unicode 4.0.1 has been published and we are up to http://www.unicode.org/unicode/standard/reports/tr14/tr14-15.html. While the text is not vastly different, there's at least one textutal fix in the section cited. A./
Re: U+0140
Philippe Verdy a écrit : From: "Patrick Andries" <[EMAIL PROTECTED]> Peter Kirk a écrit : What is U+2027 intended for? The name suggests that it might be what is needed for Catalan. [PA] Isn't this the one that should be used in dictionaries ? See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html 2027 HYPHENATION POINT Hyphenation point is primarily used to visibly indicate syllabification of words. Syllable breaks are potential line breaking opportunities in the middle of words. The hyphenation point It is mainly used in dictionaries and similar works. When an actual line break falls inside a word containing hyphenation point characters, the hyphenation point is rendered as a regular hyphen at the end of the line. This last sentence is wrong, at least in my Larousse dictionnaries: I believe it simply describes certain practices (Anglo-Saxon, American ?), maybe this should be clearer. P. A.
Re: U+0140
From: "Patrick Andries" <[EMAIL PROTECTED]> > Peter Kirk a écrit : > > > What is U+2027 intended for? The name suggests that it might be what > > is needed for Catalan. > > [PA] Isn't this the one that should be used in dictionaries ? > > See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html > 2027 > HYPHENATION POINT > Hyphenation point is primarily used to visibly indicate syllabification > of words. Syllable breaks are potential line breaking opportunities in > the middle of words. The hyphenation point It is mainly used in > dictionaries and similar works. When an actual line break falls inside a > word containing hyphenation point characters, the hyphenation point is > rendered as a regular hyphen at the end of the line. This last sentence is wrong, at least in my Larousse dictionnaries: For example, look at the entry for "Blattfeder". The entry is in fact "Blatt|feder" with a thin vertical line delimiting radicals. This entry has a subitem for "Blattlauskäfer", noted: ... || °~laus*- käfer ... where the '*' above is in fact the hyphenation point, and the '-' is a regular hyphen added because there's a line-break (additionnally the degree '°' symbol indicates that the radical symbolized by the long tilde '~' must have a capital initial letter.) There is then no mutation of the hyphenation point into a regular hyphen when there's a line-break. Clearly, the hyphenation point is a notation that is not part of the normal orthograph, unlike the regular hyphen at end of lines which would appear in normal texts out of the dictionnary entries, so when line breaks occur, both symbols are used together. This hyphenation point, used in German dictionnaries for verbs with particuls or for nouns and adjectives with prefixes, is thicker than a sentence-ending dot or period, and drawn above the baseline but it is not a middle-dot as its position is at the x-eight. It is too thick and too high to be the Catalan middle-dot...
Re: U+0140
Kenneth Whistler wrote: Philippe opined: If there's something really missing for Catalan, it's a middle-dot letter with general category "Lo", and combining class 0 (i.e. NOT combining). The one thing for sure is that the Unicode Standard does not need to encode more middle dots: 00B7;MIDDLE DOT;Po;0;ON;N; 0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;N; 1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;N; 22C5;DOT OPERATOR;Sm;0;ON;N; 2F02;KANGXI RADICAL DOT;So;0;ON; 4E36N; 302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;N; 30FB;KATAKANA MIDDLE DOT;Pc;0;ON;N; FE45;SESAME DOT;Po;0;ON;N; FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON; 30FBN; 10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; 1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;N; 2027;HYPHENATION POINT;Po;0;ON;N; 16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; 1802;MONGOLIAN COMMA;Po;0;ON;N; 318D;HANGUL LETTER ARAEA;Lo;0;L; 119EN;HANGUL LETTER ALAE A 1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;N; I was meaning to ask about this. I'm all over not encoding Yet Another middle dot, but I was wondering. In my research on Samaritan, I've found that they frequently write (you guessed it) a middle dot to separate words (they like to use space to enable them to do this cool columnar writing thing). I was assuming that this could be conflated with someone else's middle-dot-word-separator; would that be U+10101? ~mark
Re: U+0140
- Original Message - From: "Peter Kirk" <[EMAIL PROTECTED]> To: "Kenneth Whistler" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, April 16, 2004 12:03 AM Subject: Re: U+0140 > On 15/04/2004 12:32, Kenneth Whistler wrote: > > >Philippe opined: > > > > > > > >>If there's something really missing for Catalan, it's a middle-dot letter with > >>general category "Lo", and combining class 0 (i.e. NOT combining). > >> > >> > > > >The one thing for sure is that the Unicode Standard does not need > >to encode more middle dots: > > > >00B7;MIDDLE DOT;Po;0;ON;N; > >0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;N; > >1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;N; > >22C5;DOT OPERATOR;Sm;0;ON;N; > >2F02;KANGXI RADICAL DOT;So;0;ON; 4E36N; > >302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;N; > >30FB;KATAKANA MIDDLE DOT;Pc;0;ON;N; > >FE45;SESAME DOT;Po;0;ON;N; > >FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON; 30FBN; > >10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; > >1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;N; > >2027;HYPHENATION POINT;Po;0;ON;N; > >16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; > >1802;MONGOLIAN COMMA;Po;0;ON;N; > >318D;HANGUL LETTER ARAEA;Lo;0;L; 119EN;HANGUL LETTER ALAE A > >1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;N; > > > >(and that's not considering the lowered dots "FULL STOP" and the raised > >dots) > > > > > > > There are also, including combining middle dots (most of these listed at > U+00B7): > > U+0387 GREEK ANO TELEIA wrong form? it's a small square, and is the greek semicolon, and is then separating words. > U+05BC HEBREW POINT DAGESH OR MAPIQ where would you position it according to the Catalan L letter which has a distinct directionality, and should not inherit of the complexity of the Hebrew script? Why isn't there even U+0307 COMBINING DOT BELOW or U+0323 COMBINING DOT ABOVE in your list? > U+2022 BULLET too thick, and it is a word-breaking symbol with a candidate line break on either sides. most often is a bullet at the beginning of a sub-paragraph, but can be used for example to separate multiple titles (think about titles on CD-Audio) or dictionaries and lots of publication where it is a symbol mark which is used as a source anchor for a note. > U+2024 ONE DOT LEADER this is a spacing character, mostly a punctuation, and clearly word-breaking... > U+2219 BULLET OPERATOR this is a symbol with a evident word break on either sides (think about mathematical formulas) > U+2027 HYPHENATION POINT a good suggestion if this was not a punctuation... What is the exact status of this character? When I look into the UCD properties I see that: French name: POINT DE COUPURE DE MOT GC=Po: punctuation, other [not even a "connecting" Pc like the ASCII underscore], so a separator of words CC=0: not combining [OK] BD=ON: order neutral [OK] > What is U+2027 intended for? The name suggests that it might be what is > needed for Catalan. I think that this is better seen as an annotation used in dictionaries to note visually the position of candidate syllable breaks, (unlike the soft-hyphen which is normally not rendered except where the candidate line-break is realized). Many dictionnaries prefer a thin vertical line which extends from the descender to the ascender, and in fact there are fonts where this character is drawn like this, and which is not the same as the ASCII vertical line which is smaller and often thicker.) This notation symbol could be used in addition to and immediately after the Catalan middle-dot... My Larousse Catalan-French pocket dictionnary uses a very thin vertical line to mark word terminations and prefix/suffixes, in combination with a orthographic middle-dot in the Catalan word which is always noted. Question here: is that vertical line used in Larousse really the same as U+007C? In the same context I note that the ASCII TILDE (a large version aligned on the baseline) is used to note the common radical indicated by the vertical line symbol that separate prefixes and suffixes from the radical of the entry word... In the same dictionnary, the vertical line is also used, isolately or in a pair, and surrounded by a cadratin space, as a separator between definition items, to group them by semantic proximity; but in that case the vertical line is thicker and does not extend below the baseline, so this separator looks more like a true U+007C, i.e. a regular punctuation, with candidate line breaks occuring both before and after it (in fact at the position of the surrounding c
Re: U+0140
On 15/04/2004 15:13, Patrick Andries wrote: Peter Kirk a écrit : What is U+2027 intended for? The name suggests that it might be what is needed for Catalan. [PA] Isn't this the one that should be used in dictionaries ? See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html 2027 HYPHENATION POINT Hyphenation point is primarily used to visibly indicate syllabification of words. Syllable breaks are potential line breaking opportunities in the middle of words. The hyphenation point It is mainly used in dictionaries and similar works. When an actual line break falls inside a word containing hyphenation point characters, the hyphenation point is rendered as a regular hyphen at the end of the line. Well, this sounds just like the required behaviour for Catalan, as described by Anto'nio Martins-Tuva'lkin on 28th March. He wrote: Something happends when the "L·L" coincides with a soft line end. I'm no expert in Catalan typesetting but IIRC the dot becomes a hyphen, while regular "LL"s cannot be broken. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
Peter Kirk a écrit : What is U+2027 intended for? The name suggests that it might be what is needed for Catalan. [PA] Isn't this the one that should be used in dictionaries ? See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html 2027 HYPHENATION POINT Hyphenation point is primarily used to visibly indicate syllabification of words. Syllable breaks are potential line breaking opportunities in the middle of words. The hyphenation point It is mainly used in dictionaries and similar works. When an actual line break falls inside a word containing hyphenation point characters, the hyphenation point is rendered as a regular hyphen at the end of the line.
Re: U+0140
On 15/04/2004 12:32, Kenneth Whistler wrote: Philippe opined: If there's something really missing for Catalan, it's a middle-dot letter with general category "Lo", and combining class 0 (i.e. NOT combining). The one thing for sure is that the Unicode Standard does not need to encode more middle dots: 00B7;MIDDLE DOT;Po;0;ON;N; 0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;N; 1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;N; 22C5;DOT OPERATOR;Sm;0;ON;N; 2F02;KANGXI RADICAL DOT;So;0;ON; 4E36N; 302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;N; 30FB;KATAKANA MIDDLE DOT;Pc;0;ON;N; FE45;SESAME DOT;Po;0;ON;N; FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON; 30FBN; 10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; 1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;N; 2027;HYPHENATION POINT;Po;0;ON;N; 16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; 1802;MONGOLIAN COMMA;Po;0;ON;N; 318D;HANGUL LETTER ARAEA;Lo;0;L; 119EN;HANGUL LETTER ALAE A 1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;N; (and that's not considering the lowered dots "FULL STOP" and the raised dots) There are also, including combining middle dots (most of these listed at U+00B7): U+0387 GREEK ANO TELEIA U+05BC HEBREW POINT DAGESH OR MAPIQ U+2022 BULLET U+2024 ONE DOT LEADER U+2027 HYPHENATION POINT U+2219 BULLET OPERATOR What is U+2027 intended for? The name suggests that it might be what is needed for Catalan. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: U+0140
From: "Kenneth Whistler" <[EMAIL PROTECTED]> > Philippe opined: > > > If there's something really missing for Catalan, it's a middle-dot letter with > > general category "Lo", and combining class 0 (i.e. NOT combining). > > The one thing for sure is that the Unicode Standard does not need > to encode more middle dots: > > 00B7;MIDDLE DOT;Po;0;ON;N; > 0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;N; > 1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;N; > 22C5;DOT OPERATOR;Sm;0;ON;N; > 2F02;KANGXI RADICAL DOT;So;0;ON; 4E36N; > 302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;N; > 30FB;KATAKANA MIDDLE DOT;Pc;0;ON;N; > FE45;SESAME DOT;Po;0;ON;N; > FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON; 30FBN; > 10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; > 1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;N; > 2027;HYPHENATION POINT;Po;0;ON;N; > 16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; > 1802;MONGOLIAN COMMA;Po;0;ON;N; > 318D;HANGUL LETTER ARAEA;Lo;0;L; 119EN;HANGUL LETTER ALAE A > 1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;N; > > (and that's not considering the lowered dots "FULL STOP" and the raised > dots) In that set there's only one "letter" (1427; Canadian syllabics Final Middle Stop) which has the wrong script, although it is a appropriate "Lo" that would find a very unusual application for Catalan. I forget the rest (including 2027, the hyphenation point, which shamely is a punctuation, not a letter, and not explicitly "middle", meaning that it would render inappropriately for Catalan, although it still represents the Catalan function of this character).
Re: U+0140
Philippe opined: > If there's something really missing for Catalan, it's a middle-dot letter with > general category "Lo", and combining class 0 (i.e. NOT combining). The one thing for sure is that the Unicode Standard does not need to encode more middle dots: 00B7;MIDDLE DOT;Po;0;ON;N; 0701;SYRIAC SUPRALINEAR FULL STOP;Po;0;AL;N; 1427;CANADIAN SYLLABICS FINAL MIDDLE DOT;Lo;0;L;N; 22C5;DOT OPERATOR;Sm;0;ON;N; 2F02;KANGXI RADICAL DOT;So;0;ON; 4E36N; 302E;HANGUL SINGLE DOT TONE MARK;Mn;224;NSM;N; 30FB;KATAKANA MIDDLE DOT;Pc;0;ON;N; FE45;SESAME DOT;Po;0;ON;N; FF65;HALFWIDTH KATAKANA MIDDLE DOT;Pc;0;ON; 30FBN; 10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;N; 1D16D;MUSICAL SYMBOL COMBINING AUGMENTATION DOT;Mc;226;L;N; 2027;HYPHENATION POINT;Po;0;ON;N; 16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;N; 1802;MONGOLIAN COMMA;Po;0;ON;N; 318D;HANGUL LETTER ARAEA;Lo;0;L; 119EN;HANGUL LETTER ALAE A 1D01B;BYZANTINE MUSICAL SYMBOL KENTIMA ARCHAION;So;0;L;N; (and that's not considering the lowered dots "FULL STOP" and the raised dots) > It's > unfortunate that almost all legacy Catalan text transcoded to > Unicode are based > on the middle-dot symbol (the one mapped in ISO-8859-1 and ISO-8859-15) > which is > not seen by Unicode as a letter (Lo) but as a symbol only. Actually, that is *fortunate*, not unfortunate, since it is the correct conversion from 8859-1 (and Windows 1252) data. How U+00B7 behaves in Catalan data is then a matter of local *adaptation* of software for the correct handling of the Catalan language. Note that while the particular combination <006C, 00B7, 006C> is a peculiarity of Catalan orthography, U+00B7 MIDDLE DOT (often called a 'raised period') is very widely used, indeed, in technical orthographies for many languages, particularly in the Americas, where it is used much more commonly than the IPA characters U+02D0 MODIFIER LETTER TRIANGULAR COLON or U+02D1 MODIFIER LETTER HALF TRIANGULAR COLON to indicate vocalic (or less commonly, consonantal) length. Obsessing about the behavior of U+00B7 in Catalan data while ignoring its use as a vowel length indicator in many, many other orthographies is rather pointless, it seems to me. --Ken
Re: U+0140
> Did you get an answer on this ? Why is there no decomposition associated > to this character ? Thanks to Eric and Patrick for digging out my answer on this perennial question from a couple years back, and saving me the trouble of having to rummage around to find it. :-) Also, it should be noted that there *is* a decomposition for U+0140 in the Unicode Character Database, to wit: 0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L; 006C 00B7;... ^^ It is a compatibility decomposition for two reasons: the decomposition into the sequence <006C, 00B7> may result in rendering differences (both because of potentially different decisions about where the render the dot and because the introduction of the U+00B7 MIDDLE DOT might impact line break decisions, depending on the implementation); secondly, the properties of the characters in the sequence <006C, 00B7> are distinct from those for <0140> by itself, and may impact things such as identifier parsing, again, depending on an implementation. And, as I indicated before, U+0140 is itself basically a compatibility character, introduced for mapping to ISO 6937, a preexisting standard that was among the list of character encoding standards intended to be covered by the initial Unicode repertoire. The character *was* in ISO 6937 for Catalan. Noting the Catalan association in the Unicode names list is different from any recommendation that U+0140 is the preferred character for the representation of l followed by a middle dot in Catalan text. Most existing Catalan data (8859-1, Windows 1252, primarily) would not use it, of course. Converted to Unicode, that data would also not use it, but be represented as the sequence <006C, 00B7>. And there is every expectation that new data created in Unicode would continue to use such a sequence for Catalan. --Ken
Re: U+0140
Kenneth Whistler a écrit : Did you get an answer on this ? Why is there no decomposition associated to this character ? Thanks to Eric and Patrick for digging out my answer on this perennial question from a couple years back, and saving me the trouble of having to rummage around to find it. :-) Also, it should be noted that there *is* a decomposition for U+0140 in the Unicode Character Database, to wit: 0140;LATIN SMALL LETTER L WITH MIDDLE DOT;Ll;0;L; 006C 00B7;... ^^ Oops. Looked at the wrong place in BabelMap. Sorry (blushing). Patrick
Re: U+0140
From: "Patrick Andries" <[EMAIL PROTECTED]> > Anto'nio Martins-Tuva'lkin a écrit : > >>However I advise removal of the note "Catalan" under U+0140 and > >>U+013F, and perhaps replacement of the whole note with «for Catalan > >>use U+006C U+00B7» (resp. U+004C). > >> > Did you get an answer on this ? Why is there no decomposition associated > to this character ? > > Also did somewhat mention why U+0140 is even in Unicode since it could > be considered (by ignorami like me) as a precomposed character (l + > middle dot) ? Is it due to the polysemy of the middle dot ? I thought it was already answered in this list by a Catalan speaking contributor: the sequence L+middle-dot in Catalan is NOT a combining sequence. The middle dot in Catalan plays a role similar to an hyphen between syllables, to mark a distinction with words where, for example a double-L would create an alternate reading. The dot indicates that each L must be read distinctly (or read with a long or emphatic L). In French for example we have words like "maille" to be read as /maj/, and the same "-ill-" written diphtongs after another vowel occur in Catalan. But French will not write "-ill-" if it occurs between two vowels where the two L must have the sound L (if this occurs in french, only 1 L is written, and the emphatic/long sound is not marked). Catalan has this orthograph, and writes the emphatic/long L distinctly. So it needs a symbol for that. The middle-dot is then considered in Catalan as a letter, that will occur in the middle of words. I don't know if the middle-dot can be used in Catalan as a cadidate position for a line break with hyphenation: if yes, is it kept before the hyphen, or is the middle-dot used alone, or is the middle-dot replaced by a regular hyphen? I don't know. But if the middle-dot must be replaced by a hyphen, then it is a punctuation (similar to hyphens used in compound-words). But in Catalan, the middle dot should not be kerned into the preceding uppercase L, like it would appear if it was considered equivalent to . Catalan has no use of such decomposition, and if such decomposition had existed, it would have been into L + combining left-middle-dot, and not the same character. If there's something really missing for Catalan, it's a middle-dot letter with general category "Lo", and combining class 0 (i.e. NOT combining). It's unfortunate that almost all legacy Catalan text transcoded to Unicode are based on the middle-dot symbol (the one mapped in ISO-8859-1 and ISO-8859-15) which is not seen by Unicode as a letter (Lo) but as a symbol only.
Re: U+0140
Philippe Verdy a écrit : From: "Patrick Andries" <[EMAIL PROTECTED]> Anto'nio Martins-Tuva'lkin a écrit : However I advise removal of the note "Catalan" under U+0140 and U+013F, and perhaps replacement of the whole note with «for Catalan use U+006C U+00B7» (resp. U+004C). Did you get an answer on this ? Why is there no decomposition associated to this character ? Also did somewhat mention why U+0140 is even in Unicode since it could be considered (by ignorami like me) as a precomposed character (l + middle dot) ? Is it due to the polysemy of the middle dot ? I thought it was already answered in this list by a Catalan speaking contributor: the sequence L+middle-dot in Catalan is NOT a combining sequence. Are you referring to the person I quoted ? Why doesn't the U+0140 have decomposition in Unicode ? P. A.
Re: U+0140
> [Original Message] > From: Patrick Andries <[EMAIL PROTECTED]> > > did somewhat mention why U+0140 is even in Unicode since it could > be considered (by ignorami like me) as a precomposed character > (l + middle dot) ? Is it due to the polysemy of the middle dot ? More likely it is due to this character being found in legacy character encodings. While I don't believe it is in any of the ISO 8859 character sets, this character (and U+013F) is found in the ISO 6937 Videotex standard, and probably others as well.
Re: U+0140
Patrick Andries a écrit : Anto'nio Martins-Tuva'lkin a écrit : However I advise removal of the note "Catalan" under U+0140 and U+013F, and perhaps replacement of the whole note with «for Catalan use U+006C U+00B7» (resp. U+004C). Did you get an answer on this ? Why is there no decomposition associated to this character ? Also did somewhat mention why U+0140 is even in Unicode since it could be considered (by ignorami like me) as a precomposed character (l + middle dot) ? Is it due to the polysemy of the middle dot ? [PA] In the meantime Eric Muller forwarded some answers (dating back from 6/8/2002) where Ken explains this all. Thank you Eric. « There is no particular reason to use the l· as a single character, when all the 8859-based and Windows 1252 implementations would be using U+00B7 for the middle dot. Consider U+0140 as effectively a compatibility character for ISO 6937. It is mapped to 0xF7 in that standard. It is also mapped to 0xA9A8 in Code Page 949 (Korean) -- which probably got it from ISO 6937 in the first place. Is U+0140 used in other languages? Not that I know of. --Ken » Patrick
Re: U+0140
Anto'nio Martins-Tuva'lkin a écrit : However I advise removal of the note "Catalan" under U+0140 and U+013F, and perhaps replacement of the whole note with «for Catalan use U+006C U+00B7» (resp. U+004C). Did you get an answer on this ? Why is there no decomposition associated to this character ? Also did somewhat mention why U+0140 is even in Unicode since it could be considered (by ignorami like me) as a precomposed character (l + middle dot) ? Is it due to the polysemy of the middle dot ? P. .A
Re: U+0140
On 2004.03.28, 22:25, Philippe Verdy <[EMAIL PROTECTED]> wrote: >> More like a letter, from a typography point of view. > > Not really, if it can be freely changed into a regular hyphen at > line breaks; now your comments interestingly makes me think about a > explicit and visible syllable break. "If" indeed -- something of which I am not sure about; for I wrote: >> I'm no expert in Catalan typesetting but IIRC the dot becomes a >> hyphen, while regular "LL"s cannot be broken.) I could ask about >> this in Catalonia "IIRC" means "if I recall correctly". So, do speculate at will, but please do not misquote me! The substance of "Catalan middle dot vs. hypenation" is interesting but not relevant for the asked editing of the comments under U+0140 in the standard. OTOH, I maintain that Catalan middle dot is indeed to be treated like a letter, from a typography point of view -- namely for word counting and selecting purposes. > I suppose that in Catalan, one could use the middle dot to mark this > syllable break in words like "kilo.octet". OK, now you are genuinely joking, aren't you?! If you haven't yet grasped it earlier, Catalan middle dot is to be found only between "L"s, for the already explained reasons. I see that one must indeed take your statements in matters unknown with a very liberal pinch of salt. :-( --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: U+0140
- Original Message - From: "Anto'nio Martins-Tuva'lkin" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, March 28, 2004 7:02 PM Subject: Re: U+0140 > On 2004.03.27, 11:12, Philippe Verdy <[EMAIL PROTECTED]> wrote: > > >>> This becomes evident when composing with extra-space between > >>> letters: there is no "tie" between the first "L" and the dot. > > > > Interesting comment, because I had always thought that this > > middle-dot was a modifier of the previous L, > > That was apparently the whole idea behind the first implementation of > this chararcter. (Where does it come from? MacWestern? No ISO:8859 > covers it, AFAIK.) > > > and I didn't think about syllabic hyphenation. > > Your're not supposed to. But people creating encoding should have done > more than just grab glyphs from assorted text. (Too bad that the few > people who can do it seriously are not rewarded for it...) > > >>> Using this character for Catalan texts additionally causes > >>> hyphenation problems. > > > > So what would be the "hyphenation problems"? > > Something happends when the "L·L" coincides with a soft line end. I'm > no expert in Catalan typesetting but IIRC the dot becomes a hyphen, > while regular "LL"s cannot be broken. > > I could ask about this in Catalonia, as also many of us, bvut it falls > outside the scope of Unicode. > > > Also what is the normal placement of the middle-dot after a > > uppercase L letter, doesn't it kern into the space above the > > horizontal bar? > > Kerning is kerning, right. What is the normal placement of a "V" after > an "A", or a "º" after a "."?... Thsey are separate characters, and > kerning is not a matter for Unicode. > > > If I understand what you say here, that it's not a diacritic that > > modifies that first L, > > Yes, it is not. > > > so that this middle-dot is effectively a orthographic hyphen similar > > in essence to other orthographic hyphens that are used to create > > compound words, or to mark the inversion of the verb and pronominal > > subject > > More or less, yes. But while this kind of hyphens and apostrophes > separate two "words", the Catalan middle do between two "L"s does not. > > > But in that case, is that middle-dot to be considered as a regular > > punctuation mark in Catalan? > > More like a letter, from a typography point of view. Not really, if it can be freely changed into a regular hyphen at line breaks; now your comments interestingly makes me think about a explicit and visible syllable break. Not not too far from the hyphen used between two parts of a compound word (which interestingly tends to disappear in modern orthographs of lots of compound words, such as "presse-papier" in French where the hyphen is needed between what is originately a verb and a nound to build a single noun, and that some write now as a single word "pressepapier" as it simplifies the rule for plural marks, or for neologisms like "kilo-octet" more often written now "kilooctet" even though it causes problems for the separate pronunciation of the double vowel "oo"). I suppose that in Catalan, one could use the middle dot to mark this syllable break in words like "kilo.octet". But the question of word-breaks is highly context-sensitive and language- dependant. It's hard to tell from a hyphen such as the one in the previous line, if it's a word-break hyphen or a compound-word composing hyphen. - Just look at this paragraph and you'll see several hyphens whose meaning differs even in English here. ;-)
Re: U+0140
On 2004.03.27, 11:12, Philippe Verdy <[EMAIL PROTECTED]> wrote: >>> This becomes evident when composing with extra-space between >>> letters: there is no "tie" between the first "L" and the dot. > > Interesting comment, because I had always thought that this > middle-dot was a modifier of the previous L, That was apparently the whole idea behind the first implementation of this chararcter. (Where does it come from? MacWestern? No ISO:8859 covers it, AFAIK.) > and I didn't think about syllabic hyphenation. Your're not supposed to. But people creating encoding should have done more than just grab glyphs from assorted text. (Too bad that the few people who can do it seriously are not rewarded for it...) >>> Using this character for Catalan texts additionally causes >>> hyphenation problems. > > So what would be the "hyphenation problems"? Something happends when the "L·L" coincides with a soft line end. I'm no expert in Catalan typesetting but IIRC the dot becomes a hyphen, while regular "LL"s cannot be broken. I could ask about this in Catalonia, as also many of us, bvut it falls outside the scope of Unicode. > Also what is the normal placement of the middle-dot after a > uppercase L letter, doesn't it kern into the space above the > horizontal bar? Kerning is kerning, right. What is the normal placement of a "V" after an "A", or a "º" after a "."?... Thsey are separate characters, and kerning is not a matter for Unicode. > If I understand what you say here, that it's not a diacritic that > modifies that first L, Yes, it is not. > so that this middle-dot is effectively a orthographic hyphen similar > in essence to other orthographic hyphens that are used to create > compound words, or to mark the inversion of the verb and pronominal > subject More or less, yes. But while this kind of hyphens and apostrophes separate two "words", the Catalan middle do between two "L"s does not. > But in that case, is that middle-dot to be considered as a regular > punctuation mark in Catalan? More like a letter, from a typography point of view. > Which category would you use to describe this character, > independantly of the current assignment of U+00B7? Something that does not counts "Paral·lel" as two words (while "jaime" or "its" may be two words), nor uses the middle dot for cursor stop point when goind Ctrl+arrow et c. --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: U+0140 (was: "Re: Public Review Issue Update")
> On 2004.03.26, 23:37, Rick McGowan <[EMAIL PROTECTED]> wrote: > > The Unicode Technical Committee has posted new issues for public > > review and comment. Details are on the following web page: > > I just added the following to the On-Line Report Form: > > > U+0140 : LATIN SMALL LETTER L WITH MIDDLE DOT, approx. similar to > > U+006C U+00B7, is said to be used for Catalan. That is not correct. > > Catalan usual orthography uses a regular middle dot to separate two > > "L"s in those cases where they are pronounced as a single one, > > doubled only for etymological reasons. > > > > This dot is not connected to the previous "L" in any way, as if it > > were some kind of diacritical. It is a standalone character -- akin > > to the hyphen in French or Portuguese. > > > > This becomes evident when composing with extra-space between > > letters: there is no "tie" between the first "L" and the dot. Interesting comment, because I had always thought that this middle-dot was a modifier of the previous L, and I didn't think about syllabic hyphenation. > > Using this character for Catalan texts additionally causes > > hyphenation problems. So what would be the "hyphenation problems"? Do you mean that when there's a line break opportunity between and , no additional hyphen mark should be inserted because the middle-dot is already the appropriate hyphen to mark that the word is not terminated at the line break? Also what is the normal placement of the middle-dot after a uppercase L letter, doesn't it kern into the space above the horizontal bar? If I understand what you say here, that it's not a diacritic that modifies that first L, so that this middle-dot is effectively a orthographic hyphen similar in essence to other orthographic hyphens that are used to create compound words, or to mark the inversion of the verb and pronominal subject in french questions (sometimes with an added phonetic "t" as in "pense-t-il?" , or to the apostrophe used to mark an ellision of some final letters in many languages ("j'aime", "je t'aime" in French, similar examples in Italian) or leading letters ("it's") or even some medial letters ("they aren't" in English). But in that case, is that middle-dot to be considered as a regular punctuation mark in Catalan? Which category would you use to describe this character, independantly of the current assignment of U+00B7?