Re: Mayan numerals (again)

2013-07-02 Thread Szelp, A. Sz.
The question is, whether the two versions (horizontal and vertical) are
warranted for or not.
With my limited knowledge of the matter, I would believe only one set to be
encodable, the other being free / stylistic variation.

Sz

Szelp, André Szabolcs

+43 (650) 79 22 400


On Sun, Jun 23, 2013 at 9:19 PM, Jameson Quinn jameson.qu...@gmail.comwrote:

 Last year, I started a discussion about proposing the Mayan numerals for
 inclusion in Unicode. Several people on the list supported this idea, and
 encouraged me to submit a proposal. I did not manage to do so last year,
 but I am ready to now.

 I have access to dozens of different books with their page numbers, tables
 of contents, and publication dates in mayan numerals. Several of them use
 the numerals in other ways, such as numbered lists or century numbers (ie,
 siglo 16, 16th century, with 16 in Mayan numbers). All of these are from
 a single publishing house, and I know of 2 other publishers who use similar
 practices. None of the samples I have are textbooks, and it is common for
 math textbooks here in Guatemala to have a section on Mayan numerals,
 typically with a few simple addition problems or the like.

 The publisher of the books I have is interested, and would probably sign
 on to my proposal, though it would take about a month for them to get full
 consensus on this.

 I can also provide photos of Guatemalan currency notes, which have mayan
 as well as arabic numerals on them.

 I'd like to propose 40 glyphs: the vertical and horizontal versions of the
 digits 0-19. The zero glyph would be in it's shell form; the several
 minor variants of this form would be considered as the same base glyph.
 This initial proposal would not include head variants or the petroglyphic
 flower zero, nor would it include petroglyphic marginal decorations on
 the glyphs for 1, 6, 11, and 16, as all of those are generally used in a
 context of fully glyphic writing, which has a number of difficult technical
 issues to resolve before it's ready for unicode. (Although I could provide
 at least one modern example of a glyphic text; this is at least to some
 degree a living art today, though it was dead for centuries.)

 I'd like to know what should be my next step, and if anyone who's more
 experienced with unicode procedures would like to advise me more closely.

 Sincerely,
 Jameson Quinn



Re: Mayan numerals (again)

2013-07-02 Thread Szelp, A. Sz.
One never stops learning...
I'd be very interested in the examples, especially in how far they are
non-interchangeable.

Thanks

Szelp, André Szabolcs

+43 (650) 79 22 400


On Tue, Jul 2, 2013 at 1:03 PM, Jameson Quinn jameson.qu...@gmail.comwrote:

 2013/7/2, Szelp, A. Sz. a.sz.sz...@gmail.com:
  The question is, whether the two versions (horizontal and vertical) are
  warranted for or not.
  With my limited knowledge of the matter, I would believe only one set to
 be
  encodable, the other being free / stylistic variation.

 I have examples of printed pages using both forms on the same page
 non-interchangably, if that helps.



Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

2013-06-19 Thread Szelp, A. Sz.
The COMMAN BELOW / CEDILLA problem is typically something that probably
cannot be solved in Unicode in a way to satisfy every possible aspect.[^1]
These problems are an artifact of the historical development of Unicode,
and as a standard, stability issues seem to be high priority. Higher
priority usually, than canonical equivalences and NFD, especially as NFC is
the usually recommended form.

To fix these is probably a to keep in mind item for a hypothetical *
*NeoUniCode* standard of the future, as so many other issues. With modern
font technologies capable of language dependent glyph variants and markup
languages, a unification from the beginning might be a solution, or to
disunify the both forms acceptable from either (with the drawback of even
more confusables). However, these considerations are pretty academic and
hypothetical from a current Unicode point of view.

The case is similar with CARON / COMMA ABOVE RIGHT of Czech/Slovak, posing
probably an even harder case. Here one might consider for a hypothetical *
*NeoUniCode* standard encoding them as they canonically appear—with CARON
for uppercase and COMMA ABOVE RIGHT for lowercase and define
language-dependent casing behaviour, as it is already done with Latin SMALL
LETTER I and SMALL LETTER DOTLESS I / CAPITAL LETTER I and CAPITAL LETTER
WITH DOT ABOVE for Turkish in the current Unicode standard. (And while at
it, one could consider do away with separate code points for uppercase
letters altogether and resolve the issue with a mechanism similar to
combining characters or variation selectors).

/Szabolcs


[^1]: In fact, in languages where both presentations are equally
acceptable, even the (synchronic) identity is hard to determine: is it a
CEDILLE that can take COMMA form as well, or the other way around?

Szelp, André Szabolcs

+43 (650) 79 22 400


On Wed, Jun 19, 2013 at 2:41 PM, Denis Jacquerye moy...@gmail.com wrote:

 On Wed, Jun 19, 2013 at 9:12 AM, Michael Everson ever...@evertype.com
 wrote:
  On 19 Jun 2013, at 07:54, Denis Jacquerye moy...@gmail.com wrote:
  [...]
  How would one rationalize using one diacritic U+0327 with M/m and O/o
 but not with L/l and N/n in Marshallese?
 
  The same way one would rationalize using precomposed ãẽĩñõũỹ (aeinouy
 with tilde) but a necessarily de-composed g̃ (g with tilde) in Guaraní.

 This is wrong: ãẽĩñõũỹ normalize to use U+0303 in NFD, so they
 canonically use the same tilde as g̃.
 The 4 additional non decomposable characters with Marshallese with
 cedilla would not normalize to use the same cedilla as the others
 Marshallese characters with cedilla. The would no canonically use the
 same cedilla.

  [...]
  It would require less new characters to be encoded and would make it
 easier to support in fonts (adding 1 instead of 4).
 
  No! Because if you added a single new character you'd have to make sure
 you had good glyph placement with LlMmNnOo which is eight glyphs.

 The best practice would require to add diacritical mark placement
 whenever necessary if not on all possible base character, M/m and O/o
 would still need either way, L/l and N/n would need it for other
 combining diacritics either way.
 A modern font already needs to be able to correctly place combining
 diacritics, including cedilla or ogonek.
 Navajo and other languages need other placement of ogonek than that of
 European languages.
 This does not mean it is justified to encode single precomposed Navajo
 ogonek characters.
 The placement of the cedilla is not semantically different, m̧ with
 the cedilla on the left has the same meaning as if the cedilla were
 centered or on the right, even if just one of the two is correct in
 some contexts like in Marshallese.
 This does not mean it is justified to encode m with left cedilla, m
 with centered cedilla or m with right cedilla.
 An additional single combining diacritics would behave the same way.

 On Wed, Jun 19, 2013 at 9:49 AM, Michael Everson ever...@evertype.com
 wrote:
  On 19 Jun 2013, at 09:04, Denis Jacquerye moy...@gmail.com wrote:
 
  Furthermore, the cedilla can also have a proper cedilla form as opposed
 to the Latvian or Livonian comma below form in transliteration systems.
 
  This has nothing to do with the Marshallese/Latvian conflict, though.
 
  ALA-LC romanizations use cedilla with r as they do under c or s.
 
  Does ŗ contrast with r̦ in ALA-LC romanization?

 The same way Marshallese has cedilla letters contrasting with comma
 below letters.
 The only correct form is with cedilla and it doesn't use comma below.

  BGN/PCGN and UNGEGN romanizations use cedilla with d as they do under
 h, s, t or z.
  DIN 1460-2 uses the cedilla under d, k, l, n as it does under c, h, s,
 t and z.
 
  If those things are a problem, then solving this problem for Marshallese
 simply does nothing about that problem. But it solves the problem for
 Marshallese.
 
  If the 4 Marshallese cedilla characters are encoded as single
 characters, does this mean the d, k, l, r 

Re: Greek Astrology

2012-11-01 Thread Szelp, A. Sz.
Is there evidence that these have been used consistently, on most charts of
the time? These could be ad-hoc notations (as given the contemporary
praxis, ligation per se does not make a symbol).

--
Szelp, André Szabolcs

+43 (650) 79 22 400


On Thu, Nov 1, 2012 at 2:38 AM, CE Whitehead cewcat...@hotmail.com wrote:

  Hi.
  From: Raymond Mercier 
 rm459_at_cam.ac.ukrm459_at_cam.ac.uk?Subject=Re:%20Greek%20astrology

 Date: Mon, 29 Oct 2012 08:52:43 -
   I think I had somehow assumed that the symbols used in Greek Horoscopes
 had already been encoded, but it seems not.
  The four signs used to mark the principal corners (ascendant, etc) of
 the horoscope diagram are shown in the attachment, taken from
  http://www.skyscript.co.uk/greek_horoscope.html

  These four signs should be encoded along with the zodiacal signs U+2648
 to U+2653.
  Perhaps they are already in the pipeline ?
 Perhaps these should be in the pipeline, as the online templates I could
 find for astrological charts do not have them; they have to be added in
 (although it would be possible to have these built into the chart template
 also, as the houses are always in the same place and the ascendant is
 always located between the 12th and the 1rst, etc.); see:
 http://www.skyscript.co.uk/charttemp.html

 Similarly Paul Wade's copiable template is void of the symbols

 http://books.google.com/books?id=WY8hjKtSaP0Cpg=PA40lpg=PA40dq=natal+charts+astrological+charts+templatessource=blots=By-xF3UGWBsig=KvomOKgo999CwuJPKaq1LmeoqHchl=frsa=Xei=oMCRUK-wF5Sc8gTWi4DYAgved=0CDQQ6AEwAzgK#v=onepageq=natal%20charts%20astrological%20charts%20templatesf=false

 (I'll try to check an offline guide, too, but the few actual online
 templates, not sample charts, seem void of the symbols for the ascendant,
 midheaven, etc., so they seem to be separate from the actual chart of the
 houses, so go for it. Happy Halloween in any case.)



 Best,

 --C. E. Whitehead
 cewcat...@hotmail.com
  Best wishes
  Raymond Mercier



Re: Greek astrology

2012-10-29 Thread Szelp, A. Sz.
These look as if they were actually ligatures.

Without knowing the greek words for the principal corners, I'd read them
as a rho-omega-kappa, a pi-upsilon, an alpha (delta?)-upsilon-nu-omega and
a rho-mu ligature. I wouldn't be surprised, if these letters were
abbreviations for some expanded terms for the four principal corners.

On the other hand there do exist ligatures which gained conventional
meaning and are now encoded as their own character, eg. ℔, ℅.

Szabolcs


On Mon, Oct 29, 2012 at 9:52 AM, Raymond Mercier rm...@cam.ac.uk wrote:

 **
 I think I had somehow assumed that the symbols used in Greek Horoscopes
 had already been encoded, but it seems not.
 The four signs used to mark the principal corners (ascendant, etc) of the
 horoscope diagram are shown in the attachment, taken from

 http://www.skyscript.co.uk/greek_horoscope.html

 These four signs should be encoded along with the zodiacal signs U+2648 to
 U+2653.
 Perhaps they are already in the pipeline ?
 Best wishes
 Raymond Mercier




Re: Greek astrology

2012-10-29 Thread Szelp, A. Sz.
Oh, actually the *very*hompage*you*linked* makes it quite clear, that these
are not symbols, but abbreviatures (emphasis by color by me; the original
page uses a somewhat unusual transliteration scheme):
— These are just completely ordinary late ancient/medieval abbreviations, I
would not think that they are encodable. (Use ZWJ, if you must).

To demonstrate: the name for the midheaven is written in Greek asmesouranhma,
the English transliteration of which is *Mesuranima* and the translation of
which is 'midheaven' or 'middle of the sky'. The equivalent Latin term,
which has remained in use, is *Medium Coeli*. Just as we abbreviate *M*
edium *C*oeli to MC, the Greek word *m*esou*r*anhma is abbreviated to
mr,which is worked into a symbol (see fig.6.D below) by allowing the
Greek
letter mu (m) to cut across the down stroke of the Greek letter rho (r).


 [image: Ascendant symbol] [image: IC symbol] [image: Descendant] [image:
MC symbol] A) AscendantB) IC C) DescendantD) MC
Fig. 6, the abbreviations and symbols of the angular house names



A similar approach is used to generate the symbol for the ascendant. The
Greek word *w**r*os*k*opoV transliterates as *Horoskopos*, which is easily
recognised as meaning 'hour-marker' or 'hour-watcher'. Here the abbreviated
(emboldened) characters are combined so that the down stroke of rho (r)
cuts across omega (w), and rests on top of kappa (k). This is one of only
four symbols which have been noticed in ancient Greek charts. The others
are the glyph for the midheaven which has just been described, and those
for the Sun and Moon which are detailed below. Currently this symbol has
the oldest heritage, appearing without the underlying kappa in a papyrus
from Karanis relating to the year 182. Of course, when the underlying kappa
is removed, the glyph for the ascendant and that of the midheaven appear
very similar, and in some charts the same symbol seems to have been used to
mark either or both the ascendant and
midheaven.[5]http://www.skyscript.co.uk/greek_horoscope.html#5



The name of the descendant shown here is not so much a symbol as an
abbreviation with a raised character at the end. This presents the first
four letters of the Greek word dunwn, which transliterates as *dunon* and
translates as 'setting' or 'western' or 'evening' (in the same way that the
word *oriens* can mean 'eastern' 'rising' or 'morning'; all of these words
originating from the same root).


The symbol that we see under the 4th house comprises the first two
characters of the Greek word *u**v*ogeion [!], with pi (v) resting on top
of upsilon (u). The transliteration of this word is *ypogeon* and its
translation 'under-earth' (or 'underground' or 'underworld') presents a
close association with traditional astrological references to the 4th house
as 'under the earth'. Our common abbreviation I.C., derives from the
Latin* Immum
Coeli *which translates as 'lower heaven', but this older term seems to do
a better job of conveying the underworld mythology that is anciently
associated with the 4th house, and its interpretative role in describing
what lies beneath the surface of the ground.



On Mon, Oct 29, 2012 at 9:52 AM, Raymond Mercier rm...@cam.ac.uk wrote:

 **
 I think I had somehow assumed that the symbols used in Greek Horoscopes
 had already been encoded, but it seems not.
 The four signs used to mark the principal corners (ascendant, etc) of the
 horoscope diagram are shown in the attachment, taken from

 http://www.skyscript.co.uk/greek_horoscope.html

 These four signs should be encoded along with the zodiacal signs U+2648 to
 U+2653.
 Perhaps they are already in the pipeline ?
 Best wishes
 Raymond Mercier




Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

2012-07-11 Thread Szelp, A. Sz.
On Wed, Jul 11, 2012 at 10:30 PM, Richard Wordingham 
richard.wording...@ntlworld.com wrote:

 On Wed, 11 Jul 2012 21:17:08 +0200
 Joó Ádám a...@jooadam.hu wrote:

   To extend the list, the Irish, Scots, English, Scandinavians and
   Poles picked up the Roman heritage without the assistance of being
   physically conquered.  And the Romanians re-established it as an
   expression of non-Slavness.
 
  Well, the official language of Hungary was Latin up until 1844. Does
  that qualify us as the true inheritors of the Roman Empire?

 No.  I wasn't sure how voluntarily Hungary (or rather, its rulers) had
 adopted West European ways, so I didn't add Hungary to the list.


Oh, its rulers adopted West European ways (or rather: the Latin Rite
Church's way as opposed to the Greek Rite Church's ways) quite voluntarily
in the late 10th century...


Re: Mandombe

2012-06-11 Thread Szelp, A. Sz.
On Mon, Jun 11, 2012 at 10:58 AM, Stephan Stiller sstil...@stanford.eduwrote:


 This is interesting only if the encodable elements would be different -
 remember, Unicode is not a font standard.


 +1; rendering can be so much more complex than encoding. I'd really like
 to see a successful renderer for Nastaliq, (vertical) Mongolian, or
 Duployan. (What *are* the hardest writing systems to render?)


Vertical mongolian does not seem to be harder to render _conceptually_
than, let's say, simple arabic. It's more the architectural limitations of
rendering engines that seem to limit its availability, and the intermixing
with horizontal text. For Nastaliq, Thomas Milo's DecoType is miraculous:
it's hard, but given the good job they did, obviously not impossible. —
Well, I don't know about Duployan.

/Sz


Re: Mandombe

2012-06-09 Thread Szelp, A. Sz.
A very interesting script indeed. (Never heard of it before).
While the shape and the impression it does is quite intriguing and
fascinating, I'd think that it's rather impractical to write actually. What
are the experiences of the educators in this respect? (Though I understand
that this being a revealed, thus in many respects sacred script to its
educators and users, accounts of it will be probably biassed).

/Sz

On Fri, Jun 8, 2012 at 10:43 PM, Jean-François Colson j...@colson.eu wrote:

 Hello

 In the French Wikipedia article about Mandombe (
 http://fr.wikipedia.org/wiki/**Mandombehttp://fr.wikipedia.org/wiki/Mandombe)
 I read: “Un dossier de demande d'encodage de l'écriture Mandombe a été
 introduit à l'Unicode au mois de décembre 2010. Ce dossier a été discuté à
 la réunion du Comité technique de l'Unicode au début du mois de février
 2011.” which I would translate as “A Mandombe script encoding request
 dossier was introduced at Unicode in December 2010. That dossier was
 discussed at the Unicode technical committee meeting at the beginning of
 February 2011.”

 Does anybody have informations about that “dossier”?

 Is it available anywhere on the web?

 Thanks

 JF





Re: Latin chi and stretched x

2012-06-08 Thread Szelp, A. Sz.
Julian, if you look closely, it is not actually a turned s, but something
created with a turned s in mind. In the very sort of the alphabet, the
regular s has equal (or near-equal) top and bottom bowls. the turned one
has an emphasized upper bowl, which of course stems from the idea of a
turned s (as some fonts have a larger bowl lower bowl of s for balance),
but it is quite clearly not a turned s as identity, but rather something
_inspired_ by a turned s.

On Thu, Jun 7, 2012 at 11:05 PM, Julian Bradfield
jcb+unic...@inf.ed.ac.ukwrote:

 David Starner wrote:
 LATIN SMALL LETTER ROTATED P was used; see
 http://commons.wikimedia.org/wiki/File:BAE-Siouan_Alphabet.png . It
 has caused some whimpering among those trying to transcribe the text.

 Urk! And there's rotated s as well.

 Alright, I take it back. There is no limit to the barminess of script
 inventors.
 Obviously what we need are combining marks whose visual effect
 is reversing/rotating the previous glyph. No, I didn't say that, I
 really didn't say that...

 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.





Re: Latin chi and stretched x

2012-06-08 Thread Szelp, A. Sz.
You are right, the s-acute just below it confused me.

--
Szelp, André Szabolcs

+43 (650) 79 22 400


On Fri, Jun 8, 2012 at 11:32 AM, Julian Bradfield
jcb+unic...@inf.ed.ac.ukwrote:

 Szelp, A. Sz. wrote:
 Julian, if you look closely, it is not actually a turned s, but something
 created with a turned s in mind. In the very sort of the alphabet, the
 regular s has equal (or near-equal) top and bottom bowls. the turned one
 has an emphasized upper bowl, which of course stems from the idea of a
 turned s (as some fonts have a larger bowl lower bowl of s for balance),
 but it is quite clearly not a turned s as identity, but rather something
 _inspired_ by a turned s.

 Quite clearly wrong! I'm afraid you're suffering from optical delusion.
 I actually thought the same when I first looked at it, but it's not
 so.
 Cut out the turned s; then cut out, say, the initial s of
 sonant. Rotate it 180 degrees. They're identical, up to the
 tiny variations due to actual ink from metal type.
 (Beware that the ś immediately below is from a different fount, and
 *does* have more equal bowls. That's what confused me at first.)

 Of course, since this was printed in the age of metal type, it *has*
 to be a turned s. Cutting a special type would cost far more, and as
 David pointed out in his original post, the reason for the absurd
 turned p and turned s was the the publishers weren't willing to cut
 the extra types to match the letters in the original hand-written script.

 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.




Re: Vexillological symbols

2012-06-06 Thread Szelp, A. Sz.
They probably are. They are routinely used in vexillological
literature also in print.

Szabolcs

On Sat, Jun 2, 2012 at 10:17 PM, Jean-François Colson j...@colson.eu wrote:
 While we’re speaking of flags, the study of flags is named vexillology.

 In that discipline, a certain number of symbols are used:
 63 symbols show where the flags are used and a score of additional symbols,
 presented at en.wikipedia.org/wiki/Vexillology , are used to describe the
 flags.
 There could be more symbols, I haven’t investigated that matter yet.
 Are these symbols worth a proposal?

 JF






Re: Latin chi and stretched x

2012-06-06 Thread Szelp, A. Sz.
Unvoiced so far, I had similar reservations re streched x and latin chi.

Michael wrote:
As I say, stretched x is in a family of other x's with one or two
long feet, which may have rings or hooks on the end of them. But its
weight is clearly x-like -- by design. Where Teuthonista texts
occasionally used a proper Greek chi it is because of typographic
deficiency.

This family of streched x-s seem to go back to a tradition of using
different font sorts distinctively for sounds, most prominently greek
letters (this practice found its way also into IPA) and fraktur. (I
know 19th c., early 20th c. German tourist's basic Italian guidbooks
using a vs. fraktur a differently to denote different sounds, as they
use x vs. chi differently.

The streched x with one long leg quite probably comes from a fraktur
(more exactly: textur) x, as does the streched x from the chi. Denis
gives good evidence for the streched x being chi. Adding curls and
modifications to existing (including innovative) signs is common to
phonetic tradition.

All in all, I also have the impression, that while encoding LATIN
CHI as distinct from GREEK CHI was long due, there are not enough
grounds to disunify latin-chi from streched-x. There is no contrastive
use and the history points to chi. The only difference is (if there is
any? most use italic type) stroke weight distribution between the two,
according to Michael, but it's Michael himself who's recognized that
Teuthonista suffers from a good deal of extraordinarily bad
typography, which shows us, that the different stroke weight
distribution is actually just bad typography. — actually quite
similar to something we've seen with Cyrillic reform orthographies
(eg. the gha derived from a handwritten old q, which got encoded
misnamed as OI) of the 20-30ies and the chinese tone letters derived
from numbers/latin/cyrillic type.


Szabolcs


On Mon, Jun 4, 2012 at 3:10 PM, Denis Jacquerye moy...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 11:38 AM, Michael Everson ever...@evertype.com wrote:
 On 4 Jun 2012, at 10:04, Denis Jacquerye wrote:

 On Mon, Jun 4, 2012 at 10:16 AM, Michael Everson ever...@evertype.com 
 wrote:
 What is your point, though?

 Latin stretched x has been accepted based on examples with an Italic glyph 
 like Lepsius' chi, a glyph like Greek chi and a stretched x taller than 
 x-height (and not below baseline). All these are strictly different glyphs.

 Teuthonista suffers from a good deal of extraordinarily bad typography, and 
 a fair bit of non-typographic handwritten text (which isn't bad). Where it 
 uses Greek sorts it is because that was what they had, but it is clear from 
 the *family* of stretched x's some with rings and curls that it is an x 
 that is being stretched. (And not a chi with

 But Latin chi is being proposed as a different character because IPA has 
 used a different glyph. Why?

 Because all, not some, of the IPA borrowings from Greek were explicitly 
 stated to be designed to be different from Greek and to harmonize with 
 Latin. The persisting unification doesn't make processing multi-script Greek 
 and Latin text any easier, and ultimately is not what was designed. This is 
 very clear in the beta, which now can be disunified because of its capital, 
 but which should never have been unified in the first place.

 Furthermore, the Latin capital Chi is being proposed based on Lepsius' 
 capital Chi which glyphs are strictly different from that one proposed.

 Yes, but it is still essentially a Latin Chi, not a Latin Stretched X. It is 
 clearly not a Greek Chi, because Greek Chi does not use that shape for its 
 capital. Lepsius, and the IPA, explicitly disunified Latin Chi from Greek, 
 and I would say that both Lepsius and IPA glyphs could be taken for glyph 
 variants of Latin Chi. But they are different from what is found in Greek.

 My concern is only with Latin chi being unified with Latin stretched x. The 
 disunification of Latin chi from Greek chi (or the others in the proposal) 
 is a good thing, I just think it has already been done with stretched x 
 given the examples.

 As I say, stretched x is in a family of other x's with one or two long feet, 
 which may have rings or hooks on the end of them. But its weight is clearly 
 x-like -- by design. Where Teuthonista texts occasionally used a proper 
 Greek chi it is because of typographic deficiency.

 How do we move forward?

 Is there evidence IPA Latin chi is any different from Teuthonista's 
 multiple stretched x? Both use the glyph of Greek chi sometimes, and other 
 glyphs other times.

 Stretched x is an x, not anything else. In its origin, they stretched a 
 Latin x. Latin chi is borrowed from Greek chi, but in Lepsius uses a unique 
 capital, and in IPA has a Greek-chi-like weight which differs from the Latin 
 x.

 Lepsius' chi (with a proper Latin glyph) was already in use in
 Lepsius' Standard Alphabet (1855) for a guttural consonant, and chi
 with an acute for a palatal consonant. The 

Re: Exact positioning of Indian Rupee symbol according to Unicode Technical Committee

2012-05-28 Thread Szelp, A. Sz.
Keyboard layouts are, to my best knowledge, not a matter of Unicode.

Szabolcs


On Mon, May 28, 2012 at 10:19 AM, Anand Kumar Sharma aksha...@cdac.in wrote:
 Hi

 I want to know that is current exact Position of Indian Rupee Symbol on
 US-English keyboard (QWERTY keyboard).

 I came across one of the blog showing Rupee symbol on extreme left to
 character 1 refer this
 http://blog.foradian.com/rupee-foradian-keyboard-layout-type-the-india
 (Refer Keyboard picture)

 There is another way of typing Rupee symbol using ALTGr+4 which I most of
 time use on third layer of In script Keyboard

 What will be the position of Rupee symbol according to particular STANDARD
 on our keyboard when new keyboard with rupee symbol will come into market

 --
 Thanks and Regards
 This mail has came from desk of
 Anand Kumar Sharma
 GIST QA|CDAC-Pune|Ph:020-25503468|http://www.cdac.in
 Before software can be reusable it first has to be usable


 ---
 This e-mail is for the sole use of the intended recipient(s) and may
 contain confidential and privileged information. If you are not the
 intended recipient, please contact the sender by reply e-mail and destroy
 all copies and the original message. Any unauthorized review, use,
 disclosure, dissemination, forwarding, printing or copying of this email
 is strictly prohibited and appropriate legal action will be taken.
 ---



Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-23 Thread Szelp, A. Sz.
Andreas, Asmus, let me have my two coins as well...

   The Turks did not present “a new symbol”. They presented a new design
   for an existing symbol (₤) which stands in for an existing currency.

  A new design makes it a new symbol. Especially a radical new design.

 What makes a symbol a symbol?

 If I design a new door handle, is this going to get a new rubrication in 
 household supplier’s catalogues? Or is it still just: a door handle.
 And how would you define or measure the radicalism of the design in question?
 I can’t see any ‘radical new design’.

So far I see the parallel between the new radical symbol and the
encoded lira sign (₤) to be the same relation as the one between the
new (official, prescriptive) Euro symbol [1] (the geometric one,
which, though supposed to be prescriptive, *no-one* uses) and the
actual incarnations of the Euro symbol (€) in different font faces
matching their design (usually their C).

Now, I truly concur with Andreas that as such, the new code position
is _not_ warranted. Of course, if the sign does get encoded, we won't
be able to prove ourselves, as encoding this design fallacy (thank god
in the case of the Euro common sense and a sense of aesthetics won
over burocratic shortsightedness) is on the other hand a
self-fulfillying prophecy. If you will have a U+20A4 LIRA SIGN _and_ a
U+20BA TURKISH LIRA SIGN, designers _will_have_to_ make a visual
distinction between them, forcing them to take on the poor design
official design, not allowing them to interpret creatively the sign
like they did with the Euro to make it more visually pleasing. I'm
wondering how often we'll see (before the encoding happens! [2]) the
new lira sign to surface as ₤, £ or Ł or Ƚ in handwritten naive
typography after the fuss about the new sign settles in 1–2 months.

So I really think that the current situation is as this:
   The Turkish government has presented an official Turkish lira
sign (alike the official Euro construction). This is probably going
to be printed on banknotes, as it is official. This is fine. So does
the ENB. However, the sign is in fact just a particular identity, used
for official, engraving and minting purposes of ₤, as which it should
be used in text.
Transcribing the particular design of [anchor-lira] 100 of a
hypothetical future banknote in plaintext as ₤ 100 is equally valid
as transcribing the [official-geometric-euro-design] 10 of the Euro
banknotes as € 10.

Of course, if you buy too quickly and too cheap, as Andreas put it,
and encode the new glyph variant of the Lira sign which happens to be
the one preferred for future Turkish banknotes and coins, you open up
Pandora's box by forcing a need for distinction, where there is — as
per status-quo — none. You have been warned :-)

My two cents...
Szabolcs

[1] http://en.wikipedia.org/wiki/File:Euro_Construction.svg
[2] of course, once the sign is encoded, it will be used in print and
that will influence handwritten usage. Well, the self-fulfilling
prophecy sets in.




Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-23 Thread Szelp, A. Sz.
Asmus,

most of your letter (and my previous one, for a matter of fact), is
opinion, which is valuable to voice and to be heard, but which upon
it's hard to argue, so I won't go into that.

However you write:
 What you and Andreas are advocating, that is not to add a code point, would
 require a wholesale glyph change for U+20A4. All existing fonts would have
 to be tweaked to suddenly have shapes based on a L in a Turkish slipper
 (that's what the times-like example in the proposal document reminds me
 of) instead of a script-like shape (based on £).

And I must reject that. This is not what we are advocating.
While not speaking for Andreas, *my* point is that if we were to
encode that writing on that mug or on the flyer (cf. the proposal you
are referring to), the identification of the incriminated glyph as
U+20A4 would be correct and preferable and right.

Thank god, for fancy flyer designers (who might want to have the
flashy anchor-style, or let's put it that way: technocratic
constructivist style) modern font technologies allow for glyph
variants via stylistic sets or other means. (i.e. there might be a
_preferred_ style of U+20A4 in Turkey, as there is a preferred style
for certain italic Cyrillic letters in Serbia distinct from the
Russian [= de facto general] style.

If the usage of the sign develops in a way that a disunification is
warranted, we can do so later. No need to hurry. The Armenian dram
sign was first printed on a banknote in 2003 (in the security strip of
the 10.000 dram banknote). It has been consequently used in newer
coinage and banknotes, appearing on the 1.000 dram in 2011. It was
part of an Armenian national standard. Yet Unicode encoded it only in
2011 in v6.1. That's 8 years. There was obviously no need for hurry to
encode a new-born currency sign. Neither is here need for hurry. We
can wait and see wether there's need or real basis for disunification.

Szabolcs




Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-23 Thread Szelp, A. Sz.
Michael wrote:
  which happens to be the one preferred for future Turkish banknotes and 
  coins, you open up Pandora's box by forcing a need for distinction, where 
  there is — as per status-quo — none. You have been warned :-)

 There is nothing new here.

 2003-02-24 ₲ ₳ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2579.pdf
 2003-10-01 ؋   http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2640.pdf
 2004-04-23 ₴ ₵ http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2743.pdf
 2008-03-06 ₷   http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3390.pdf
 2008-03-06 ₸   http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3392.pdf
 2010-02-10 ֏   ftp://std.dkuug.dk/jtc1/sc2/wg2/docs/n3771.pdf (KP)
 2010-07-19 ₹   http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3862.pdf
 2012-04-17 ₺   http://std.dkuug.dk/jtc1/sc2/wg2/docs/n4258.pdf


Of course there is. These were signs unidentifyable with existing
currency symbols. The new rupee sign is, of course, note identical
with the [Rp] sign, which is quite distinct, only the semantics being
identical. (like € being semantically identical to the
four-codepoint string Euro).

Also, all of these (and also the Euro) have a clear description in
terms of constituent letters.
$: an S with a single or a double vertikal stroke
¢: a c with a vertical or slanted stroke
£: a fancy L with a horizontal stroke
₤: a fancy L with two horizontal strokes
₪: a SHIN and a HET ligated ( sheqel ḥadash 'new shekel')
€: a C with double strokes ( E)
₲: a G with a vertical stroke
₳: an A with a double crossbar
₴: A DZELO with a double crossbar ( italic minuscle GHE)
₵: A C with a vertical stroke
₷: an S with an m ligated
₸: a T with a second top bar
֏: an Armenian CAPITAL DA with a double crossbar (instead of the
simple right twig)
₹: a stemless R crossed ( RA crossed)

Quite tellingly in the case of the EURO it is *not* the very geometric
official glyph of the Euro that is encoded, it is not even chosen
for the representative glyph.
So what is the proposed TURKISH LIRA SIGN, if not a fancy L with two
horizontal strokes?

Szabolcs




Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-23 Thread Szelp, A. Sz.
Philippe,

 In fact I do expect that real world representation of the new sign
 (outside banknotes and preprinted check forms), will be more similar
 to a mirrored capital J, the two strokes will be there but their
 slanting will vary a lot.

so if your assumptions do turn out to be true, then it really will be
an ARMENIAN DRAM rotated by 180°s... ;-)

/Sz




Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-23 Thread Szelp, A. Sz.
On Wed, May 23, 2012 at 2:31 PM, Philippe Verdy verd...@wanadoo.fr wrote:

 so if your assumptions do turn out to be true, then it really will be
 an ARMENIAN DRAM rotated by 180°s... ;-)

 A 180 degrees rotation is really so much significant that there's no
 risk of confusion. Otherwise we would always confuse A and V, 6 and 9,
 L and 7, C and Ɔ, p and d, d and q, and so on.

... unless you are a legasthenic ...

Come on, note the ;-), I was not suggesting that this were a problem.

/Sz




Re: Unicode 6.2 to Support the Turkish Lira Sign

2012-05-22 Thread Szelp, A. Sz.
  I always wondered about the strange Drachma glyph in the standard: a
 Latin script D connected to a greek rho.


What you identify as a Latin script D is probably also a Greek script
D. cf. also the Cyrillic script D, which coincides with the Latin, even
though the roman (and even printed cursive!) letters diverge considerably.
Having a script Δρ (in script style) does not seem strange or absurd.

Szabolcs


Re: Unicode, SMS and year 2012

2012-04-29 Thread Szelp, A. Sz.
While there are good reasons the authors of HTML5 brought to ignore SCSU or
BOCU-1, having excluded UTF-32 which is the most direct, one-to-one mapping
of Unicode codepoints to byte values seems shortsighted. We are talking
about the whole of Unicode, not just BMP.

/Sz



On Sat, Apr 28, 2012 at 21:48, Doug Ewell d...@ewellic.org wrote:

 anbu at peoplestring dot com wrote:

  What are some of the reasons a new encoding will face challenges?


 The main challenge to a new encoding is that UTF-8 is already present in
 numerous applications and operating systems, and that any encoding intended
 to serve as an alternative, let alone a replacement UTF-8, must be better
 enough to justify re-engineering of these systems.

 Some people are simply opposed to additional encoding schemes. The HTML5
 specification explicitly forbids the use of UTF-32, SCSU, and BOCU-1 (while
 allowing many non-Unicode legacy encodings and quietly mapping others to
 Windows encodings); one committee member was quoted as saying that other
 encodings of Unicode waste developer time.

 Any encoding that does not align code point boundaries along byte
 boundaries will be criticized for requiring excessive processing. The
 argument that I made will be made by others, that if it necessary to
 process bit-by-bit, one might as well use a general-purpose compression
 algorithm. It is popular to present gzip as the ideal compression approach,
 since it is widely available, especially on Linux-type systems, and
 publicly documented (and not IP-encumbered).

 I may have missed some other objections.


 --
 Doug Ewell | Thornton, Colorado, USA
 http://www.ewellic.org | @DougEwell ­




Re: Support for non-BMP characters

2012-04-25 Thread Szelp, A. Sz.
Shouldn't it be technically possible to store Supplementary Plane
characters in UTF-16 / UCS-2 as well? Isn't that what Surrogate Pairs are
for?

Sz

On Wed, Apr 25, 2012 at 11:09, Marc Durdin marc.dur...@tavultesoft.comwrote:

 Probably the most egregious example I know of is JavaScript.  As far as I
 know, JavaScript still only groks UCS-2.  I'd love to be wrong.

 Marc

 -Original Message-
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
 Behalf Of David Starner
 Sent: Wednesday, 25 April 2012 6:32 PM
 To: Unicode Mailing List
 Subject: Support for non-BMP characters

 It's been ten years since the first non-BMP characters were encoded.
 How are they working in your neck of the woods? There's a lot of places
 where they're working just fine, but I was facing MySQL's support. It has
 had support for UCS-2 and UTF-8 limited to the BMP for a long time; now in
 MySQL 5.5 there's utf16, utf32 and utf8mb4. (MySQL
 5.1 and 5.5 are the current stable releases.) But there's enough warnings
 about incompatibilities with utf8mb4 to make me pause before switching my
 private database to it, and I think the net will see MySQL databases with
 utf8 instead of utf8mb4 as long as MySQL exists, unless they decide to push
 people over to it.

 (Ada's an issue too, though not one most people will have to deal with.
 While Ada 2005 added a UTF-32 string type, it left the UCS-2 string type as
 is. Again, I suspect a lot of nominally Unicode Ada programs are going to
 BMP-only. Of course, UTF-8 as an ASCII superset is used, stuffed into
 strings labeled Latin-1; it's technically not conformant with the Ada
 standard but it works so long as you don't need much string processing.)

 In any case, is the use of non-BMP characters still problematic in your
 corner of the computing world or is everything looking fine from where you
 are?

 --
 Kie ekzistas vivo, ekzistas espero.







Re: Support for non-BMP characters

2012-04-25 Thread Szelp, A. Sz.
I'm really not a technical expert, but what you write rather sounds to me
as if Javascripts UCS-2 implementation were broken...
Thanks for the linked document.

Sz

On Wed, Apr 25, 2012 at 11:41, Marc Durdin marc.dur...@tavultesoft.comwrote:

  Yes, but this means that regexes with SMP don’t work (e.g. [풜-풵]),
 character counts returns code units, etc.  So you have to reimplement
 string.length, string.charCodeAt, etc, if you don’t want to deal with
 surrogate pairs (I reckon you’ve got better things to be spending your time
 on).

 ** **

 http://dheeb.files.wordpress.com/2011/07/gbu.pdf “Unicode Support
 Shootout - The Good, the Bad  the (mostly) Ugly”  by Tom Christiansen has
 a great summary of some of the issues with relying on JavaScript’s internal
 string manipulation (unfortunately can’t find a better working link at
 present – the official training.perl.com site seems to be down).
 Actually, that presentation is a fantastic place to start for understanding
 many of the limitations of various programming languages’ support for
 Unicode – if you haven’t read it, I’d urge you to go read it now.

 ** **

 Marc

 ** **

 *From:* Szelp, A. Sz. [mailto:a.sz.sz...@gmail.com]
 *Sent:* Wednesday, 25 April 2012 7:28 PM
 *To:* Marc Durdin
 *Cc:* David Starner; Unicode Mailing List
 *Subject:* Re: Support for non-BMP characters

 ** **

 Shouldn't it be technically possible to store Supplementary Plane
 characters in UTF-16 / UCS-2 as well? Isn't that what Surrogate Pairs are
 for?

 ** **

 Sz 

 On Wed, Apr 25, 2012 at 11:09, Marc Durdin marc.dur...@tavultesoft.com
 wrote:

 Probably the most egregious example I know of is JavaScript.  As far as I
 know, JavaScript still only groks UCS-2.  I'd love to be wrong.

 Marc


 -Original Message-
 From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On
 Behalf Of David Starner
 Sent: Wednesday, 25 April 2012 6:32 PM
 To: Unicode Mailing List
 Subject: Support for non-BMP characters

 It's been ten years since the first non-BMP characters were encoded.
 How are they working in your neck of the woods? There's a lot of places
 where they're working just fine, but I was facing MySQL's support. It has
 had support for UCS-2 and UTF-8 limited to the BMP for a long time; now in
 MySQL 5.5 there's utf16, utf32 and utf8mb4. (MySQL
 5.1 and 5.5 are the current stable releases.) But there's enough warnings
 about incompatibilities with utf8mb4 to make me pause before switching my
 private database to it, and I think the net will see MySQL databases with
 utf8 instead of utf8mb4 as long as MySQL exists, unless they decide to push
 people over to it.

 (Ada's an issue too, though not one most people will have to deal with.
 While Ada 2005 added a UTF-32 string type, it left the UCS-2 string type as
 is. Again, I suspect a lot of nominally Unicode Ada programs are going to
 BMP-only. Of course, UTF-8 as an ASCII superset is used, stuffed into
 strings labeled Latin-1; it's technically not conformant with the Ada
 standard but it works so long as you don't need much string processing.)

 In any case, is the use of non-BMP characters still problematic in your
 corner of the computing world or is everything looking fine from where you
 are?

 --
 Kie ekzistas vivo, ekzistas espero.



 

 ** **



Re: Code2000 on SourceForge (was Re: [indic] Re: Lack of Complex script rendering support on Android)

2012-02-04 Thread Szelp, A. Sz.
James,

you might want to review (at least) the OFL:
http://en.wikipedia.org/wiki/SIL_Open_Font_License, a license specifically
created for fonts, created with freedoms in mind. In several respects it
fits fonts much better than GPLv3.

/Sz


On Fri, Feb 3, 2012 at 18:12, James Kass jamesk...@att.net wrote:

 I rather would stick with GPLv3, simply because more permissive license
 threatens freedom. For example, someone may take over my fonts, develop
 them further, and subsequently change their license to something
 commercial-only. It is what I want to avoid. Just something like stories
 known from MACOS X, initially Berkeley-licensed-software derivative,
 finally commercialized product.


 James Kass



Re: Code2000 on SourceForge (was Re: [indic] Re: Lack of Complex script rendering support on Android)

2012-02-04 Thread Szelp, A. Sz.
Sorry, I was reading my mail threads according to time/date.  I see now
that the same has been proposed on the other thread. I also see you
preferring not to act due to private commitments and time constrains.

Sorry, again, for bringing this up unnecessarily.

All the best for your struggle, and keep it simple!

/Szabolcs

On Sat, Feb 4, 2012 at 10:49, Szelp, A. Sz. a.sz.sz...@gmail.com wrote:

 James,

 you might want to review (at least) the OFL:
 http://en.wikipedia.org/wiki/SIL_Open_Font_License, a license
 specifically created for fonts, created with freedoms in mind. In several
 respects it fits fonts much better than GPLv3.

 /Sz



 On Fri, Feb 3, 2012 at 18:12, James Kass jamesk...@att.net wrote:

 I rather would stick with GPLv3, simply because more permissive license
 threatens freedom. For example, someone may take over my fonts, develop
 them further, and subsequently change their license to something
 commercial-only. It is what I want to avoid. Just something like stories
 known from MACOS X, initially Berkeley-licensed-software derivative,
 finally commercialized product.


 James Kass





Re: Sorting and Volapük

2012-01-02 Thread Szelp, A. Sz.
Indeed, I can confirm that behaviour for ö and ü. However,
Hungarian does not have ä which is part of Volapük. (And if it's
nevertheless there, e.g. in name-lists containing foreign names, or
Hungarian names of foreign (German) origin, ä is sorted as a).

So Hungarian is neither a perfect fit as a substitute locale for Volapük.

/Szabolcs


On Sun, Jan 1, 2012 at 19:48, Jean-François Colson j...@colson.eu wrote:
 Le 01/01/12 16:27, Michael Everson a écrit :

 IIRC Hungarian does that for ö and ü: they’re separate letters sorted after
 o and u respectively. But OTOH á, é, í, ó, ő, ú and ő are sorted as a, e, i,
 o, ö, u and ü respectively.






Re: Archaic Pashto letter

2011-12-13 Thread Szelp, A. Sz.

 - Is the present hamza convention a development of the two vertical dots
 proposal, or are they unrelated? About a year ago I worked with several
 Afghan expatriates living in Southern California, and in handwriting they
 would typically join two diacritical dots as a squiggle rather than a line
 (which is more common in Arabic). One could see how two vertical dots might
 develop into a vertical squiggle and later into a hamza, especially given
 the note by Vladimir Ivanov cited below. But this is only a conjecture at
 this point.


This sounds pretty much plausible, anyway it seems more plausible than an
original hamza. In that case U+0682 would be actually a glyph variant of
U+0682.

Anyway, I'm quite interested in the outcome of yours and others'
investigation into that matter.

Szabolcs