Re: Special Type Sorts Tray 2001

2001-09-30 Thread DougEwell2

In a message dated 2001-09-30 9:19:31 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

>  I have been thinking recently that it would be useful to have presentation
>  forms for a ct ligature character and various long s ligatures so that one
>  may transcribe printed works from the 18th century into unicode while
>  keeping the typographic style intact.

As mentioned, this can already be done with ZWJ, although fonts may not be 
able to render it correctly.  (But this is always true for any newly added 
glyph, no matter how encoded.)

>  In view of these various situations and possibly various others that people
>  might like to post into this thread, I write to put forward the suggestion
>  that as a discussion on this list various users of the unicode
>  specification might like to agree informally a collection of characters
>  called Special Type Sorts Tray 2001 or STST2001 to be defined in the 
Private
>  Use Area in, say, the range U+E700 through to U+E7FF in the hope that
>  perhaps by there being some informal agreement perhaps someone with a font
>  generating package might like to add them into a font and maybe various
>  small yet significant benefits to the facilities available for encoding 
text
>  might be achieved.

You might want to take a look at the ConScript Unicode Registry, which was 
originally intended for "constructed" and artificial scripts, but which could 
also be used for this purpose.

>  Please know that I am specifically suggesting that this be a discussion
>  amongst the user community: I am not suggesting that the Unicode Consortium
>  endorse this suggestion as I am fully aware that the rules for the use of
>  the Private Use Area specifically say that no assignment to a particular 
set
>  of characters will ever be endorsed by the Unicode Consortium.

OK, then ConScript might be a suitable venue for this proposed encoding after 
all.

>  I declare an interest in the choice of U+E700 to U+E7FF as the range for
>  STST2001 in that I have been defining and publishing,

This range is already taken in ConScript, but several other ranges are 
available, and as David mentioned, you'll probably need a lot more than 256 
code points.

ConScript is the work of Michael Everson and John Cowan.  You should check 
with them.

http://www.evertype.com/standards/csur/index.html
http://www.evertype.com/standards/csur/conscript-table.html

-Doug Ewell
 Fullerton, California




Re: Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)

2001-09-30 Thread David Starner

On Sun, Sep 30, 2001 at 04:59:49PM +0100, William Overington wrote:
> In view of these various situations and possibly various others that people
> might like to post into this thread, I write to put forward the suggestion
> that as a discussion on this list various users of the unicode
> specification might like to agree informally a collection of characters
> called Special Type Sorts Tray 2001 or STST2001 to be defined in the Private
> Use Area in, say, the range U+E700 through to U+E7FF in the hope that

All those characters can be encoded in Unicode already. Use a ZWJ for
the ligated characters. And all those characters can be displayed on an
OpenType system - the H with line below and hyphen with diaresis can be
display on my xterm with overprinted combining characters. The rest of
the world has a solution for this; a hacked solution may be usable on
some systems that can't get it right, but there's no need to standardize
it.

Did you notice that all the characters you mentioned are for Latin
scripts? Some other scripts, in normal use, can take more than 256
glyphs to be right - see the Arabic pre-shaped glyphs and the
precomposed Hangul characters for examples.

I bet I can fill that with Latin examples alone. Malay Grammar has a
ligated ng. Lakota has at least couple dozen non-precomposed letters.
Lithuanian needs its couple dozen. Math books will arbitarily compose 
any letter with any symbol - I can get a couple dozen examples from
what I have on hand. The Fraktur ligations probably add up to a couple
dozen there. I don't think I'd have any problem coming up with 256
examples, all clearly documented as to source with scans, by the end of
the day. 

> Maybe someday some of the characters might be promoted to become regular
> unicode characters by the Unicode Consortium, maybe not.  

Not likely. Unicode refuses to encode more ligatures and precomposed
characters.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I saw a daemon stare into my face, and an angel touch my breast; each 
one softly calls my name . . . the daemon scares me less."
- "Disciple", Stuart Davis




RE: Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)

2001-09-30 Thread Carl W. Brown

William,

It looks like if you really want multilingual support that you need to run
your text through a layout engine.  If that is the case then you can remap
certain characters or character combinations into the U+FDD0 to U+FDEF
Unicode range and use this special non-character area for what ever purpose
the font and layout engine needs.

If the private area becomes standardized then it is no longer a private
area.

Carl

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of William Overington
> Sent: Sunday, September 30, 2001 9:00 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
> [EMAIL PROTECTED]
> Subject: Special Type Sorts Tray 2001 (derives from Egyptian
> Transliteration Characters)
>
>
> In a recent thread entitled Egyptian Transliteration Characters, a request
> was made for various characters including the following.
>
> LATIN CAPITAL LETTER H WITH LINE BELOW
> LATIN SMALL LETTER H WITH LINE BELOW
>
> There was also a suggestion from a participant in the thread for
> a character
> HYPHEN WITH DIARESIS for use in preparing a vocabulary list in German.
>
> I have been thinking recently that it would be useful to have presentation
> forms for a ct ligature character and various long s ligatures so that one
> may transcribe printed works from the 18th century into unicode while
> keeping the typographic style intact.
>
> There is already U+017F LATIN SMALL LETTER LONG S and U+FB05 LATIN SMALL
> LIGATURE LONG S T in regular unicode.
>
> I am thinking of such characters as LATIN SMALL LIGATURE LONG S LONG S and
> LATIN SMALL LIGATURE LONG S L and LATIN SMALL LIGATURE LONG S B and so on.
> There are perhaps about a dozen long s ligatures that could usefully be
> encoded.
>
> In view of these various situations and possibly various others
> that people
> might like to post into this thread, I write to put forward the suggestion
> that as a discussion on this list various users of the unicode
> specification might like to agree informally a collection of characters
> called Special Type Sorts Tray 2001 or STST2001 to be defined in
> the Private
> Use Area in, say, the range U+E700 through to U+E7FF in the hope that
> perhaps by there being some informal agreement perhaps someone with a font
> generating package might like to add them into a font and maybe various
> small yet significant benefits to the facilities available for
> encoding text
> might be achieved.
>
> Maybe someday some of the characters might be promoted to become regular
> unicode characters by the Unicode Consortium, maybe not.  I feel
> that it is
> better to have available soon rather than not to have available some
> informal list with some level of agreement amongst users, even if
> only tacit
> agreement, so that it is possible to use unicode to encode the various
> characters for the various purposes.
>
> Please know that I am specifically suggesting that this be a discussion
> amongst the user community: I am not suggesting that the Unicode
> Consortium
> endorse this suggestion as I am fully aware that the rules for the use of
> the Private Use Area specifically say that no assignment to a
> particular set
> of characters will ever be endorsed by the Unicode Consortium.  So, whilst
> recognizing that that statement in the specification may not preclude the
> Unicode Consortium from saying that some particular usage of the
> Private Use
> Area is wrong in some way, the absence of any encouragement from
> the Unicode
> Consortium over the definition of Special Type Sorts Tray 2001
> should not be
> seen as in any way an objection to it being defined.
>
> I declare an interest in the choice of U+E700 to U+E7FF as the range for
> STST2001 in that I have been defining and publishing, as part of my
> research, designations for a number of characters in the Private Use Area
> for a specific application area, namely for use in Java
> programming for the
> DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system and
> this particular range does not conflict with the codes that I am using in
> that project, so the choice of U+E700 to U+E7FF as the range would be
> particularly convenient to me.  If anyone is interested to see those
> definitions then they are in the DVB-MHP section of
> http://www.users.globalnet.co.uk/~ngo which is our family webspace in
> England.  There are references in various of the documents, namely the
> Contemporary introduction, the document about Sequential text files and
> their applications and in the second and third documents about
> the Astrolabe
> Channel numerical pointer.
>
> It is hard to even guess how many characters there are that people might
> like to suggest for STST2001 and maybe there will be only a few and sorts
> can be added gradually over a number of years, or maybe the tray will be
> filled up quickly and starting another tray will need to be considered.
> Hopefully STST2001 will be a useful facility and then

Unicode Conf. game idea: Vowel Karuta

2001-09-30 Thread
You play karuta like normal, but on the cards are IPA vowels. So the guy says 
/i::/ or whatever and you have to pick up that vowel.


$B$8$e$&$$$C$A$c$s(B(Juuitchan)
Well, I guess what you say is true,
I could never be the right kind of girl for you,
I could never be your woman
  - White Town


Re: Egyptian Transliteration Characters

2001-09-30 Thread William Overington

>
>The missing characters can be characterised as follows:
>
>LATIN CAPITAL LETTER H WITH LINE BELOW
>LATIN SMALL LETTER H WITH LINE BELOW
>

When I saw this I remembered that there is a letter H with a line across it
that is used in Maltese.  I remembered this from seeing it in a catalogue of
metal type which listed the accents needed for various European languages,
not from a linguistic perspective, so I do not know if that letter would be
appropriate for your needs.

My thoughts are that, as the use is for transliteration for study rather
than for transcription as a direct record it might perhaps be a suitable
choice for your use, even if only on a temporary basis, with the big
advantage that the letters are not only already coded in unicode as U+0126
for LATIN CAPITAL LETTER H WITH STROKE and U+0127 for LATIN SMALL LETTER H
WITH STROKE (the 0126 and 0127 being hexadecimal representations) but also
that both are often included in fonts that are available now.  If someone
happens to be using an older version of Word that has not got those
characters available in the font being used then later versions of several
fonts, including Arial and Times New Roman, that do contain the characters
are available free from the
http://www.microsoft.com/typography/fontpack/default.htm webpage.

In the Microsoft Word program one simply uses Insert Symbol and then finds
the desired character in the display provided.  One can even set up short
cuts so that some combination such as Alt + Shift + H gives the one
character and Alt + H gives the other character using text entry using an
ordinary English keyboard.

I do have a further suggestion regarding the use of the Private Use Area,
though as that has a wider context, I will start a new thread for that
suggestion.

William Overington

30 September 2001











Special Type Sorts Tray 2001 (derives from Egyptian Transliteration Characters)

2001-09-30 Thread William Overington

In a recent thread entitled Egyptian Transliteration Characters, a request
was made for various characters including the following.

LATIN CAPITAL LETTER H WITH LINE BELOW
LATIN SMALL LETTER H WITH LINE BELOW

There was also a suggestion from a participant in the thread for a character
HYPHEN WITH DIARESIS for use in preparing a vocabulary list in German.

I have been thinking recently that it would be useful to have presentation
forms for a ct ligature character and various long s ligatures so that one
may transcribe printed works from the 18th century into unicode while
keeping the typographic style intact.

There is already U+017F LATIN SMALL LETTER LONG S and U+FB05 LATIN SMALL
LIGATURE LONG S T in regular unicode.

I am thinking of such characters as LATIN SMALL LIGATURE LONG S LONG S and
LATIN SMALL LIGATURE LONG S L and LATIN SMALL LIGATURE LONG S B and so on.
There are perhaps about a dozen long s ligatures that could usefully be
encoded.

In view of these various situations and possibly various others that people
might like to post into this thread, I write to put forward the suggestion
that as a discussion on this list various users of the unicode
specification might like to agree informally a collection of characters
called Special Type Sorts Tray 2001 or STST2001 to be defined in the Private
Use Area in, say, the range U+E700 through to U+E7FF in the hope that
perhaps by there being some informal agreement perhaps someone with a font
generating package might like to add them into a font and maybe various
small yet significant benefits to the facilities available for encoding text
might be achieved.

Maybe someday some of the characters might be promoted to become regular
unicode characters by the Unicode Consortium, maybe not.  I feel that it is
better to have available soon rather than not to have available some
informal list with some level of agreement amongst users, even if only tacit
agreement, so that it is possible to use unicode to encode the various
characters for the various purposes.

Please know that I am specifically suggesting that this be a discussion
amongst the user community: I am not suggesting that the Unicode Consortium
endorse this suggestion as I am fully aware that the rules for the use of
the Private Use Area specifically say that no assignment to a particular set
of characters will ever be endorsed by the Unicode Consortium.  So, whilst
recognizing that that statement in the specification may not preclude the
Unicode Consortium from saying that some particular usage of the Private Use
Area is wrong in some way, the absence of any encouragement from the Unicode
Consortium over the definition of Special Type Sorts Tray 2001 should not be
seen as in any way an objection to it being defined.

I declare an interest in the choice of U+E700 to U+E7FF as the range for
STST2001 in that I have been defining and publishing, as part of my
research, designations for a number of characters in the Private Use Area
for a specific application area, namely for use in Java programming for the
DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system and
this particular range does not conflict with the codes that I am using in
that project, so the choice of U+E700 to U+E7FF as the range would be
particularly convenient to me.  If anyone is interested to see those
definitions then they are in the DVB-MHP section of
http://www.users.globalnet.co.uk/~ngo which is our family webspace in
England.  There are references in various of the documents, namely the
Contemporary introduction, the document about Sequential text files and
their applications and in the second and third documents about the Astrolabe
Channel numerical pointer.

It is hard to even guess how many characters there are that people might
like to suggest for STST2001 and maybe there will be only a few and sorts
can be added gradually over a number of years, or maybe the tray will be
filled up quickly and starting another tray will need to be considered.
Hopefully STST2001 will be a useful facility and then when someone chooses
to put forward a suggestion for a character to be available then sometimes
adding it to STST2001 will be a suitable solution.  A solution that someone
suggesting a character should allow eight days for discussion and then if
the suggestion does not conflict with an existing definition and no good
reason has been put forward as to why the suggestion should not be included
then the suggestion becomes included in STST2001 would perhaps be suitable.
A good reason might be that, unknown to the person making the suggestion,
that the character sort is already defined in regular unicode.

I feel that a special type sorts tray within the Private Use Area agreed
informally by people within the user community would be a very useful
facility.

William Overington

30 september 2001










Re: Missing Arabic and Syriac characters in Unicode

2001-09-30 Thread Majid Bhurgri







>From: Philipp Reichmuth <[EMAIL PROTECTED]>

>Reply-To: Philipp Reichmuth <[EMAIL PROTECTED]>

>To: Roozbeh Pournader <[EMAIL PROTECTED]>

>CC: Miikka-Markus Alhonen <[EMAIL PROTECTED]>, Unicode List <[EMAIL PROTECTED]>

>Subject: Re: Missing Arabic and Syriac characters in Unicode 

>Date: Sun, 30 Sep 2001 12:54:17 +0200 

> 

>-BEGIN PGP SIGNED MESSAGE- 

>Hash: SHA1 

> 

>Hi folks! 

> 

>RP> At least not in the Korans I've seen. In those, Turned Damma is clearly 

>RP> used to mark an /u:/ sound when a Waw is not there (and only that). It is 

>RP> not an ornament in any way. I'm talking about Iranian Korans. 

> 

>It's clearly a character then. It definitely makes sense in an Iranian 

>context from the viewpoint of the Persian use of Arabic script. 

> 

>BTW does it represent /u:/ or /u/? In the Qur'an, /u:/ would probably 

>represented by Waw and would be read by a Persian as /u/, wouldn't it? 

>While damma would be read as /o/. Could you point me to a location in 

>an Iranian Qur'an where there is one of these? 

> 

>Greetings 

> Philipp mailto:[EMAIL PROTECTED] 

>__ 



Inverted Damma, vertical Kasrah etc are accentuated Damma and Kasrah etc and these are variants (alternate ways) used extensively in Qurans published in India and Pakistan. In the Qurans published in Middle East these are usually represented by Damma followed by a small waw and Kasrah followed by a small yay. If anyone wished I could try scanning and sending you the examples from published copies of Quran. Since these are variants, one could (and I have) defined these as ligatures in fonts, so that when one wishes these to appear for a work published for people of certain area, all one has to do is change the font. Since these are two different ways of representing same vowel sounds, different unicode positions may not be advisable. In any case these are not ornamental or decorative marks, rather these are different way of representing existing unciode characters.

 

Regards

 

Abdul-Majid Bhurgri
Get your FREE download of MSN Explorer at http://explorer.msn.com



Re: Turned Damma [was Re: Missing Arabic and Syriac characters inUnicode]

2001-09-30 Thread Roozbeh Pournader

On Sun, 30 Sep 2001, Philipp Reichmuth wrote:

> BTW does it represent /u:/ or /u/? In the Qur'an, /u:/ would probably
> represented by Waw and would be read by a Persian as /u/, wouldn't it?
> While damma would be read as /o/. Could you point me to a location in
> an Iranian Qur'an where there is one of these?

Well, beginners read the vowels as they read them in Persian, but you are
told about the real pronounciation in the Iranian high schools. So in
short, Turned Damma is pronounced like a Waw, if you are a beginner, you
pronounce both as /u/; if you are an expert, you pronounce both as /u:/.

I don't have an old Koran handy (since the mark is only used in Korans
published before 1980). I will look and tell you when I got home.

roozbeh





Re: Missing Arabic and Syriac characters in Unicode

2001-09-30 Thread Philipp Reichmuth

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi folks!

RP> At least not in the Korans I've seen. In those, Turned Damma is clearly
RP> used to mark an /u:/ sound when a Waw is not there (and only that). It is
RP> not an ornament in any way. I'm talking about Iranian Korans.

It's clearly a character then. It definitely makes sense in an Iranian
context from the viewpoint of the Persian use of Arabic script.

BTW does it represent /u:/ or /u/? In the Qur'an, /u:/ would probably
represented by Waw and would be read by a Persian as /u/, wouldn't it?
While damma would be read as /o/. Could you point me to a location in
an Iranian Qur'an where there is one of these?

Greetings
 Philippmailto:[EMAIL PROTECTED]
__
Errors have occurred / We won't tell you where or why / Lazy programmers
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (MingW32)
Comment: Freedom of the press is limited to those who own one.

iD8DBQE7tvnfAFQhKhQ6O0kRAjsQAKCRgS2L9VfqZp7cKeqLZxIDBGzBdgCbBzlx
N33Sx3c1saFTjPthvVBpCe4=
=lUUN
-END PGP SIGNATURE-





Re: Missing Arabic and Syriac characters in Unicode

2001-09-30 Thread Roozbeh Pournader

On Sun, 30 Sep 2001, Philipp Reichmuth wrote:

> >> This includes 'Subscript Alef' and 'Turned Damma' (Ulta Pesh), used in
> >> Iran and Pakistan;
>
> MMA> I think these are also used in Arab countries, because even my Arabic teacher
> MMA> who's from Syria referred to this "ulta pesh" as a "Koranic sign".
>
> Hm, as far as I understand it, it is mainly used as a calligraphic
> sign in Arab-speaking countries and carries no phonetic or
> recitational information of its own. I've checked through my own
> copies of the Qur'an briefly, but as far as I can see, it's used only
> in calligraphic script as an ornamental sign. Since Qur'anic verses
> tend to be rather ornately decorated, the association with the Qur'an
> appears quite straightforward. On the other hand, all of my copies are
> printed (says something already) either in Egypt or Sudan, so they
> need not be representative.

At least not in the Korans I've seen. In those, Turned Damma is clearly
used to mark an /u:/ sound when a Waw is not there (and only that). It is
not an ornament in any way. I'm talking about Iranian Korans.

roozbeh