Re: Copyleft Symbol

2016-02-15 Thread Christopher Fynn
On 15/02/2016, David Faulks  wrote:
> .(there was actually a discussion about  encoding it back in 2000 on this 
> mailing list).

Presumably that indicates at least 15 years of usage - far longer than
most emoji.

- Chris


Re: "Unicode of Death"

2015-09-04 Thread Christopher Fynn
Perhaps there should be a "tounge in cheek" emoji to indicate this


On 30 May 2015 at 04:50, Andrew Cunningham  wrote:

> Geez Philippe,
>
> It was tounge in cheek.
>
> A.
>
>
> On Saturday, 30 May 2015, Philippe Verdy  wrote:
> >
> > 2015-05-28 23:36 GMT+02:00 Andrew Cunningham :
> >>
> >> Not the first time unicode crashes things. There was the google chrome
> bug on osx that crashed the tab for any syriac text.
> >
> > "Unicode crashes things"? Unicode has nothing to do in those crashes
> caused by bugs in applications that make incorrect assumptions (in fact not
> even related to characters themselves but to the supposed behavior of the
> layout engine. Programmers and designers for example VERY frequently forget
> the constraints for RTL languages and make incorrect assumptions about left
> and right sides when sizing objects, or they don't expect that the cursor
> will advance backward and forget that some measurements can be negative: if
> they use this negative value to compute the size of a bitmap redering
> surface, they'll get out of memory, unchecked null pointers returned, then
> they will crash assuming the buffer was effectively allocated.
> > These are the same kind of bugs as with the too common buffer overruns
> with unchecked assumtions: the code is kept because "it works as is" in
> their limited immediate tests.
> > Producing full coverage tests is a difficult and lengthy task, that
> programmers not always have the time to do, when they are urged to produce
> a workable solution for some clients and then given no time to improve the
> code before the same code is distributed to a wider range of clients.
> > Commercial staffs do that frequently, they can't even read the technical
> limitations even when they are documented by programmers... in addition the
> commercial staff like selling softwares that will cause customers to ask
> for support... that will be billed ! After that, programmers are
> overwhelmed by bug reports and support requests, and have even less time to
> design other thigs that they are working on and still have to produce. QA
> tools may help programmers in this case by providing statistics about the
> effective costs of producing new software with better quality, and the cost
> of supporting it when it contains too many bugs: commercial teams like
> those statistics because they can convert them to costs, commercial
> margins, and billing rates. (When such QA tools are not used, programmers
> will rapidly leave the place, they are fed up by the growing pressure to do
> always more in the same time, with also a growing number of "urgent"
> support requests.).
> > Those that say "Unicode crashes things" do the same thing: they make
> broad unchecked assumptions about how things are really made or how things
> are actually working.
> >
>
> --
> Andrew Cunningham
> Project Manager, Research and Development
> (Social and Digital Inclusion)
> Public Libraries and Community Engagement
> State Library of Victoria
> 328 Swanston Street
> Melbourne VIC 3000
> Australia
>
> Ph: +61-3-8664-7430
> Mobile: 0459 806 589
> Email: acunning...@slv.vic.gov.au
>   lang.supp...@gmail.com
>
> http://www.openroad.net.au/
> http://www.mylanguage.gov.au/
> http://www.slv.vic.gov.au/
>
>


Re: "Unicode of Death"

2015-09-04 Thread Christopher Fynn
On 28 May 2015 at 20:23, Doug Ewell  wrote:


> "Every character you use has a unicode value which tells your phone what
> to display. One of the unicode values is actually never-ending and so
> when the phone tries to read it it goes into an infinite loop which
> crashes it."
>
> I've read TUS Chapter 4 and UTR #23 and I still can't find the
> "never-ending" Unicode property.
>
> Perhaps astonishingly to some, the string displays fine on all my
> Windows devices. Not all apps get the directionality right, but no
> crashes.
>

Well isn't Apple's street address "Infinite Loop"?


Re: Emoji characters for food allergens

2015-08-17 Thread Christopher Fynn
Surely there is already some international standards body or panel which
deals with food safety and labelling? (maybe ISO 22000 Food Safety
Management Systems)

If there is a real need for characters to represent food allergens,
wouldn't such a body be the right group to come up with appropriate glyphs
and then make a proposal to ISO 10646 / Unicode

- Chris

On 25 July 2015 at 22:13, William_J_G Overington 
wrote:

> Emoji characters for food allergens
>
> An interesting document entitled
>
> Preliminary proposal to add emoji characters for food allergens
>
> by Hiroyuki Komatsu
>
> was added into the UTC (Unicode Technical Committee) Document Register
> yesterday.
>
> http://www.unicode.org/L2/L2015/15197-emoji-food-allergens.pdf
>
> This is a welcome development.
>
> I suggest that, in view of the importance of precision in conveying
> information about food allergens, that the emoji characters for food
> allergens should be separate characters from other emoji characters. That
> is, encoded in a separate quite distinct block of code points far away in
> the character map from other emoji characters, with no dual meanings for
> any of the characters: a character for a food allergen should be quite
> separate and distinct from a character for any other meaning.
>
> I opine that having two separate meanings for the same character, one
> meaning as an everyday jolly good fun meaning in a text message and one
> meaning as a specialist food allergen meaning could be a source of
> confusion. Far better to encode a separate code block with separate
> characters right from the start than risk needless and perhaps medically
> dangerous confusion in the future.
>
> I suggest that for each allergen that there be two characters.
>
> The glyph for the first character of the pair goes from baseline to
> ascender.
>
> The glyph for the second character of the pair is a copy of the glyph for
> the first character of the pair augmented with a thick red line from lower
> left descender to higher right a little above the base line, the thick red
> line perhaps being at about thirty degrees from the horizontal. Thus the
> thick red line would go over the allergen part of the glyph yet just by
> clipping it a bit so that clarity is maintained.
>
> The glyphs are thus for the presence of the allergen and the absence of
> the allergen respectively.
>
> It is typical in the United Kingdom to label food packets not only with an
> ingredients list but also with a list of allergens in the food and also
> with a list of allergens not in the food.
>
> For example, a particular food may contain soya yet not gluten.
>
> Thus I opine that two characters are needed for each allergen.
>
> I have deliberately avoided a total strike through at forty-five degrees
> as I opine that that could lead to problems distinguishing clearly the
> glyph for the absence of one allergen from the glyph for the absence of
> another allergen.
>
> I have also wondered whether each glyph for an allergen should include
> within its glyph a number, maybe a three-digit number, so that clarity is
> precise.
>
> I opine that two separate characters for each allergen is desirable rather
> than some solution such as having one character for each allergen and a
> combining strike through character.
>
> The two separate characters approach keeps the system straightforward to
> use with many software packages. The matter of expressing food allergens is
> far too important to become entangled in problems for everyday users.
>
> For gluten, it might be necessary to have three distinct code points.
>
> In the United Kingdom there is a legal difference between "gluten-free"
> and "no gluten-containing ingredients".
>
> To be labelled gluten-free the product must have been tested. This is to
> ensure that there has been no cross-contamination of ingredients. For
> example, rice has no gluten, but was a particular load of rice transported
> in a lorry used for wheat on other days?
>
> Yet testing is not always possible in a restaurant situation.
>
> William Overington
>
> 25 July 2015
>
>


Re: About cultural/languages communities flags

2015-02-10 Thread Christopher Fynn
One area where this would be useful is for indicating national teams
in football (soccer), rugby and other sports where England, Scotland,
Wales and N. Ireland play separately internationally.

On 10 February 2015 at 12:10, Mark Davis ☕️  wrote:
>
> On Tue, Feb 10, 2015 at 12:11 AM, Ken Whistler  wrote:
>>
>> for the full context, and for the current 26x26 letter matrix which is
>> the basis for the flag glyph implementations of regional indicator
>> code pairs on smartphones.
>>
>> SC, SO, ST are already taken, but might I suggest putting in for
>> registering
>> "AB" for Alba? That one is currently unassigned.
>>
>> Yeah, yeah, what is the likelihood of BSI pushing for a Scots two-letter
>> code?! But seriously, if folks are planning ahead for Scots independence
>> or even some kind of greater autonomy, this is an issue that needs to
>> be worked, anyway.
>>
>> In the meantime, let me reiterate that there is *no* formal relationship
>> between TLD's and the regional indicator codes in Unicode (or the
>> implementations
>> built upon them). Well, yes, a bunch of registered TLD's do match the
>> country
>> codes, but there is no two-letter constraint on TLD's. This should already
>> be apparent, as Scotland has registered ".scot" At this point there isn't
>> even
>> a limitation of TLD's to ASCII letters, so there is no way to map them
>> to the limited set of regional indicator codes in the Unicode Standard.
>>
>> Not having a two letter country code for Scotland that matches the
>> four letter TLD for Scotland might indeed be a problem for someone,
>> but I don't see *this* as a problem that the Unicode Standard needs
>> to solve.
>
>
> I want to add to that that there are already a fair number of ISO 2-letter
> codes for regions that are administered as part of another country, like
> Hong Kong. There are also codes for crown possessions like Guernsey. So
> having a code for Scotland (and Wales, and N. Ireland) do not really break
> precedent. But as Ken says, the best mechanism is for the UK to push for a
> code in ISO and the UN.
>
> Mark
>
> — Il meglio è l’inimico del bene —
>
> ___
> Unicode mailing list
> Unicode@unicode.org
> http://unicode.org/mailman/listinfo/unicode
>

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: About cultural/languages communities flags

2015-02-09 Thread Christopher Fynn
Using flags to indicate particular languages on websites has plenty of
problems - languages need a better indicator.

Scripts could be indicated by a representative glyph.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: N'Ko - which character? 02BC vs. 2019

2015-02-01 Thread Christopher Fynn
If used as characters that are part of a word, especially when they
occur at the beginning or end of a word, ASCII apostrophes and and
both right and left quotation marks easily get changed to something
else by the auto quotes features of word-processors.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: subscribe

2014-11-24 Thread Christopher Fynn
Ben

You can subscribe to the Unicode mailing list on line at:
http://unicode.org/mailman/listinfo/unicode

(Not by sending a SUBSCRIBE message to the list)


On 20/11/2014, Bev Corwin  wrote:
> subscribe
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Request for Information

2014-07-24 Thread Christopher Fynn
On 24/07/2014, Philippe Verdy  wrote:

> It would be useful to have a sample text of the language, useful to show
> examplar characters but a but more showing typical layout of words.

https://en.wikipedia.org/wiki/List_of_pangrams#Other_languages
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: How to seet up a Unicode 7-aware OpenType features lookup table for your fonts

2014-07-09 Thread Christopher Fynn
Robert

I suggest you subscribe to the OpenType mailing list  and ask your
questions there.

subscribe: opentype-subscr...@indx.co.uk

good luck with this

- Chris


On 10/07/2014, Robert Wheelock  wrote:
> Hello!
>
> I’ve been editing fonts with FontLab Studio for some time now, but HAVE NOT
> YET really delved into editing OpenType features lookup tables...
>
> (1)  How’d you start?  May I use FontLab Studio to do my OTF tables, or
> MUST I use VOLT?
>
> (2)  In a Unicode 7-capable OT font, what letter/accent and ligature
> combinations should I include?
>
> (3)  How should I use the 20 Stylistic Sets (ss01 through ss20) to my BEST
> advantage?
>
> (4)  What online resources are available to help me create suitable OTF
> lookup tables for my fonts?
>
> Your cooperation would be greatly appreciated.  Thank You!
>
>
> Robert Lloyd Wheelock
> INTERNATIONAL SYMBOLISM RESEARCH INSTITUTE
> Augusta, ME  U.S.A.
>

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Thai unalom symbol

2014-07-02 Thread Christopher Fynn
On 02/07/2014, James Clark  wrote:

> The Royal Institute Thai Dictionary (the authoritative dictionary for the
> Thai language) has an entry for unalom showing the symbol:

>   https://pbs.twimg.com/media/BrdB2IsCYAAu4gP.jpg:large

Are there other dictionaries and books which use this symbol in text?

With three or four more examples like this I should think it would
certainly be a good candidate for encoding.

(use in logos is not so persuasive)
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Characters that should be displayed?

2014-06-30 Thread Christopher Fynn
On 30/06/2014, David Starner  wrote:

> On Sun, Jun 29, 2014 at 2:02 PM, Jukka K. Korpela 
> wrote:
>> They might be seen as “not displayable by normal rendering”, so yes. On
>> the
>> practical side, although Private Use characters should not be used in
>> public
>> information interchange, they are increasingly popular in “icon font”
>> tricks.

> Since when is HTML necessarily public information interchange? I can't
> imagine where you would better use private use characters then in HTML
> where a font can be named but you don't have enough control over the
> format to enter the data in some other format.

+1

If the font specified in the CSS has glyphs for those characters they
should be displayed.
There are also some Chinese national standards (do they count as a
"private" agreement?) that make use of use of PUA and supplementary
PUA characters - and quite a few web pages using them.

- C

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode ranges with baseline/x-height/X-height

2014-05-15 Thread Christopher Fynn
Indic scripts generally have  a hanging base
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: The application of localized read-out labels

2014-04-16 Thread Christopher Fynn
On 16/04/2014, William_J_G Overington  wrote:
>
>> William, the UTC is not in the business of creating file formats for
>> localization data.
>
>> Peter
>
> Thank you for replying.
>
> Feeling that a format for the particular application is important I have now
> produced a format myself and published it.

Whether or not it is important, it is clearly beyond the defined scope
of Unicode so off-topic here.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Updated emoji working draft

2014-04-16 Thread Christopher Fynn
On 15/04/2014, Peter Constable  wrote:
> William, the UTC is not in the business of creating file formats for
> localization data.
>
> Peter

Yes a proper understanding of what is the scope of Unicode - and what
is not within that scope - might help.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Singhala scirpt ill defined by OpenType standard

2014-04-04 Thread Christopher Fynn
If you think there is a problem with OpenType  and Singala, the place
to bring that up is on the OpenType  list - not the Unicode list.

There are often several different ways of accomplishing things with
OpenType - and how you do them also depends on how you design the
font.

Creating and Supporting OpenType Fonts for Sinhala Script
 is
not part of the OpenType specification  it is just a guideline for
creating OpenType Singhala fonts  based on how Microsoft made their
own Singhala  font - but some other font maker could do things a
little differently - as long as the text renders correctly with the
font.

If you have problems with the document itself and the use of terms -
you should take that up with Microsoft typography and give them
suggestions how to fix it. It was probably written by someone who
makes the OpenType tables for fonts for many different scripts with no
particular knowledge of the  the Singhala language or of Sanskrit.



On 04/04/2014, Naena Guru  wrote:
> Here is the proof that OpenType standard defined the Singhala script
> wrongly. Also find a BNF grammar that describes it.
> http://ahangama.com/unicode/index.htm
>
> Thanks.
>
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: FYI: More emoji from Chrome

2014-04-02 Thread Christopher Fynn
On 02/04/2014, Asmus Freytag  wrote:
> On 4/2/2014 1:42 AM, Christopher Fynn wrote:
>> Rather than Emoji it might be better if people learnt Han ideographs
>> which are also compact (and  a far more developed system of
>> communication than emoji). One  CJK character can also easily replace
>> dozens of Latin characters - which is what is being claimed for emoji.
>
> One wonders why the Japanese, who already know Han ideographs, took to
> emoji as they did

Perhaps because emoji are a sort of playful version of  a means of
communication they are already used to
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: FYI: More emoji from Chrome

2014-04-02 Thread Christopher Fynn
On 02/04/2014, "Martin J. Dürst"  wrote:
> Now that it's no longer April 1st (at least not here in Japan), I can
> add a (moderately) serious comment.

Long past April 1 here too - I'd already forgotten. ;-)

>>> More emoji from Chrome:
>>>
>>> http://chrome.blogspot.ch/2014/04/a-faster-mobiler-web-with-emoji.html
>>>
>>> with video: https://www.youtube.com/watch?v=G3NXNnoGr3Y
>>
>> I do not know… The demos leave me completely unimpressed: emoji — by
>> their nature — require higher resolution than text, so an emoji for
>> “pie” does not save any place comparing to the word itself.  So the
>> impact of this on everyday English-languare communication would not be
>> in any way beneficial.
>
> This is somewhat different for Japanese (and languages with similar
> writing systems) because they have higher line height.
>
> Regards,   Martin.

So CJK glyphs take up similar space to that needed to display an emoji
character. - Presumably the individual Han ideographs for "pie",
"dumpling" or  "turd"  would save as much screen space as using the
corresponding   emoji pictographs. Once there were enough emoji to
carry on a  conversation above the level of a 4 year old, they would
also require an IME  as complex as that needed for entering CJK text.

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: FYI: More emoji from Chrome

2014-04-02 Thread Christopher Fynn
Rather than Emoji it might be better if people learnt Han ideographs
which are also compact (and  a far more developed system of
communication than emoji). One  CJK character can also easily replace
dozens of Latin characters - which is what is being claimed for emoji.

On 02/04/2014, "Martin J. Dürst"  wrote:
> Now that it's no longer April 1st (at least not here in Japan), I can
> add a (moderately) serious comment.
>
> On 2014/04/02 01:43, Ilya Zakharevich wrote:
>> On Tue, Apr 01, 2014 at 09:01:39AM +0200, Mark Davis ☕️ wrote:
>>> More emoji from Chrome:
>>>
>>> http://chrome.blogspot.ch/2014/04/a-faster-mobiler-web-with-emoji.html
>>>
>>> with video: https://www.youtube.com/watch?v=G3NXNnoGr3Y
>>
>> I do not know… The demos leave me completely unimpressed: emoji — by
>> their nature — require higher resolution than text, so an emoji for
>> “pie” does not save any place comparing to the word itself.  So the
>> impact of this on everyday English-languare communication would not be
>> in any way beneficial.
>
> This is somewhat different for Japanese (and languages with similar
> writing systems) because they have higher line height.
>
> Regards,   Martin.
>
> ___
> Unicode mailing list
> Unicode@unicode.org
> http://unicode.org/mailman/listinfo/unicode
>

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Emoji

2014-04-01 Thread Christopher Fynn
On 02/04/2014, William_J_G Overington  wrote:
> For me, an important aspect of emoji is that they are independent of
> language.

Emoji seem fairly culturally specific. (Maybe the mobile-phone
messaging culture.) Kind of shorthand expressions which may be used
with several languages - but not independent of language. I suspect
some of them already convey one thing to a Japanese teenager and quite
another to an American.  And if you showed these symbols many people
in other countries they wouldn't have a clue as to what they are
supposed to mean.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Emoji

2014-04-01 Thread Christopher Fynn
On 02/04/2014, Nicole Selken  wrote:

> I think  Emoji is totally beneficial as a communication form.

A reversion to a crude form of Hieroglyphics?
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Pali in Thai Script

2014-03-28 Thread Christopher Fynn
Here the case is a little different as there is no particular script
associated with Pāḷi. People in different Buddhist countries just use
their own script for writing Pāḷi..

A conversion utility, or simple way of letting users choose the script
in which Pāḷi. is displayed, would be useful so that there would be no
need to type the same texts in each script.

Sanskrit is strongly associated with the Devanāgarī  script - but it
is sometimes written in nearly all of the widely used scripts of India
and some others such as Tibetan and Latin.

- C

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Pali in Thai Script

2014-03-28 Thread Christopher Fynn
On 28/03/2014, Theppitak Karoonboonyanan  wrote:

> The full character chart, demonstrated by a font created by a Thai
> scholar (Facebook login is needed, sorry):
>
> http://www.facebook.com/photo.php?fbid=10201049297248857

Even after  logging into Facebook I only get the message:
 "This content is currently unavailable"
"The page you requested cannot be displayed at the moment. It may be
temporarily unavailable, the link you clicked on may have expired, or
you may not have permission to view this page."

- C
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Pali in Thai Script

2014-03-27 Thread Christopher Fynn
On 28/03/2014, Ed Trager  wrote:

> Hi, Chris,

> Besides the scripts you mention, there is also Tai Tham as Richard
> mentioned

Several un-encoded Mon and Shan scripts too - as well as other Indic scripts.
>
> In theory, writing a utility to convert Pali written in any of those
> scripts to any one of the other scripts should not be too difficult but ...

> * Modern phonetically-based Lao lacks some of the traditional letters that
> are still preserved in Thai and other scripts.

Are there old Lao characters (once) used for writing Pāḷi?

Even if there is not a  1 to 1 correspondence - as long as there is
consistency in the way Pāḷi is written in each script - and you know
you are dealing with Pāḷi and not another language written in that
script, it should be possible.

> * At least as far as Tai Tham goes, it seems that Tai Tham spelling is not
> consistent with Central Thai spelling when it comes to Sanskrit and
> Pali-derived words ... I don't really know much about this -- just my own
> limited observations. Probably somebody else here like Richard Wordingham
> or Martin Hosken knows a lot more about this than I do ...

A problem might be if scribal errors have crept in over the centuries
and some of these misspellings have become accepted in one script or
another.

I think there is work going on to make a very carefully edited
critical edition of the Pāḷi Canon - it
would be useful to be able to convert and print this out in the
scripts used in the different countries where Theravāda Buddhism is
popular.

> ... so maybe in reality it is not so simple to do?
>
> - Ed
>
>
> On Thu, Mar 27, 2014 at 3:50 PM, Christopher Fynn
> wrote:
>
>> On 27/03/2014, Richard BUDELBERGER 
>> wrote:
>>
>> > And now, Pali. Not Thai in Pali script, but Pali in Thai script…
>>
>> There is no standard script for Pāḷi - It is often written in
>> Devanagri, Sinhala, Myanmar, Thai, Lao, Khmer, Latin, and several
>> other scripts.
>>
>> I do think there is quite a need for a utility to convert Pāḷi written
>> in any one of these scripts to any of the others,
>>
>> - Chris

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Pali in Thai Script

2014-03-27 Thread Christopher Fynn
On 27/03/2014, Richard BUDELBERGER  wrote:

> And now, Pali. Not Thai in Pali script, but Pali in Thai script…

There is no standard script for Pāḷi - It is often written in
Devanagri, Sinhala, Myanmar, Thai, Lao, Khmer, Latin, and several
other scripts.

I do think there is quite a need for a utility to convert Pāḷi written
in any one of these scripts to any of the others,

- Chris

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Details, please (was: Re: Romanized Singhala got great reception in Sri Lanka)

2014-03-22 Thread Christopher Fynn
On 18/03/2014, Doug Ewell  wrote:
> I think what some of us would like to see are detailed examples, citing
> specific characters and combinations, rather than general rhetoric, to
> support claims like this:

Yes
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Dead and Compose keys (was: Re: Romanized Singhala got great reception in Sri Lanka)

2014-03-18 Thread Christopher Fynn
On 18/03/2014, Naena Guru  wrote:

> Okay, Doug.

> Type this inside the yellow text box in the following page:
> kaaryyaalavala yanþra pañkþi
> http://www.lovatasinhala.com/puvaruva.php

> Please tell me what sequence of Unicode Sinhala codes would produce what
> the text box shows.

Naena Guru

If you want you should just be able to type it in as you wrote it
"kaaryyaalavala yanþra pañkþi"
and  get Singhala Unicode characters. But to do this you do need
something more than a re-mapped keyboard layout made with MSKLC

So long as the Roman  transliteration system you are using for
Singhala and Pali follows consistent rules, it is possible to write an
input method that parses the Romanized text and
converts it into  Singhala Unicode.

If you care about your language and script, that is the proper way to
do this sort of thing - not by using OpenType lookups to map strings
of latin characters to Singhala glyphs.

Chris Fynn
Thimphu, Bhutan

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Dead and Compose keys (was: Re: Romanized Singhala got great reception in Sri Lanka)

2014-03-18 Thread Christopher Fynn
Hi Andrew

It may be possible with Keyman. I once even wrote a set of MS Word
macros that did the same thing (let users type in Romanized Tibetan
and output Tibetan characters) - however it stopped working when
Microsoft switched from Word Basic to VBA.  :-(

At least Keyman hides all the messy (and poorly documented) details of
Windows system hooks which is what you have to use if you want to make
a stand-alone utility (did that once too).

If Keyman can call external libraries ~ that's interesting. It is
certainly *far* more sophisticated and flexible than MSKLC and I
shouldn't have lumped the two together.

- Chris

On 18/03/2014, Andrew Cunningham  wrote:
> Chris,
>
> Keyman is capable of doing that and a lot more,  but few keyboard layout
> developers use it to its full potential.
>
> As an example,  I was asked by Harari teachers here in Melbourne to develop
> a set of three keyboard layouts for them and their students.
> The three keyboards were for three different orthographies in the following
> scripts:
> 1) Latin
> 2) Ethiopic
> 3) Arabic
>
> They wanted all three layouts to work identically,  using the keystrokes
> used on the Latin keyboard.
>
> The Ethiopic and Arabic keyboard layouts required extensive remapping of
> key sequences to output.
>
> If I was a programmer I could have done something more elegant by building
> an external library Keyman could call but as it is we could do a lot inside
> the Keyman keyboard layout itself.
>
> For Myanmar script keyboard layouts we allow visual input for the e-vowel
> sign and medial Ra,  with the layout handling reordering.
>
> One of the Latin layouts I use,  supports combining diacritics and reorders
> sequences of diacritics to their canonical order regardless of order of
> input. Assuming a maximum of one diacritic below and two diacrtics above
> base character.
> Analysis and creativity can produce some very effective Keyman layouts.
>
> Andrew
>  On 18/03/2014 7:23 PM, "Christopher Fynn"  wrote:
>
>> MSKLC and KeyMan are fairly crude ways of creating input methods

>> For what you want to - you probably need a memory resident program
>> that traps the Latin input from the keyboard, processes the
>> (transliterated) input strings converting them into unicode Sinhala
>> strings, and then injects these back into the input queue  in place of
>> the Latin characters.

>> There are a couple of utilities that do this for typing
>> transliterated/romanised Tibetan in Windows and getting  Tibetan
>> Unicode output.
>> http://tise.mokhin.org/
>> http://www.thubtenrigzin.fr/denjongtibtype/en.html

>> But I think both of these were written in C as they have to do a lot
>> of processing which is far beyond what can be accomplished with MSKLC
>> and even KeyMan

>> - C
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Dead and Compose keys (was: Re: Romanized Singhala got great reception in Sri Lanka)

2014-03-18 Thread Christopher Fynn
MSKLC and KeyMan are fairly crude ways of creating input methods

For what you want to - you probably need a memory resident program
that traps the Latin input from the keyboard, processes the
(transliterated) input strings converting them into unicode Sinhala
strings, and then injects these back into the input queue  in place of
the Latin characters.

There are a couple of utilities that do this for typing
transliterated/romanised Tibetan in Windows and getting  Tibetan
Unicode output.
http://tise.mokhin.org/
http://www.thubtenrigzin.fr/denjongtibtype/en.html

But I think both of these were written in C as they have to do a lot
of processing which is far beyond what can be accomplished with MSKLC
and even KeyMan

- C
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Websites in Hindi

2014-03-02 Thread Christopher Fynn
I don't know about that particular Serif software which may have
limitations, but if a site is using Unicode UTF-8, there should be no
problem creating a website in Hindi

e.g.
http://www.bbc.co.uk/hindi/
https://hi.wikipedia.org/
http://tehelkahindi.com/
http://www.webdunia.com/
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: proposal for new character 'soft/preferred line break'

2014-02-13 Thread Christopher Fynn
On 06/02/2014, Rhavin Grobert  wrote:

> No, you did not understand.  is like ­ its below the whitespace
> level: if the line is to long, it breaks a word:

Not really alike.  is an HTML tag  while ­ is a named
reference for a character.

Unicode has nothing to do with  as it is higher level markup.

- C
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: _Unicode_code_page_and_?.net

2013-08-06 Thread Christopher Fynn
On 06/08/2013, Whistler, Ken  wrote:

> These kinds of systems are widely deployed, but the endgame we are
> all working towards (and in large part have achieved) consists of
> servers configured in Unicode and clients connections configured in
> Unicode. Conversions still may be going on, but more often of
> the UTF-8 <--> UTF-16 type which preserve all data, instead of
> spitting out multiple instances of uninterpretable "?" characters
> when client and data source don't match.

> --Ken

I wonder why so many servers, database applications, and so on, _still_
don't install with Unicode (in some encoding format) as the *default*
installation option. People still have to configure e.g Apache PHP
MySQL to use Unicode / UTF-8 - and this is not always straightforward.



Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Christopher Fynn
I think the idea of encoding "regional identifiers" instead of actual
flags was to avoid a political minefield, and that flags change over
time. (Afghanistan has haad something like 20 different flags) I also
imagine PR China wouldn't be too happy if someone wanted to encode a
Tibetan flag. You're also right about things like sporting events
where England, Scotland, Wales and Northern Ireland have seperate
teams, etc.  Still, the use of flags as identifiers is common on the
web,  at international confrences to identify delegates and so on.
With the arrival of colored fonts (supported in Windows 8.1) I suspect
people will inevitably try and make use of these characters - in spite
of all the current limitations you have pointed out.

I guess someone could come up with a private registry, similar to the
ConScript registry, where ways of encoding all kinds of symbols (e.g.
FC logos)  using these identifiers could be listed.


On 06/08/2013, Philippe Verdy  wrote:
> OK I see the point of the PRI. But using joiners in the middle of the same
> flag is worse than just using start/end (which also have a clean way to e
> mapped to glyphs without using complex rendering like ligatures :
> start+RIS...+RIS+end can fully be converted to individual glyphs producing
> a flag showing the region code in the middle (good for simple editors) and
> then ligaturing can be aplied if needed on sequences to generate actual
> flags (possibly colorful as emoji icons)
>
> Your PRI does not dolve the problem of versioning, notaly in ISO 3166 which
> is not stable, e.g. for [CS], but as well for chaging flags of a country.
> You'll need dates or other specifiers in extensions of the code. The
> start/end solution also ensures stability of the default rendering without
> having to create and maintain any registry for the actual flags (this cold
> be made on another project, e.g. by maintainers and participants of the
> Flags of the World on their existing collaborative site, just the same way
> that Unicode does not have to maintain a dictionary of all words of a
> language. The start+RIS+end solution would act like a "word" in its own
> language, using its own ortography, and would be freed from ISO 3166-1
> dependency.
>
> Font creators would immediately be able to provide a font with a reasonable
> default rendering which will be suitable for the default, monochromatic,
> rendering of these "words". It would then be up to other applications to
> decide which word they recognize to replace them by colorful flag icons or
> emojis. The problem is solved once for Unicode and ISO/IEC 10646. The
> Unicode standard just has too say that these "words" can be freely replaced
> by icons showing a flag of the same encoded entity. It does not have to
> specify which ones, just like Unicode does not mandate any typographical
> ligatures (however TUS may specify the internal syntax of these encoded
> flags, to ensure that it would be compatible with ISO 3166 or with some
> other flags libraries like the IOC flags and codes.
>
> For Unicode however, the codes will be treated as all different : if [FR]
> is used for representing France, [-IOC-FRA] for reprenting the French
> Olympic team, both could display exactly the same flag (and [MQ] could as
> well display the same flag or the cultural regional flag, becayuse here
> there's no other qualifier to say which one to use, and both are valid ;
> but if only the official national flag used in UN must be used then
> [-UN-MQ] will only display the tricolor flag, and if needed a versioning
> sufix could be used) The syntax could be similar to the syntax developed
> for language tags (or locale tags).
>
> 2013/8/5 Christopher Fynn 
>
>> On 05/08/2013, Philippe Verdy  wrote:
>>
>> > The way I perceive the regional indicators (in Uncode 6.0), they are
>> > absolutely not used and will be never used at all as long as there are
>> > no
>> > complements such as the minimum brackets I suggest to fix them. The 26
>> > letter-like characters are basically broken in their identity, you
>> > can't
>> > safely align multiple flags or delimit them with break iterators, like
>> you
>> > can break words, paragraphs, syllables (in some languages this is
>> difficult
>> > as it is contextual too, but not impossible, and in many languages you
>> can
>> > find syllabel breaks without having to parse backward on indefinite
>> length)
>> > or lines.
>>
>> See:
>>
>> http://www.unicode.org/review/pri215/pri215-background.html
>>
>> http://www.unicode.org/L2/L2012/12284r3-reg-indicator-seg.pdf
>>
>



Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Christopher Fynn
On 05/08/2013, Philippe Verdy  wrote:

> The way I perceive the regional indicators (in Uncode 6.0), they are
> absolutely not used and will be never used at all as long as there are no
> complements such as the minimum brackets I suggest to fix them. The 26
> letter-like characters are basically broken in their identity, you can't
> safely align multiple flags or delimit them with break iterators, like you
> can break words, paragraphs, syllables (in some languages this is difficult
> as it is contextual too, but not impossible, and in many languages you can
> find syllabel breaks without having to parse backward on indefinite length)
> or lines.

See:

http://www.unicode.org/review/pri215/pri215-background.html

http://www.unicode.org/L2/L2012/12284r3-reg-indicator-seg.pdf



Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796)

2013-08-05 Thread Christopher Fynn
🇮🇳

http://en.wikipedia.org/wiki/Regional_Indicator_Symbol


On 05/08/2013, Michael Everson  wrote:
> Pradeep,
>
> The Unicode Consortium and ISO/IEC JTC1/SC2 defined a set of "Regional
> Indicator symbols" (basically a special coded form of the letters A-Z) and
> when two of those come together like CN or GB, the Chinese or UK flag can be
> displayed, if the font vendor supports a single flag glyph for those
> sequences.
>
> In the case of the WhatsApp implementation, the font vendor is Apple.
>
> On 5 Aug 2013, at 05:44, Pradeep Aluru  wrote:
>
>> Hi,
>>
>> Im not sure if this is the right contact for the request below, if not Im
>> sure it will be directed to the right group.
>>
>> From the below email it is understood that you are the creators of
>> emoticons in the WhatsApp application.
>>
>> So, I'm sure you are the right people to help us with this.
>>
>> Unfortunately there is no flag of India available in the emoticons in
>> WhatsApp. Since, Indian Independence day is close by, me on behalf of
>> millions of users from India would like to request you all to please
>> design and add a Flag of India which would come to a huge help and GREATLY
>> be appretiated by millions of users.
>>
>> Looking forward to see it happen soon.
>>
>> Thanks in advance,
>> Pradeep.
>
> Michael Everson * http://www.evertype.com/
>
>
>
>




Re: Ways to show Unicode contents on Windows?

2013-07-15 Thread Christopher Fynn
On 15/07/2013, Doug Ewell  wrote:

> This shows that the problem is not that Windows is unable to display
> arbitrary Unicode text, or inherently cannot support font fallback, but
> that:
>
> (a) Complete font fallback in Windows is not automatic, and users or
> developers often must supply additional knowledge of which fonts support
> which scripts (like the "Composite Font Mappings" feature in Andrew
> West's BabelPad).
>
> (b) Some Windows apps apply their own fallback strategy, which may be
> better or worse than the default strategy, depending on the situation.
>
> For a recent project, I was harshly reintroduced to the way MS Word maps
> fonts. I needed to generate Word documents using the PUA, but even
> though I specified a font that covered all my characters, Word silently
> substituted another font that covered none of them (and did not change
> the name of my selected font on the ribbon). I was forced to use
> OpenOffice and generate .odt documents instead, since OO doesn't
> override my font choices in such a destructive way. This was running
> Office Professional Plus 2013 under Windows 8 Pro, so it has nothing to
> do with outdated versions. For someone who regularly defends MS software
> against its detractors, this was tough to swallow.

 MS Office seems to want to do is apply fonts based on the language
being used - the input "language" being determined by the keyboard or
IME currently selected. When using a custom keyboard (e.g. one created
with MSKLC) or IME  MS Office frequently does not accuratly determine
the language and consequently overides your font selection.

In MS Word a work-around is to create a custom character style for
each language/script you use. When defining those styles choose an
appropriate font for the particular language/script - and put the name
of that font in the "Latin text font" "Complex text fonts" and "CJK
fonts" (all 3) boxes on the font tab of the syle definition dialouge.
Apply this style to any text in that prticular script/language.

I've found this to work in most cases - even for PUA characters,
though in some instances I've had to remove paricularly pernicious
system fonts - which isn't easy to do.



Re: Ways to show Unicode contents on Windows?

2013-07-12 Thread Christopher Fynn
Peter

I'm wondering how do you change the fonts selected by the built-in
font fall back in various versions of Windows? I've found that the
rendering for certain scripts is less than ideal with some of these
fonts. Also the fallback font sometimes overides the font selected by
the user in Office and other applications even when the selected font
is available.

The only way round this that I've found is to remove the offending
fallback font from the system (not always easy)

- Chris

On 11/07/2013, Peter Constable  wrote:
...
> For simple scripts that do not require shaping that are not yet supported,
> if you have the font and can select the font in your app, then text in those
> scripts can be displayed. Of course, we don't have built-in font fallback
> for such scripts.
>
>
> Peter



Re: interaction of Arabic ligatures with vowel marks

2013-06-13 Thread Christopher Fynn
Andreas

Have you tried Mihail Bayaryn's Siddhanta font - (or his earlier
Chandas and Uttara fonts)?

http://svayambhava.org/index.php/en/fonts

This font supports many more vertical ligatures for Sanskrit than most
other Devanagri fonts.

- Chris

On 13/06/2013, Andreas Prilop  wrote:
> On Wed, 12 Jun 2013, Richard Wordingham wrote:
>
>> While the same principle applies to Indic scripts (and indeed, to the
>> Roman alphabet), there is only one Indic mark I can think of for which
>> the issue of component association arises, and that is the nukta.
>
> Sanskrit requires "candrabindu" U+0901 inside (or on top of)
> two "La" U+0932.
> See
>  http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0138.html
>
> Instead of
> http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/att-0135/image001.png
> I would like to see the two "La" on top of each other.
>
>



Re: Preconditions for changing a representative glyph?

2013-05-30 Thread Christopher Fynn

On 29/05/2013 21:39, Leo Broukhis wrote:
In light of recent news about New York adopting a redesigned 
"handicapped" symbol


http://www.disabilityscoop.com/2013/05/28/handicapped-symbol-facelift/18034/
http://www.cbc.ca/news/yourcommunity/2013/05/revamped-handicapped-icons-coming-to-new-york-city.html

and the new signs starting to appear in public places

https://www.google.com/search?q=new+york+handicapped+sign+redesign

IMO this is just a NYC variant of U+267F.

I'd like to ask: what is supposed to be the trigger condition for the 
UTC to consider changing the representative glyph of

U+267F WHEELCHAIR SYMBOL ♿

to the new design?

Perhaps when the most of the rest of the world adopts the new design.


Thanks,
Leo




Re: Suggestion for new dingbats/symbols

2013-05-25 Thread Christopher Fynn
The US National Park Service pictograph set might be a good candidate
set as these are widely used on maps and in the text of guidebooks,
etc. - as well as in GIS applications.

http://www.nps.gov/hfc/carto/map-symbols.cfm

http://commons.wikimedia.org/wiki/File:MapSymbols_other_US_NPS.svg

http://commons.wikimedia.org/wiki/File:Map_symbols_US_NPS.svg


These symbols also have been placed in the public domain - so there
should be no copyright issues.

This set of symbols, has been around since 1974 and is now well
established. It is somewhat curious that they haven't already found
their way into the Unicode Standard as in general they seem more
useful than many other symbols  which have been encoded.

http://www.thesmartset.com/article/article07140901.aspx

- Chris



Re: Bing now translates to/from Klingon

2013-05-20 Thread Christopher Fynn
I suppose when this gets embeddded in some mobile or gaming devices
there will be new calls to encode Klingon using the precedent of
Emoji.



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Christopher Fynn
William

Your  "localizable sentences" idea reminds me of telegraph companies
that used to have a number of common sentences that could be
transmitted in morse code by number. In India you could have telegrams
containing such sentences delivered in any of the major Indian
regional languages.

This was a good idea in the days of the low-bandwidth telegraph - but,
as Ken suggested, with modern technology there are now far more
sophisticated ways of accomplishing the same sort of thing.

regards

- Chris



Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-14 Thread Christopher Fynn
Simply stacking glyphs doesn't really work In Tibetan either. The size
and shape of the consonants needs to be adjusted as the stacks get
more and more complex - particularly in fonts, and situations where
where there are vertical constraints. So each Tibetan glyph also needs
to be adjusted depending on what occurs above or below. The descender
of one Tibetan glyph may often occur to the right or left of another
glyph in a stack. Very rarely you find occurrences of vowels written
in the middle of a stack - or even one consonant written horizontally
beside another in the middle of a vertical stack.  I don't think
anything handles these things properly though.



On 14/04/2013, Richard Wordingham  wrote:
> On Sun, 14 Apr 2013 13:44:26 +0600
> Christopher Fynn  wrote:
>
>> In practice, the rendering of Tibetan appears to be far less complex
>> than that of Khmer (with its coeng joiner) or that of Indic.
>
> That's largely because Tibetan puts the consonants in a simple vertical
> stack with the vowels at the top and bottom.  Khmer has to worry about
> subscripts consonants with spacing ascenders to the left and spacing
> ascenders to the right.  Further, the top-to-bottom length of the
> ascenders depends on what is in the stack above the body of such
> subscripts.
>
> Richard.
>



Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-14 Thread Christopher Fynn
Shriramana

It is interesting to compare:

http://skia.googlecode.com/svn/trunk/third_party/harfbuzz/src/harfbuzz-indic.cpp

http://skia.googlecode.com/svn/trunk/third_party/harfbuzz/src/harfbuzz-khmer.c

http://skia.googlecode.com/svn/trunk/third_party/harfbuzz/src/harfbuzz-tibetan.c

In practice, the rendering of Tibetan appears to be far less complex
than that of Khmer (with its coeng joiner) or that of Indic.

Where you do get a some complexity in Tibetan script is in collation:
http://developer.mimer.com/charts/tibetan.htm
http://developer.mimer.com/charts/dzongkha.htm

This would have been somewhat simpler it characters like those I
mentioned earlier had been dropped.

Perhaps some of the other Tibetan encoding proposals might have made
Tibetan collation a little simpler - but I think this would have been
at the cost of all kinds of added complexity in rendering and input
methods.

- Chris



Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-14 Thread Christopher Fynn
On 14/04/2013, Shriramana Sharma  wrote:
> On Sun, Apr 14, 2013 at 2:32 AM, Christopher Fynn 
> wrote:

>> The purpose of having most of these characters there was to facilitate
>> conversion between Tibetan and Devanagri scripts.

> Well conversion from Tibetan to Devanagari can easily be done even
> without these characters -- they only facilitate one-to-one mapping.

I agree - but some people thought they should be there for that purpose.

(Mind you I've never encountered anyone with the need to actually do this.)

Most of these characters were in earlier proposals for Tibetan which
mimicked that of Devanagri and the other ISCII derived encodings. They
kind of just got left in.

There were also proposals to have an invisible root consonant marker -
to flag the root consonant in a Tibetan tsheg-bar or syllable.

Other proposals wanted to have all the Tibetan consonants encoded +
combining super-added RA LA and SA (ra-mgo la-mgo sa-mgo) + subscript
YA RA and LA. This would mean having to type (or re-order) these head
letters after the base consonant. It might seem to have made Tibetan
collation easier - but to get that to work as intended, it would have
also been necessary to  encode a PREFIX-GA, PREFIX-DA, PREFIX-BA,
PREFIX-MA and PREFIX-ACHUNG - all of which would have to logically
occur after the root cluster but be re-ordered   before for rendering.

Anyway all these models completely fall apart as soon as you move away
from standard  orthography for standard Tibetan words. All assumed
Tibetan neatly followed the orthographic rules found in Tibetan
grammar books  - but this is not the case. First there are different
rules as to which letters can be combined with each other when writing
Sanskrit and other Indian languages - but there are still rules for
this. However these break down when transliterating words from other
languages (Chinese, English, etc.) into Tibetan. Next  there are some
unusual combinations required when writing some other Tibeto-Burmese
languages in Tibetan  script. Finally some Tibetan texts are full of
abbreviations which are written in a way which break all the standard
rules of Tibetan orthography and characters are combined in all sorts
of weird and wonderful ways. (See:
http://www.dzongkha.gov.bt/publications/PDF-publications/Duyig.pdf for
some examples.)

The encoding model finally adopted for Tibetan simply follows the way
Tibetans are taught to spell out combinations - and the way, and order
in which, they actually write.  After all we were encoding a script
the way it is *actually* used - not encoding the rules of Tibetan
grammar or rules in books of orthography which tell you how the script
is supposed to be used.

- Chris



Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-13 Thread Christopher Fynn
The main thing is everyone finally agreed to accept the encoding we
have today ~ though there had been objections to a number of earlier
proposals.China actually wanted an encoding that would have ended up
with 6000+ characters - but they finally agreed to this one.

The encoding actually makes a lot of sense to anyone brought up
speaking and reading Tibetan.


Given the encoding model used there are some characters which are not
strictly necessary - including 0F00,  0F73, 0F75, 0F76, 0F75,
0F77,0F78 0F79.

All these these in a way were left overs from earlier proposals.

0F7B could have been represented by 0F7A + 0F7A;  0F7D by 0F7C + 0F7C

0F43 could have been represented by 0F42 + 0FB7 and 0F93 by 0F92 + 0FB7;
0F57 could have been represented by 0F56 + 0FB7 and 0FA7 by 0FA6 + 0FB7; etc.

Letters like these are spelt as two letters by Tibetans and collate as
such in Tibetan dictionaries so they are not really separate letters
in the Tibetan alphabet as they are in Devanagri.

The purpose of having most of these characters there was to facilitate
conversion between Tibetan and Devanagri scripts.

- Chris



Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-13 Thread Christopher Fynn
On 12/04/2013, "Martin J. Dürst"  wrote:

> On 2013/04/11 16:30, Michael Everson wrote:
>> On 11 Apr 2013, at 00:09, Shriramana Sharma  wrote:
>>
>>> Or was the Khmer model of an invisible joiner a *later* bright idea?
>>
>> Yes.
>
> Later, yes. Bright? Most Kambodian experts disagree.
>
> Regards,   Martin.

At one time there was also a proposal for an "invisible joiner"
character for Tibetan. As far as possible I think "invisible"
characters are best avoided as ordinarily the user can't see them and
doesn't always know if one is there or not.




Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-11 Thread Christopher Fynn
On 11/04/2013, Shriramana Sharma  wrote:

> On Thu, Apr 11, 2013 at 9:54 AM, Christopher Fynn 
> wrote:
>> In Unicode v1 Tibetan was encoded on the Indic model - but in practice
>> there were problems found with this and Tibetan was removed and later
>> re-encoded.
>
> I'd like to know what exact "problems". Often I hear "there are
> problems" in relation to various encoding models adopted for specific
> scripts, but no such problem is presented for examination.

I'm not sure - it was between Unicode v1 and v2.
Tibetan was dropped by v2 and then encoded again much later
You'd have to go back to the WG2 and UTC documents of the time to find
out the reasons.

I guest it *may* have been due to objections by China ~ but I'm not
really sure.

>>> But even for Devanagari, if it were not for
>>> Sanskrit, a visible virama is almost never used for Hindi, the
>>> prevalent language, and it is only that Devanagari is also heavily
>>> used for Sanskrit and the thing about maintaining uniformity with
>>> other Indic scripts that the visible function and the joining function
>>> were united in a single character.
>>
>> But afaik in Hindi etc. it is legal to use a visible virama instead of
>> joining letters. In Tibetan this is not so (except when writing
>> Sanskrit)
>
> I'm not sure what you mean by it being "legal" to use a visible
> virama. If सुरक्षा (surakṣaa = protection) were written सुरक् षा
> (ignore the space in between) any Hindi reader would say "hey who
> wrote like this". It would be comprehensible, but it would not be
> considered accepted orthography. I am not sure how the Tibetan
> situation would be different and how it would be "illegal" (and what
> exactly "illegal" would mean).

Wrong word - "acceptable" would have been better.

I've read that at one time Devanagri was often written more like
Tibetan (many complex conjuncts stacked vertically) - and have seen
manuscript examples of this. With the advent of metal type printing
this was impractical and a simpler orthography evolved and people are
used to that now.


>> You can look on the Tibetan encoding as a compromise between the two
>> ideas - but it works well and there is no ambiguity.
>
> OK so it's a compromise to satisfy everyone all around? So that means
> its validity as a precedent for other perhaps less-controversial
> scripts diminishes.

I'm not saying that it is a compromise - but you could look upon it that way.

Actually it is a very workable encoding model and font lookups and
rendering engine complexity seem much less than with the ISCIII
derived Indic model.

- Chris




Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan?

2013-04-10 Thread Christopher Fynn
On 11/04/2013, Shriramana Sharma  wrote:

> Hello people. This is just of academic interest, since the fact is
> that a full series of subjoined characters *have* been encoded and
> *are* being used for Tibetan, and nothing is going to change that, but
> it could have an effect on future proposals for Tibetan-like scripts,
> so I think it is important for this matter to be discussed.

The way Indic is encoded is an inheritance of ISCII - which encoded
Indic scripts in a limited 8-bit code space alongside ASCII.

There was no similar pre-existing standard for Tibetan, the encoding
model used is based on the way Tibetans are taught to spell.

In Unicode v1 Tibetan was encoded on the Indic model - but in practice
there were problems found with this and Tibetan was removed and later
re-encoded.

> The standard says that "there were two main reasons for this choice"
> of choosing to encode separate subjoined characters for Tibetan rather
> than using an Indic-like virama model:
>
> The *second* reason provided is that due to the prevalence of stacking
> in Tibetan, encoding subjoined characters would cause decreased
> storage requirements. Well that's true for any South Indic script --
> Telugu, Kannada, Grantha -- which also regularly uses stacks for
> representing clusters, so this is not something that is unique to
> Tibetan.
>
> The *first* reason stated is that "the virama is not normally used in
> the Tibetan writing system to create letter combinations". But this
> sentence conflates two things, the visible device of a vowel-killer
> virama as part of the attested orthography, and the abstract encoded
> character as part of digital text. Clearly the "is not normally used"
> can refer only to the former, not the latter.

Unlike Indian languages, there are a lot of unvoiced (silent)
consonants (prefixes and some suffixes) in Tibetan.  Other letters are
pronounced with no vowel sound. Both these things are dependent on the
position of the consonant with in the syllable (marked by)

> OK fine, so in practice the virama *with a visible form* is never used
> in writing Tibetan.

It is never used when writing the Tibetan language, but it is
sometimes used when writing Sanskrit in Tibetan (but nowhere near as
much as when writing Sanskrit in Devanagri)

> But even for Devanagari, if it were not for
> Sanskrit, a visible virama is almost never used for Hindi, the
> prevalent language, and it is only that Devanagari is also heavily
> used for Sanskrit and the thing about maintaining uniformity with
> other Indic scripts that the visible function and the joining function
> were united in a single character.

But afaik in Hindi etc. it is legal to use a visible virama instead of
joining letters. In Tibetan this is not so (except when writing
Sanskrit)

> So it's not a big deal to separate the two functions, as is done in
> Khmer etc. Hypothetically even in mainland Indic we could have
> separate joiner-virama vs visible-virama characters.
>
> So my point is that even though the visible virama is not used in
> Tibetan (probably because the TSHEG separates syllables making the
> final consonant vowelless) one could very well have gone the Khmer way
> and made a separate character for that (as indeed has been done) but
> still have had a single joiner for causing the stacks.

In Tibetan that not only the final consonant may be vowel-less

There was a proposal to encode Tibetan with an explicit STACK
(invisible-joiner) character - but eventually the model adopted was
preferred.

China wanted to encode every combination of Tibetan characters - which
would have meant 6,000+ characters. (They do have an official national
standard which encodes Tibetan that way using PUA characters for the
combinations. This is in everyday use in China.)

You can look on the Tibetan encoding as a compromise between the two
ideas - but it works well and there is no ambiguity.

> Or was the Khmer model of an invisible joiner a *later* bright idea?
> But really that doesn't hold water (I mean the "later" part) because
> the Indic virama model already existed, and whether or not Tibetan
> used the visible virama heavily need not have prevented from a virama
> character, which would have a visible form in appropriate contexts,
> causing stacking in other contexts.
>
> And even that thing about the contrast between the full-form subjoined
> consonants YA RA VA and half-form ones (I mean the -tags forms) need
> not prevent this, because you could encode a virama and have the
> *regular* (-tags) forms produced by it, and use separately encoded
> subjoined characters for the aberrant forms alone.
>
> As for the RA-MGO thing, I still am not sure how it is advisable to
> have a 0F6A glyphically identical to 0F62 and even if a
> default-ignorable ZWNJ would not have been satisfactory, some
> specialized non-default-ignorable conjoining-form-prevention character
> could be defined, which would then also be used for subjoined
> full-form YA RA VA avoidin

Complex script support on mobile devices

2013-03-15 Thread Christopher Fynn
Anyone know of a web page or resource which lists the level of support
for complex scripts on various mobile devices?

I'd like to know things like:
* Which "smart" phones and tablets have complex script rendering
support - and for which scripts?
* Is that support available system wide or only in the web browser?
* Font support for various scripts (installed or easily dowloaded)
* Input method support for different languages/scripts.
* Can a user install additional fonts without "jailbreaking" the device?
* Can a user install additional input methods without "jailbreaking" the device?

>From what I can gather:

Apart from a handful of mobile devices running Linux, iPhone and iPad
currently seem to have the best level of complex script support.
Unfortunately these devices are very expensive for many users in
countries that use languages written in a complex script. Outside of
the web browser, support for complex script rendering on most Android
devices is very poor (there are some exceptions e.g. Sony-Ericson MT &
ST series phones). Windows phone 8 appears to have decent complex
script rendering support - but seems to lack fonts and input methods
for many languages/ scripts).

A few "dumb" handsets and feature phones seem to have have better
support for complex scripts than "smart" phones costing several times
the price.

 - Chris



Re: If X sorts before Y, then XZ sorts before YZ ... example of where that's not true?

2013-01-07 Thread Christopher Fynn
On 07/01/2013, Costello, Roger L.  wrote:
> Hi Folks,
>
> In the book, Unicode Demystified (p. xxii) it says:
>
> An English-speaking  programmer might assume,
> for example, that given the three characters X, Y,
> and Z, that if X sorts before Y, then XZ sorts before
> YZ. This works for English, but fails for many
> languages.
>
> Would you give an example of where character 1 sorts before character 2 but
> character 1, character 3 does not sort before character 2, character 3?
>
> /Roger

Look at the collation for Dzongkha or Tibetan:

http://developer.mimer.com/charts/dzongkha.htm

http://developer.mimer.com/charts/tibetan.htm



Re: Mayan numerals

2012-11-14 Thread Christopher Fynn
On 24/08/2012, Michael Everson  wrote:

>> Finland has not decided its position, but I'd personally tend to support
>> Asmus' position.

> Are you suggesting that the UCS ought to have two sets of Mayan numbers
> encoded?

Michael

Someone might argue that we already have multiple sets of the
Indic/Arabic numbers encoded  since nearly all sets of base ten digits
are in a real sense just variant sets of glyphs of the same numbers.

;-)

- Chris



Re: texteditors that can process and save in different encodings

2012-10-17 Thread Christopher Fynn
On 16/10/2012, Jukka K. Korpela  wrote:
> 2012-10-16 13:06, Christopher Fynn wrote:
>
>> On Windows I use Andrew West's Babel Pad
>>
>> http://www.babelstone.co.uk/Software/BabelPad.html
>
> As far as I can see, the “Encoding” menu in “Save As” in BabelPad has
> just a small set of encodings to choose from, basically just UTF-8 and
> UTF-16 and GP18030. There’s UTF-32 and SCSU too… the rest is not
> encodings but file formats (ASCII encoding, with non-ASCII characters
> represented using various escape notations, like \u1234 (“ASCII Plus
> something”). But not even ISO-8859-1.

> BabelPad is great as a Unicode editor, but it’s not particularly
> oriented towards dealing with different encodings. And I think it’s
> really better to use dedicated code converters rather than build a large
> number of character code and encoding conversions into various
> application programs.

Babel Pad opens a lot more encodings than it Saves As
I usually have to convert from legacy encodings -  not to them, so it
works for me

It also has a lot of other features which I find very useful.
I agree that if you have to do a lot of conversions from and to
different encodings then a dedicated code converter is probably
better.




Re: texteditors that can process and save in different encodings

2012-10-16 Thread Christopher Fynn
On Windows I use Andrew West's Babel Pad

http://www.babelstone.co.uk/Software/BabelPad.html



Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

2012-07-10 Thread Christopher Fynn
Satyakam Phukan

Please don't be too concerned about script names or character names -
the names are there simply as unique identifiers for the convenience
of programmers. Users rarely, if ever, see these names. Unicode and
ISO 10646 could have named scripts as "Script AAA", "Script AAB",
"Script AAC" and so on in the order which they were encoded in the
standard - but names like that wouldn't be very easy to remember.

If you don't like the names Unicode has assigned to characters   e.g
calling character U+099A "BENGALI LETTER CA" then simply call it
"U+099A" instead. Again these names are there simply as unique string
identifiers for the convenience of programmers and nothing to get
worked up about. There are many characters in the standard which have
probably been inappropriately named but due to the Unicode stability
policy names of characters or scripts cannot be changed and that is
something that has to be accepted.

If it really bothers you then, as someone has already suggested, you
could make a formal proposal to have the existing Character names
annotated with informative aliases giving the Assamese names. Or have
the concerned department of Government of Assam make such a proposal.
Such a proposal made in the proper way might be accepted


If there are glyphs for particular characters which are normally
written one way in Bengali and another way in Assamese this can be
handled by putting language specific glyph variants in an OpenType
font. This is however a font issue not a Unicode issue.

Don't worry about the order of characters in the standard. Collation
of any script can be handled by a tailored collation table for your
language. See: http://unicode.org/reports/tr10/

I understand these things seem very important to you. In the past
other people have expressed similar concerns with regard to their own
language and the script used to write it. However I believe that, if
you take the time to learn more about Unicode, your concerns will
pretty much disappear.

regards

Chris Fynn



Re: Sorting Pali in Tibetan Script

2012-07-10 Thread Christopher Fynn
On 07/07/2012, Richard Wordingham  wrote:
> Can someone please advise me as to the sorting of Pali as Pali in
> Tibetan script.  I need a prompt response rather than a complete
> treatment.  It is possible that I have been misunderstood what I have
> been able to pull together.

Richard

I'm rather curious as to why you would want to know this, as I have
never encountered Pali written in Tibetan script. The Buddhist Canon
in Tibetan was almost entirely translated from Sanskrit (with a few
sutras translated from Chinese) - as far as I know, no part was
translated from Pali into Tibetan (there is a 20th century translation
of the Dhammapada from Pali but it is not included in the Tibetan
Canon). The small handful of Tibetans or Bhutanese I know of who have
any knowledge of Pali are familiar with reading that language in
Devanagri script or Roman transliteration.

> What I understand is the following:
>
> (a) The retroflex lateral ('LLA' in most Unicode encodings) is written
> , as at
> http://www.tipitaka.org/tibt/ .

In 40 years have never seen that combination of characters (U+0F63
U+0F39) used in Tibetan, or any other language  normally written in
the Tibetan script. If you really need to collate Pali written in
Tibetan script in the correct order for Pali then you should probably
create a specific tailoring. The current collation table for Tibetan
in CLDR (and that described in Robert Chilton's slides) has errors
even for Tibetan. It cannot be used to collate Sanskrit written in
Tibetan script in the correct order for Sanskrit - though it should
put Sanskrit loan words in the order they are normally found within a
Tibetan dictionary.

- Chris



Re: complex rendering (was: Re: Mandombe)

2012-06-28 Thread Christopher Fynn
On 13/06/2012, Naena Guru  wrote:

> I made the first smartfont for Singhala in 2004. (Perhaps the first ever
> for any written language -- a little bragging there).

Hmm I made a smartfont for Tibetan script in 2000-2001 - and there
were smartfonts for several other complex scripts already available.



Re: Offlist: complex rendering

2012-06-18 Thread Christopher Fynn
Naena Guru 

Naena - If you don't like Unicode, then develop your own character
encoding and try to get your country to adopt it as a national
standard - but please stop trying to abuse Unicode and OpenType by
attempting to warp them to conform to your scheme.

You could also quite easily create an IME that accepts transliterated
Sinhala typed on a QWERTY keyboard using Latin characters and converts
those to proper Unicode Sinhala characters. That would *much* better
than trying to use complex script rendering to get transliterated
Sinhala.

BTW This kind of idea is not new - about 12 years ago I messed around
with using complex script rendering to display transliterated
(romanized) Tibetan withTibetan glyphs (it is fairly easy to do using
Graphite instead of OpenType) - however I didn't make the mistake of
actually assigning some of those Tibetan glyphs to Latin code points.



Re: Exact positioning of Indian Rupee symbol according to Unicode Technical Committee

2012-05-29 Thread Christopher Fynn
On 29/05/2012, Pravin Satpute  wrote:
>
> I have not heard any news regarding mapping Rupee symbol on US English
> layout. I think US International keyboard layout is right one for
> discussion. [3]

The US International keyboard layout is for a "102 key"
(International) keyboard. But all the keyboards I've seen sold in
India seem to be the usual US "101 key" type. So putting the Rupee
symbol on a "102 key" type International keyboard would be of little
benefit to the public there, unless hardware suppliers in India can be
persuaded to supply the "102 key" type of keyboard as standard.

The obvious place to locate the Rupee symbol on such a keyboard would
be where the Pound Sterling symbol is located on the standard UK
keyboard which is a "102 key" international type.

- C



Re: Joining Arabic Letters

2012-04-01 Thread Christopher Fynn
On 31/03/2012, Philippe Verdy  wrote:

> This means that even if there's a font change between two letters (for
> example due to a fallback for some letters or diacritics), each letter
> should contonue to adopt its normative joining behavior (i.e.
> displaying their correct joining form).

Using OpenType or something similar there are several; ways you can
implement an Arabic script font including several different ways you
can write the lookup tables - all of which are valid. The same goes
for any other complex script.

Unless you are going to define some rigid way Arabic fonts are
implemented - and a fixed glyph set - there is just no practical way
to get font lookups to work across font change boundaries. Even then
it would require some protocol  allowing the lookups in each font to
interact.



Re: Support for Unicode and CTL on mobile devices.

2012-03-19 Thread Christopher Fynn
Yes Apple devices now appear have pretty good support - though they
are not that widely available in India and the price is prohibitive
for most people. Nokia N900 and N9 are also very good - but again the
price is prohibitive and Nokia seem to be abandoning their Linux based
operating systems - and afaik Windows phones don't have this support.
The latest version of Android is supposed to have support for
Devanagri and Tamil - but people report problems with incorrect
character reordering and other scripts are not supported. It also
seems some manufacturers, including Samsung, have implemented their
own complex script support in a few models sold in some markets - even
on older versions of Android. But It is hard to figure out which
models. Also on some Android phones complex script rendering seems to
work OK in the webkit browser - but not for SMS or any other
application. I guess webkit is doing its own rendering.

In general it seems the support for complex scripts on "smart phones"
is about at the level it was on PCs ten or twelve years ago - except
this time Apple seem to be ahead of Microsoft whereas on PCs it was
the other way round.

It is unfortunate that when making mobile operating systems companies
didn't include complex script rendering support from the beginning.



On 19/03/2012, Tom Gewecke  wrote:
>
> On Mar 19, 2012, at 8:24 AM, Christopher Fynn wrote:
>
>> Has anyone done a survey of which mobile devices support Unicode and
>> complex script rendering?
>
> As far as the iPhone/iPad/iPod Touch are concerned, my understanding is they
> support display of Unicode Devanagari, Gujarati, Gurmukhi, Tamil, Telugu,
> Sinhala, Oriya, Malayalam, Kannada, and Bengali.  But so far only a Hindi
> keyboard has been provided.   There are some apps that will permit input in
> some the other scripts for some purposes.



Support for Unicode and CTL on mobile devices.

2012-03-19 Thread Christopher Fynn
Has anyone done a survey of which mobile devices support Unicode and
complex script rendering?

In India it is easy to buy a very cheap basic mobile phones that
supportDevanagari - and can send and receive SMS messages in Devanagri
script. But it it seems, with one or two exceptions, most so called
"smart phones" available in the market have no working support for
complex script rendering. If you want to send and receive  Indic
script text it seems a basic mobile phone costing about 1,000 rupees
is much  better than most "smart" phones costing more than 25,000
rupees.

- Chris



Re: Emoji domains

2012-02-29 Thread Christopher Fynn
Apparently Tokelau (.tk) will also register emoji domains.





Re: Emoji domains

2012-02-28 Thread Christopher Fynn
On 28/02/2012, Stephane Bortzmeyer  wrote:
> On Tue, Feb 28, 2012 at 03:47:04AM +0600,
>  Christopher Fynn  wrote
>  a message of 7 lines which said:
>
>> Come to think of it, Unicode could probably fund itself by selling
>> code points for this ;-)
>
> RFC 5241 already explored a similar idea
>
> http://www.ietf.org/rfc/rfc5241.txt

On the Internet  if something can be monetized it will inevitably be
monetized by someone. Perhaps it is better that the money goes to
something useful.



Re: Emoji domains

2012-02-27 Thread Christopher Fynn
On 28/02/2012, Christopher Fynn  wrote:

> Now  isn't everyone going to want their logo encoded so they can have
> a domain like this? ~ The pressure to do so could be enormous.

Come to think of it, Unicode could probably fund itself by selling
code points for this ;-)



Re: Emoji domains

2012-02-27 Thread Christopher Fynn
On 27/02/2012, Jeroen Ruigrok van der Werven  wrote:
> -On [20120226 21:11], Stephane Bortzmeyer (bortzme...@nic.fr) wrote:
>>Note that it is a direct violation of RFC 5892. U+1F4A9, being of
>>category So, should be DISALLOWED. The registry was wrong to accept
>>it.
>
> Oh, this will be fun. So I guess they did not check the codepoint categories
> in their validation step then? (I honestly have no idea how NICs do this
> nowadays, it's been ages since I messed with stuff on that level.)

Now  isn't everyone going to want their logo encoded so they can have
a domain like this? ~ The pressure to do so could be enormous.



Unicode on Symbian phones

2012-02-27 Thread Christopher Fynn
An interesting paper:

http://www.panl10n.net/english/LOCALIZATION_OF_MOBILE_PLATFORM.pdf



Re: [indic] Re: Lack of Complex script rendering support on Android

2012-01-31 Thread Christopher Fynn
On 26/11/2011, Siji Sunny  wrote:
...
> Since the source code of samsung android built is not available , am in the
> process of building ICS (Ice Cream Sandwich) on my Pandaboard ,for the
> further experiments.Will soon update you the result.

Any luck with this?

- Chris



Re: Re: Continue: Glaring mistake in the code list for South Asian Script//Reply to Kent Karlsson

2012-01-31 Thread Christopher Fynn
On 04/11/2011, Naena Guru  wrote:

[Snip]

> I do not know about CJKV, but Indic would have been much better off had
> they made their standards within SBCS. I tested this for Sinhala and it is
> a great success.

In the 1990s India had a kind of single byte national character
encoding standard for Indic scripts called ISCII. - Although there was
software and fonts based on ISCII, which were developed by the Centre
for Advanced Computing in Pune, ISCII was never that popular, and most
people in India seemed to have used a whole variety of solutions based
on various non-standard font based encodings for Indic scripts.

When it came to encoding Indic scipts in the Unicode and ISO 10646
standards, as far as I'm aware, India did not participate - though
they certainly could have as they were full members of the ISO in good
standing. Consequently the encoding of the scripts of India in the UCS
took place with little input from India - though it was initially
based on the model of ISCII.

- C



Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
Hard to keep track of these things - but shouldn't affect the fact
that one can safely implement OpenType rendering without a "licence"
from Adobe or Microsoft.


On 7 November 2011 13:21, Philippe Verdy  wrote:
> 2011/11/7 Christopher Fynn :
>> I'm sure people like RedHat, Debian, and Sun/Oracle (who use it in
>> OpenOffice) - have satisfied themselves that the open type rendering
>> they use is unencumbered.
>
> Actually now, this (OpenOffice) should no longer be Sun/Oracle but
> Apache. Oracle has donated OpenOffice to Apache that accepted it.
> Since the ecqusiation of Sun by Oracle, some OpenOffice developers
> were unhappy with Oracle and splitted the project in LibreOffice; not
> sure that both projects will merge again now that OpenOffice goes to
> Apache, but Apache has stated that both projects could live now (there
> are some differences in the GUI, but many developers are already
> trying to make modules that works on both projects). For now
> OpenOffice is still branced by Oracle in the current distribution,
> this may change in the next major relase showing the Apache branch,
> once all IPR issues are solved between Oracle and Apache.
>



Re: [indic] Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
On 4 November 2011 22:36, Philippe Verdy  wrote:

> All OS distributors should work on creating a base set of fonts needed
> to support all languages and scripts of the world (not necessarily in
> many styles), with a repository of webfonts that an be synced and
> cached automatically. It is no longer acceptable to see square boxes
> for texts written in modern languages and soon people will want to be
> able to read also technical documents with collections of symbols from
> anywhere too, even if they can't easily input them on their device.

Google's web fonts could be something like this - but when I contacted
them freely offering a Tibetan script font that renders well on small
devices they said the service is limited to Latin script fonts for
now.

> Input methods should also be ported to include also user preferences
> on their layouts for the various device form factors they want to use
> (physical keyboards or virtual on-screen touch keyboards) We should
> also be able to extend any smartphone with additional input devices
> with a Bluetooth or WiFi connection, or by the USB plug.

IMO there should be an easy way for users to map or configure the
keyboard (virtual or hardware) - this could be an XML file as in OSX.

> But the main problem of mobile devices is still their battery: you
> can't fit everything in your device, but you also cannot use mobile
> access networks due to the slow speed and cost of data transfers; if
> those prices were lowered, we could host most CPU/GPU and
> memory/storage capabilities remotely, and save lots of battery life on
> the mobile device (no smartphone can work today at least for 24 hours,
> and finding a place to recharge the device is still difficult)

I don't think this is a big issue - Smart phones today contain many
more memory hungry applications than well implemented complex script
rendering - and let the user switch it off if they don;t want it.

Many Symbian feature phones have complex script rendering through Qt -
and they have decent battery life. Data transmission for Unicode
*text* is not heavy load. What is heavy is the way Opera Mini renders
complex script web pages. They render the page on their server and
then send it to your phone as graphics.

- C



Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
You are free to make use of the information in the spec - though you
might infringe their copyright if you published the same information
that is on the Microsoft and Adobe sites without their permission.
There is no patentable information in the specification - the part
that is proprietary are the particular implementations used by Adobe
(Cooltype) and Microsoft (Uniscribe) and you can't copy those. But
there have been several free and open source implementations of
OpenType layout around for almost 10 years - and newer ones too- which
is freely available and  you can freely make use of under the
licensing terms of that code. Or you could write your own.

I'm sure people like RedHat, Debian, and Sun/Oracle (who use it in
OpenOffice) - have satisfied themselves that the open type rendering
they use is unencumbered.

- C

On 6 November 2011 08:06,   wrote:

> Dear John and Christopher
> I am happy to read your reassuring statements and I hope you are right.
>
> But...
>
> Can any one point to a public (published ) document PERPETUALLY  freeing up 
> the opentype technology ( open font of ISO patently remains one-way bonded to 
> OpenType) as published by Microsoft+Adobe.
> Is it safe to treat the same 'free' without such assurance ?




Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
On 6 November 2011 07:23, John Hudson  wrote:

> I don't know why OT Layout is not yet implemented in Android phones. I can
> think of a number of possible reasons, a combination of which might apply.
> One is that the developers simply have not done the work yet, but intend to.
> Another is that they have concerns about font size on mobile devices, which
> has delayed support for fonts with large layout tables. Another is that they
> have security concerns about OTL tables in fonts (Google's webfont sanitiser
> was stripping OTL tables from fonts served to Chrome for this reason, as I
> understand; I'm not sure if this has changed yet).

Well Samsung have implemented it on some phones - and non-Android
Linux phones have no problem rendering complex scripts - so it is not
intrinsically difficult. Even Symbian feature phones sold in India
mange this through Qt ~ and they have much less processing power and
memory capacity than the majority of Android phones. For under Rs
1,500/- new I can buy a small Nokia phone that I can use to send and
display SMS text messages in Devanagri - this is transmitted and
received in Unicode.

  - C



Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
On 6 November 2011 15:33, Mahesh T. Pai  wrote:
...
> You can update the firmware yourselves - go the custom ROM
> way. (errr... I am afraid the carriers' representatives may do
> something to me for advising this). :-D
>
> You can continue to be with the existing carrier.
>
> Look around. If all else fails, ask google - the search engine; not
> the customer support. (what a paradox). There are ways to root and
> de-root the phone for warranty purposes.

We can probably all do things to get our own phones to work - but what
is needed is that if *any* person buys *any* Android phone in India it
"just works" with Indic scripts (even if the UI is not localized into
all regional languages)

I don't think it is an unreasonable expectation for consumers in these
countries   when they shell out a lot of money for the latest phone to
be able to send and receive messages in their script. .

Another problem is that reviews of phones, even those published or
broadcast in India, rarely mention whether the phone supports Indic
scripts - so it is difficult for a consumer to take this into account
when deciding which phone to buy. A lot of consumers just expect it to
be there, since even much cheaper phones have offered this for years,.

I hope the latest Windows 7 phones don't have this limitation when we
start to see them.

- C



Re: [indic] Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
Phillipe - some Android devices can render Indic scripts in the webkit
browser which seems to have its own complex script rendering module
built in. However outside the webkit browser complex scripts don't
work. I've also heard that some Samsung phones have implemented system
level complex script rendering that Samsung added themselves (i.e. it
is not part of the normal Android code-base modules).

IMO this is a mess. If i send an Indic script SMS, IM or email to
someone with a smart phone, it would be nice to have a reasonable
expectation that they can read my message - without having to know the
particular manufacturer and model number of the phone they have.

However I'd be interested to know if your Samsung S2 properly displays
complex scripts outside of Webkit.

- Chris

On 4 November 2011 20:35, Philippe Verdy  wrote:
> 2011/11/4 Mahesh T. Pai :
>> Christopher Fynn said on Fri, Nov 04, 2011 at 02:27:30PM +0600,:
>>
>>  > Although Adnroid devices "support Unicode" - it seems there is no
>>  > support for complex script rendering on almost all Android devices
>>  > which makes them pretty useless for text communication in all Indian
>>  > languages (and many others too)
>>
>> And strangely, Android uses harfbuss, ICU and pango for text
>> layout. All libraries support Indic; and it seems that at compile
>> time, support for Indic is disabled.
>>
>> Thanks for taking the pain to identify the bug/ issue numbers.
>>
>> As pointed out, some devices sold in India support _some_ Indic
>> scripts. That is it.
>
> It may depend on the device. The manufacturer can compile Android as
> he wants apparently, and suppress/disable some modules. I have no
> problem on my Samsung S 2 that displays properly all languages shown
> on the home page of Wikimedia Commons for example.
>
>
>




Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-06 Thread Christopher Fynn
On 6 November 2011 19:34, Mahesh T. Pai  wrote:

> After looking at the directory structure of my Android device under
> the stock Samsung ROM (both 2.2.2 Froyo and 2.3.3 and 2.3.4
> Gingerbread), as well as CyanogenMod 7.1, (which is very stock AOSP),
> and the files in there, (but not the soruce code or compilation
> process, I fell that this is more a case of not enabling compile time
> switches to support Indic modules.
>
> This is of course, subject to the caveat - I do not understand much of
> coding.

Samsung may have this - but I am not sure that, outside of Webkit,
complex script rendering is part of the standard codebase.

It does not work in Google's ICS (Android 4) emulator.

- C



Re: [indic] Re: Lack of Complex script rendering support on Android

2011-11-05 Thread Christopher Fynn
On 4 November 2011 16:46, jitendra  wrote:
> Dear All,
> I wish to stick my neck out and make some exploratory , though humble
> statements.
> The issue of complex script not being enabled in various OSes and interfaces
> is linked to font. I mean if unicode had been truly font-independent or at
> least if unicode had been dependent on only truly free fonts (of complex
> script), the problem could be resolved.
> Opentype, Open Font(ISO) as also the Harfbuzz all orginate from the same
> proprietary standard .
> Is that the reason why many systems cannot adopt complex (i.e.Indian)
> sctipts readily?
> What is the way out?
> Response from several Indian government bodies (including most importantly
> TDIL in DIT) is lukewarm , so far at least.
>
>
> regards

Jitendra

Unicode, which is a character encoding not a glyph encoding standard,
*is* font independent - and even font technology independent.
OpenType, AAT, and Graphite fonts can all render Unicode text for
complex scripts.

OpenType is an openly available specification for fonts which anyone
can use without paying a licence to adobe or microsoft who maintain
the specification

Systems can adopt OpenType readily. All Indian scripts work fine on my
Nokia N900 which runs Linux. For some reason though Google crippled
Android by not including support for OpenType font rendering - or
support fo any alternative technology.

I don't know about TDIL - they should be taking the lead in this by
insisting handsets sold in India can support Indic scripts

- Chris

Thimphu, Bhutan



Lack of Complex script rendering support on Android

2011-11-04 Thread Christopher Fynn
Although Adnroid devices "support Unicode" - it seems there is no
support for complex script rendering on almost all Android devices
which makes them pretty useless for text communication in all Indian
languages (and many others too)

Looking on the Android issues tracker I found over 60 related issues*
 reported since May 2009 - yet nothing seems to be being done to
address this. I'm wondering if there is someone from Google on the
Unicode mailing list who could look into this and let us know if there
are any plans for general support of complex scripts on Android phones
and tablets?  Samsung have apparently included complex script support
on a few devices they sell in India.


* issues: 2600, 3008, 3027, 3029, 4153, 5925, 4153, 6283, 8103, 9045,
9248, 9859, 10685, 10750, 11999, 12674, 12981, 13022, 13967, 14234,
15171, 15895, 16306, 16939, 16144, 17011, 17279, 17291, 17445, 17563,
17573, 17576, 17803, 17850, 17992, 18178, 18235, 18392, 18859, 18936,
18950, 19050, 19352 19410, 19466, 19470, 19691, 19735, 19946, 19963,
21284, 20141, 20161, 20198, 20485, 20486, 20655, 20744, 20772, 20785.
21196, 21382

- chris



Re: Continue: Glaring mistake in the code list for South Asian Script//Reply to Kent Karlsson

2011-10-22 Thread Christopher Fynn
Delex

Nobody's saying Unicode is perfect, but it works.

Please realize that whatever "mistakes" you find in the standard,
Unicode is not going to change the way it has encoded Indic scripts,
the names it has given these scripts / writing systems, or the names
of individual characters. A Character Encoding Standard would hardly
be a useable standard if these things changed over time.

The time to have suggested things be done differently, or that
different names be used, was many years ago when the Indic scripts
were first being included in the UCS. Why did no authority from India
complain at the time?

If you have real problems with the way Unicode has encoded the
characters in Indic scripts, and you think it can be done better, you
are of course welcome to create your own character encoding where e.g.
each of the letters in all of the 1652+ mother tongues of India is
encoded separately and then try and get people to adopt your "better"
system as standard.

Good luck to you.

- C



Re: RTL PUA?

2011-09-03 Thread Christopher Fynn
> What is needed is a way to specify the properties in a
> platform-independent way, where "platform" means not only "OS" but also
> "font technology."

The font format used by all "smart font" technologies (OT, AAT,
Graphite) are all based on the TrueType font file format which allows
you to add any number of custom tables. If the people responsible for
the OT, AAT & Graphite specs agreed on it amongst themselves, it might
be possible to specify an embedded table of properties for PUA
characters that all the different rendering engines could read and
make use of.

That might not be completely "font-technology independent" - but pretty close.

 - C



Re: Non-standard Tibetan stacks

2011-09-02 Thread Christopher Fynn
U+034F seems like a reasonable solution to prevent re-ordering.

However we will probably need to include a way to key this character
on Tibetan and Bhutanese keyboards - and find a way of explaining, in
simple terms, to users why (and when) they need to insert this
character.

Look-up tables in Tibetan fonts would also need updating

- C

On 18/08/2011, Richard Wordingham  wrote:
> On Tue, 16 Aug 2011 23:32:51 +0100
> Andrew West  wrote:
>
>> Chris Fynn asked about certain non-standard stacks he was trying to
>> implement in the Tibetan Machine Uni font in an email to the Tibex
>> list on 2006-12-09, but these didn't involve multiple consonant-vowel
>> sequences (one stack sequence was <0F43 0FB1 0FB1 0FB2 0FB2 0F74 0F74
>> 0F71> which would be reordered to <0F42 0FB7 0FB1 0FB1 0FB2 0FB2 0F71
>> 0F74 0F74> by normalization which would display differently).
>
> Isn't the position now that the correct encoding would be <0F43 0FB1
> 0FB1 0FB2 0FB2 0F74 0F74 034F 0F71>?  If U+034F can prevent the
> misordering of hiriq and patah in Hebrew (TUS Version 6.0 Section
> 16.2), then it should be able to sort out the ordering of Tibetan
> vowels.  What does this stack abbreviate?
>
> I think U+034F is also the answer to distinguishing Tibetan  I, U> and  abbreviations of  and  -
> distinguish them as  and .
>
> Richard.
>
>



Re: Fw: Endangered Alphabets [OT]

2011-09-02 Thread Christopher Fynn
How does this differ from what the Script Encoding Initiative
  is already trying to do?

Chris

On 15/08/2011, d...@bisharat.net  wrote:
> Forwarding the following although it's off the list topic since the scripts
> it covers would include at least some that have figured in discussions here
> and/or the work of various list subscribers. (I personally have no
> connection with this project, so plz address any questions to Mr. Brookes,
> cc'd.)
>
> Don
>
>
> --Original Message--
> From: Harold Schiffman
> Sender: lgpolicy-list-bounces+dzo=bisharat@groups.sas.upenn.edu
> To: Language Policy List
> ReplyTo: Language Policy List
> Subject: [lg policy] Endangered Alphabets
> Sent: Aug 5, 2011 09:41
>
> Forwarded From:  linga...@listserv.linguistlist.org
>
>
> Dear ladies and gents,
>
> I suspect--and hope--that you may be interested in my Endangered
> Alphabets Project, which you can find at
> http://www.endangeredalphabets.com. If you like it enough to want to
> help me move it to the next stage, I would be immensely grateful if
> you'd head over to
> http://www.kickstarter.com/projects/1496420787/the-endangered-alphabets-project/
> and see if anything strikes your fancy. Needless to say, I'd also be
> delighted to hear from you, engage in discussion, send you
> high-resolution photos, and so on.
> Tim
>
>
>
> --
> =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
>
>  Harold F. Schiffman
>
> Professor Emeritus of
>  Dravidian Linguistics and Culture
> Dept. of South Asia Studies
> University of Pennsylvania
> Philadelphia, PA 19104-6305
>
> Phone:  (215) 898-7475
> Fax:  (215) 573-2138
>
> Email:  harol...@gmail.com
> http://ccat.sas.upenn.edu/~haroldfs/
>



Re: Non-standard Tibetan stacks (was Re: Sanskrit nasalized L)

2011-09-02 Thread Christopher Fynn
You can find quite a few "non-standard" stacks (those used in Tibetan
abbreviations) in the book བསྡུ་ཡིག་གསེར་གྱི་ཨ་ལོང།  which is freely
available in PDF format from


- Chris

On 17/08/2011, Asmus Freytag  wrote:
> On 8/16/2011 3:32 PM, Andrew West wrote:
>> On 16 August 2011 18:19, Asmus Freytag  wrote:
 "These stacks are highly unusual and are considered beyond the scope
 of plain text rendering. They may be handled by higher-level
 mechanisms".
>>> The question is: have any such "mechanisms" been defined and deployed by
>>> anyone?
>> In my opinion, until someone produces a scan of a Tibetan text with
>> multiple consonant-vowel sequences, and asks how they can represent it
>> in plain Unicode text there is no question to be answered.
>
> Thank you Andrew - that clarifies the issue for the non-specialist.
>
> A./
>
>>
>> Chris Fynn asked about certain non-standard stacks he was trying to
>> implement in the Tibetan Machine Uni font in an email to the Tibex
>> list on 2006-12-09, but these didn't involve multiple consonant-vowel
>> sequences (one stack sequence was<0F43 0FB1 0FB1 0FB2 0FB2 0F74 0F74
>> 0F71>  which would be reordered to<0F42 0FB7 0FB1 0FB1 0FB2 0FB2 0F71
>> 0F74 0F74>  by normalization which would display differently).
>>
>> Other non-standard stacks that I have seen involve horizontal
>> progression within the vertical stack (e.g. yang written horizontally
>> in a vertical stack).
>>
>> More recently, the user community needed help digitizing Tibetan texts
>> that used the superfixed letters U+0F88 and U+0F89 within non-standard
>> stacks, resulting in a proposal to encode additional letters
>> (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3568.pdf).
>>
>> None of these non-standard stack use cases involved multiple
>> consonant-vowel sequences, and I'm not sure whether I have ever seen
>> an example of such a sequence.  I have learnt that there is little
>> point discussing a solution for a hypothetical problem, because when
>> the real problems arise they likely to be something different.
>>
>> Andrew
>>
>
>
>




Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-18 Thread Christopher Fynn
On 15/07/2011, Karl Pentzlin  wrote:

> In WG2 N4085 "Further proposed additions to ISO/IEC 10646 and comments to 
> other proposals" (2011‐ 05‐25), the German NB had requested re WG2 N4022 
> "Proposal to add Wingdings and Webdings Symbols" besides other points:
>>   "Also, in doing this work, other fonts widespread on the computers of 
>> leading manufacturers (e.g.  Apple) shall be included, thus avoiding the 
>> impression that Unicode or SC2/WG2 favor a single  manufacturer."

In regard to getting their "standard" symbol / dingbats fonts encoded,
isn't Apple way ahead of Microsoft? Didn't the original dingbats
symbols in Unicode get encoded mostly because the ITC Zapf Dingbats
font was built into the Apple Laserwriter?




Re: Telugu vs Kannada confusables

2010-11-28 Thread Christopher Fynn
On 27/11/2010, Shriramana Sharma  wrote:
> On Sat, Nov 27, 2010 at 5:29 PM, Christopher Fynn 
> wrote:
>> I wonder, in a case like this, which of the two scripts takes precedence?
>
> Where's the question of precedence? As I understand it, confusable
> mappings go from higher codepoint to lower codepoint, so it's just a
> question of folding -- something like case folding.
>
> అరగ ಅರಗ అರగ ಅరಗ (you'll have to look at that via UniView to get the
> difference) will all fold to అరగ (all Telugu) if I am not mistaken
> (provided the appropriate Confusables.txt entries are present) and
> given that mixed script domain names are (almost) prohibited, whoever
> registers whichever first -- whether all Kannada or all Telugu -- will
> get precedence, even though in internal processing the Kannada
> codepoints will fold to Telugu.
>
> @ the techies here: I hope I got that right...
>
> Shriramana Sharma.

Do the folded domain names get displayed?

If so, doesn't this have the potential to break some rendering
systems? OpenType rendering engines usually do not apply the the glyph
substitutions and positioning necessary to form the conjuncts in Indic
scripts accross script boundaries (anyway the glyphs for Kannada and
Telugu are likely to be in seperate fonts). If you have to dispaly
mixed Kannada and Telugu characters in a name the results might look
pretty odd.

- C




Re: Telugu vs Kannada confusables

2010-11-27 Thread Christopher Fynn
This is going to cause some fun.

I wonder, in a case like this, which of the two scripts takes precedence?

- C

On 25/11/2010, Shriramana Sharma  wrote:

> Hello. Here's a Telugu vs Kannada confusables list I cooked up right
> now. As this is an important security issue, I post to all the lists
> so that people may contribute. Also, some of this is probably already
> there but I'm going for completeness:

> ANUSVARA
> VISARGA
> LETTER A
...

...
> DIGIT NINE

> That makes sixty two characters in all including the ones marked ?.
> Even *without* the ones marked ? (which I did because I suspected
> others may contest these cases) it comes to fifty two.

> Now to count the characters NOT common or confusable (obviously much
> lesser):
>
> LETTER U
> LETTER UU
...

...
> DIGIT SEVEN

> That comes to sixteen.

> I left out the LLLA of Kannada and the fractions of Telugu which are
> *not* present (as of Unicode 6.0) in the other script because
> obviously there can be no comparison on those.

> So there are at least *thrice* (or at most *four times*) as many
> confusable characters between Kannada and Telugu than there are
> NON-confusables.

> Now can you beat that! Speaking of scripts with a common origin and
> causing potential confusion in IDNs, *I* say Kannada and Telugu takes
> the cake!

> Shriramana Sharma.



Re: Best smart phones & apps for diverse scripts?

2010-11-26 Thread Christopher Fynn
On 29/10/2010, Don Osborn  wrote:
> What do users of this list find to be the most Unicode friendly smart
> phones? Apps for those phones? Best input systems for texting beyond ASCII
> (and potentially multiscriptly)?

I use a Nokia N900 which runs a pretty well full-fleged version of
Linux (Debian based) -  it works fine with complex scripts - though
you do have to install fonts for some scripts.

Anyway, at the time I bought it,  it was the only phone I found that
worked with Tibetan without some serious hacking.

> Thanks in advance for any feedback. I'm back in the US and in the market for
> a new phone, and if I pay for high-end, don't want to be limited to ASCII.
>
>
>
> Don
>
>



Re: Tibetan question

2010-07-16 Thread Christopher Fynn
For the names leave out the initial shad  (ཧར་རྗེར།  ཀསཏར་མན། )

In cases like this even the single shad at the end is optional. Some
dictionaries use it at the end of each headword and some don't - It is
punctuation and doesn't form part of the name and in running text it
wouldn't be there.


for the Book title use a single terminal shad (ཏིན་ཏིན་བོད་ལ་ཕྱིན་པ།)

"༄༅།།"  is  punctuation - not properly part of the title

- c


On 06/06/2010, Αλέξανδρος Διαμαντίδης  wrote:
> Hello,
>
> I don't know Tibetan, but I'd like to add the Tibetan edition of "Tintin
> in Tibet" to the Grand Comics Database (http://www.comics.org/). Can
> someone please help a bit?
>
> First of all, there's an article about the book, including a hi-res scan
> of the cover, in the Tibetan Wikipedia:
>
> http://bo.wikipedia.org/wiki/%E0%BD%8F%E0%BD%B2%E0%BD%93%E0%BC%8B%E0%BD%8F%E0%BD%B2%E0%BD%93%E0%BC%8B%E0%BD%96%E0%BD%BC%E0%BD%91%E0%BC%8B%E0%BD%A3%E0%BC%8B%E0%BD%95%E0%BE%B1%E0%BD%B2%E0%BD%93%E0%BC%8B%E0%BD%94%E0%BC%8D
>
> Now, the cover text is: (taken mostly from the article - please point
> out any mistakes)
>
> །ཧར་རྗེར། (Author - "Hergé")
> ༄༅།། ཏིན་ཏིན་གྱི་དཔའ་རྩལ། (Series title - "Brave Tintin"?)
> ཏིན་ཏིན་བོད་ལ་ཕྱིན་པ།། (Book title - "Tintin went to Tibet"?)
> །ཀསཏར་མན། (Publisher - "Casterman")
>
> My main question is, should I enter the above in the database exactly
> like this? I'm unsure, because I noticed some differences between what's
> on the cover and the Wikipedia entries.
>
> For example, the entry for the book title has a single shad. Should the
> book be indexed like this, or with two shad as shown on the cover? And
> should they be input as two U+0F0D "།" characters, or as a single U+0F0E
> "༎"?
>
> The author and publisher names as shown on the cover also have a shad in
> front, but the corresponding Wikipedia article for Hergé doesn't - it's
> under "ཧར་རྗེར།".
>
> Finally, what does the "༄༅།།" sign mean? Is it part of the series name
> or should it be left out?
>
> I tried searching the web for the transliterations of the series and
> book title, to see what's their literal meaning in English - are the
> translations above correct?
>
> Thanks!
>
> Alexandros
>
>




Re: ISO 10646 compliance and EU law

2004-12-25 Thread Christopher Fynn
Philippe Verdy wrote:
If such a rule exists, it means that the only supported character 
*repertoire* is ISO10646.

 [etc. ]
Like you said "If" - but no one has produced any evidence that such a 
rule exists in fact.

merry xmas
- chris



Re: OpenType vs TrueType (was current version of unicode-font)

2004-12-03 Thread Christopher Fynn
Gary P. Grosso wrote:
Hi Antoine, others,

Questions about OpenType vs TrueType come up often in my work, so 
perhaps the list will suffer a couple of questions in that regard.

First, I see an "O" icon, not an "OT" icon in Windows' "Fonts folder" 
for some fonts and a "TT" icon for others.  Nothing looks like "OT" to me, 
so are we talking about the same thing?
Hi Gary
The "O" icon simply indicates the font has been digitally signed. Though 
the digital signature field is defined in the OpenType specification
the presence of a digital signature in a font does not necessarily 
indicate that the font has any other OpenType features. Many OpenType 
fonts with advanced features have not been digitally signed and 
consequently do not display the "O" icon in Windows.

Next, if I double-click on one of the "fonts" (files), I get a window 
which shows a sample of the font, at the top of which is the font name, 
followed by either "(OpenType)" or "(TrueType)".  Can I believe what 
that says as indicative of whether this is truly OpenType or TrueType?
OpenType is a superset of TrueType - so all Windows fonts which conform 
to the TrueType specification could also be called OpenType. If it says
OpenType in the sample window it doesn't mean very much.

If you want to be able to find out more useful information about Windows
fonts use Microsoft's Font Properties Extension:
<http://www.microsoft.com/typography/TrueTypeProperty21.mspx>
If the Font Properties Extension is installed you can then R-click on a 
font file in Windows Explorer and bring up a "Properties" dialog - in 
this dialog there is a "Features" panel which will tell you whether or 
not there are any OpenType GSUB and GPOS tables in the font.

Mostly how this comes up is we have customers ask if we support OpenType 
fonts, to which I reply with some variation of "it depends".  I usually 
say the OpenType spec is complex, but we handle all the commonly-used fonts 
we know of, and follow it by saying that they can look in their Fonts folder 
(at the icon) to see some examples of OpenType fonts.  
So that is the background for my questions.
When people ask whether your application supports OpenType fonts, what I 
expect they mean is "Does your application make use of the GSUB And
GPOS lookups in OpenType fonts?". Supporting OT GSUB and GPOS lookups is 
*necessary* for proper display of Unicode data for complex scripts 
(Arabic, Devanagari, Bengali, Tamil, Tibetan, Khmer, Sinhala etc.)in 
Windows (and many Linux) applications.

If your application supports TrueType but does not support the OpenType 
lookups you will still see some glyphs using the OpenType font but these 
will probably not be the correct ones as your application won't be 
showing the correct contextual forms necessary for languages written in 
these scripts.

Large "Pan-Unicode" fonts like "Arial Unicode MS" usually do not contain
proper OpenType tables and ligatures for *all* the scripts the font 
covers. For example "Arial Unicode MS" and "Code 2000" contain glyphs 
for Tibetan script but they *do not* contain the OpenType GSUB and GPOS 
lookups necessary to display Tibetan correctly.

If a Windows application needs to properly display Unicode text for 
languages such as Hindi, Tamil, Bengali, Nepali, Sinhala, Arabic, Urdu 
and so on then it probably needs to support OpenType GSUB and GPOS lookups.

For Latin script, OpenType lookups are mainly used to place combining 
diacritics properly and for advanced typographic features such as true 
small saps, Swashes, automatic ligatures, old-style figures and do on.

If you have more questions about OpenType, then the OpenType
mailing list <[EMAIL PROTECTED]> may be a more
appropriate forum to ask those questions.
Regards
Chris
==
Christopher Fynn



Re: Ezra

2004-11-22 Thread Christopher Fynn
Peter Kirk wrote:
Chris, this may be true for those of who are still using pre-Unicode 
applications and code pages. But for those of us using Unicode 
applications on Unicode-based OSs it is the PUA characters which are 
stored. This point caused no end of problems when Word 97 was 
introduced, and documents such as legacy Hebrew were converted to the 
new format either according to the code page or into the PUA symbol area 
according to certain details in the font which at that time few of us 
understood. But in the past 7 years we have come to work mostly with 
Unicode applications and so have almost forgotten about such pains.
Your right! - it's been so many years since I used anything that used a 
Windows symbol font encoding hack that I hadn't hadn't noticed the 
change. Most of the more recent font-hack encodings seem to use
"Windows ANSI" for the font encoding.

- Chris




Re: Ezra

2004-11-22 Thread Christopher Fynn
Peter Kirk wrote:
Perhaps I should clarify further. SIL Ezra was designed to use "legacy 
Latin-1 override or similar hacks". For example, in its Windows 
Character Set table it uses 0x41/0x61 for "a" sounds, 0x42/0x62 for "b" 
sounds etc - although it goes beyond Latin-1 in using nearly every code 
point from 32 to 255. But for various technical reasons connected with 
Windows, it is encoded as a Windows Symbol font, which means that its 
Unicode tables are mapped not to U+0020 to U+00FF, but to U+F020 to 
U+F0FF, following Windows Symbol conventions. This makes it rather less 
of hack in that PUA characters are used rather than regular Unicode code 
points being reused. And it can only be called a hack in retrospect, for 
it was designed at a time when full Unicode Hebrew was barely defined 
and certainly not widely implemented.
It doesn't really make it "less of a hack" since Windows just maps the
glyphs encoded from from F020 to F0FF in the cmap of "Windows Symbol" 
fonts to characters x20-xFF in the Windows code page for your locale 
(normally "Windows ANSI" if you are in the US or UK). You still type in 
non PUA characters, and those non PUA characters are what gets stored in 
your files - *not* PUA characters.

- Chris





Exporting Unicode UTF-8 from Word (was: Re: utf-8 and unicode fonts on LINUX)

2004-11-22 Thread Christopher Fynn
Cristian Secară wrote:
On Mon, 22 Nov 2004 13:38:04 +0100, kefas wrote:

I tried UTF-8 export to send an e-mail that contained 
several scattered unicode codepoints from the full 
16-bit range from   to  from XP+Word [...]

Just curious - how do you export UTF-8 from MS Word ? AFAIK, the only
way to do that is to copy from Word & paste to Notepad, then save as
UTF-8.
Alternatively, copy from Word & paste to e-mail recipient, whose
encoding is set to UTF-8.
Hm ?
Cristi
If you have a Word document with Unicode characters
choose: File Save As, Save as type: Plain Text
enter a file name, and click on "Save"
This should bring up a "File Conversion" dialog
which says: "Warning: Saving as a text file will cause all formatting, 
pictures and objects in your file to be lost"
Under this it should say:
"Text Encoding:" followed by three radio buttons
o Windows (default) o MS-DOS o Other encoding

Select: Other Encoding
This should activate a list box at the right containing names of 
numerous code pages and character encodings. Scroll right down and you 
will find:

Unicode
Unicode (Big Endian)
Unicode (UTF-7)
Unicode (UTF-8)
Select "Unicode (UTF-8)" and Click OK
- Chris



Re: [even more increasingly OT-- into Sunday morning] Re: Unicode HTML, download

2004-11-21 Thread Christopher Fynn
Thanks Michael
This is useful information. Unfortunately I usually need to use static 
HTML - so I can't use the ASP parts.  It would be nice see something 
like this working on UTF-8 encoded web pages where lang  is defined. In 
most cases knowing the text is a specific language and knowing the page 
is Unicode would let you know which script is being used.

I'd also like to figure out a way to trigger this kind of behavior  in 
other browsers as well as in IE (using Java Script or Java rather than 
VB)  as not quite everyone uses IE - (but I guess you are not going to 
give me any more clues on how to do that :-) )

regards
- Chris

Michael (michka) Kaplan wrote:
From: "Stefan Persson" <[EMAIL PROTECTED]>

I haven't used M$ IE for many years, though, and my
memory might be wrong.

Blinded by the misspelling of the product name, maybe? :-)

See http://msdn.microsoft.com/msdnmag/issues/0700/localize/ and the section
entitled "Choosing Character Sets" for info on what is going on here,
particularly firgures 3 and 4 for info on how to script the behavior for the
UTF-8 case

MichKa [MS]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Windows International Division




Re: [increasingly OT--but it's Saturday night] Re: Unicode HTML, download

2004-11-21 Thread Christopher Fynn
Doug Ewell wrote:
Beyond that, you might want to specify a font family using CSS (doesn't
have to be in a separate CSS file, either) to improve the odds that the
reader will see Hebrew instead of hollow boxes, but this is optional.
While we are on the (off) topic of HTML, browsers etc. 
I've noticed, that with Windows and IE, - when going to a page with 
characters for a script for which fonts are not installed my system, IE 
will sometimes ask whether or not I want to download & install fonts for 
that script from Microsoft's web site.
This only happens in some cases - even where the same script is 
involved. I've looked the source of some of these pages but I've never 
been able to identify just what what triggers this. Does anyone know?

- Chris



Re: Eudora 6.2 has been released

2004-11-17 Thread Christopher Fynn
Michael Everson wrote:
It still has no Unicode support.

Isn't that disappointing.
Hi Michael
You've been complaining about this for years - maybe time to switch to 
something else?

The Mozilla Thunderbird mail client works very well with Unicode.

Thunderbird & Eudora both use the same "mbox" format to store messages.
On PC's MS Outlook Express also works very well with Unicode - though 
some people have security concerns about it.

- Chris



Re: Opinions on this Java URL?

2004-11-15 Thread Christopher Fynn
Isn't it already deprecated?  The URL that started this thread

is marked as part of the "Deprecated API"
- Chris
Norbert Lindenberg wrote:
Theodore,
Thank you for your feedback. Adding a warning to the description in  
DataInput sounds like a good idea. In the meantime, if somebody wants  
to use modified UTF-8 outside the Java context, please point them to
http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ 
index.html#Modified_UTF-8

Unfortunately, since this encoding is widely used within the Java  
context, we can not deprecate it.

Best regards,
Norbert
Java Internationalization
java.sun.com/j2se/corejava/intl



Re: [africa] Unicode & IDNs

2004-11-11 Thread Christopher Fynn
Peter Kirk wrote:

Not in the DNS server. The problem was that the browser was looking for 
http://www.%c9%99%c9%9b.net/ rather than http://www.ÉÉ.net/, in other 
words exactly the same problem as Otto found with Netscape 6.2. The 
clipboard contained http://www.%c9%99%c9%9b.net/, as both Unicode text 
and basic plain text. There was obviously a problem in how Mozilla 
copied this address to the clipboard.
Thats odd because if I copy  http://www.ÉÉ.net/ from Mozilla's
navigation bar to the clipboard and paste it into MS Word or Windows 
Notepad it remains as http://www.ÉÉ.net/  and if I look in Windows XP 
Clipbook viewer it is also correct - so Mozilla, at least the version 
I'm using, seems to be copying to the clipboard OK.

I *do* get problems if I paste or type this URL into the IE 6 navigation
bar. The text of the URL displays OK in the navigation bar but, almost
immediatly, Internet Explorer just comes up with "Cannot find server The
page cannt be displayed. The page you are looking for is currently
unavailable" etc. - mind you I don't have the Verisign add-in installed.
[Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7) Gecko/20040616]
[Internet Explorer Version 6.0.2900.2180.xpsp_sp2_rtm.040803-2158]
[Win XP Version 5.1 (Build 2600.xpsp_sp2_rtm.040803-2158:Service Pack 2]
===
Subsequent to writing the above I've installed Mozilla 1.7.3 en-GB build
and atill have no problem going to http://www.ÉÉ.net/ or copying the URL 
to the clipboard. Exactly the same results as with v1.7.
[Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.3) Gecko/20040910]

Also no problems with Firefox 1.0
[Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) 
Gecko/20041107 Firefox/1.0]

- Chris






Re: [africa] Unicode & IDNs

2004-11-10 Thread Christopher Fynn
Peter Kirk wrote:
Strangely enough, it works today, after rebooting and restarting 
Mozilla. Perhaps Mozilla has picked up the Verisign plug-in for IE. Or 
perhaps there is some other subtle setting which has changed itself.
I don't have the VeriSign plug in for IE installed so it can't be that..
Maybe it was a glitch in your DNS server?
Well, Mozilla certainly shouldn't rely on people running IE first!




Re: [africa] Unicode & IDNs

2004-11-09 Thread Christopher Fynn
Peter Kirk wrote:
On 09/11/2004 22:43, Karl Pentzlin wrote:

www.ÉÉ.net - the site name between "www." and ".net" is U+0259 U+025B
(The site was set up for test purposes only and contains no real 
information.
It can be reached by www.xn--snae.net also.)

Doesn't work in Mozilla 1.7.3 on Windows XP. Does anyone know, will it 
be supported?
Hmmm, this is odd since it works fine for me with Mozilla 1.7 on XP.
If I click on the www.ÉÉ.net URL in the email message in Thunderbird
Mozilla is launched and goes straight to the page - though the URL 
somehow gets translated to www.xn--snae.net.

If I restart Mozilla and then paste www.ÉÉ.net in it works as well. Only 
this time the URL is neither translated to www.xn--snae.net nor 
converted to http://www.%c9%99%c9%9b.net/ as you describe in your other 
message - it stays as www.ÉÉ.net.

 - Chris



  1   2   >