date:20110911

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Stephan Stiller


Well, it's not that complicated.  Ligatures in German must not happen
at compound break points, while they can be applied to ordinary break
points.

Consider the word `Dorfladen' (village shop).  Using `=' to indicate a
compound break point and `-' for normal ones, the proper break points
are `Dorf=la-den' which means no `fl' ligature.  Note that `Fladen'
means `cow dung', so having a ligature there is really bad.


But "Dorfladen" is not ambiguous. Asmus war referring to ambiguous cases 
created by the way compound words are spelled in German. For those, some 
user interaction is necessary, and it's my view that there are 
unobtrusive ways of interacting with the user about this.


(But then it needs to be acknowledged that ambiguous cases probably 
exist or can be constructed in a lot of languages. And the frequency of 
such ambiguity occurring in actual German text isn't that high. Even 
more so if one takes into account the orthographic recommendation to use 
an explicit hyphen in ambiguous cases. But of course these cases, if 
they occur, need to be handled nevertheless.)


Stephan

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Werner LEMBERG

>> Certain layout processes, in certain cases, in certain languages,
>> simply can't be fully automated.
> 
> And interestingly, there is a crucial difference between ligatures
> and hyphenation in this regard: While a conservative processor could
> simply omit hyphenation in ambiguous cases (potentially leading to
> suboptimal linebreaking though), a decision ought to be made for
> ligatures if one uses a font requiring them. But then, although
> getting ligatures wrong in this case is categorically somehow
> "worse" than too-wide inter-word spacing, who knows which visual
> effect actually has more adverse effect on the reading process ...

Well, it's not that complicated.  Ligatures in German must not happen
at compound break points, while they can be applied to ordinary break
points.

Consider the word `Dorfladen' (village shop).  Using `=' to indicate a
compound break point and `-' for normal ones, the proper break points
are `Dorf=la-den' which means no `fl' ligature.  Note that `Fladen'
means `cow dung', so having a ligature there is really bad.

On the other hand, consider `Löffel' (spoon).  Inspite of the
hyphenation `Löf-fel', a ligature looks good.


   Werner

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Stephan Stiller



From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in 
Germany) if things are designed well (which is the job of the font 
designer) and for the user everything is handled automatically in 
the background by the available technology ...


Which cannot happen for German, as it is one of the languages where 
the same letter pair may or may not have a ligature based on the 
*meaning* of the word - something that you can't automate.


You are absolutely right!




Certain layout processes, in certain cases, in certain languages, 
simply can't be fully automated.


*Actually*, the emphasis here is on the word "fully". Writing a 
(language-specific) tool (or wordprocessor plugin) for semi-automated 
processing would be so easy - something that walks you through all cases 
of ambiguous hyphenation and ligatures (if the font so requires). An 
unobtrusive way of doing this would be if the word processor simply put 
a purple squiggly line under each word needing closer inspection, for 
right-click fixing. I'm really wondering why such tools are not employed 
or - if they are - I haven't heard of them ...


Stephan

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Stephan Stiller



From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) 
if things are designed well (which is the job of the font designer) 
and for the user everything is handled automatically in the 
background by the available technology ...


Which cannot happen for German, as it is one of the languages where 
the same letter pair may or may not have a ligature based on the 
*meaning* of the word - something that you can't automate.


You are absolutely right!

We had famous discussions on this list on this subject. Take an "st" 
ligature. There are two meanings for the German word "Wachstube", only 
one allows the st ligature. A human would have to decide when the 
ligature is appropriate. (Incidentally, the same goes for hyphenation 
for this word, one meaning allows a hyphen after the "s" the other 
does not).


Certain layout processes, in certain cases, in certain languages, 
simply can't be fully automated.


And interestingly, there is a crucial difference between ligatures and 
hyphenation in this regard: While a conservative processor could simply 
omit hyphenation in ambiguous cases (potentially leading to suboptimal 
linebreaking though), a decision ought to be made for ligatures if one 
uses a font requiring them. But then, although getting ligatures wrong 
in this case is categorically somehow "worse" than too-wide inter-word 
spacing, who knows which visual effect actually has more adverse effect 
on the reading process ...


There are two ways of generalizing from a situation where a locale tends 
to preferably use fonts without (and not necessitating) ligatures: If 
fonts with ligatures are introduced ...
(1) [generalizing: "we're not using ligatures"] ... the community is 
going to find it distracting because it is not used to the ligatures, 
plus there may be inherent problems with this for the respective locale 
anyways.
(2) [generalizing: "presently used fonts don't use ligatures"] ... the 
community won't find it distracting because good fonts will do ligatures 
well.

(while the great majority of laymen might neither notice nor care ...)

This theoretical ambiguity in generalizing simply arises from the fact 
that "not using ligatures" is equivalent to "not using fonts 
having/necessitating ligatures" in Germany.


Lots of the English-language discussion of ligatures I've seen tacitly 
assumes that "good" typesetting with "good" fonts" "should" use 
ligatures in certain cases, and I just disagree with this assumption. 
Well, forgive me if maybe I'm just getting the wrong impression, being a 
layman on this matter.


Stephan

RE: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Peter Constable

There are certainly monospaced fonts that support Arabic. For instance, Windows 
fonts Courier New and Simplified Arabic Fixed support Arabic.

Devanagari is a different matter.

Peter

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Richard Wordingham
Sent: Sunday, September 11, 2011 5:19 PM
To: Unicode Discussion
Subject: Re: ligature usage - WAS: How do we find out what assigned code points 
aren't normally used in text?

On Sun, 11 Sep 2011 23:14:04 +0200
Kent Karlsson  wrote:

> Den 2011-09-11 18:53, skrev "Peter Constable"
> :

> > Hence, in a monospaced font, FB01 certainly should look different 
> > from <0066,
> > 0069>, regardless of whether ligature glyphs are used in either 
> > 0069>case.
> 
> If "monospace" is interpreted that rigidly, then it is much better
> *not* to have any glyph at all for FB01 (and other characters like
> it) in a "monospace" font.

Aesthetically you're correct, but U+FB01 and U+00E6 LATIN SMALL LETTER A WITH 
DIAERESIS both have the ID start property, and the latter is definitely allowed 
in C identifiers.  While U+00E6 is much securer as a character, it too tends to 
be quite ugly in monospaced fonts.  (Courier can be quite useful for setting 
off text as computer code, especially variable and function names.)

Incidentally, are there working definitions of monospace for Arabic and 
Devanagari?

Richard.

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Richard Wordingham

On Sun, 11 Sep 2011 23:14:04 +0200
Kent Karlsson  wrote:

> Den 2011-09-11 18:53, skrev "Peter Constable"
> :

> > Hence, in a monospaced font, FB01 certainly should look different
> > from <0066,
> > 0069>, regardless of whether ligature glyphs are used in either
> > 0069>case.
> 
> If "monospace" is interpreted that rigidly, then it is much better
> *not* to have any glyph at all for FB01 (and other characters like
> it) in a "monospace" font.

Aesthetically you're correct, but U+FB01 and U+00E6 LATIN SMALL LETTER
A WITH DIAERESIS both have the ID start property, and the latter is
definitely allowed in C identifiers.  While U+00E6 is much securer as a
character, it too tends to be quite ugly in monospaced fonts.  (Courier
can be quite useful for setting off text as computer code, especially variable 
and function names.)

Incidentally, are there working definitions of monospace for Arabic and 
Devanagari?

Richard.

Don't be evil - Unicode Inc President

2011-09-11 Thread tulasi

The subject was before
"Re: Mail filtering, and Tulasi - (was) Re: Everson's Ahom proposal"
Changed it to "Don't be evil - Unicode Inc President"

"Shall Mark Davis continue to encoding any letter/symbol used in scripture
like Koran?".

The "unicode hot f**k" portal link/passage portion is omitted from the email
~mark (Mark E. Shoulson) cited and used as reference to compose the reply
(appended herewith).

It was Everson/Magda who omitted that portion before the email was delivered
to unicode forum. So ~mark did not read the original email:
http://www.mail-archive.com/unicode@unicode.org/msg28840.html

~mark am I correct?

Fyi, Magda recently moved that "unicode hot f**k" portal to "under cover"
state.

> You're really out of line making cracks about how much money
> Mark Davis is or should be making; it's (a) not your business and
> (b) not relevant to the discussion at hand.

To experiment one shall post a message to a worldwide Islamic forum.
In the message s(he) shall mention facts like:

Mark Davis, President of Unicode Inc, has supervised encoding some
letters/symbols used in "Koran".

google upper-deck executives orchestrate google policies.
google advertises in "hot f**k" portal for revenue.
CNN and other news say google profited from add in "prescription drug
abuse".

> Might as well get this out into the open.

Portion of Mark Davis (Unicode Inc President) income come from "hot f**k"
portal revenue and "prescription drug abuse".

So shall Mark Davis continue to encoding any letter/symbol used in scripture
like "Koran"?

Google add on hot f**k portal:
http://techstack.com/forum/apache/70011-hi-im-16-hot-f*u*c*k-me-night-free.html
http://www.xred2.com/Amazing_hot_amateur_on_Fucking_Machines___Hardcore_sex_video

Prescription drug abuse:
http://articles.cnn.com/2011-06-22/us/google.drug.ads_1_prescription-drug-abuse-google-advertising-internet-search-giant-google/2?_s=PM:US

http://www.ktbs.com/news/28409624/detail.html

Who is Magda Danish?
Administrator of unicode @ unicode.org forum
http://www.jigsaw.com/scid11549476/magda_danish.xhtml?ver=5
http://unicode.org/consortium/directors.html
http://unicode.org/consortium/img/magda.jpg

> ~mark
> (NOT employed by any large corporation, not making money off
> advertisements, etc.  That good enough?)

Shall be good enough if not from add in hot f**k portal / drug abuse as well
as other low-moral conscious-deficient activity.

Is this response good enough?

Tulasi

From: Mark E. Shoulson 
Date: Wed, Jun 29, 2011 at 6:29 PM
Subject: Re: Mail filtering, and Tulasi - (was) Re: Everson's Ahom proposal
To: tulasi 
Cc: Unicode Discussion 

On 06/29/2011 02:58 PM, tulasi wrote:

>
> This unicode @ googlegroups is a property of Google Inc.
> Do you know that 97% of google revenue comes from advertisement?
> http://gigaom.com/2009/07/17/where-does-google-get-97-of-its-revenue/
>
> It seems Mark Davis is upper-deck executive at Google Inc.
> So part of his living comes from the revenue that Google Inc earns from
> advertisements through protocol like unicode @ googlegroups
>
> My suggestion to Mark:-
> Ask Google Inc to clean-up all such protocol and take pay-cut -
> fyi academia in California has been living with at least 15% pay-cut for a
> while.
> Shall Mark do so it shall elevate Unicode Inc moral/consciousness.
> I hope so!
>

Look, this keeps going on and you really should stop it.  First of all,
saying that Google gets 97% of its income from advertising is like saying
that doctors get 97% of their income from patients.  That's the business
they're in, were you expecting something else?

You seem to have some kind of axe to grind against Unicode for being a
commercial entity.  Might as well get this out into the open.  Standards
committees generally are part of the commercial sector, because businesses
are a big part of what will use and be affected by the standards.  I'm not
sure how you propose to do what needs to be done to create a standard like
this without business involvement.  There are occasional complaints and
resentment (including from me) about perceived overstrong influence of this
or that company in some of the decision-making, but that's to be expected,
and I don't think anyone (except you) really believes that some companies
are out there maliciously trying to tweak Unicode in order somehow to make
them money unfairly.

You're really out of line making cracks about how much money Mark Davis is
or should be making; it's (a) not your business and (b) not relevant to the
discussion at hand.

So, out with it: what is it that you suspect Unicode Inc and Google are
doing that is so underhanded and unfair that you need to make these
insinuations?  Let's at least set that to rest so you can talk about the
*actual* business of Unicode without such distractions.

~mark
(NOT employed by any large corporation, not making money off advertisements,
etc.  That good enough?)

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Kent Karlsson


Den 2011-09-11 18:53, skrev "Peter Constable" :

> There's no requirement that the width of glyphs in a monospaced font be 1 em.
> I would agree, though, that if a monospaced font forms a ligature of a pair
> like <0066, 0069>, then it should be twice the width (not necessarily 2em) of
> single-character glyphs.

That's fine (assuming the ligature is well designed, in the case of a
monospace font connecting the bar of the f to the top serif of the i and
only that).

> In a monospace font, nothing prevents the glyph for FB01 being a ligature, and
> some monospaced fonts do have a ligature glyph for that character.

Fine too. But see below.

> Of course, in a monospaced font, the glyph for that character should be the
> same width as all other glyphs. So if it's not a ligature, then the "f" and
> "i" elements still need to be narrower than the glyphs for 0066 and 0069.
> 
> Hence, in a monospaced font, FB01 certainly should look different from <0066,
> 0069>, regardless of whether ligature glyphs are used in either case.

If "monospace" is interpreted that rigidly, then it is much better *not* to
have any glyph at all for FB01 (and other characters like it) in a
"monospace" font.

/Kent K

> 
> Peter
> 
> -Original Message-
> From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
> Of Philippe Verdy
> Sent: Saturday, September 10, 2011 10:33 PM
> To: Michael Everson
> Cc: unicode Unicode Discussion
> Subject: Re: ligature usage - WAS: How do we find out what assigned code
> points aren't normally used in text?
> 
> 2011/9/11 Michael Everson :
>> On 11 Sep 2011, at 00:23, Richard Wordingham wrote:
>> 
>>> A font need not support such ligation, but a glyph for U+FB01 must
>>> ligate the letters - otherwise it's not U+FB01!
>> 
>> Not in monowidth, it doesn't.
> 
> I also agree, a monospaced font can perfectly show the dot and ligate the
> letters, using a "double-width" (2em) ligature without any problem, or simply
> not map it at all, or choose to just map a composite glyph made of the
> 1em-width glyphs assigned to the two letters f and (dotted) i without showing
> any visible ligation between those glyphs (this being consistant with
> monospaced fonts that remove all ligations, variable advances and kernings
> between letters).
> 
> You could as well have a font design in which all pairs or Latin letters are
> joined, including in a monospaced font, in which case you should not see any
> difference between FB01 and the pair or Basic Latin letters. Joining letters
> is fully independant of the fact that the upper part of letter f may or may
> not interact graphically with the presence of a dot. If the style of letter
> glyphs does not cause any interaction, there's no reason to remove the dot
> over i or j in the "ligature" or joining letters.
> 
> You should not be limited by the common style used in modern Times-like fonts
> (notably in italic styles, where the letter f is overhanging over the nearby
> letters). Other font styles also exist that do not require adjustment to
> remove the dot, or merge it with a graphic feature of the preceding letter f
> which is specific to some fonts.
> 
> As the pair of letters f and (dotted) i is perfectly valid in Turkish, there's
> absolutely no reason why the fi ligature would be invalid in Turkish. But
> given that this character is just provided for compatibility with legacy
> encodings, I would still not recommand it for Turkish or for any other
> language, including English. This FB01 character is not necessary to any
> orthography and if possible, should be replaced by the pair of Basic Latin
> letters (and in fact I don't see any reason why a font would not choose to do
> this everywhere)
> 
> -- Philippe.
> 
> 
> 
>

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Peter Zilahy Ingerman, PhD

An old acquaintance of mine, many years ago, pointed out two cases in 
Dutch: a hunter of kiwi birds, kiwijager, cannot use the customary ij 
ligature. And as for parsing ambiguities, he observed that there were 
three different ways of understanding the word "kwartslagen", depending 
on whether it was read "kwart-slagen", "kwarts-lagen", or "kwart-sla-gen".


Peter Ingerman

On 2011-09-11 00:42, Asmus Freytag wrote:

On 9/9/2011 8:12 PM, Stephan Stiller wrote:

Dear Martin,

Thanks for alerting me to the issue of causal direction of aesthetic 
preference - it's been on my mind, but your reply helps me sort out 
some details.


When I first encountered text (outside of the German language locale) 
with ample use of ligatures in modern printed text, I definitely 
found the ligatures a bit distracting, but partly just because I 
wasn't used to them. I also perceived them as a solution to what (in 
Germany) appeared to me to be a real non-issue.


Put simply, there is a conflict between full flexibility for font 
designs and the burden imposed by sophisticated ligatures and kerning 
tables.


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) 
if things are designed well (which is the job of the font designer) 
and for the user everything is handled automatically in the 
background by the available technology ...


Which cannot happen for German, as it is one of the languages where 
the same letter pair may or may not have a ligature based on the 
*meaning* of the word - something that you can't automate.


We had famous discussions on this list on this subject. Take an "st" 
ligature. There are two meanings for the German word "Wachstube", only 
one allows the st ligature. A human would have to decide when the 
ligature is appropriate. (Incidentally, the same goes for hyphenation 
for this word, one meaning allows a hyphen after the "s" the other 
does not).


Certain layout processes, in certain cases, in certain languages, 
simply can't be fully automated.


A./


Stephan

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Asmus Freytag


On 9/9/2011 8:12 PM, Stephan Stiller wrote:

Dear Martin,

Thanks for alerting me to the issue of causal direction of aesthetic 
preference - it's been on my mind, but your reply helps me sort out 
some details.


When I first encountered text (outside of the German language locale) 
with ample use of ligatures in modern printed text, I definitely found 
the ligatures a bit distracting, but partly just because I wasn't used 
to them. I also perceived them as a solution to what (in Germany) 
appeared to me to be a real non-issue.


Put simply, there is a conflict between full flexibility for font 
designs and the burden imposed by sophisticated ligatures and kerning 
tables.


From my background I never perceived a need, but I guess I (and most 
people??) wouldn't really mind the tradition coming back (in Germany) 
if things are designed well (which is the job of the font designer) 
and for the user everything is handled automatically in the background 
by the available technology ...


Which cannot happen for German, as it is one of the languages where the 
same letter pair may or may not have a ligature based on the *meaning* 
of the word - something that you can't automate.


We had famous discussions on this list on this subject. Take an "st" 
ligature. There are two meanings for the German word "Wachstube", only 
one allows the st ligature. A human would have to decide when the 
ligature is appropriate. (Incidentally, the same goes for hyphenation 
for this word, one meaning allows a hyphen after the "s" the other does 
not).


Certain layout processes, in certain cases, in certain languages, simply 
can't be fully automated.


A./


Stephan

RE: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Peter Constable

There's no requirement that the width of glyphs in a monospaced font be 1 em. I 
would agree, though, that if a monospaced font forms a ligature of a pair like 
<0066, 0069>, then it should be twice the width (not necessarily 2em) of 
single-character glyphs.

In a monospace font, nothing prevents the glyph for FB01 being a ligature, and 
some monospaced fonts do have a ligature glyph for that character. 

Of course, in a monospaced font, the glyph for that character should be the 
same width as all other glyphs. So if it's not a ligature, then the "f" and "i" 
elements still need to be narrower than the glyphs for 0066 and 0069. 

Hence, in a monospaced font, FB01 certainly should look different from <0066, 
0069>, regardless of whether ligature glyphs are used in either case.

Peter

-Original Message-
From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf 
Of Philippe Verdy
Sent: Saturday, September 10, 2011 10:33 PM
To: Michael Everson
Cc: unicode Unicode Discussion
Subject: Re: ligature usage - WAS: How do we find out what assigned code points 
aren't normally used in text?

2011/9/11 Michael Everson :
> On 11 Sep 2011, at 00:23, Richard Wordingham wrote:
>
>> A font need not support such ligation, but a glyph for U+FB01 must 
>> ligate the letters - otherwise it's not U+FB01!
>
> Not in monowidth, it doesn't.

I also agree, a monospaced font can perfectly show the dot and ligate the 
letters, using a "double-width" (2em) ligature without any problem, or simply 
not map it at all, or choose to just map a composite glyph made of the 
1em-width glyphs assigned to the two letters f and (dotted) i without showing 
any visible ligation between those glyphs (this being consistant with 
monospaced fonts that remove all ligations, variable advances and kernings 
between letters).

You could as well have a font design in which all pairs or Latin letters are 
joined, including in a monospaced font, in which case you should not see any 
difference between FB01 and the pair or Basic Latin letters. Joining letters is 
fully independant of the fact that the upper part of letter f may or may not 
interact graphically with the presence of a dot. If the style of letter glyphs 
does not cause any interaction, there's no reason to remove the dot over i or j 
in the "ligature" or joining letters.

You should not be limited by the common style used in modern Times-like fonts 
(notably in italic styles, where the letter f is overhanging over the nearby 
letters). Other font styles also exist that do not require adjustment to remove 
the dot, or merge it with a graphic feature of the preceding letter f which is 
specific to some fonts.

As the pair of letters f and (dotted) i is perfectly valid in Turkish, there's 
absolutely no reason why the fi ligature would be invalid in Turkish. But given 
that this character is just provided for compatibility with legacy encodings, I 
would still not recommand it for Turkish or for any other language, including 
English. This FB01 character is not necessary to any orthography and if 
possible, should be replaced by the pair of Basic Latin letters (and in fact I 
don't see any reason why a font would not choose to do this everywhere)

-- Philippe.

UAX #14 (UCA): Derived primary weight ranges

2011-09-11 Thread Philippe Verdy

I think that the UCA forgets to specify which are the valid primary weights
infered from the default rules used in the current DUCET.

# Derived weight ranges:  FB40..FBFF
#   [Hani] core primaries:   FB40..FB41 (2)
  U+4E00..U+9FFFFB40..FB41 (2)
  U+F900..U+FAFFFB41   (1)
#   [Hani] extended primaries:   FB80..FB9D (30)
  U+3400..U+4DBFFB80   (1)
  U+2..U+E  FB84..FB9D (29)
#   Other primaries: FBC0..FBE1 (34)
  U+..U+E   FBC0..FBDD (30)
  U+F..U+10 FBDE..FBE1 (4)
#  Trailing weights: FC00.. (1024)

It clearly exhibits that the currently assigned ranges of primary weights
are way too large for the use.

- Sinograms can fully be assigned a first primary weight within a set of
only 32 values, instead of the 128 assigned.

- This leaves enough place to separate the primary weights used by PUA
blocks (both in the BMP or in planes 15 and 16), which just requires 1
primary weight for the PUAs in the BMP, and 4 primary weights for the last
two planes (if some other future PUA ranges are assigned, for example for
RTL PUAs, we could imagine that this count of 5 weights would be extended
to

- All other primaries will never be assigned to anything outside planes 0 to
14, and only for unassigned code points (whose primary weight value should
probably be between the first derived primary weights for sinograms, and
those from the PUA), so they'll never need more than 30 primary weights.

Couldn't we remap these default bases for derived primary weights like this,
and keep more space for the rest:

# Derived weight ranges: FBB0..FBFF (80)
#   [Hani] core primaries:  FBB0..FBB1 (2)
  U+4E00..U+9FFFFBB0 (1)
(using base=U+2000 for the 2nd primary weight)
  U+F900..U+FAFFFBB1 (1)
(using base=U+A000 for the 2nd primary weight)
#   [Hani] extended primaries:  FBB2..FB9D (30)
  U+3400..U+4DBFFBB2   (1)
(using base=U+2000 for the 2nd primary weight)
  reserved  FBB3   (1)
  U+2..U+E  FBB4..FBCF (26)
(using base=U+n or U+n8000 for the 2nd primary weight)
#   Other non-PUA primaries:FBD0..FBEF (32)
  U+..U+E   FBD0..FBED (30)
(using base=U+n or U+n8000 for the 2nd primary weight)
  reserved  FBEE..FBEF (2)
#   PUA primaries:  FBF0..FBFF (16)
  U+D800..U+DFFFFBF0   (1)
(using base=U+n8000 for the 2nd primary weight)
  reserved  FBF1..FBFB (11)
  U+F..U+10 FBFC..FBFF (4)
(using base=U+n or U+n8000 for the 2nd primary weight)
# Trailing weights:  FC00.. (1024)

This scheme completely frees the range FB40..FBAF, while reducing the gaps
currently left which will never have any use.

(In this scheme, I have no opinion of which best range to use for code
points assigned to non-characters, but they could all map to FBFF, used here
for PUA, but with the second primary weight at end of the encoding space
8000.. moved to 4000..BFFF so that the second primary weight for
non-characters goes easily into C000..)

This way, we would keep ranges available for future large non-sinographic
scripts (pictographic, non-Han ideographic), that would probably use only
derived weights, or for a refined DUCET containing more precise levels or
gaps facilitating some derived collation tables (for example in CLDR).

And all PUAs would clearly sort within dedicated ranges of primary weights,
with a warranty of all being sorted at end, after all scripts.

-- Philippe.

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

RE: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Don't be evil - Unicode Inc President

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

RE: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

UAX #14 (UCA): Derived primary weight ranges

12 matches

Site Navigation

Mail list logo

Footer information